Scale‑Across Networking Trend

- Hyperscalers are shifting from traditional data-center interconnects to 'scale-across' GPU backend networking. - A Cisco architect said this shift requires roughly 14x more bandwidth, deep buffers, and 800G optics. - The change forces data-center network designs to prioritize ultra-high bandwidth and buffering to support distributed model training and inference. (x.com)

Training a big artificial intelligence model already means splitting the work across thousands of graphics processors. Now the network linking those processors is stretching beyond one building and into “scale-across” backbones that tie multiple data centers into one computing system. (nvidia.com) NVIDIA publicly framed that shift in August and September 2025, when it introduced Spectrum-XGS Ethernet and described “scale-across” as a third networking layer alongside “scale-up” inside racks and “scale-out” across clusters. Cisco followed on October 8, 2025 with its Silicon One P200 and 8223 platform for distributed artificial intelligence traffic between data centers. (nvidia.com) (cisco.com) The basic problem is distance. Inside one rack, links are short and fast; across campuses or metro areas, latency rises, congestion builds, and the network has to keep synchronized training jobs from stalling while thousands of processors wait on each other. (nvidia.com) (arista.com) That is why vendors are now talking less about ordinary data-center interconnect and more about graphics-processor back-end fabrics. Cisco said on February 10, 2026 that its latest Nexus and 8000 systems pair 102.4-terabit-per-second Silicon One G300 chips for “scale-out” with deep-buffer, 800-gigabit P200-based systems for “scale across.” (cisco.com) A deep buffer is a larger packet holding area inside the switch or router, like a longer on-ramp that can absorb bursts before traffic merges. Cisco’s pitch is that artificial intelligence traffic moving between sites needs that shock absorber, while NVIDIA argues those same deep buffers can add latency and jitter that hurt tightly synchronized jobs. (cisco.com) (nvidia.com) The hardware numbers show how much this market is moving up the stack. Cisco’s P200 is a 51.2-terabit-per-second deep-buffer router processor, and its February 2026 launch said the newer G300 switching silicon reaches 102.4 terabits per second, with 800-gigabit optics in current systems and 1.6-terabit “scale-out” performance in the new line. (cisco.com 1) (cisco.com 2) Software is moving with the hardware. NVIDIA said its NeMo Framework 25.02 and Megatron-Core 0.11.0 reached 96% scaling efficiency training a 340-billion-parameter model across two data centers about 1,000 kilometers apart, using thousands of graphics processors. (nvidia.com) The reason hyperscalers care is physical, not theoretical. NVIDIA said power, cooling, and space now cap the size of a single artificial intelligence facility, pushing operators to pool multiple sites into one “AI factory” instead of waiting for one giant campus to come online. (nvidia.com) Ethernet vendors also see an opening to take more of the artificial intelligence back end. Arista’s September 2025 artificial intelligence networking paper said 20% to 50% of a typical training job can be spent on communication, which turns every network bottleneck into a direct hit on job completion time. (arista.com) So the networking fight is no longer just about connecting data centers to each other. It is about whether the links between buildings can behave enough like one machine that distributed training and inference keep running at graphics-processor speed. (nvidia.com) (cisco.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.