DigiTimes: AI data centers hit interconnect limits
- DigiTimes reported on April 26 that AI data-center operators are hitting network limits before compute limits, as east-west traffic inside large GPU clusters overwhelms the links connecting servers and switches. - The clearest signal is the rush to faster optics and denser fabrics: Broadcom says Tomahawk 6 ships at 102.4 terabits per second, while Nvidia says fifth-generation NVLink links 72 GPUs. - Demand is shifting toward 800G and higher optical modules as AI clusters scale out and add more hops, tightening focus on the network layer (digitimes.com).
An AI data center is a warehouse of chips, but the chips only work as a group if they can keep talking fast enough. DigiTimes reported April 26 that many large AI clusters are now hitting that communication limit first. (digitimes.com) Those internal conversations are called east-west traffic: data moving sideways between servers, graphics processors, and switches inside the same facility rather than out to users on the internet. In AI training and inference, that traffic explodes because models are split across many accelerators that must exchange results constantly. (developer.nvidia.com) That is why the network fabric matters. The fabric is the web of links and switches joining thousands of chips, and every extra hop adds delay, congestion risk, and power use. (broadcom.com) Nvidia said in August 2025 that fifth-generation NVLink supports 72 GPUs with 1,800 gigabytes per second per GPU and 130 terabytes per second of aggregate bandwidth. That scale-up design is meant to keep many GPUs acting like one larger machine. (developer.nvidia.com) Broadcom made the same bottleneck explicit when it shipped its Tomahawk 6 switch on June 3, 2025. The company said the chip delivers 102.4 terabits per second and was built because AI clusters are scaling from tens to thousands of accelerators, turning the network into a critical bottleneck. (broadcom.com) Once copper cables and older optics run out of reach or bandwidth, operators move to optical modules, which turn electrical signals into light so data can travel farther at higher speed. That is why faster pluggable optics have become a procurement priority alongside graphics processors. (coherent.com) (investor.lumentum.com) Lumentum said on April 1, 2025 that it began limited sampling of new 400G and 800G ZR+ L-band pluggable transceivers and made its 800G ZR+ C-band module generally available. Coherent said on March 28, 2025 that its 800G ZR/ZR+ QSFP-DD transceiver had reached general availability for data-center interconnect use. (investor.lumentum.com) (coherent.com) Market researchers saw the same shift last year. LightCounting said in its June 2025 update that optical-transceiver sales were expected to grow 10% sequentially in the quarter, with most of that growth coming from 800G Ethernet transceivers and first sales of 1.6T modules adding a smaller lift. (lightcounting.com) Arista, one of the main suppliers of cloud networking gear, also leaned harder into AI transport in 2025. The company said its October 29, 2025 R4 launch targeted low AI job completion time, lower power use, and high-performance routing for data centers and AI backbones. (arista.com) The result is a change in what counts as scarce inside an AI build. For the biggest clusters in 2026, the limiting resource is often no longer just graphics processors or electrical power, but the quality, speed, and layout of the links between them. (digitimes.com) (broadcom.com)