Chip crunch and backlogs

- Fab capacity is tightening as AI demand floods the supply chain for GPUs and accelerators. - TSMC plans to ramp 2nm to about 120k–150k wafers per month by 2026 while NVIDIA reportedly has a Blackwell backlog near 3.6 million units. - High utilization (around 95%) and forecasts that inference will be two‑thirds of AI compute are driving long lead times ( ).

The bottleneck in artificial intelligence is no longer just software. It is the number of advanced chips and the factory slots needed to build them. (tsmc.com) Taiwan Semiconductor Manufacturing Co. said its 2-nanometer process entered volume production in the fourth quarter of 2025, and its N2P follow-on is scheduled for the second half of 2026. TrendForce, citing supply-chain sources, reported TSMC’s combined 2nm capacity could reach 120,000 to 130,000 wafers a month by the end of 2026. (tsmc.com, trendforce.com) A wafer is the round slice of silicon that many chips are cut from, so wafer capacity is the industry’s basic unit of supply. TrendForce said TSMC’s Baoshan and Kaohsiung 2nm fabs were each expected to scale toward roughly 60,000 to 65,000 wafers a month by late 2026 or early 2027. (trendforce.com) On the demand side, NVIDIA said in March 2025 that Blackwell systems were moving into full production, with thousands of Grace Blackwell GPUs already live at CoreWeave and thousands more being deployed with Oracle Cloud Infrastructure. SemiAnalysis’s March 2026 archive shows the firm separately published “The Great AI Silicon Shortage” and “The Great GPU Shortage,” underscoring how tight the market had become. (blogs.nvidia.com, blogs.nvidia.com, newsletter.semianalysis.com) That squeeze is widening because AI demand is shifting from training models to serving them to users. Gartner said inference-focused spending is expected to reach $20.6 billion in 2026, up from $9.2 billion in 2025, and 55% of AI-optimized infrastructure-as-a-service spending will support inference workloads in 2026. (gartner.com) NVIDIA has been pushing the same argument from the vendor side. In an April 2025 post, the company said inference is the step that turns trained models into live outputs, and linked profitability to lowering the cost per token across the full hardware and software stack. (blogs.nvidia.com) The pressure is not limited to chip fabs. Goldman Sachs Research said in December 2025 that data-center occupancy was predicted to remain at peak levels through 2026, with demand increasing slightly faster than supply over the prior nine months. (goldmansachs.com) That means the backlog is stacking up across several choke points at once: leading-edge wafers, advanced packaging, networking gear, power equipment and data-center space. When one of those runs short, cloud providers can have GPUs on order without having finished systems ready for customers. (goldmansachs.com, blogs.nvidia.com) TSMC has said customer demand for 2nm is higher than it was for 3nm at the same stage, according to the TrendForce report. If that demand holds through 2026, the chip shortage story will be less about whether companies want more AI hardware and more about which part of the supply chain breaks first. (trendforce.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.