Hardware trend: ASICs next
Social commentary this week traced the hardware arc from CPUs to GPUs to FPGAs and argued that TPUs/ASICs may next take a dominant role similar to FPGAs’ prior importance in crypto and trading. The discussion positioned that shift as a potential challenge to incumbent GPU‑centric stacks. (x.com)
A chip is a factory line for math, and the line in artificial intelligence has kept getting more specialized: central processing units handled general code, graphics processing units sped up parallel math, and custom chips are now being built for narrower jobs. Google says its Tensor Processing Unit is an application-specific integrated circuit, or ASIC, designed for neural networks. (cloud.google.com) That is why the latest debate is not really about one viral post. It is about whether the next big shift in computing moves from graphics processing units to purpose-built silicon such as Tensor Processing Units, Amazon Web Services Trainium chips, and Microsoft’s Maia accelerator. (cloud.google.com) (aws.amazon.com) (techcommunity.microsoft.com) Google says Cloud Tensor Processing Units are optimized for large deep-learning models with many matrix calculations, and Amazon says Trainium is built for cost-efficient training and inference across generative artificial intelligence workloads. Microsoft said in August 2024 that Maia 100 was its first custom accelerator for large-scale artificial intelligence jobs in Azure. (cloud.google.com) (aws.amazon.com) (techcommunity.microsoft.com) The argument against graphics processing units is not that they stopped working. It is that a general-purpose accelerator can lose ground on cost or power when a cloud company controls the model, the data center, the networking, and the software well enough to design a narrower chip around one workload. (techcommunity.microsoft.com) (cloud.google.com) (aws.amazon.com) That is the same logic that pushed field-programmable gate arrays, or reprogrammable chips, into earlier speed races. Academic and industry material still describes field-programmable gate arrays as a way to cut latency in high-frequency trading systems, where microseconds can decide whether an order wins or loses. (ieeexplore.ieee.org) (sec.gov) The challenge for any ASIC push is software, not just silicon. Nvidia says CUDA is “the foundation for GPU computing,” with compilers, libraries, and tools that sit underneath frameworks such as PyTorch, which helps explain why graphics processing units remain the default stack for many developers. (developer.nvidia.com) That software edge is exactly where rivals are pressing. PyTorch/XLA already documents how to move PyTorch workloads onto Tensor Processing Units, and Google said on April 7, 2026 that TorchTPU is meant to let developers migrate existing PyTorch workloads to TPUs with minimal code changes. (docs.pytorch.org) (developers.googleblog.com) The money behind custom chips is no longer theoretical. Broadcom said on September 4, 2025 that third-quarter artificial intelligence revenue rose 63 percent from a year earlier to $5.2 billion, driven by custom AI accelerators, and it guided to $6.2 billion in fourth-quarter AI semiconductor revenue. (broadcom.com) That does not mean graphics processing units disappear. Google still compares Tensor Processing Units against graphics processing units rather than replacing the category altogether, and Amazon markets Trainium on “price performance” against GPU-based instances, which suggests a mixed market where specialized chips take more of the work that repeats at scale. (cloud.google.com) (aws.amazon.com) The thread running through all of this is specialization. When the workload is stable enough, the winning chip often stops being the most flexible one and starts being the one built for that exact job. (cloud.google.com) (developer.nvidia.com)