Inference moves: Groq, Marvell

- Nvidia acquired Groq to strengthen its inference capabilities and product lineup, according to recent industry chatter (x.com). - Google is partnering with Marvell on custom ASICs that reportedly include a memory‑processing unit alongside TPUs (x.com). - Those moves show vendor strategies splitting between buying inference specialists and designing custom ASIC blocks for memory‑centric tasks ( ).

Running an AI model after it is trained is called inference, and the bottleneck is often moving data, not doing math. Two recent chip moves point at different fixes: Nvidia reached for Groq’s inference technology, while Google is discussing new inference chips with Marvell. (cnbc.com) CNBC reported on December 24, 2025 that Nvidia was buying Groq assets for about $20 billion, which would make it Nvidia’s largest deal on record. Groq had raised $750 million at a $6.9 billion post-money valuation on September 17, 2025 and had pitched itself as a specialist in fast, low-cost inference. (cnbc.com, groq.com) Nvidia disputed the idea that it bought the whole company. Bloomberg reported on March 20, 2026 that Nvidia said it “did not acquire Groq” and instead bought a non-exclusive license to Groq intellectual property and hired some engineering talent; TechCrunch reported the same denial on December 24, 2025. (bloomberg.com, techcrunch.com) Groq’s pitch is straightforward: build chips and software for the serving step, when a model answers a prompt, rather than for the training step, when a model learns from huge datasets. Crunchbase describes Groq as focused on inference with its Language Processing Unit hardware, and Groq said in 2024 that it planned to deploy more than 108,000 of those chips by the end of the first quarter of 2025. (crunchbase.com, groq.com) Google is pursuing a different route. The Information reported on April 19, 2026 that Google is in talks with Marvell to build two chips for inference: a memory processing unit designed to work alongside Google’s Tensor Processing Unit, and a new Tensor Processing Unit built specifically for running AI models. (theinformation.com) That design choice tracks a broader problem in AI hardware. Google said its Ironwood Tensor Processing Unit is built for “high-throughput, low-latency inference,” while Marvell has been marketing custom silicon and CXL memory-pooling gear aimed at breaking the AI “memory wall,” the industry term for systems that wait on data movement instead of computation. (cloud.google.com, marvell.com) Marvell has spent the past year telling investors that custom silicon is becoming a bigger part of AI infrastructure. On its custom application-specific integrated circuit page, Marvell says it builds chips tailored to cloud and AI customers, and in a recent company blog it said the custom silicon opportunity is becoming “more diverse” as hyperscalers ask for specialized compute and interconnect parts. (marvell.com, marvell.com) Google has already shown it wants its own chips to cover both training and serving. In its Trillium launch, Google said the sixth-generation Tensor Processing Unit was built to train foundation models and serve them with lower latency and lower cost; by late 2025, Bloomberg reported, Google had expanded Tensor Processing Unit access to Anthropic at very large scale. (cloud.google.com, bloomberg.com) The split is now visible in the supplier map. Nvidia has leaned on a deal tied to Groq’s inference stack, while Google is exploring a setup in which a memory-side chip works next to a Tensor Processing Unit instead of replacing it. (bloomberg.com, theinformation.com) Both approaches start from the same constraint: inference demand is rising, and the cost of serving models is increasingly set by latency, bandwidth and memory traffic. Whether companies buy an inference specialist or add custom blocks around their own accelerators, the race is moving from general-purpose AI compute to narrower chips built for the serving step. (groq.com, marvell.com, cloud.google.com)

Inference moves: Groq, Marvell

Get your own daily briefing