Nvidia Reportedly Tapping Groq for New AI Chip
Nvidia is expected to unveil a new AI inference chip at its upcoming GTC conference that incorporates technology from Groq. The move is seen as an effort to address the hardware bottleneck in AI inference and deliver faster responses for models from providers like OpenAI.
The AI industry is pivoting from training, a cost center, to inference, the revenue-generating process of running models for live queries. While Nvidia's GPUs dominate the training market, which accounted for about 40% of its data center revenue in 2024, customers now demand more cost-effective, low-latency chips specifically for inference. Groq's technology centers on its Language Processing Unit (LPU), an Application-Specific Integrated Circuit (ASIC) built from the ground up for inference speed. Unlike repurposed GPUs, the LPU architecture is designed solely for the linear algebra calculations that dominate AI inference workloads, enabling it to run large language models at significantly faster speeds. The LPU's performance advantage comes from its unique design, which uses large amounts of fast SRAM directly on the chip instead of the High-Bandwidth Memory (HBM) found in GPUs. This, combined with a software-first, deterministic architecture controlled by a custom compiler, eliminates the variable latency common in GPUs and ensures predictable, ultra-low-latency responses. The collaboration is reportedly a $20 billion licensing deal, structured as a non-exclusive technology license and a talent transfer, or "acqui-hire." The agreement brings Groq founder Jonathan Ross and the majority of his engineering team to Nvidia, a move that secures critical IP and talent without the regulatory hurdles of a formal acquisition. This strategic move is Nvidia's defense against rising competition from hyperscalers like Google and Amazon, which are developing their own custom AI chips, and a growing number of startups focused purely on the inference market. The deal followed reports of customer frustration, including from OpenAI, over the inference speed of Nvidia's existing hardware for specific tasks like code generation. OpenAI was reportedly in talks with startups, including Groq and Cerebras, to find faster inference alternatives to fulfill about 10% of its future computing needs. Nvidia's multi-billion dollar deal with Groq effectively ended those discussions, securing OpenAI as a lead customer for the forthcoming chip and reinforcing its partnership with the AI leader. The formal announcement is anticipated at Nvidia's GTC 2026 conference, scheduled for March 16-19 in San Jose, California. CEO Jensen Huang's keynote is expected to detail the new chip, which analysts speculate could feature a hybrid "chiplet" design integrating Groq's LPU cores alongside traditional GPU components.