Groq Taps Samsung for 4nm AI Inference Chips

AI chip startup Groq has partnered with Samsung Foundry to produce its next-generation LPU inference chips using a 4nm process. The deal aims to optimize production for Groq's hardware, which is designed for ultra-fast performance on AI workloads.

This partnership situates Groq's next-generation Language Processing Units (LPUs) within Samsung's new $17 billion fab in Taylor, Texas. For Samsung's foundry division, which holds about 13% of the global market share compared to TSMC's dominant 62%, this is a strategic win, marking the first publicly named client for its advanced US-based 4nm process node. The deal is bolstered by a $6.4 billion grant to Samsung under the CHIPS and Science Act, aimed at strengthening the domestic US semiconductor supply chain. Groq's LPU architecture is fundamentally different from GPUs, designed exclusively for AI inference. It utilizes a single-core, deterministic "assembly line" approach with on-chip SRAM, which provides significantly higher memory bandwidth—up to 80 TB/s compared to the roughly 8 TB/s of GPU HBM. This design avoids the bottlenecks of traditional GPUs, enabling superior performance in tokens-per-second and lower latency, crucial for real-time AI applications. Performance benchmarks highlight this architectural advantage. On models like Llama 2 (70B parameters), Groq's LPU can deliver around 300 tokens per second, while comparable GPU-based systems typically achieve 10-30 tokens per second. This leap in inference speed is a direct challenge to Nvidia's dominance in the AI hardware market, which is increasingly shifting focus from training to the operational deployment of models. The competitive landscape was dramatically reshaped in late 2025 by a non-exclusive licensing deal between Nvidia and Groq, valued at approximately $20 billion. This wasn't a traditional acquisition but a strategic "acqui-hire" that saw Groq's founder, Jonathan Ross (a key designer of Google's TPU), and a significant portion of the engineering team move to Nvidia. Nvidia gains access to Groq's low-latency IP to integrate into its own architecture, effectively neutralizing a direct competitor while absorbing top-tier talent. This move occurs within a tightening regulatory framework. U.S. export controls, first implemented in October 2022 and expanded since, aim to limit China's access to advanced AI chips and semiconductor manufacturing equipment. These rules now impact nearly all global exports of advanced AI accelerators, requiring government approval and positioning the U.S. as a gatekeeper for AI infrastructure. This directly affects supply chain strategies and international sales for all major hardware producers. For Apple's internal roadmap, Groq's architecture and this subsequent talent migration to Nvidia signal a critical trend: specialized, power-efficient hardware is key to unlocking next-generation, on-device AI experiences. The LPU's design principles, focused on eliminating memory bottlenecks and ensuring deterministic performance, offer a blueprint for the kind of yield optimization and performance gains necessary for future iOS and macOS engineering challenges, especially as AI integration deepens.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.