Nvidia Plans New AI 'Inference' Chip
Nvidia is planning a new AI chip designed for rapid 'inference' processing. The move is a direct challenge to competitors and aims to capture a larger share of the booming AI market by speeding up how AI models generate responses and content.
While Nvidia's GPUs have dominated the power-intensive "training" phase of AI development, this new chip targets "inference," which is the less computationally demanding, but higher volume, process of an AI model actually generating an answer or content. As AI applications become widespread, inference is becoming a major profit center for the industry, shifting the focus to speed, efficiency, and lower operational costs. The new processor is expected to be unveiled at Nvidia's GTC developer conference in March. It will reportedly incorporate technology from Groq, an AI chip startup known for its Language Processing Units (LPUs) that use high-speed SRAM memory embedded directly on the chip, a design intended to deliver faster response times. The move addresses a booming market, with the AI inference sector projected to grow from over $100 billion to more than $250 billion by 2030. Some forecasts predict the market could reach nearly $350 billion by 2032, driven by the expansion of edge computing and the Internet of Things (IoT) ecosystem. This strategic pivot comes as major customers, including OpenAI, have explored alternatives to Nvidia for inference tasks, seeking faster and more efficient hardware. Nvidia has reportedly secured OpenAI as a lead customer for the new system, a significant win as cloud giants like Amazon and Google continue to develop their own in-house AI chips (Trainium and TPUs, respectively). Nvidia holds an estimated 80-90% of the overall AI chip market, but the inference space is more contested. Competitors like AMD, Intel, and a host of startups are specifically targeting the inference market, where factors like price-performance become more critical than the raw power needed for training.