DeepSeek Withholds AI Model From US Chipmakers

Chinese AI firm DeepSeek is reportedly withholding its newest model from U.S. chipmakers, including Nvidia. The move reflects growing geopolitical complexity and supply chain risks in the advanced AI sector. This action highlights the need for flexible, hardware-agnostic system architectures for companies reliant on third-party models.

- DeepSeek's previous reasoning models, DeepSeek-R1 and R1-Zero, were developed using reinforcement learning (RL) without initial supervised fine-tuning, a technique that allows the model to explore and learn complex problem-solving. The company has also released smaller, "distilled" versions of its large models, a method for transferring the reasoning capabilities of a powerful model to a more efficient one. - The standard industry practice DeepSeek is breaking involves sharing pre-release models with chipmakers like Nvidia and AMD to optimize the software's performance on their widely-used hardware. By providing Chinese firms like Huawei weeks of advance access, it allows them to tailor the model to their specific processors. - This move is part of an escalating rivalry over semiconductor technology, where the U.S. has imposed export controls on advanced AI chips to China, arguing they could have military applications. In response, China has accelerated its push for domestic chip manufacturing to reduce its reliance on American technology. - Despite U.S. export bans, a senior Trump administration official stated that DeepSeek's latest model was likely trained on Nvidia's top-tier Blackwell chips within a cluster in mainland China. Reports suggest DeepSeek may attempt to obscure the use of U.S. hardware and publicly credit Chinese chips for the training process. - DeepSeek's models have been downloaded over 75 million times on the open-source platform Hugging Face, highlighting a broader trend of influential open-source AI development from China. This contrasts with the predominantly closed-source approach of leading American AI labs. - The computational requirements for running large-scale models are substantial; for instance, training DeepSeek-V3 required over 2.6 million H800 GPU hours. Even running a distilled 70B parameter model requires significant hardware resources, making hardware-software optimization critical for performance. - The situation underscores the value of hardware-agnostic software design, which enables AI systems to operate across different hardware platforms. For startups, this flexibility can mitigate risks associated with supply chain disruptions and dependency on specific vendors affected by geopolitical tensions.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.