DeepSeek Reportedly Trained AI on Banned Nvidia Chips

A Reuters exclusive reports that Chinese AI firm DeepSeek trained its models on Nvidia's top-tier chips, despite a US export ban, by leveraging local infrastructure and supply chains. The development is highlighted in a broader analysis of China's AI market, which notes that billions in cash are fueling intense model competition between firms like Alibaba, ByteDance, and new challengers like DeepSeek.

- The specific hardware in question is reportedly Nvidia's top-tier "Blackwell" AI chip, which a senior Trump administration official claims is being used in a DeepSeek data center located in Inner Mongolia. U.S. policy explicitly prohibits the shipment of these advanced chips to China, a rule that has been progressively tightened since initial controls were introduced in October 2022. - DeepSeek was founded in mid-2023 by Liang Wenfeng, who previously co-founded the quantitative hedge fund High-Flyer. The company is fully funded by High-Flyer, which managed assets around $8 billion as of 2023, allowing DeepSeek to operate without seeking external venture capital. - The company's models, such as DeepSeek V3 and DeepSeek R1, leverage a Mixture-of-Experts (MoE) architecture to enhance computational efficiency. DeepSeek-R1 models specifically focus on advanced reasoning capabilities, developed using reinforcement learning to improve performance on complex math and coding tasks. - The alleged use of banned chips may involve "distillation," a technique where a new model learns from an older, more powerful one. A U.S. official suggested the model trained on Blackwell chips likely relied on distilling knowledge from leading U.S. models from firms like OpenAI, Google, and Anthropic. - For CTOs building multi-agent systems, open-source frameworks like Microsoft's AutoGen and CrewAI are gaining traction for orchestrating complex workflows. AutoGen focuses on creating "conversable" agents for flexible collaboration, while CrewAI uses a role-based abstraction (defining an agent's Role, Goal, and Backstory) to simplify development and reduce unpredictable behavior. - Architecturally, multi-agent systems are moving beyond simple sequential task handoffs to more complex patterns. These include concurrent orchestration (multiple agents working on the same task simultaneously) and handoff orchestration (dynamic delegation between specialized agents), which are critical for building scalable and reliable consumer-facing agent products. - As engineering leaders scale AI teams, a key challenge is moving beyond pilot projects, where up to 85% of AI initiatives fail due to poor governance or team readiness. The focus for CTOs is shifting from simply hiring for AI expertise to fostering "AI fluency"—the ability for engineers to critically evaluate AI outputs and integrate them with human judgment. - China's regulatory environment for AI is maturing rapidly, moving beyond high-level plans to specific rules. The "Generative AI Measures," effective since August 2023, govern all public-facing generative AI services. Additionally, the national standards body (TC260) has issued guidelines on the security of training data and models, requiring algorithm filing with the Cyberspace Administration of China (CAC) for services that can influence public opinion.

DeepSeek Reportedly Trained AI on Banned Nvidia Chips

Get your own daily briefing