DeepSeek launches V4 model

- DeepSeek on April 24 released preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash, its new open-weight AI models, with a 1 million-token context window and support across web, app, API, and Hugging Face. - The flagship V4-Pro has 1.6 trillion total parameters with 49 billion active per query, while V4-Flash has 284 billion total and 13 billion active, reflecting a mixture-of-experts design. - The launch lands as Washington warns allies about alleged Chinese AI distillation and as DeepSeek shifts V4 toward Huawei chips, sharpening the U.S.-China AI split. (cnbc.com)

DeepSeek on April 24 released preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash, its new open-weight artificial intelligence models with a 1 million-token context window. (huggingface.co) (cnbc.com) A token is a chunk of text, and a context window is the amount a model can keep in working memory during one session. DeepSeek says both V4 models can hold up to 1 million tokens at once, which is meant for long coding jobs, tool use, and document-heavy tasks. (huggingface.co) The company released two versions instead of one. V4-Pro has 1.6 trillion total parameters with 49 billion active per query, while V4-Flash has 284 billion total parameters with 13 billion active per query. (huggingface.co 1) (huggingface.co 2) That “active per query” detail comes from a mixture-of-experts design, which works like routing a problem to a smaller set of specialists instead of waking up the whole model every time. DeepSeek and Hugging Face say the setup is aimed at lowering compute costs while keeping performance high. (huggingface.co 1) (huggingface.co 2) DeepSeek says V4 was built for long-running “agent” workloads, where a model calls tools, reads results, and keeps going through many steps. Hugging Face wrote that at 1 million tokens V4-Pro uses 27% of the single-token inference floating-point operations of DeepSeek-V3.2 and 10% of the key-value cache memory, while V4-Flash drops to 10% and 7%. (huggingface.co) The release is also a hardware story. Reuters reported that DeepSeek adapted V4 for Huawei chips, a shift that ties the model more closely to China’s push for a domestic artificial intelligence stack as U.S. export controls squeeze access to top Nvidia processors. (money.usnews.com 1) (money.usnews.com 2) The timing also collides with a new U.S. diplomatic campaign. Reuters reported on April 25 that the State Department ordered posts worldwide to raise concerns with foreign governments about alleged Chinese efforts, including by DeepSeek, to extract or distill United States artificial intelligence intellectual property. (cnbc.com) (money.usnews.com) DeepSeek’s public materials show the new models are already replacing older routes. Its API documentation says `deepseek-chat` and `deepseek-reasoner` will be deprecated on July 24, 2026, and currently map to non-thinking and thinking modes of V4-Flash. (api-docs.deepseek.com) (api-docs.deepseek.com) Pricing is part of the pitch too. DeepSeek’s API docs list a limited-time 75% discount for V4-Pro through May 5, 2026, while the company positions Flash as the lower-cost option for high-volume use. (api-docs.deepseek.com) (api-docs.deepseek.com) For developers, the immediate question is less whether V4 exists than where it fits. DeepSeek has put the models on Hugging Face, in its own API, and in its chat products, while governments and companies now have to weigh open access, lower cost, hardware independence, and the security warnings arriving at the same time. (huggingface.co) (api-docs.deepseek.com) (cnbc.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.