DeepSeek V4 open-sources 1M-token
- DeepSeek on April 24 released DeepSeek-V4 as open weights on Hugging Face and as an API model family, with DeepSeek-V4-Pro and DeepSeek-V4-Flash both advertising 1 million-token context windows. - The flagship Pro checkpoint lists 1.6 trillion total parameters with 49 billion active, while DeepSeek says 1 million-token inference uses 27% of V3.2’s FLOPs and 10% of its KV cache. - The launch also resets DeepSeek’s API lineup, mapping older chat and reasoner endpoints to V4-Flash and undercutting rivals on long-context pricing. (api-docs.deepseek.com)
A language model’s context window is its short-term memory: how much text it can keep in view before it starts forgetting earlier material. DeepSeek says its new V4 models can hold 1 million tokens at once. (huggingface.co 1) (huggingface.co 2) DeepSeek released DeepSeek-V4 on April 24, publishing open weights for DeepSeek-V4-Pro and DeepSeek-V4-Flash on Hugging Face and listing both models in its API docs. The company says both support a 1 million-token context window. (huggingface.co 1) (huggingface.co 2) (api-docs.deepseek.com) The bigger model, DeepSeek-V4-Pro, is a mixture-of-experts system with 1.6 trillion total parameters and 49 billion active at a time. The smaller DeepSeek-V4-Flash lists 284 billion total parameters and 13 billion active. (huggingface.co) Mixture-of-experts models work like a company that routes each task to a smaller specialist team instead of waking up the whole organization. That design lets DeepSeek publish very large models without activating all parameters for every token. (huggingface.co) The practical problem V4 is trying to solve is cost, not just capacity. At very long context lengths, every new token has to look back over a huge history, which drives up compute and memory use. (huggingface.co) DeepSeek says V4-Pro at 1 million tokens needs 27% of the single-token inference FLOPs of DeepSeek-V3.2 and 10% of the KV cache memory. DeepSeek-V4-Flash cuts that to 10% of the FLOPs and 7% of the KV cache, according to the model card. (huggingface.co 1) (huggingface.co 2) KV cache is the model’s running notebook: the saved internal state it reuses while generating text. If that notebook gets too large, long coding sessions, tool calls, and document-heavy agent runs become expensive or impossible to keep alive on the same hardware. (huggingface.co) DeepSeek’s pricing page ties the technical pitch to a commercial one. It lists V4-Flash at $0.14 per 1 million input tokens on a cache miss and $0.28 per 1 million output tokens, while V4-Pro is temporarily discounted to $0.435 input and $0.87 output through May 5, 2026. (api-docs.deepseek.com) The release also changes DeepSeek’s product lineup. Its docs say the older `deepseek-chat` and `deepseek-reasoner` names will be deprecated on July 24, 2026, and now map to the non-thinking and thinking modes of `deepseek-v4-flash`. (api-docs.deepseek.com) (api-docs.deepseek.com) That means the V4 launch is not just a research drop. It is also an API migration, a price cut, and an open-weights release aimed at developers building coding agents and long-document workflows around a million-token memory budget. (huggingface.co) (api-docs.deepseek.com)