DeepSeek V4: 1.6T total, 49B active
- DeepSeek released its V4 preview on April 24, adding DeepSeek-V4-Pro and DeepSeek-V4-Flash across open weights, chat apps, and its API. - The flagship Pro model uses 1.6 trillion total parameters but activates 49 billion per token, with a one-million-token context window. - The release also starts an API migration away from legacy model names by July 24, 2026. (api-docs.deepseek.com)
Large language models predict the next token; DeepSeek’s new V4 preview is built to keep doing that across documents as long as one million tokens. (openlm.ai) (fe-static.deepseek.com) DeepSeek published the V4 preview on April 24, 2026 and split it into two Mixture-of-Experts models: V4-Pro and V4-Flash. In a Mixture-of-Experts system, only part of the model is used for each token instead of all parameters firing at once. (fe-static.deepseek.com) (openlm.ai) The flagship DeepSeek-V4-Pro has 1.6 trillion total parameters with 49 billion activated per token, while V4-Flash has about 285 billion total with 13 billion activated. Both are listed with a one-million-token context length. (fe-static.deepseek.com) (openlm.ai) Context length is the amount of text a model can keep in working memory during one session. DeepSeek says V4-Pro cuts single-token inference compute at one million tokens to 27% of DeepSeek-V3.2 and reduces key-value cache use to 10% of V3.2. (openlm.ai) DeepSeek says it got there with a hybrid attention design that mixes two compression methods, Compressed Sparse Attention and Heavily Compressed Attention. The model card also lists manifold-constrained hyper-connections and the Muon optimizer as core changes in the new series. (fe-static.deepseek.com) (openlm.ai) Training is the stage where a model absorbs patterns from data before users ever touch it. DeepSeek says V4 was pre-trained on more than 32 trillion tokens, then post-trained in two steps: separate expert tuning first, on-policy distillation into one unified model second. (openlm.ai) (huggingface.co) The company is not only posting weights; it is also changing how developers call the models. DeepSeek’s API changelog says `deepseek-v4-pro` and `deepseek-v4-flash` are live now, and the legacy names `deepseek-chat` and `deepseek-reasoner` will be discontinued on July 24, 2026. (api-docs.deepseek.com) Pricing changed with the launch as well. DeepSeek lists V4-Flash at $0.14 per 1 million input tokens on cache miss and V4-Pro at $0.435 during a temporary 75% discount that runs until May 5, 2026, 15:59 Coordinated Universal Time. (api-docs.deepseek.com) DeepSeek is pitching V4 as an open-weight model family for coding, reasoning, and agent tasks that can work over much longer inputs without the usual memory costs. The practical test now is whether developers rebuild around the new names, new pricing, and that one-million-token promise. (openlm.ai) (api-docs.deepseek.com)