DeepSeek‑V4 open‑sourced with 1M context
- DeepSeek released four open-weight DeepSeek‑V4 checkpoints on April 24, including Pro and Flash chat models plus base variants, and said both flagship versions support a 1 million token context window. - The larger DeepSeek‑V4‑Pro uses 1.6 trillion total parameters with 49 billion active per token; DeepSeek says 1 million-token inference needs 27% of V3.2’s FLOPs and 10% of its KV cache. - The release adds MIT-licensed weights and API access, pushing long-document and agent workflows into cheaper open models. (huggingface.co) (api-docs.deepseek.com)
Large language models read text in chunks called tokens, and the context window is the amount they can keep in view at once. DeepSeek says its new V4 models can hold 1 million tokens in a single prompt. (huggingface.co) (api-docs.deepseek.com) DeepSeek published the DeepSeek‑V4 collection on Hugging Face on April 24, 2026, with four open-weight releases: DeepSeek‑V4‑Pro, DeepSeek‑V4‑Flash, and their base model variants. The model card says both Pro and Flash support a 1 million-token context length. (huggingface.co 1) (huggingface.co 2) The bigger model, DeepSeek‑V4‑Pro, is a mixture-of-experts system with 1.6 trillion total parameters and 49 billion activated per token. DeepSeek‑V4‑Flash is smaller at 284 billion total parameters with 13 billion activated per token. (huggingface.co) A context window that large is only useful if it can run cheaply enough to fit on real hardware. DeepSeek says V4 uses a hybrid attention design that cuts the cost of scanning long histories by compressing and sparsifying what the model attends to. (huggingface.co 1) (huggingface.co 2) In its model card, DeepSeek says that at a 1 million-token context, V4‑Pro needs 27% of the single-token inference FLOPs of DeepSeek‑V3.2 and 10% of the KV cache. It says V4‑Flash cuts those figures further, to 10% of the FLOPs and 7% of the KV cache. (huggingface.co 1) (huggingface.co 2) DeepSeek also says it pre-trained the V4 series on more than 32 trillion tokens, then used supervised fine-tuning, reinforcement learning with Group Relative Policy Optimization, and on-policy distillation in post-training. Those are the steps it says were used to combine domain-specific specialists into one model family. (huggingface.co) The release is not just a research PDF. DeepSeek’s pricing page lists live API endpoints for deepseek‑v4‑flash and deepseek‑v4‑pro, each with 1 million context and a maximum output of 384,000 tokens. (api-docs.deepseek.com) That same pricing page shows why developers noticed the launch. DeepSeek lists deepseek‑v4‑flash at $0.14 per 1 million input tokens on a cache miss and deepseek‑v4‑pro at $0.435 during a limited-time discount that runs until May 5, 2026, 15:59 UTC. (api-docs.deepseek.com) Hugging Face’s write-up said the benchmark numbers are competitive rather than state of the art, but argued the bigger story is that V4 is built for long-running agent systems that keep appending tool results and conversation history. That is the workload where memory cost and cache size usually break first. (huggingface.co) The practical pitch is simple: feed in a codebase, a long legal file, or a stack of documents without chopping them into many smaller prompts. DeepSeek’s release puts that long-context bet into open weights, not just a closed API. (huggingface.co)