DeepSeek v4 expands to 1M context

- DeepSeek launched DeepSeek-V4 Preview on April 24, with V4-Pro and V4-Flash, open weights, API access, and a claimed 1M-token default context window. - The headline detail is practical, not just big — DeepSeek says 1M context is standard everywhere, with sparse attention cutting long-context memory costs sharply. - That matters because long context is moving from demo feature to usable workflow — for agents, giant codebases, and multi-document analysis.

DeepSeek just made a very specific bet about where AI usage is going next. Not just smarter models. Longer working memory. On April 24, it released DeepSeek-V4 Preview — two open-weight models called V4-Pro and V4-Flash — and said 1 million tokens is now the default context length across its official services. That sounds like a spec-sheet flex. But the real story is more practical. A giant context window only matters if people can afford to use it, and if the model stays useful instead of getting slow, forgetful, or wildly expensive. DeepSeek’s pitch is that V4 is the first version where million-token context is supposed to be routine, not exotic. Models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. V4-Pro is the larger flagship, listed at 1.6T total parameters with 49B active parameters, while V4-Flash is the cheaper, faster model at 284B total and 13B active. Both are available through DeepSeek’s API and as open weights, and both support 1M context plus thinking and non-thinking modes. So why is 1M tokens a big deal? A million tokens is enough to hold absurdly large inputs in one shot — roughly multiple books, long code repositories, or huge piles of documents. That changes the shape of the task. Instead of chunking everything into little pieces and hoping retrieval works, you can often keep the whole mess in view at once. That is especially useful for coding agents, transcript analysis, and when context gets fragmented. ### Haven’t other models already gone big on context? Yes — but the catch is that “supports a huge context window” and “is pleasant to use at that size” are not the same thing. Long context usually explodes compute and memory costs because attention scales badly as sequences get longer. DeepSeek says V4 uses token-wise compression plus DeepSeek Sparse Attention to cut those costs enough to make 1M context practical by default. That’s the core claim here. ### Why does DeepSeek keep talking about agents? Because agents are where long context stops being a benchmark trick. An agent doing real work has to remember instructions, tool outputs, prior errors, file contents, and the state of a task that may run for a long time. DeepSeek explicitly says V4 is optimized for agent capabilities, integrated with tools in the Claude Code ecosystem, and already used for tasks as a worker, not just a chatbot. ### Is this also an ecosystem play? Very much so. DeepSeek kept the same base URL, added compatibility with OpenAI-style Chat Completions and an Anthropic-format API, and even documents how to point Anthropic’s SDK at DeepSeek’s endpoint. That lowers switching costs. Developers do not have to rebuild everything from scratch to test V4. Older endpoints are on the way out. DeepSeek says `deepseek-chat` and `deepseek-reasoner` will be fully retired on July 24, 2026, and currently route to V4-Flash modes for compatibility. So this is not a side experiment. It is the new default direction of the product line. ### So DeepSeek touted “1M” on a launch page. Plenty of AI launches chase the biggest number. The interesting part is that it paired that number with open weights, API compatibility, and an explicit claim that million-token context should be cheap and normal. If that holds up in real workloads, long context stops being a luxury feature and starts becoming basic infrastructure.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.