DeepSeek previews V4 Flash, 1/10 API price
- DeepSeek replaced its older chat and reasoner endpoints with deepseek-v4-flash compatibility routes, while previewing deepseek-v4-pro as its higher-end coding model on April 24. - DeepSeek priced V4 Flash at $0.14 input and $0.28 output per million tokens, with cache-hit input pricing cut to one-tenth. - Anthropic is also pushing parallel coding agents inside Claude Code, tightening pressure on model pricing. (api-docs.deepseek.com)
DeepSeek’s new V4 release is a pricing story as much as a model story: its Flash model now sits behind the old chat and reasoner endpoints. (api-docs.deepseek.com 1) (api-docs.deepseek.com 2) DeepSeek said on April 24 that DeepSeek-V4-Flash is a smaller, faster model whose reasoning is close to V4-Pro and that it can match V4-Pro on simpler agent tasks. (api-docs.deepseek.com) In plain terms, application programming interface pricing is what developers pay each time software sends text to a model and gets text back. DeepSeek’s posted rate for V4 Flash is $0.14 per million input tokens and $0.28 per million output tokens. (api-docs.deepseek.com) The company’s pricing page also says cache-hit input pricing across its models was cut to one-tenth of launch price effective April 26 at 12:15 Coordinated Universal Time. Reused prompt prefixes are the part developers can avoid paying full price for on repeated calls. (api-docs.deepseek.com) DeepSeek’s docs say the legacy model names `deepseek-chat` and `deepseek-reasoner` will be deprecated on July 24, 2026, and now map to non-thinking and thinking modes of V4 Flash. That means existing integrations can keep the same base URL while swapping model names later. (api-docs.deepseek.com 1) (api-docs.deepseek.com 2) V4-Pro is the more expensive tier. DeepSeek’s pricing table lists it at $1.74 per million input tokens and $3.48 per million output tokens, with a 75% discount extended until May 31, 2026 at 15:59 Coordinated Universal Time. (api-docs.deepseek.com 1) (api-docs.deepseek.com 2) The release is also about long prompts. DeepSeek says both V4 models support a one-million-token context window and up to 384,000 output tokens, which are the limits that matter for large repositories, long documents, and multi-step coding jobs. (api-docs.deepseek.com) Anthropic is pushing the same coding market from a different angle. Its Claude Code product page says engineers now manage multiple agents in parallel, and its February Opus 4.6 launch said Claude Code users can assemble agent teams to work on tasks together. (anthropic.com) (anthropic.com) Anthropic has not published an official product page for “Bugcrawl” in the sources I could verify today, but recent reporting and product coverage describe a cloud review feature that sends multiple agents to scan code changes for bugs. One report tied that preview to a Claude Code command called `/ultrareview`. (testingcatalog.com) (tessl.io) (pasqualepillitteri.it) Put together, the two companies are selling different parts of the same workflow: DeepSeek is cutting the meter on high-volume calls, and Anthropic is adding more autonomous review inside the repository. (api-docs.deepseek.com) (anthropic.com) The next test is whether developers switch production traffic, not just demos. DeepSeek has made migration cheap on paper, and Anthropic is betting teams will pay more for agents that can search larger codebases on their own. (api-docs.deepseek.com) (anthropic.com)