DeepSeek uploads 1.6T open weights
- DeepSeek put DeepSeek-V4-Pro and V4-Flash on Hugging Face on April 24, making a 1.6T-parameter MIT-licensed model downloadable as open weights. (huggingface.co) - The headline spec is 1.6T total parameters with 49B active, a 1M-token context window, and mixed FP4/FP8 releases built for deployment. (huggingface.co) - That matters because frontier-ish long-context models are usually closed; this one is inspectable, portable, and cheap enough to pressure API incumbents. (huggingface.co)
Open-weight AI models usually force a tradeoff. You get openness, but you give up scale, context length, or real-world usefulness. DeepSeek is trying to br(huggingface.co)h to Hugging Face, with V4-Pro carrying 1.6 trillion total parameters under an MIT license and a one-million-token context window. (huggin([huggingface.co)at actually got released? DeepSeek released two Mixture-of-Experts checkpoints — V4-Pro and V4-Flash — plus base variants. The big one is V4-Pro at 1.6T tota(huggingface.co) is V4-Flash at 284B total with 13B activated. Both landed on Hugging Face as downloadable weights, not just API endpoints. (huggingface.co) ### Why does “open weights” matter here? Because this is not just a demo or a research teaser. MIT licensing means people can download, fine-tune, serve, modify, and even (huggingface.co) That is a much looser deal than the “source available” licenses a lot of big models hide behind. It also means outsiders can inspect behavior, benchmark it independently, and run it without asking a vendor for permission. (huggingface.co) ### Is it really a 1.6T model? Yes(huggingface.co)Mixture-of-Experts model, so it does not fire all 1.6 trillion parameters on every token. The model card says 49B parameters are activated during inference. That is the trick that makes these giant systems usable at all. Think of it less like one monolithic brain and more like a switchboard routing each token to a smaller set of specialists. (huggingface.co) ### Why is the 1M-token context a big deal? A mill(huggingface.co)ntext length turns into a cost and memory problem. Hugging Face’s writeup is basically the key point here — DeepSeek built V4 to make long traces less painful, cutting single-token inference FLOPs to 27% of V3.2 at 1M context and KV-cache use to 10%. In plain English, longer sessions become more practical instead of collapsing under their own memory bill. (huggingface.co) ### So is this the best model no(huggingface.co)more modest than the hype cycle. The benchmark story is “competitive, but not SOTA” for the release overall, while V4-Pro-Max is pitched as the strongest open-source option rather than the strongest model full stop. That distinction matters. Closed frontier systems still look ahead on some reasoning and agentic tasks. (huggingface.co) ### What about pricing? The official DeepSeek API docs show V4-Pro on a temporary promotional price through May 31, 20(huggingface.co)ird-party providers are already undercutting one another hard. OpenRouter shows $0.435 per million input tokens and $0.87 per million output tokens, while DeepSeek’s own docs note a 75% discount window on V4-Pro. Basically, the model is not just open — it is arriving in a price war. (api-docs.deepseek.com) ### What’s the catch? Open weights do not mean easy local use(huggingface.co)dly releases rely on mixed FP4 and FP8 formats. That helps, but it is still a serious infrastructure project for anyone trying to self-host the full Pro model. Most developers will touch it through hosted inference first, then decide whether the control is worth the hardware pain. (huggingface.co) ### Bottom line? The news is not just that DeepSeek uploaded a giant model. The bigger shift is that one of (api-docs.deepseek.com)der MIT terms, with long-context engineering aimed at real agent workloads. If that holds up in practice, the pressure lands on two fronts at once — closed labs have less cover for keeping weights shut, and API vendors have less room to keep prices high. (huggingface.co)