JetBrains releases Mellum2 12B MoE
- JetBrains on June 1 released Mellum2, an open-weight 12B-parameter mixture-of-experts language model aimed at software engineering and low-latency text-and-code workloads. (huggingface.co) - The key number is 2.5 billion active parameters per token: JetBrains says Mellum2 delivers competitive results while running more than 2x faster. (huggingface.co) - JetBrains published base, Instruct, Thinking and SFT-stage checkpoints on Hugging Face, alongside a technical report on arXiv and Hugging Face. (huggingface.co)
JetBrains released Mellum2 on June 1, adding an open-weight 12 billion-parameter mixture-of-experts model to its developer tooling stack. The company said the model is trained from scratch on natural language and code and is designed for low-latency, high-throughput inference on software engineering tasks. (huggingface.co) JetBrains published the release through Hugging Face under an Apache 2.0 license and linked a technical report describing the architecture, training setup and evaluations. The new model extends JetBrains’ earlier Mellum line, which started as a code-completion model, into broader text-and-code uses including code generation, editing, debugging, tool use and conversational programming assistance, according to the technical report. (huggingface.co) JetBrains said Mellum2 is intended for workloads such as routing, retrieval-augmented generation, summarization, sub-agents, coding features and private deployments. ### Why is the “12B A2.5B” label central to this release? Mellum2 has 12 billion total parameters but activates 2.5 billion parameters per token, JetBrains said. The company’s model cards and technical report describe a 64-expert architecture with 8 experts activated per token, a design meant to lower inference cost while keeping model capacity higher than a dense model with similar active compute. (huggingface.co) JetBrains said that setup lets Mellum2 run at the per-token compute of a 2.5B dense model while remaining competitive with open-weight baselines in the 4B-to-14B range. In its Hugging Face launch post, the company said the model achieves more than 2x faster inference than similar-sized models. (huggingface.co) ### What exactly did JetBrains publish? JetBrains released multiple checkpoints rather than a single model endpoint. The published family includes a base model, an Instruct model for direct answers, a Thinking model that emits reasoning traces, and supervised fine-tuning-stage research artifacts including Thinking SFT, according to the model cards. (huggingface.co) The Instruct checkpoint was produced from the base model through supervised fine-tuning followed by reinforcement learning with verifiable rewards, or RLVR, on math, executable coding, tool use, instruction following, reasoning and knowledge tasks, JetBrains said. The Thinking variant follows a similar path but is tuned to produce explicit reasoning inside `<think>` blocks before the final answer. (huggingface.co) ### What does the technical report add beyond the launch post? The technical report, published May 28 and surfaced on Hugging Face on June 1, gives the most detailed account of the training recipe. JetBrains said pre-training covered about 10.6 trillion tokens in a three-phase curriculum that shifted from broad web data toward more curated code and math data. (huggingface.co) The same report says the base model was later extended to a 128K context window with layer-selective YaRN, while the released model cards list a 131,072-token context length. The architecture also combines grouped-query attention, sliding-window attention on most layers and a multi-token prediction head that JetBrains says can support speculative decoding. (huggingface.co) ### Where does Mellum2 fit in JetBrains’ broader AI strategy? JetBrains already markets Mellum as a family of small language models for coding tasks, available both as open-source foundation models and as production-optimized variants integrated into JetBrains IDEs. (huggingface.co) The company has described Mellum as part of its effort to build AI-assisted developer tools focused on speed, accuracy and cost efficiency. Hugging Face’s Transformers documentation now includes Mellum model support with examples for loading `JetBrains/Mellum2-12B-A2.5B-Base`, which gives developers a direct path to test the release in standard open-source tooling. JetBrains also published vLLM serving examples in the model cards for both the Instruct and Thinking variants. (huggingface.co) ### What comes next for developers who want to evaluate it? JetBrains pointed users on June 1 to a Hugging Face collection for the Mellum2 family and to the full technical report for benchmark details, evaluation methodology and training decisions. The released checkpoints are available now under Apache 2.0, and the model cards include deployment examples for vLLM as well as guidance on when to use the Instruct versus Thinking variants. (jetbrains.com) (huggingface.co 1) (huggingface.co 2)