Japan’s new LLMs released

Japan’s National Institute of Informatics released two open‑source LLMs — LLM‑jp‑4 in 8B and 32B‑A3B sizes — trained on about 12 trillion high‑quality tokens and claiming better performance than GPT‑4o and Qwen3‑8B on Japanese/English benchmarks. The institute said larger models are planned in 2026, signalling more locally optimized open models for bilingual use cases. (x.com)

Japan’s National Institute of Informatics published two open‑source large language models on April 3, 2026: LLM‑jp‑4 8B and LLM‑jp‑4 32B‑A3B. (nii.ac.jp) The smaller checkpoint is an ~8.6‑billion‑parameter model built on the Llama‑2 architecture. (nii.ac.jp) The second is a mixture‑of‑experts (MoE) design whose total parameter count is listed at about 320 billion but whose active per‑token parameters are roughly 3.8 billion, with 128 experts and eight active experts at a time. (nii.ac.jp) Both models were trained from scratch on a curated “high‑quality” corpus assembled from publicly available web data, government and parliamentary documents, and synthetic LLM‑generated content — a collection the institute summarizes as about 12 trillion tokens used across pretraining and intermediate training. (nii.ac.jp) NII says it optimized how much of each language subcorpus the models saw; the full corpus they built totals roughly 19.5 trillion tokens, of which about 10.5 trillion were used for initial pretraining and about 1.2 trillion for midtraining. (nii.ac.jp) A practical difference between the two releases is how MoE works in the 32B‑A3B model: think of dozens of small “expert” subnetworks stacked behind a gate that routes each input token to only a few experts. This gives the model a very large stored capacity while keeping per‑request compute similar to a much smaller dense model. The NII paper and release give the expert counts and active‑expert numbers mentioned above. (nii.ac.jp) NII reports that both checkpoints handle up to about 65,000 tokens of input and output, and that on the Japanese MT‑Bench and the English MT‑Bench the new models match or beat some leading closed and open models — for example the Japanese MT‑Bench scores cited are 7.54 for the 8B model and 7.82 for the 32B‑A3B model versus 7.29 for GPT‑4o and 7.14 for Qwen3‑8B in their evaluation. (nii.ac.jp) The artifacts are public: model files, tuning data and training corpus scripts are published on the LLM‑jp release page and the checkpoints are hosted on Hugging Face, so you can download and run them without a commercial API. ( ) For an early‑stage startup engineer in San Francisco, this release matters in two concrete ways. First, an 8‑ to 9‑billion‑parameter Llama‑2‑based model is small enough to serve on modest GPU clusters or via quantized GGUF builds, so you can prototype bilingual features without heavy API bills. Second, the MoE variant hints at a middle path: high representational power with lower steady‑state inference cost, but it requires specialized runtime and memory orchestration to route experts efficiently. (nii.ac.jp) If you’re picking a career niche, these models create demand for three practical skills at startups: (1) model engineering — fine‑tuning, instruction‑tuning and safety evaluation for local languages, (2) ML systems — efficient serving, quantization and MoE runtime engineering, and (3) product ML — designing bilingual flows that use long contexts and document‑level understanding. No single path is objectively better; each trades breadth for depth and operational ownership for leverage from platform teams. The institute also says larger LLM‑jp‑4 variants are in development and will be released during fiscal 2026, and the release notes that training used AIST’s ABCI 3.0 cluster for compute. (nii.ac.jp) You can access the full release, models and data from the NII release page and the Hugging Face model pages. ( )

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.