Nucleus‑Image open‑source model
Nucleus AI published Nucleus‑Image, a 17B‑parameter sparse mixture‑of‑experts diffusion model that runs with about 2B active parameters and reportedly matches top models like Imagen 4 through pre‑training. (x.com) The team said the model is fully open‑source on Hugging Face under an Apache 2.0 licence. (x.com)
Text-to-image systems turn written prompts into pictures by removing noise step by step, like developing a photo from static. Nucleus AI said its new Nucleus-Image model uses that approach with a sparse design that activates about 2 billion of 17 billion parameters on each pass. (huggingface.co) (withnucleus.ai) The model card on Hugging Face says Nucleus-Image is a 32-layer diffusion transformer with 64 routed experts and one shared expert, and 29 of the 32 blocks use mixture-of-experts layers instead of one dense network. The same card says the release includes full model weights, training code, and the dataset under an Apache 2.0 licence. (huggingface.co) A mixture-of-experts model works like a switchboard: each input is sent to a small subset of specialized submodels instead of the whole network. Nucleus AI said its “Expert-Choice” router picks which experts handle each step, which is how the model keeps active compute near 2 billion parameters instead of 17 billion. (arxiv.org) (withnucleus.ai) Nucleus AI’s paper says the model matches or exceeds Qwen-Image, GPT Image 1, Seedream 3.0, and Imagen 4 on GenEval, DPG-Bench, and OneIG-Bench. The Hugging Face model card and paper both say those scores come from pre-training only, with no reinforcement learning, direct preference optimization, or human preference tuning. (arxiv.org) (huggingface.co) Google describes Imagen 4 as its latest image-generation line and “our best text-to-image model yet,” which makes it a notable comparison point for any open release claiming similar benchmark performance. Google made the Imagen 4 family generally available in the Gemini application programming interface and Google AI Studio in August 2025. (deepmind.google) (developers.googleblog.com) The paper says Nucleus-Image was trained on 1.5 billion image-text pairs spanning 700 million unique images after filtering, deduplication, quality scoring, and caption curation. Training moved from 256-pixel images to 512-pixel and then 1024-pixel images, a staged process the authors say improved stability. (arxiv.org) The model also changes how text is handled during generation. Instead of pushing text tokens through the full image transformer each step, the paper says it caches text key-value projections across denoising steps, which reduces repeated work during inference. (huggingface.co) (arxiv.org) Open-weight image models have often shipped with narrower licences or without training data and code, which limits reuse and auditing. Nucleus AI is positioning this release differently by publishing weights, code, and dataset together on public repositories. (huggingface.co) (arxiv.org) The immediate test is whether outside developers can reproduce the benchmark numbers and run the system at the costs Nucleus AI describes. The files are already live on Hugging Face, and the paper is now public on arXiv. (huggingface.co) (arxiv.org)