Nucleus open-sources image model
Nucleus AI released Nucleus-Image, a 17B-parameter sparse MoE diffusion model (about 2B active) that claims parity with several leading image models from pre-training alone and published the model on Hugging Face. (x.com)
Image generators start with visual noise and gradually turn it into a picture from a text prompt. Nucleus AI said on April 14 it is open-sourcing a model called Nucleus-Image that uses that approach with a sparse “mixture of experts” design. (huggingface.co) In a mixture of experts model, the system keeps many specialist sub-models but activates only a few for each step, like routing a task to the right team instead of calling the whole company. Nucleus AI said Nucleus-Image has 17 billion total parameters but uses about 2 billion per forward pass. (withnucleus.ai) The company published the weights on Hugging Face under an Apache 2.0 license, along with training code and a dataset, and described it as the first fully open-source mixture-of-experts diffusion model at this performance tier. The model repository was updated on April 15 and lists a 51.7 gigabyte release. (huggingface.co 1) (huggingface.co 2) Nucleus AI’s paper, posted to arXiv on April 14, says the model matches or exceeds leading image systems on GenEval, DPG-Bench, and OneIG-Bench while using a smaller active compute budget. The paper says those scores come from pre-training alone, without reinforcement learning, direct preference optimization, or human preference tuning after the base training run. (arxiv.org 1) (arxiv.org 2) That claim lands in a market where many of the strongest image models are available only through closed application programming interfaces or web products. By releasing weights, code, and data together, Nucleus AI is giving outside researchers a package they can inspect, modify, and run themselves. (huggingface.co 1) (huggingface.co 2) The model uses 64 specialist experts plus one shared expert, with what Nucleus AI calls “Expert-Choice” routing to decide which parts of the network handle each token. The company says that design is meant to push more capacity into the model without paying the full inference cost of activating all 17 billion parameters every time. (withnucleus.ai) (arxiv.org) Nucleus AI also published example code showing the model running through the Diffusers library with a text key-value cache, a speedup trick that reuses text-side computations across denoising steps. A related pull request to Hugging Face Diffusers describing NucleusMoE-Image was opened about three weeks before the paper appeared. (huggingface.co) (github.com) The release does not settle how the model will compare in day-to-day use against closed rivals on prompt following, safety filters, or commercial reliability. It does put a full, large image model on the open market with enough technical detail for others to test the company’s efficiency claims for themselves. (arxiv.org) (huggingface.co)