Mamba‑3 SSM Released

Together.ai dropped Mamba‑3, an open‑source state‑space model that claims better long‑sequence language modeling and lower latency than prior Transformer decoders on 16K contexts. The inference‑first SSM design is pitched for real‑time reasoning agents and long‑context apps. (blockchain.news)

Mamba‑3’s paper and reference code were published March 17, 2026 and list Aakash Lahoti, Kevin Y. Li, Berlin Chen, Caitlin Wang, Aviv Bick, J. Zico Kolter, Tri Dao, and Albert Gu as authors. (together.ai)) The architecture introduces three explicit innovations: “exponential‑trapezoidal” discretization, complex‑valued state updates for richer state tracking, and a multi‑input multi‑output (MIMO) SSM formulation. (arxiv.org)) In 1.5B‑parameter SISO experiments the authors report Mamba‑3 outperformed Mamba‑2, Gated DeltaNet, and Llama‑3.2‑1B on combined prefill+decode latency metrics. (together.ai)) Independent coverage cites roughly a 4% language‑modeling improvement versus Transformer baselines and decode speedups up to about 7× in some long‑sequence scenarios. (winbuzzer.com)) Across state‑size ablations Mamba‑3 matches Mamba‑2’s perplexity while using approximately half the predecessor’s state size, a claim the paper highlights for efficiency gains. (arxiv.org)) The release includes optimized inference kernels built with Triton, TileLang, and CuTe and ships a readable PyTorch reference implementation named “mamba3‑minimal” alongside the high‑performance code. (together.ai)) The project’s main codebase lives under the state‑spaces GitHub organization with LICENSE files present in the repository, and the work appears as an ICLR 2026 oral on OpenReview. (github.com))

Mamba‑3 SSM Released

Get your own daily briefing