NVIDIA releases Dynamo 1.0
NVIDIA announced Dynamo 1.0, a production inference operating system the company says is broadly adopted for AI factories, positioning a unified runtime for training→inference portability. That push reinforces CUDA + software lock‑in as vendors race to own production inference tooling. (nvidianews.nvidia.com)
NVIDIA unveiled Dynamo 1.0 as a production-ready, open‑source inference “operating system” at GTC on March 16, 2026 and said the release is available to developers worldwide. (investor.nvidia.com) NVIDIA says Dynamo has already been integrated by major cloud providers — Amazon Web Services, Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure — and listed cloud partners including Alibaba Cloud, CoreWeave, Together AI and Nebius. (investor.nvidia.com) The company named production adopters such as Perplexity, PayPal, Pinterest and Cursor, and included inference endpoint providers Baseten, Deep Infra and Fireworks among early users. (stockwatch.com) NVIDIA and its developer blog report up to a 7× increase in inference throughput when Dynamo runs on Blackwell hardware in specific benchmark tests, with third‑party reporting attributing the headline figure to SemiAnalysis runs on GB200 NVL72 systems using the DeepSeek R1‑0528 workload. (developer.nvidia.com) Dynamo’s 1.0 feature set includes native optimizations for NVIDIA TensorRT‑LLM plus integrations with open inference frameworks (vLLM, SGLang, LangChain, llm‑d, LMCache), and introduces disaggregated prefill/decode pipelines, embedding caches, ModelExpress startup improvements and Kubernetes management for multi‑node inference. (nvidianews.nvidia.com) NVIDIA credits inference research contributions from Together AI in the announcement and positions Dynamo alongside the Blackwell platform; independent coverage flagged that making Dynamo open source shifts value toward ecosystem control and can materially change the ROI calculus for Blackwell purchases. (investor.nvidia.com)