NVIDIA Dynamo 1.0 Live

NVIDIA announced Dynamo 1.0 as a production inference OS claiming multi‑node serving, Kubernetes scaling, and up to 7x speedups on Blackwell GPUs. The release targets hyperscalers and large enterprises and aims to enable horizontally scaled, agentic workloads in production developer blog.

Dynamo 1.0 is published as an open‑source project on the ai‑dynamo/dynamo GitHub repo (github.com), where the repository shows thousands of commits, prebuilt containers and deploy recipes meant for datacenter scale. The runtime natively supports popular inference engines SGLang, NVIDIA TensorRT‑LLM and vLLM to run existing model binaries with Dynamo’s dispatcher (developer.nvidia.com), while integrations with LangChain, llm‑d and LMCache are listed as ecosystem partners in NVIDIA’s platform notes (nvidianews.nvidia.com). Cloud and hosting integrations include Amazon Web Services, Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure for platform deployment (nvidianews.nvidia.com), and NVIDIA names early production users such as Cursor and Perplexity plus endpoint providers Baseten, Deep Infra and Fireworks in its adoption list (nvidianews.nvidia.com). Dynamo introduces ModelExpress for faster replica startup by checkpoint restore and streaming weights over NVLink/NIXL, alongside disaggregated encode/prefill/decode pipelines, an embedding cache and multimodal KV routing to reduce repeated GPU work (developer.nvidia.com). Third‑party benchmark details published by NVIDIA note SemiAnalysis InferenceX runs using DeepSeek R1‑0528 (FP4, 1k/1k interactivity) on GB200 NVL72 hardware reporting roughly 50 tokens/sec per user for the tested workload and configuration (developer.nvidia.com). Operational features include a Grove API for topology‑aware scheduling (mentioning GB300 NVL72), a pip‑installable KV Block Manager with object‑storage integration, and layered fault detection plus request cancellation/migration aimed at resilient multi‑node inference deployments (developer.nvidia.com).

NVIDIA Dynamo 1.0 Live

Get your own daily briefing