NVIDIA + Azure: production agents

NVIDIA and Microsoft outlined deep co‑engineering work to support production agents, including Foundry with NVIDIA Nemotron models and new Azure infrastructure like Vera Rubin aimed at multi‑step agent workloads. The effort bundles model, infra and orchestration primitives to help enterprises run agentic systems at scale. That co‑engineering signals a move from prototype tooling toward integrated stacks for reliable agent orchestration. (x.com)

Most artificial intelligence agents still break the moment they have to do more than answer one prompt. A production agent has to remember context, call tools, wait for other systems, and keep going without losing the thread. (microsoft.com) Microsoft and NVIDIA spent March 16, 2026 talking about that exact problem at NVIDIA’s Graphics Technology Conference. Their answer was not one new model or one new server, but a stack that ties models, cloud infrastructure, and agent operations together. (microsoft.com) Microsoft calls its layer Foundry, which it describes as the operating system for building and running artificial intelligence at enterprise scale. In plain English, it is the control desk where a company picks models, connects data, deploys agents, and watches whether those agents are behaving. (microsoft.com) NVIDIA’s piece is Nemotron, a family of open models built for specialized agents. NVIDIA says the weights, training data, and recipes are open, which gives companies more room to inspect and tune the models before putting them into production. (developer.nvidia.com) Microsoft said on March 16 that NVIDIA Nemotron 3 Super NIM is now available in Microsoft Foundry. Microsoft described it as an open reasoning model aimed at long-context, multi-step agent workflows rather than simple chat replies. (techcommunity.microsoft.com) That “NIM” label matters because it means the model is packaged as a microservice instead of a loose set of files. The point is to let a company plug the model into a larger system the way it would plug in a database or payment service. (nvidia.com) The hardware side matters because agents do not just generate one answer and stop. NVIDIA says its Vera Rubin platform is built for multi-step problem solving and long-context workloads, with a focus on faster inference and lower cost per token than the Blackwell generation. (nvidia.com) Microsoft said Azure is the first hyperscale cloud to power on NVIDIA Vera Rubin NVL72 systems. The company framed that as infrastructure for inference-heavy reasoning workloads, which is the expensive part once a business moves from demos to constant agent use. (microsoft.com) NVIDIA used even plainer language in its own announcement. It said Vera Rubin will let Azure run “more powerful models and agents at massive scale” for “hundreds of millions of people,” which tells you this is about serving live systems, not lab benchmarks. (nvidianews.nvidia.com) Put together, the pitch is simple: Foundry is the control tower, Nemotron is one of the reasoning engines, and Vera Rubin is the machine room underneath. Microsoft and NVIDIA are trying to sell enterprises one connected path from model choice to deployment to monitoring instead of a pile of separate tools. (microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.