NVIDIA opens API keys for 100+ models
- NVIDIA is letting developers generate API keys on Build to call its hosted NIM model catalog through an OpenAI-compatible endpoint, without spinning up GPUs first. - New users get 1,000 inference credits, and NVIDIA’s catalog now spans 100-plus models across Llama, DeepSeek, Gemma, Kimi, gpt-oss, speech, and retrieval. - This matters because NVIDIA is moving from chip supplier to default inference layer for app builders testing multi-model agents.
NVIDIA just made a very specific part of AI development easier — trying a lot of models quickly without first choosing a cloud, renting GPUs, or wiring up a custom stack. On Build, its developer portal, you can now generate an API key and hit a hosted endpoint that speaks the same basic language as the OpenAI API. That sounds small, but it closes a real gap. A lot of builders want to compare models, route tasks between them, or prototype agents fast. The annoying part was always infrastructure. ### What actually opened up? The new entry point is Build’s hosted NIM API experience. You join the NVIDIA Developer Program, generate a key, and call NVIDIA-hosted models through a serverless endpoint instead of deploying anything yourself. NVIDIA is pitching it as free API access for development, with DGX Cloud underneath and a path to self-hosting later if you outgrow the trial. ### Why does “OpenAI-compatible” matter? Because it kills integration friction. A lot of developer tools, agent frameworks, and internal wrappers already assume the OpenAI chat-completions shape. If NVIDIA exposes compatible endpoints, builders can often swap the base URL and API key rather than rewrite their app. That turns NVIDIA from “place where models live” into “drop-in backend for software that already exists.” ### How big is the catalog? Big enough that this is not just a demo shelf anymore. NVIDIA’s AI model catalog now spans 100-plus models and families across general LLMs, multimodal models, speech, embeddings, rerankers, and domain-specific systems. The public model pages highlight families like DeepSeek, Gemma, Kimi, Llama, and OpenAI’s gpt-oss, which tells you NVIDIA is trying to be the neutral access layer for a broad chunk of the open-model world. ### What do developers get for free? The starter allotment is 1,000 API credits on sign-up. NVIDIA forum guidance also says the trial can be expanded up to 5,000 credits, with more available in some cases through a 90-day AI Enterprise path. So this is generous enough for benchmarking, prompt testing, and prototype apps — but not “free production inference forever.” The catch is right there in the credit system. ### Is this new-new, or newly noticed? More the second. NIM itself has been around, and NVIDIA opened free developer access to hosted endpoints and downloadable microservices earlier. What changed is that the hosted catalog has quietly become broad, practical, and easy enough to plug into existing tooling that people are suddenly treating it like a real alternative API layer. Basically, the product matured into something worth paying attention to. ### Why is NVIDIA doing this? Because the company wants to own more of the AI stack than just the chips. If developers discover, test, evaluate, and eventually deploy models through NVIDIA infrastructure, NVIDIA captures the workflow before a team ever commits to a cloud architecture. It is the classic platform move — make experimentation cheap, hosted APIs and self-hosted microservices. ### Who benefits most? Startups, researchers, and tool builders. If you are building an agent that needs one model for coding, another for retrieval, and another for speech, the expensive part is usually not the prompts — it is the glue. NVIDIA is reducing the glue cost. And because the endpoint is compatible with familiar SDK patterns, it also lowers the odds that a prototype dies in setup hell. ### Bottom line? This is NVIDIA turning inference access into a front door. The free credits are the bait, but the bigger play is to become the default place developers test open models before they decide where serious workloads will run.