Together AI ships Nemotron 3 Nano

- Together AI said on April 28 it added NVIDIA’s Nemotron 3 Nano Omni on day one, giving developers API access to the new multimodal model. - NVIDIA says the model handles video, images, audio, documents and text in one system, with up to 9x higher throughput than peers. - The launch extends NVIDIA’s Nemotron push into multimodal agents and enterprise inference marketplaces. (together.ai)

Together AI said on April 28 that NVIDIA’s new Nemotron 3 Nano Omni model is available on its platform the same day NVIDIA launched it. (together.ai) (blogs.nvidia.com) The model is built to take in video, images, audio, documents, charts, graphical interfaces and text, then produce text responses from that combined context. NVIDIA describes it as an “omni-modal reasoning model” for agent systems. (blogs.nvidia.com) (build.nvidia.com) In plain terms, it is meant to replace the usual stack of separate vision, speech and language models with one model that acts as the “eyes and ears” inside a larger software agent. That cuts the handoffs where context is often lost. (developer.nvidia.com) (together.ai) NVIDIA said Nemotron 3 Nano Omni tops six leaderboards for document intelligence, video understanding and audio understanding, and delivers up to 9x higher throughput than other open omni models with similar interactivity. Together is pitching that speed as a fit for production inference. (blogs.nvidia.com) (together.ai) The “Nano” label refers to efficiency, not a tiny use case. NVIDIA’s model card lists it as `nemotron-3-nano-omni-30b-a3b-reasoning`, indicating a 30 billion-parameter model with about 3 billion active parameters at a time. (build.nvidia.com) (api.together.ai) That matters for companies building agents that need to watch screens, read documents, listen to calls or meetings, and respond quickly enough to stay useful. Together’s launch post points to dedicated inference as the route for developers who want to scale those workloads. (together.ai 1) (together.ai 2) The launch also shows how NVIDIA is distributing new Nemotron models through cloud inference partners instead of only through its own stack. Together framed the release as “day 0” availability, meaning customers could start using the model immediately rather than waiting for a later integration. (together.ai) (developer.nvidia.com) NVIDIA’s research report says the model adds native audio support and uses token-reduction techniques to handle long audio-video and document tasks more efficiently. Those are the kinds of workloads that can make multimodal agents expensive and slow. (research.nvidia.com) (developer.nvidia.com) For Together AI, the announcement is less about inventing a new foundation model than about becoming an early distribution point for one NVIDIA wants developers to deploy now. The pitch is simple: one open model, one API endpoint, and fewer moving parts in the agent loop. (together.ai) (blogs.nvidia.com)

Together AI ships Nemotron 3 Nano

Get your own daily briefing