DeepInfra raises $107M Series B
- DeepInfra closed a $107 million Series B round with Nvidia among investors to scale inference-serving infrastructure across cloud providers and enterprise customers. - The raise underscores investor interest in storage, caching and serving layers that keep agentic workloads responsive under concurrency and cost control at scale. - For collaboration teams this signals funding momentum behind inference plumbing rather than only new model features. (startupfortune.com)
Inference infrastructure is the layer that actually runs AI models after training — the part that turns a model into a product people can hit all day, at low latency, without the bill exploding. That sounds less glamorous than model labs, but it is where a lot of the real operational pain now lives. On May 4, 2026, DeepInfra said it raised a $107 million Series B to expand that layer globally as demand shifts from training runs to production inference. The round was co-led by 500 Global and Georges Harik, with NVIDIA, Samsung Next, Supermicro, Felicis, A.Capital Ventures, Crescent Cove, Peak6, and Upper90 also participating. ### What does DeepInfra actually do? DeepInfra sells cloud infrastructure for running AI models, especially open models, in production. The company’s pitch is simple: developers and enterprises want API access to big models and custom deployments, but they do not want to spend months stitching together GPUs, routing, caching, scaling logic, and cost controls themselves. DeepInfra says it now supports more than 190 open models and processes nearly five trillion tokens per week. ### Why is inference the bottleneck now? Training still gets the headlines because it is flashy and expensive. But once a model is good enough, the hard problem shifts. You have to serve millions of requests, keep latency low, manage bursts, and avoid wasting GPU time on idle capacity. Agentic workloads make this worse — one user action can trigger chains of model calls, retrieval steps, and tool use. So the constraint is no longer just “can you train a frontier model?” It is “can you run useful AI all day at a price customers will tolerate?” DeepInfra framed the raise around exactly that shift from training to production-scale inference. ### Why does the investor list matter? Because it is not just financial capital. NVIDIA and Supermicro sitting in the round tells you the hardware side of the stack sees value in specialized inference clouds. Samsung Next adds another strategic name. And 500 Global plus Georges Harik co-leading suggests this is not being treated like a niche infra bet — it is being treated like a platform bet on where AI demand is heading. Bloomberg’s interview framing was basically the same: DeepInfra is trying to attack AI compute bottlenecks, not just ride hype around model releases. ### What’s the strongest signal in the announcement? Probably not the funding amount by itself. It is the operating numbers around it. DeepInfra says token volume is up 25x since its Series A, and the company says revenue has tripled since early 2026. Those numbers suggest this is not a pure “build first, hope later” infrastructure round. It looks more like investors saw real demand from customers already pushing production traffic through the system. ### Why are open models part of this story? Because open models change where value can accrue. If more companies are willing to build on open-weight or openly available models instead of closed APIs, the serving layer becomes more important. Someone still has to host, optimize, route, and secure those models. DeepInfra is positioning itself as that layer — the place where open models become reliable products instead of research demos. ### So what changed with this round? The new money is meant to expand DeepInfra’s global capacity and inference cloud footprint. In plain English, that means more infrastructure, more regions, and more ability to absorb enterprise and agent-driven workloads without falling over or charging absurd margins. The company is betting that the next phase of AI spending will reward the firms that make model usage cheap, fast, and boring — boring in the good way. ### What’s the catch? Inference is a brutal business. Customers care about price, speed, reliability, and model availability all at once. Hardware supply can tighten. Margins can compress fast. And if the big cloud vendors or model providers bundle better serving economics directly into their own platforms, independent inference clouds get squeezed. DeepInfra’s growth numbers are impressive, but this market will reward execution more than narrative. ### Bottom line? This round matters because it is a vote for the plumbing. AI’s next fight is not only about who builds the smartest model. It is also about who can serve useful models at production scale, under real traffic, for a sane cost — and DeepInfra just raised a lot of money to try to be that company.