DeepInfra raises $107M Series B
- DeepInfra said on May 4 it raised a $107 million Series B, co-led by 500 Global and Georges Harik, to expand its AI inference cloud. - The company says it now processes nearly 5 trillion tokens a week, has grown token volume 25x since Series A, and supports 190-plus models. - The bigger shift is from training to inference — where enterprises now care most about cost, latency, and open-model flexibility.
AI infrastructure is moving into a different phase. Training giant frontier models still matters, but the real commercial pain point now is inference — serving those models cheaply, fast, and at production scale every day. That is the gap DeepInfra is trying to fill. On May 4, the Palo Alto company said it raised a $107 million Series B to expand its inference cloud and global capacity, with the round co-led by 500 Global and Georges Harik and backed by names including NVIDIA, Samsung Next, and Supermicro. (deepinfra.com) ### What does DeepInfra actually sell? DeepInfra is not another model lab. It is a cloud platform for running models after they are already trained — the part where customers send prompts, generate tokens, and pay for reliable throughput. The company pitches itself as purpose-built for high-throughput inference, especially for open-source and agentic AI workloads, rather than as a general-purpose cloud. (financialcontent.com) ### Why is inference suddenly the hot layer? Because inference is where usage turns into a recurring bill. Once companies move from demos to products, they stop obsessing only over benchmark scores and start caring about token costs, latency, uptime, and how fast they can sw(financialcontent.com)roader market logic — training is episodic and concentrated, but inference is continuous and operational. (deepinfra.com) ### What are the numbers that matter here? The headline number is $107 million. But the more revealing number is usage: DeepInfra says it is now processing nearly 5 trillion tokens per week. It also says token volume has grown 25x since its Series A, revenue has tripled in 2026, and the platform supports more than 190 open-source models. Those are the stats me(deepinfra.com)ry already in motion. (financialcontent.com) ### Why do NVIDIA and Samsung Next matter? Because those names make the round feel more strategic than symbolic. NVIDIA is the center of gravity in AI hardware, Samsung Next has been active around AI infrastructure, and Supermicro sits close to the server layer. None of that(financialcontent.com) can decide whether an inference platform is actually cheaper and faster or just says it is. (deepinfra.com) ### Why lean so hard on open-source models? Because open models have become good enough for a lot of real workloads, and they give customers more control. A company using an inference cloud built around open models can switch providers more easily, tune for price-performance, and avoid being locked into a single proprietary API. DeepInfra’s pitch is basically that enterprise(deepinfra.com)r own GPU fleet. (financialcontent.com) ### What is the hard part of this business? Serving tokens sounds simple, but it is really a scheduling and systems problem. You need GPUs in the right places, software that keeps them busy, and latency low enough that apps feel instant under load. The analogy is less “hosti(financialcontent.com)much in inference. (deepinfra.com) ### So what changed with this round? The new money gives DeepInfra room to buy more capacity, expand geographically, and push further into production workloads just as inference becomes the main economic engine of AI. The company says total funding is now above $133 million. That still leaves it competing against hyperscalers and well-funded inference startups(deepinfra.com) enough to back directly. (deepinfra.com) ### Bottom line This raise matters less as a venture headline than as a market signal. The AI stack is maturing, and the bottleneck is shifting from who can train the biggest model to who can serve useful models at the lowest reliable cost. DeepInfra is betting that inference — not training — is where the next infrastructure winners get made. (deepinfra.com)