Modal sits amid inference momentum
Modal operates in the same inference and developer‑tooling market where Gimlet Labs is pitching elegant fixes for inference bottlenecks — that momentum tightens timing for Modal's infra decisions. (techcrunch.com)
Gimlet Labs closed an $80 million Series A led by Menlo Ventures on March 23, 2026, pitching what it calls a “multi‑silicon inference cloud” to run AI workloads across diverse hardware. (techcrunch.com) Gimlet says its orchestration can slice models to run different portions on the best chip and reliably speed inference 3×–10× at comparable cost, and the company launched publicly in October with eight‑figure revenues. (techcrunch.com) Gimlet lists partnerships with NVIDIA, AMD, Intel, ARM, Cerebras and d‑Matrix and delivers its product either as software or via its Gimlet Cloud aimed at large model labs and data centers. (techcrunch.com) Modal announced an $87 million Series B led by Lux Capital on Sept. 29, 2025, taking its post‑money valuation to $1.1 billion and bringing total capital raised to $111 million, with the company saying it already has thousands of customers and sub‑second container startup times. (modal.com) TechCrunch reported on Feb. 11, 2026 that Modal was in talks to raise at roughly a $2.5 billion valuation, with an annualized revenue run rate near $50 million and General Catalyst in discussions to lead the round, while CEO Erik Bernhardsson said those talks were not active fundraising. (techcrunch.com) Modal runs a globally distributed, autoscaling GPU worker pool sourced from AWS, GCP, Azure and OCI that it says has scaled to well over 20,000 concurrent GPUs and has launched over four million cloud instances in the last couple of years. (modal.com) Modal’s documentation shows first‑class CUDA support and lists NVIDIA data‑center GPU classes (A10G, L40S, B200/A100 references) and serverless GPU primitives for inference, training and sandboxing on its platform. (modal.com)