Gimlet raises $80M for multi‑silicon inference

Gimlet Labs closed an $80M round to build a multi‑silicon inference cloud that runs across NVIDIA, AMD, Intel, ARM, Cerebras and d‑Matrix chips — a clear signal that customers are experimenting with heterogenous stacks. Investors and coverage framed this as an elegant fix for inference bottlenecks, pressuring single‑vendor narratives. (techcrunch.com) (menlovc.com)

The Series A was led by Menlo Ventures and included follow‑on participation from Eclipse, Factory, Prosperity7 and Triatomic; the company’s earlier seed was led by Factory and attracted angel backers including Intel’s Lip‑Bu Tan and Sequoia’s Bill Coughran. (gimletlabs.ai) Gimlet’s founding team is Zain Asgar (co‑founder & CEO, Stanford adjunct and ex‑GPU architect), with co‑founders Michelle Nguyen, Omid Azizi, Natalie Serrino and James Bartlett — the group previously built Pixie and shipped that product to acquisition. The product — branded as a “multi‑silicon inference cloud” — dynamically slices and orchestrates agentic inference across NVIDIA, AMD, Intel, ARM, Cerebras and d‑Matrix hardware, and the company reports 3–10× speedups on >1T‑parameter frontier models while its customer base has tripled to include a top‑three frontier lab and a top‑three hyperscaler. (gimletlabs.ai) Gimlet has already announced a technology partnership with d‑Matrix that the vendors say produces up to 10× speedups and major power‑efficiency gains for certain frontier workloads, with d‑Matrix highlighting SRAM‑centric acceleration as part of the stack. (storagenewsletter.com) The company runs its software on purpose‑built multi‑silicon data centers it manages for low‑latency agent workloads, while also offering the stack as deployable software and an API for customers that want to run Gimlet in their own facilities. (finance.yahoo.com) Menlo’s writeup and Gimlet’s own blog frame the opportunity as an “Inference Speed Wars” problem — arguing that phases like prefill, decode, dense and sparse attention have different compute/memory/network profiles and that matching each phase to the optimal silicon is how to reclaim efficiency at scale. (menlovc.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.