Hebbia’s citation claim

Hebbia’s CTO said their mini model matches or exceeds competitors on citation recall at a lower cost, aligning with the growing trend of tiered routing across OpenAI, Anthropic, and Google for agent systems. The claim underscores a move toward smaller, task‑tuned models for citation‑sensitive enterprise search. (x.com)

OpenAI released GPT‑5.4 mini and nano on March 17, 2026 as smaller, lower‑latency variants intended for high‑volume and agentic workloads. (9to5mac.com) OpenAI published per‑call pricing that places mini at roughly $0.75 per million input tokens and $4.50 per million output tokens, with nano at about $0.20/$1.25 — positioning both as budget layers for subagents and short‑turn tasks. (testingcatalog.com) Early benchmarks reported by multiple outlets show GPT‑5.4 mini closing substantial gaps with earlier small models (examples: 54.4% on SWE‑Bench Pro for mini versus 45.7% for the previous GPT‑5 mini), alongside claimed 2x‑plus inference speedups versus larger variants. (testingcatalog.com) Hebbia, which markets the Matrix agent platform for finance and law, has been scaling its engineering organization under CTO Aabhas Sharma since his October 2025 appointment after roles at Found and Uber. (finanicalcontent.com) Hebbia is a Series B startup that raised $130 million at about a $700 million valuation in July 2024 and sells document‑centric, citation‑driven search and agent workflows to asset managers, banks and law firms. (techcrunch.com) Cloud partners are framing mini/nano as the fast execution tier for tiered routing in agent stacks, with Microsoft Azure documentation recommending smaller models for low‑latency subagents and short classification/extraction jobs. (techcommunity.microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.