Cloudflare’s unified AI layer

Cloudflare expanded its AI Gateway into a unified inference layer that lets developers call 50+ models from more than a dozen providers while centralizing logs and billing for image, video and speech models. It also rolled out AI Search as a “search primitive for your agents,” allowing dynamic search instances, file uploads and hybrid retrieval with relevance boosting. (x.com) (blog.cloudflare.com)

Cloudflare has turned its AI Gateway into a single layer for calling outside artificial intelligence models, handling routing, logs and billing in one place. (blog.cloudflare.com) Cloudflare said on April 16 that developers can reach 70-plus models from more than 12 providers through one application programming interface, with a one-line switch between Cloudflare-hosted and third-party models inside Workers. The company named providers including OpenAI, Anthropic, Google, Alibaba Cloud, AssemblyAI, Runway and Vidu. (blog.cloudflare.com) The same rollout extends beyond text. Cloudflare said the catalog now includes image, video and speech models, and its unified billing system lets customers buy Cloudflare credits and use them across supported third-party providers without passing separate provider keys. (blog.cloudflare.com) (developers.cloudflare.com) An inference layer is the software between an app and the model that answers a prompt. Cloudflare’s pitch is that developers can swap models, retry failed requests, cache responses and watch usage from one control plane instead of wiring each provider separately. (blog.cloudflare.com) (developers.cloudflare.com) That matters for agents, which often make several model calls to finish one task. Cloudflare said a slow or failed provider can ripple through a chain of calls, which is why it is bundling routing and reliability features with model access. (blog.cloudflare.com) Cloudflare paired that release with AI Search, a managed retrieval system for agents that need to fetch documents before answering. The company said AI Search, formerly AutoRAG, can create search instances at runtime, ingest uploaded files directly and search from Workers, the Agents software development kit or the Wrangler command-line tool. (blog.cloudflare.com) The search system combines semantic matching, which looks for similar meaning, with keyword matching based on BM25, a standard ranking method in search engines. Cloudflare said the two run in parallel and the results are fused, and developers can boost rankings with document metadata or query across multiple instances in one call. (blog.cloudflare.com) Cloudflare’s search docs frame the product as retrieval-augmented generation, or feeding a model selected documents instead of making it answer from memory alone. The company said AI Search also plugs into its broader stack, including Workers AI, AI Gateway, R2 storage and Vectorize. (developers.cloudflare.com) This builds on Cloudflare’s earlier push into AI operations software. When AI Gateway reached general availability on May 22, 2024, Cloudflare said the service had already proxied more than 500 million requests and was meant to give customers one place to manage model traffic across providers. (blog.cloudflare.com) The new version makes that argument more literal: one endpoint for model calls, one bill for many providers, and one search layer for the documents agents need to look up before they answer. (blog.cloudflare.com 1) (blog.cloudflare.com 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.