Open-Source AI Proxy Aims to Cut LLM Costs by 50%

An open-source AI proxy named Plano has been released, designed to reduce LLM operational costs. The tool reportedly uses a 1.5-billion-parameter router model to intelligently route user prompts to the most cost-effective and optimal LLM. The system also adds a layer for orchestration and observability, addressing key MLOps challenges.

- Plano is built by contributors to the Envoy Proxy, a widely adopted open-source edge and service proxy designed for cloud-native applications, signaling an architecture focused on high performance and scalability. - The practice of using a smaller, specialized model to direct prompts is known as LLM routing; this can be implemented by fine-tuning a classifier model to assess query complexity or by using semantic search to match a prompt's embedding vector to the most appropriate LLM. - Benchmarks for similar intelligent routing systems show they can reduce operational costs by up to 85% while maintaining 95% of the performance quality of premium models like GPT-4. - The "observability" layer addresses LLMOps challenges by centralizing the logging of all requests, responses, and token usage, which is critical for monitoring model drift, performance latency, and cost attribution in production environments. - This type of proxy functions as an AI gateway or control plane, abstracting the complexity of using multiple model providers (like OpenAI, Anthropic, or Google) behind a single, consistent API endpoint. - Architecturally, Plano is designed to be deployed as a sidecar proxy, running alongside application services to intercept and manage traffic without requiring changes to the core application code. - The concept competes with managed services from major cloud providers, such as Amazon Bedrock's Intelligent Prompt Routing, which also automatically routes requests to the most suitable model to balance performance and cost. - The orchestration function allows developers to define and manage complex workflows or chains of agentic interactions, where different steps might be handled by different LLMs or tools, all coordinated through the proxy.

Open-Source AI Proxy Aims to Cut LLM Costs by 50%

Get your own daily briefing