Developers route Claude, GPT through proxies
- Developers on May 19 discussed routing Claude, GPT and Gemini traffic through internal AI proxies to centralize logs, token tracking and guardrails. - Kong’s documentation says its AI Gateway can “log all requests, track costs across teams, enforce rate limits, or apply security policies.” - Azure, LiteLLM and MLflow all publish AI gateway documentation describing multi-model routing, usage tracking and centralized controls.
Developers on Tuesday described a familiar enterprise pattern spreading into day-to-day AI work: putting Claude, GPT and Gemini behind an internal proxy before employees or tools reach the model provider directly. The setup is designed to give teams one control point for logging, budget tracking, model routing and policy enforcement across multiple vendors. The discussion surfaced in posts on X and aligns with a growing set of vendor and open-source products now marketed as AI gateways. A May 19 post cited by the briefing pointed to proxies as a way to track tokens and apply guardrails across Claude and other models. The post itself was not fully retrievable through web access, but the architecture it referenced matches current documentation from Microsoft, Kong, LiteLLM and MLflow. Those products describe a gateway layer that sits between applications and model APIs, then records usage, applies limits and standardizes access across providers. ### Why are developers putting a proxy in front of Claude, GPT and Gemini? Microsoft says its AI gateway is meant to “secure, scale, monitor, and govern” AI models, agents and tools, including APIs using OpenAI and Anthropic schemas. The company says the layer can authenticate access, load balance across endpoints, monitor and log interactions, and manage token usage and quotas across applications. (learn.microsoft.com) MLflow describes the same idea in platform terms. Its AI Gateway documentation says companies can manage multiple LLM providers through a single secure endpoint, with request and response logging for audit trails, usage tracking, budget alerts and content guardrails. ### What problem does the proxy solve for engineering teams? Kong says its AI Gateway can proxy requests from AI command-line tools to model providers and give teams “centralized control over AI traffic.” The company says that includes logging all requests, tracking costs across teams, enforcing rate limits and applying security policies and guardrails. (learn.microsoft.com) LiteLLM markets a similar proxy for organizations using many models at once. (mlflow.org) Its documentation says the gateway can call more than 100 LLMs through a unified interface, track spend, set budgets per user or team, and add access control, logging, alerting and metrics. ### Does this let companies choose models centrally? MLflow says its gateway supports traffic splitting for A/B testing and failover chains across providers. (developer.konghq.com) That means a platform team can decide when to send a request to one model, mirror it to another for evaluation, or fall back if a provider fails. Microsoft says its gateway can manage models deployed in Microsoft Foundry or non-Microsoft environments such as Amazon Bedrock, while Kong says its tooling can sit in front of Claude Code, Codex CLI and Gemini CLI. (docs.litellm.ai) Those vendor descriptions suggest the proxy is becoming a way to separate internal application logic from the underlying model vendor. (mlflow.org) ### Where do guardrails and audit logs fit in? MLflow says gateways can enforce content policies that block or sanitize requests and responses, while Kong says teams can apply security policies and guardrails at the gateway layer. Microsoft frames the same function as governance and observability across AI endpoints. That matters for companies trying to standardize AI use across departments. (learn.microsoft.com) A gateway can create one audit trail for prompts, responses, token consumption and access patterns, instead of leaving each product team to wire those controls directly into Anthropic, OpenAI or Google integrations. That is an inference drawn from the documented features those vendors list. ### What comes next for this setup? Kong’s published guides already cover routing Claude Code, Codex CLI and Gemini CLI through its gateway, and Microsoft says AI gateway capabilities can be integrated into Microsoft Foundry. LiteLLM and MLflow both publish setup paths for unified endpoints, budgets and guardrails, giving engineering teams multiple ways to adopt the architecture now. (learn.microsoft.com)