Chutes Open‑Source Inference Stack

- Chutes open‑sourced a secure AI inference stack, including an SDK, API server, vLLM/SGLang forks and an E2EE proxy. - The project emphasises auditable code paths and enterprise privacy rather than relying only on privacy policy statements. - Enterprises can use the stack to run auditable inference traces and stronger observable toolchains for agentic or regulated workloads. (x.com)

Running an artificial intelligence model means sending prompts into software that turns text into predictions. Chutes has now open-sourced more of that serving layer, including its software development kit, API server, forks of vLLM and SGLang, and an end-to-end encryption proxy. (github.com 1) (github.com 2) Inference is the part after training: the system that receives a request, loads a model on a graphics processor, and returns an answer. Chutes’ public repositories describe that stack as a command-line and development kit, an API backend, and serving components built around the open-source vLLM and SGLang engines. (github.com 1) (github.com 2) (docs.vllm.ai) vLLM is one of the main open-source engines for running large language models, and its documentation says it focuses on high-throughput serving, continuous batching, and an OpenAI-compatible API. Chutes’ GitHub organization shows public forks of both vLLM and SGLang updated in recent days. (docs.vllm.ai) (github.com) The privacy claim here is not just a contract term or a policy page. Chutes’ e2ee-proxy repository says the proxy intercepts OpenAI-compatible requests, encrypts them with ML-KEM-768 and ChaCha20-Poly1305, and forwards them so that only the model instance can decrypt the payload. (github.com 1) (github.com 2) That matters for companies using models on sensitive work, where the question is not only who hosts the model but what can be inspected after the fact. Chutes also publishes documentation for Trusted Execution Environment verification that says workloads run inside Intel Trust Domain Extensions confidential virtual machines with graphics processors attached by passthrough and hardware-enforced memory isolation. (github.com) (chutes.ai) The company’s broader pitch is infrastructure for open-source models rather than closed model APIs. Its website says Chutes is a decentralized compute provider for deploying and running open-source models, and its platform page lists more than 1,100 available graphics processors across hardware including NVIDIA H200, B200, H100 and A6000 systems. (chutes.ai) (chutes.ai) For enterprise buyers, “auditable” means the software path can be inspected in code instead of accepted as a black box. Chutes’ repositories make public the pieces that handle requests, routing and encryption, which gives security teams something concrete to review before putting agents or regulated workflows on top. (github.com) (github.com) (github.com) Chutes is still a small open-source footprint by GitHub numbers: its main SDK repository showed 86 stars and 31 forks, while the API server showed 23 stars and 16 forks when checked Sunday. But the company is tying that code release to a larger argument that privacy promises for model serving should be visible in software, not just stated in policy. (github.com) (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.