OpenSquilla saves 60-80% tokens

- OpenSquilla released its open-source AI agent runtime on May 14, 2026, publishing an Apache 2.0-licensed, local-first system built around routing and memory. - OpenSquilla says its routing, caching and prompt controls cut token spend by 60% to 80%, with one published test showing 222,848 cached tokens. - Developers can clone the GitHub repository now, and the project’s quick-start flow points users to a local control panel.

OpenSquilla went public on May 14 with an Apache 2.0-licensed AI agent runtime that pitches one core benefit: lower token bills. The project’s GitHub repository and website describe it as a local-first, self-hostable system that combines model routing, persistent memory, sandboxing, built-in search and local embeddings in one agent loop. OpenSquilla says those controls reduce token spend by 60% to 80% compared with a flat single-model setup. The software arrives as a Python-based runtime with a web interface, command-line tooling and support for multiple model providers. The quick-start instructions tell users to clone the repository with Git LFS, run an installer, complete an onboarding flow and then start a gateway that serves a control panel on a loopback address. The repository is public on GitHub under the Apache-2.0 license. (github.com) ### Where does the 60% to 80% savings claim come from? OpenSquilla’s own materials tie the savings claim to several mechanisms working together rather than one compression feature. The website says the system uses “smart routing,” reasoning-depth tiers, adaptive prompts and on-demand skill loading so simpler requests do not consume premium-model or heavy-reasoning tokens. A sponsored write-up published on May 14 by TestingCatalog said OpenSquilla benchmarked three prompts through its gateway and processed 279,762 tokens at a total session cost of $0.0094. (github.com) TestingCatalog reported that 222,848 of those tokens were served from cache, or about 80% of input tokens, which it attributed to context reuse across turns. The same report said OpenSquilla’s own benchmarks put the combined savings versus a flat configuration at 60% to 80%. (opensquilla.ai) ### What is actually local-first about it? The project’s website says OpenSquilla runs a local web control panel at `127.0.0.1:18790` by default and uses bundled ONNX inference for embeddings on CPU. That means memory embeddings can stay on the user’s machine unless the operator chooses an external provider such as OpenAI or Ollama. GitHub documentation for the repository says the system includes local embeddings, durable sessions and a local web UI. (testingcatalog.com) The release notes page also describes the package as a Python agent runtime with MCP-native tools, local memory and multi-channel messaging. ### How does the routing system decide when to spend more? OpenSquilla says its router uses a mix of hand-crafted and semantic signals to judge task complexity. (opensquilla.ai) The website lists message length, language, code blocks and keywords as factors, combined with embedding-based features, to decide whether a query should go to a cheaper or more capable model. TestingCatalog reported that the classifier logs gate decisions per query and that deep reasoning is turned off for lightweight tasks. (github.com) That setup, according to the report, is meant to avoid paying reasoning-token charges on trivial prompts. ### What role does memory compression play? OpenSquilla describes its memory system as a four-tier structure: working, episodic, semantic and raw memory. (opensquilla.ai) The website says retrieval combines vector search with full-text search, while “hot memory promotion” surfaces frequently recalled items and temporal decay pushes older items down unless they are marked evergreen. The practical effect, based on the company’s description, is that the agent does not need to reload the same full context on every turn. (testingcatalog.com) TestingCatalog’s May 14 report linked the high cache share in its sample run to that reuse of prior context. ### What can developers use today? The repository’s README says OpenSquilla already supports a shared agent loop across web UI, CLI and chat channels, with a pluggable provider layer for OpenRouter, OpenAI, Anthropic, Ollama, DeepSeek, Gemini, Qwen/DashScope and other providers. (opensquilla.ai) The website says the runtime supports more than 10 channels and requires Python 3.12 or newer for source installs. (testingcatalog.com) As of May 14, the GitHub project showed recent commits, a public repository and installation scripts for macOS, Linux and Windows. The quick-start flow directs users to run `opensquilla onboard`, then launch the gateway and open the local control panel in a browser. (github.com 1) (github.com 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.