AgentBench for agentic AI
Solo.io launched AgentBench, a framework for evaluating agentic AI systems—tools that act autonomously on your behalf—helping organizations test performance and safety before deployment. That matters because small IT teams are increasingly tempted to automate routine tasks with agents but need guardrails to prevent misconfiguration or data exposure. (thenewstack.io)
Solo.io announced the agentevals project at KubeCon + CloudNativeCon Europe on March 25, 2026. (solo.io) The framework is designed to integrate with Solo.io’s Gloo platform and Envoy proxy to simulate multi-step infrastructure workflows such as configuring microservices, updating routing policies, or troubleshooting Kubernetes clusters. (thenewstack.io) agentevals leverages OpenTelemetry to capture and correlate distributed agent invocations, then scores traces against “golden” eval sets using an extensible evaluation engine, with both offline (recorded traces) and online (live streaming) modes. (solo.io) The project’s canonical repository (agentevals-dev/agentevals) is published under an Apache-2.0 license, includes a CLI and embedded web UI, and advertises zero-code OTLP integration that accepts Jaeger JSON and OTLP trace formats. (github.com) Solo.io also said it will contribute its agentregistry project to the Cloud Native Computing Foundation to centralize governance for agent artifacts, and the agentevals ecosystem includes a community evaluators catalog plus an MCP server to run evaluations from conversations. (solo.io) (github.com) Solo.io’s hands-on blog notes prerequisites such as a Kubernetes cluster and kagent for some integrations, and the agentevals quick start offers a pip-distributable CLI (pip install agentevals-cli) for running evaluations against sample traces. (solo.io) (github.com)