Nebius Adds Human-in-the-Loop for Enterprise Agents

Cloud provider Nebius is integrating Toloka’s Tendem platform to allow its AI agents to escalate complex or high-stakes tasks to human experts. This human-in-the-loop system is designed to boost the reliability of AI agents in critical enterprise workflows, providing a fallback when the AI's confidence is low or a human decision is required.

The integration of Toloka's Tendem platform into Nebius's ecosystem treats human expertise as a high-latency, high-accuracy API call. This is designed to address the "reliability ceiling" developers hit when moving AI agents from demos to production, where edge cases and high-stakes decisions require more than prompt engineering. The system uses a Model Context Protocol (MCP) to allow an AI agent to programmatically escalate tasks to a network of over 10,000 vetted human experts across more than 20 domains when its confidence is low. Nebius, an AI-native cloud provider headquartered in Amsterdam, was formed after a divestment from the Russian tech giant Yandex. The company operates a full-stack platform for AI workloads, including its own hardware and proprietary software, and is a preferred cloud service provider in the Nvidia Partner Network. This move to integrate a human-in-the-loop system signals Nebius's strategy to provide a complete, enterprise-ready stack for building, deploying, and governing AI agents, moving beyond just infrastructure. Toloka, which has a shared history with Yandex and is now a separate entity in which Nebius holds a stake, brings over a decade of experience in building human intelligence infrastructure. Its platform is engineered for large-scale data labeling and human-in-the-loop workflows, which are leveraged by leading AI labs like Anthropic. Benchmarks for the Tendem platform show a 53% faster task completion compared to human-only work, with a 21.3% improvement in quality over traditional freelance platforms for complex tasks. This approach of embedding human oversight directly into the agentic workflow is becoming a critical theme in enterprise AI. Competitors are tackling this reliability challenge in different ways. Glean, for instance, focuses on grounding its AI in an enterprise knowledge graph and uses user feedback like upvotes and downvotes to validate performance. Hebbia, which targets high-stakes industries like finance and law, emphasizes "auditable intelligence." Its platform is built on a "citation-first" principle, where every piece of information generated by its AI agents is hyperlinked directly back to the precise source material, ensuring verifiability for every output. Cohere's strategy involves building "oversight as a feature," with the goal of safely reducing the need for human supervision over time. In specific enterprise deployments, such as with their partner Ensemble, they have implemented a "human-in-the-loop safety framework" to ensure reliability in complex, regulated environments like healthcare revenue cycle management.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.