Stanford runs 1M+ agent sandboxes monthly

A Stanford CS project reported running more than one million monthly sandboxes for AI agent training, cutting infrastructure engineering time by months with sub‑100ms GPU provisioning, custom images and stateful environments. The setup is presented as a bridge from research to production for autonomous problem‑solving agents. (x.com)

Training an artificial intelligence agent often means giving it a disposable computer to test code, open files, and make mistakes without touching a real system. Stanford’s Computer Science department said one such setup is now running more than 1 million sandboxes a month for agent training. (daytona.io) The Stanford case study names PhD researcher Etash Guha and professor Ludwig Schmidt’s group, which is tied to the DataComp project, as users of the system. The page says the team estimated it would take at least four months to build comparable sandbox infrastructure in-house. (daytona.io, arxiv.org, etash.me) A sandbox is an isolated runtime, essentially a sealed-off computer environment for code execution. Daytona, the platform in Stanford’s case study, describes its sandboxes as separate environments with their own kernel, filesystem, network stack, and allocated compute resources. (daytona.io) That isolation matters because agent benchmarks increasingly ask models to do multi-step work instead of answering one prompt. Stanford’s MLAgentBench, for example, evaluates agents on end-to-end machine learning experimentation tasks where they read files, run experiments, and analyze results inside interactive environments. (github.com, hai.stanford.edu) The Stanford case study says each sandbox in this workflow needed custom system configurations, dependencies, and software tools, and some runs lasted up to three hours. It also says slow setup would leave graphics processing units idle during environment creation, raising the cost of training. (daytona.io) To avoid that, the page says Stanford used programmatic image building for custom environments and stateful sandboxes that can persist work across sessions. Daytona’s documentation says snapshots can package dependencies and settings into reusable templates, and archived sandboxes can preserve filesystem state in object storage. (daytona.io, daytona.io, daytona.io, daytona.io) The speed claim in the Stanford material is specific: “3x faster sandbox provisioning,” with the company elsewhere advertising sub-90 millisecond environment creation. The Stanford page also says the team uses the managed runtime “daily across every stage” of its machine learning pipeline. (daytona.io, daytona.io, daytona.io) The wider backdrop is a rush to build infrastructure for agents that write and run code, browse tools, and carry state across longer tasks. Daytona’s own product pages now pitch sandboxes as “full composable computers” for agents, while other benchmark papers have shifted from static question answering to interactive environments. (daytona.io, arxiv.org, github.com) The Stanford example is also a reminder that the bottleneck in agent research is no longer only the model. Once a lab starts running thousands of long-lived, customized environments, the plumbing around the model becomes part of the experiment. (daytona.io, github.com)

Stanford runs 1M+ agent sandboxes monthly

Get your own daily briefing