Karpathy Open-Sources 'Autoresearch' Agent
Andrej Karpathy just released 'Autoresearch,' a simple 630-line Python tool that lets an AI agent autonomously run machine learning experiments on a single GPU. The agent can design, execute, and iterate on experiments in about five minutes each, autonomously deciding what to try next. It's a powerful proof-of-concept for how agentic workflows can accelerate R&D, even with modest hardware.
The project's design intentionally shifts the human's role from writing Python code to orchestrating research by writing high-level instructions in a Markdown file. This "program.md" file guides the AI agent, which then directly modifies the `train.py` script containing the GPT model, optimizer, and training loop. This setup embodies Karpathy's concept of "agentic engineering," where humans manage AI agents that perform the bulk of the coding. 'Autoresearch' is built upon Karpathy's "nanochat" LLM training core, stripped down to its essentials to fit within the context window of modern LLMs. This minimalism is a key feature, allowing the agent to have a complete understanding of the code it is modifying. The system is designed for a single NVIDIA GPU, making it accessible beyond large-scale research labs. Each experiment runs for a fixed duration of five minutes, allowing for rapid, iterative testing—potentially over 100 experiments overnight. The agent's success is measured by a "bits-per-byte" (val_bpb) metric; if a code change improves this validation score, it's committed to a git branch, otherwise, it's discarded. This fixed-time approach normalizes for hardware differences and focuses purely on the quality of the configuration changes. The project saw immediate validation when Shopify CEO Tobi Lutke applied it to an internal model. Overnight, the agent ran 37 experiments and achieved a 19% improvement in the validation score. Lutke noted he learned more from observing the agent's reasoning than from months of following ML research. This work is part of a broader shift towards agentic workflows, where AI systems can plan, act, and adapt to achieve goals with limited human supervision. These workflows are distinct from simple automation in their ability to make decisions and learn from feedback, transforming processes in fields from customer support to complex scientific research. Karpathy, a co-founder of OpenAI and former Director of AI at Tesla, has a history of impactful work in deep learning and computer vision. His previous roles involved leading the Autopilot vision team at Tesla and pioneering deep learning education at Stanford, lending significant credibility to this new open-source contribution.