Karpathy Open-Sources 'Autoresearch' for ML Experimentation
Andrej Karpathy has open-sourced “Autoresearch,” a 630-line Python tool enabling AI agents to autonomously run ML experiments on a single GPU. This aligns with industry trends toward agentic research loops and self-improving ML systems. Mastering experiment automation and data pipeline orchestration is becoming a key differentiator for engineers.
Andrej Karpathy, formerly Director of AI at Tesla and a founding member of OpenAI, recently released "Autoresearch". This lean Python tool allows AI agents to autonomously run machine learning experiments on a single GPU. Karpathy has a history of influential work in AI, including creating Stanford's first deep learning course, CS231n. Autoresearch automates the iterative process of ML experimentation. The AI agent modifies the training code, runs a short experiment (5 minutes), evaluates the results, and decides whether to keep the changes. This creates an autonomous loop where the agent acts as a junior researcher, continuously improving the model. The human provides high-level instructions in a Markdown file, while the AI modifies a Python training script. The system uses bits-per-byte (BPB) as the primary validation metric, ensuring only beneficial changes are committed. A lower BPB score indicates a more accurate model. Karpathy demonstrated initial runs where the agent successfully reduced validation loss from 1.0 to 0.97 BPB through autonomous code iteration. This allows researchers to potentially wake up to hundreds of completed experiments. "Autoresearch" is a stripped-down version of Karpathy's nanochat LLM training core. Karpathy noted that code tweaks discovered by the agent were integrated back into his broader nanochat framework, showing the tool's ability to find optimizations applicable to larger-scale systems. This aligns with the broader trend of self-improving AI systems.