Karpathy Open-Sources 'Autoresearch' for Solo ML

Andrej Karpathy has open-sourced 'Autoresearch', a 630-line Python tool that lets AI agents autonomously run machine learning experiments. It's designed to work on a single GPU, dramatically lowering the barrier for indie developers to automate tasks like hyperparameter tuning and model selection without needing massive cloud infrastructure.

Andrej Karpathy is a founding member of OpenAI and the former Director of AI at Tesla, where he led the computer vision team for Autopilot. His work also includes creating Stanford's first deep learning course, CS231n, which has been a foundational resource for many developers in the field. The tool operates on a simple, continuous loop: the AI agent reads high-level instructions from a human-written Markdown file, programmatically modifies a Python training script, and then runs a fixed five-minute experiment. This fixed duration allows for roughly 12 directly comparable experiments per hour, or nearly 100 in an overnight session. Progress is tracked using git commits. The agent only commits its code modifications if the final validation metric—bits-per-byte (BPB)—is lower than the previous best score, ensuring that only beneficial changes are retained. In early demonstrations, Karpathy showed the agent successfully reducing validation loss on its own. This project differs from traditional Automated ML (AutoML) tools, which often focus on hyperparameter sweeps. 'Autoresearch' allows the agent to make arbitrary code modifications to the model architecture, optimizer, or training loop, going beyond simple parameter tuning. Shopify CEO Tobi Lutke tested the framework on an internal model and reported a 19% improvement in validation scores after the agent ran 37 experiments overnight. The resulting 0.8B parameter model outperformed the previous 1.6B version it was designed to replace. Karpathy describes this approach as "agentic engineering," a paradigm where the developer's role shifts from writing code to orchestrating AI agents that perform the implementation and iteration. The project itself is a minimalist distillation of his `nanochat` LLM training core, designed to be accessible to researchers without massive compute budgets.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.