Karpathy Open-Sources 'Autoresearch' Tool
Andrej Karpathy just open-sourced 'Autoresearch,' a 630-line Python tool that lets AI agents autonomously run machine learning experiments. The tool is designed for single GPUs, making it a powerful playground for hands-on optimization and automated research without massive compute resources.
Andrej Karpathy's release of 'Autoresearch' follows his departure from OpenAI in February 2024, where he was a founding member. After leaving, he stated his intention to focus on personal projects, a move that echoes his earlier departure from Tesla in 2022. The new tool is a minimalist 630-line Python framework derived from his 'nanochat' project, designed to let an AI agent autonomously conduct machine learning research. It operates by having a human provide high-level instructions in a Markdown file, which the AI agent then uses to modify a Python training script. Each experiment run by the agent is strictly capped at five minutes on a single GPU. The agent only commits a code change if it improves the model's performance, measured by a lower bits-per-byte (BPB) score on a validation dataset. In initial tests, Karpathy demonstrated the agent successfully reduced validation loss on its own. This project embodies Karpathy's concept of "agentic engineering," where the developer's role shifts from writing code to orchestrating AI agents that perform the low-level work. It's part of a series of minimalist, educational projects he has released, including 'nanoGPT' and 'llm.c', which focus on training language models from scratch with simple, readable code. The tool has already demonstrated practical value. Shopify CEO Tobi Lutke reported using the 'Autoresearch' loop to enhance a model's performance by 19%, showcasing how agent-driven optimization can yield significant improvements without massive computational resources. While 'Autoresearch' is positioned as an experimental tool and not production software, it provides a powerful playground for hands-on learning and automated research. It allows for rapid, overnight experimentation, with the potential to run approximately 100 experiments in an eight-hour period.