Andrej Karpathy Releases 'Autoresearch' for AI Agents
Andrej Karpathy has open-sourced "Autoresearch," a 630-line Python tool that lets AI agents run ML experiments autonomously overnight on a single GPU. The agent can evaluate results and propose new experiments, automating the research loop for small teams or solo engineers. The tool is seen as a way to democratize access to scalable model prototyping and optimization.
Andrej Karpathy's career has consistently focused on democratizing AI, from co-creating Stanford's popular deep learning course, CS231n, to his work at OpenAI, which he joined as a founding member. His tenure as Director of AI at Tesla, where he led the computer vision team for Autopilot, gave him deep experience in production-level machine learning systems. "Autoresearch" is a practical embodiment of Karpathy's vision for 'agentic' AI workflows. The tool operates by having an AI agent make modifications to a single Python training script, `train.py`, and then running a fixed five-minute experiment to test the impact. If the validation loss improves, the agent commits the change to a git repository; if not, it reverts and tries a new approach. The strict five-minute time limit for each experiment is a key design choice. It allows for a high number of iterations—potentially over 100 in an overnight session—and ensures that comparisons between different model configurations are fair, as it normalizes for hardware differences. In one of his own runs, an agent achieved 29 improvements over 276 experiments. The tool's minimalism is intentional; the entire codebase is around 630 lines, designed to fit within the context window of modern LLMs. This allows the AI agent to have a holistic understanding of the code it is modifying. Karpathy has already integrated optimizations discovered by "Autoresearch" back into his larger `nanochat` framework, proving its utility for larger-scale systems. This project taps into a larger trend of using autonomous agents for scientific research, where AI is not just a tool for analysis but an active collaborator that can generate hypotheses and design experiments. This approach is being explored in fields like drug discovery to accelerate new findings. For the San Francisco startup ecosystem, where over 900 AI startups have raised nearly $90 billion in funding, tools like "Autoresearch" are particularly relevant. They lower the barrier to entry for sophisticated model development, enabling small teams to compete with larger, better-funded research labs. The AI sector already accounts for 30% of new office leases in the city, indicating a robust and growing landscape for such innovations.