What Is Autoresearch? How Karpathy's Five-Minute Research Loop Works

A clear explanation of what autoresearch is, who built it, how the loop works, and how to run it.

Published March 21, 20263 min readDefinition + setup

The short definition

Autoresearch is Andrej Karpathy's experiment in letting an AI agent edit a real language-model training file, run a short experiment, evaluate the result, and repeat the process. The repo is deliberately compact: one main file to change, one fixed time budget, and one metric that decides whether the new idea survives.

That narrowness is the point. Instead of asking an agent to manage a sprawling research stack, autoresearch gives it a small loop it can understand, stress, and improve.

Who created it, and when did it appear?

The repository lives at karpathy/autoresearch on GitHub. In the README, Karpathy frames it as a step toward autonomous research organizations and dates the repo introduction to March 2026.

That matters because autoresearch is not just a loose label for an idea. It is a runnable repo, tied directly to Karpathy, and built around repeated experimentation on a small LLM training setup.

How does the loop actually work?

The repo reduces the system to three important files:

prepare.py handles constants, data preparation, tokenizer training, and runtime utilities.
train.py is the single training file the agent is allowed to modify.
program.md is the human-written instruction layer that tells the agent how the research organization should behave.

From there, the loop is straightforward:

The agent studies the instruction context in program.md.
It proposes a change in train.py.
It runs a time-boxed training experiment.
It checks whether the result improved on the target metric.
It keeps the better idea or discards it and tries again.

This is what makes the project feel different from ordinary automation scripts. It is not only running jobs. It is iterating on research choices.

Why does the five-minute budget matter so much?

The fixed wall-clock budget is one of the strongest design choices in the repo. Every experiment gets the same five-minute training window. That means ideas are judged under the same constraint instead of winning simply because they ran longer.

In practice, that does two useful things:

It keeps overnight experimentation dense, because the agent can fit many trials into a limited window.
It makes short-run comparisons cleaner across architecture or hyperparameter changes.

The README also notes that the main evaluation metric is validation bits per byte (val_bpb), where lower is better.

What makes autoresearch different from a normal training repo?

A standard training repository usually assumes a human researcher is doing the thinking while the code handles execution. Autoresearch moves part of that judgment into the loop itself.

The project is still intentionally constrained:

one GPU
one main file to edit
one simple metric
one short budget per run

That constraint is not a weakness. It is what makes the experiment inspectable. You can review the diffs, understand why a change happened, and decide whether the loop is learning real research instincts or just wandering.

Can you run it yourself?

Yes, as long as your environment matches the repo expectations closely enough. The README lists:

a single NVIDIA GPU
Python 3.10+
uv

The quick-start commands are:

uv sync
uv run prepare.py
uv run train.py

After the manual setup works, the autonomous loop can take over. Karpathy also notes that smaller machines may need different defaults and points people toward lighter forks or smaller datasets.

What should a new reader take away?

Autoresearch is best understood as an agent-native research loop rather than a generic AI headline. The interesting part is not just that an agent writes code. The interesting part is that the code writing sits inside a narrow experiment cycle with evaluation, rejection, and survival.

That is why the repo attracted attention so quickly in March 2026. It turns a broad idea, autonomous research, into something compact enough to run, inspect, and argue about.

Where to learn more

Start with the original sources:

If you want the shortest mental model, use this: an agent edits, trains, evaluates, and repeats inside a tightly controlled sandbox.

Next step

Go deeper in the repo for code, issues, forks, and the latest changes to the research loop.

Back to overview Open GitHub