What if an agent could improve a model overnight while you sleep, committing only the changes that worked? That's autoresearch — coined by karpathy/autoresearch.
This list collects tools, codebases, and papers in the same spirit: AI agents that drive scientific progress autonomously, from ML training loops to open-ended discovery.
-
karpathy/autoresearch — reference implementation that trains a small GPT (nanochat) in a loop, where agents propose code changes and keep improvements; with community forks for macOS, MLX, and Windows/RTX
-
allenai/autodiscovery — MCTS-based hypothesis search and verification for open-ended scientific discovery guided by Bayesian surprise (AllenAI, NeurIPS 2025) [PDF]
-
SakanaAI/AI-Scientist-v2 — end-to-end system using agentic tree search to ideate, run experiments, and write papers; produced the first workshop-accepted AI-generated paper (SakanaAI) [PDF] [Blog]
-
SakanaAI/AI-Scientist — covers the full research loop from ideation and coding to experimentation and simulated peer review across ML subfields (SakanaAI) [PDF] [Blog]
-
NoviScl/Automated-AI-Researcher — builds an automated executor that implements LLM-generated ideas and runs large-scale GPU experiments, then uses evolutionary search and RL to learn from results (Stanford) [PDF]
-
NoviScl/AI-Researcher — paper series benchmarking LLM research ideation against expert NLP researchers: first comparing how ideas are judged in blind review [PDF], then having researchers fully execute those ideas into projects [PDF] (Stanford)
-
google-deepmind/simply — minimal JAX codebase for frontier LLM research, designed for both humans and agents to propose ideas, run experiments, and iterate autonomously (Google DeepMind)
-
google-deepmind/superhuman — Aletheia, a math research agent powered by Gemini that autonomously solved several open Erdős conjectures and generated complete research papers without human intervention (Google DeepMind) [PDF] [Blog]
-
bogoconic1/Qgentic-AI — automated ML competition stack where Researcher and Developer agents explore data, generate code, run it, and iteratively refine solutions for Kaggle
-
ltjed/freephdlabor — multiagent framework that automates the full research lifecycle from hypothesis generation to paper writing, with customizable agents per domain [PDF]
-
K-Dense-AI/claude-scientific-skills — collection of Agent Skills that gives agents access to scientific databases, analysis tools, and domain workflows
Contributions are welcome via pull request