A multi-agent system that automatically reads ML research papers, analyses their code, samples datasets, runs experiments, and writes a structured report to Notion — triggered by n8n and managed through a web UI.
- n8n sends a paper (title + arXiv URL) to the API via webhook
- You approve it in the web UI, choosing model tier (Haiku / Sonnet)
- The pipeline runs automatically:
- Downloads and reads the PDF, extracts key claims
- Clones the GitHub repo (if found) and analyses the code
- Searches HuggingFace Hub for datasets and downloads samples
- Generates and runs a Python experiment script in Docker (with self-healing retries)
- Writes a structured research memo to Notion
- You get a Notion page with summary, verdict, per-claim results, and experiment logs
n8n webhook → FastAPI → SQLite queue → LangGraph pipeline
│
┌─────────────────────────┼────────────────────────┐
▼ ▼ ▼
paper_reader code_analyst data_agent
(PDF + Claude) (git clone + Claude) (HF Hub + Claude)
└─────────────────────────┼────────────────────────┘
▼
experiment_runner
(Claude → Docker → verdict)
│
▼
report_writer
(→ Notion)
| Layer | Tech |
|---|---|
| Orchestration | LangGraph (StateGraph + SQLite checkpointer) |
| LLM | Claude Haiku 4.5 / Sonnet 4.6 (Anthropic) |
| API | FastAPI + uvicorn |
| UI | Vanilla JS single-page app |
| Experiment sandbox | Docker (CPU-only, self-healing retry loop) |
| Report destination | Notion API |
| Automation trigger | n8n webhook |
| Package manager | uv |
Requirements: Python 3.11+, Docker, uv
git clone <repo>
cd agentic_research
uv syncCreate a .env file:
ANTHROPIC_API_KEY=sk-ant-...
NOTION_API_KEY=secret_...
NOTION_RESEARCH_DB_ID=...
NOTION_DAILY_PAGE_PARENT_ID=... # optional
HF_TOKEN=hf_... # optional, for gated datasets
STORAGE_ROOT=sandbox
DB_PATH=db/research_queue.dbStart the server:
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reloadOpen http://localhost:8000 for the queue UI.
agents/
paper_reader.py # PDF download, text extraction, Claude analysis
code_analyst.py # git clone, file selection, code analysis
data_agent.py # HuggingFace dataset search and sampling
experiment_runner.py # script generation, Docker execution, verdict
report_writer.py # Notion page creation, queue update, cleanup
api/
main.py # FastAPI app, startup cleanup, cache-size endpoint
routes/
ingest.py # POST /ingest — n8n webhook receiver
queue.py # queue management, pipeline runner, log streaming
orchestrator/
graph.py # LangGraph StateGraph definition and routing
state.py # PaperResearchState dataclass
notion/
page_builder.py # Notion block builders, research memo layout
db_manager.py # Notion DB row and page creation
client.py # Notion API client wrapper
tools/
pdf_tools.py # PDF download and text extraction
git_tools.py # GitHub URL detection, repo cloning
dataset_tools.py # HuggingFace search and dataset sampling
claude_utils.py # Anthropic API wrapper with retry logic
runners/
local_docker_runner.py # Docker sandbox execution
base_runner.py # RunResult dataclass
db/
queue_manager.py # SQLite queue CRUD
utils/
logger.py # Structured progress logger
cleanup.py # Post-run and age-based cache cleanup
ui/index.html # Queue management UI (3-tab, live logs, cancel)
config.py # Pydantic settings, loaded from .env
- Never-raises agents — every agent catches all exceptions, appends to
state.errors, and returns a degraded-but-valid state. The pipeline always reachesreport_writer. - Self-healing experiments — up to 3 attempts: generate → syntax check → Docker run → fix with error context → retry.
- Cost tracking — every Claude call's token cost is accumulated on state and surfaced in the Notion report and UI.
- Cache cleanup — repos deleted immediately after use (largest artifact). PDFs and datasets kept 7 days then purged on server startup.
- Claim verdicts — the experiment interpretation call returns per-claim
verified / partial / failed / not_testedresults at no extra cost, rendered as a scannable table in Notion.
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | Anthropic API key |
NOTION_API_KEY |
Yes | Notion integration token |
NOTION_RESEARCH_DB_ID |
Yes | Notion database ID for the research log |
NOTION_DAILY_PAGE_PARENT_ID |
No | Parent page for daily digest links |
HF_TOKEN |
No | HuggingFace token (gated datasets) |
RUNNER_BACKEND |
No | docker (default) or daytona |
EXPERIMENT_TIMEOUT_SECONDS |
No | Docker run timeout (default 600) |
MAX_DATASET_SAMPLE_MB |
No | Dataset sample size limit (default 100) |
STORAGE_ROOT |
No | Cache directory (default sandbox) |
