Describe a bug, point it at a repo — patch finds the faulty code, fixes it, and proves the fix with your tests.
patch is a small, autonomous SWE-agent. Give it a repository and a bug
description; it explores the code, makes a minimal edit, runs the test suite, and
iterates until the tests pass — then hands you a clean git diff. It works on a
throwaway copy of your repo, so your code is never touched.
Fixing a bug isn't text generation — it requires understanding the code: finding the right file, making a change that doesn't break everything else, and verifying it. An LLM that just prints a plausible-looking snippet can't tell you whether it actually works.
- Locates the faulty code (
list_files,search,read_file). - Edits it with a minimal, exact change (
edit_file/write_file). - Runs the tests in an isolated copy of the repo (
run_tests). - Iterates: reads the failure output, fixes, and tries again — bounded by
MAX_STEPSso it always stops. - Returns a verified patch: a unified diff plus a summary of what changed and why, only counted as solved when the tests actually pass.
An agent that delivers test-verified fixes — not just suggestions — is the difference between "looks right" and "is right." Because every fix is gated on the test suite, you can trust the diff or throw it away with confidence.
make benchmark runs patch against a bundled toy repo (benchmark/toy_repo) with
8 seeded bugs, each shipped with a test that fails at the start. For every
bug, patch gets a fresh copy + the bug description, and we check whether its test
goes green. Fix rate = bugs fixed / 8.
The chart above is a schematic placeholder — running make benchmark with your
key overwrites it with real per-bug results (assets/benchmark_results.json). The
benchmark needs a key because it runs the agent; the test suite itself doesn't.
The seeded bugs cover off-by-one (chunk), wrong base case (factorial), operator
/ ordering mistakes (clamp, fizzbuzz), a wrong constant (temperature), and
subtle semantics (dedupe order, word_count whitespace, case-insensitive
palindrome).
bug description + repo
│
▼
┌─────────────┐ copies repo to a temp dir, git baseline commit
│ Workspace │ (original repo is never modified)
└──────┬──────┘
▼
┌─────────────────────────── agent loop (bounded by MAX_STEPS) ──────────────┐
│ LLM emits ONE JSON action -> a tool runs -> observation goes back │
│ │
│ list_files · read_file · search · edit_file · write_file · run_tests │
│ │
│ locate ──> edit ──> run_tests ──(red)──> read error ──> edit ──> ... │
│ └────────────────(green)──────────> finish │
└─────────────────────────────────────────────────────────────────────────────┘
▼
final unified diff + summary + solved? (tests pass)
| Piece | Choice | Why |
|---|---|---|
| Agent loop | explicit JSON-action loop, capped at MAX_STEPS | inspectable, always terminates |
| Tools (ACI) | list/read/search/edit/write/run_tests | the minimal surface to fix a bug |
| Isolation | temp copy + git baseline diff | never mutate the real repo |
| LLM | Anthropic Claude (primary) · OpenAI (fallback) | bring your own key |
| Safety | path-escape guard, test timeout, soft tool errors | degrades, never crashes |
The LLM is reached only through a tiny complete() interface, so the entire loop
is testable without an API key — the suite scripts a fake model and runs the real
tools, workspace, and diff end-to-end.
Bug: factorial(0) returns 0 and factorial(5) returns 0.
--- a/toykit/math_ops.py
+++ b/toykit/math_ops.py
@@ def factorial(n):
if n < 0:
raise ValueError("factorial is undefined for negative numbers")
- result = 0
+ result = 1
for i in range(1, n + 1):
result *= i
return result[1] read_file — locate factorial
[2] edit_file — initialise the accumulator to 1
[3] run_tests -> Tests PASSED.
[4] finish — factorial must start its product at 1, not 0
Result: SOLVED (tests pass) in 4 steps.
Screenshot placeholder — add a capture of patch running here.
assets/screenshot.png
# 1. Clone & install
git clone https://github.com/alexandremourin/patch.git
cd patch
make install # or: python3.11 -m venv .venv && .venv/bin/pip install -r requirements.txt
# 2. Add YOUR key
cp .env.example .env
# edit .env: ANTHROPIC_API_KEY=sk-ant-... (OPENAI_API_KEY also works)
# 3. Fix a bug in any repo (patch works on a temp copy; your repo is safe)
.venv/bin/python -m patch.cli /path/to/repo "describe the bug" --test "python -m pytest -q"
# or: make run REPO=/path/to/repo TASK="factorial(0) should be 1"
# 4. Benchmark the fix rate, and run the tests
make benchmark # needs your key
make test # runs WITHOUT a key (the LLM is mocked)Without a key the CLI prints a clear message telling you to add one.
patch was inspired by princeton-nlp/SWE-agent — studied, then rebuilt as a small, focused, inspectable agent. What's mine:
- A minimal, bounded agent loop instead of a large framework: one JSON action per
step, a fixed
MAX_STEPScap, and a hard test timeout, so runs always terminate at predictable cost. - A tight agent-computer interface (six tools) with soft failures — a bad edit, missing file, unknown tool, or hanging test returns an error the agent can read and recover from, never a crash.
- Safe isolation: the agent works on a temp copy with a git baseline; the final
patch is a real
git diff, and the original repo is guaranteed untouched (with a path-escape guard on every tool). - A reproducible fix-rate benchmark the original doesn't ship: 8 seeded bugs with failing tests and a one-command harness that reports how many the agent repairs.
- A fully mockable design so the whole loop — locate, edit, run tests, diff — is covered by tests that need no API key.
MIT © 2026 Alexandre Mourin.
Educational project. Inspired by princeton-nlp/SWE-agent; the agent loop, tools, workspace isolation, benchmark, and CLI here are original work.
