patch

Describe a bug, point it at a repo — patch finds the faulty code, fixes it, and proves the fix with your tests.

patch is a small, autonomous SWE-agent. Give it a repository and a bug description; it explores the code, makes a minimal edit, runs the test suite, and iterates until the tests pass — then hands you a clean git diff. It works on a throwaway copy of your repo, so your code is never touched.

The problem

Fixing a bug isn't text generation — it requires understanding the code: finding the right file, making a change that doesn't break everything else, and verifying it. An LLM that just prints a plausible-looking snippet can't tell you whether it actually works.

What patch does

Locates the faulty code (list_files, search, read_file).
Edits it with a minimal, exact change (edit_file / write_file).
Runs the tests in an isolated copy of the repo (run_tests).
Iterates: reads the failure output, fixes, and tries again — bounded by MAX_STEPS so it always stops.
Returns a verified patch: a unified diff plus a summary of what changed and why, only counted as solved when the tests actually pass.

Why it matters

An agent that delivers test-verified fixes — not just suggestions — is the difference between "looks right" and "is right." Because every fix is gated on the test suite, you can trust the diff or throw it away with confidence.

Fix-rate benchmark

make benchmark runs patch against a bundled toy repo (benchmark/toy_repo) with 8 seeded bugs, each shipped with a test that fails at the start. For every bug, patch gets a fresh copy + the bug description, and we check whether its test goes green. Fix rate = bugs fixed / 8.

The chart above is a schematic placeholder — running make benchmark with your key overwrites it with real per-bug results (assets/benchmark_results.json). The benchmark needs a key because it runs the agent; the test suite itself doesn't.

The seeded bugs cover off-by-one (chunk), wrong base case (factorial), operator / ordering mistakes (clamp, fizzbuzz), a wrong constant (temperature), and subtle semantics (dedupe order, word_count whitespace, case-insensitive palindrome).

Architecture

  bug description + repo
          │
          ▼
   ┌─────────────┐   copies repo to a temp dir, git baseline commit
   │  Workspace  │   (original repo is never modified)
   └──────┬──────┘
          ▼
   ┌─────────────────────────── agent loop (bounded by MAX_STEPS) ──────────────┐
   │  LLM emits ONE JSON action  ->  a tool runs  ->  observation goes back       │
   │                                                                             │
   │   list_files · read_file · search · edit_file · write_file · run_tests       │
   │                                                                             │
   │   locate ──> edit ──> run_tests ──(red)──> read error ──> edit ──> ...       │
   │                          └────────────────(green)──────────> finish         │
   └─────────────────────────────────────────────────────────────────────────────┘
          ▼
   final unified diff  +  summary  +  solved? (tests pass)

Piece	Choice	Why
Agent loop	explicit JSON-action loop, capped at MAX_STEPS	inspectable, always terminates
Tools (ACI)	list/read/search/edit/write/run_tests	the minimal surface to fix a bug
Isolation	temp copy + git baseline diff	never mutate the real repo
LLM	Anthropic Claude (primary) · OpenAI (fallback)	bring your own key
Safety	path-escape guard, test timeout, soft tool errors	degrades, never crashes

The LLM is reached only through a tiny complete() interface, so the entire loop is testable without an API key — the suite scripts a fake model and runs the real tools, workspace, and diff end-to-end.

Example: a fix patch produces

Bug: factorial(0) returns 0 and factorial(5) returns 0.

--- a/toykit/math_ops.py
+++ b/toykit/math_ops.py
@@ def factorial(n):
     if n < 0:
         raise ValueError("factorial is undefined for negative numbers")
-    result = 0
+    result = 1
     for i in range(1, n + 1):
         result *= i
     return result

[1] read_file — locate factorial
[2] edit_file — initialise the accumulator to 1
[3] run_tests -> Tests PASSED.
[4] finish    — factorial must start its product at 1, not 0
Result: SOLVED (tests pass) in 4 steps.

Screenshot placeholder — add a capture of patch running here. assets/screenshot.png

Run it yourself

# 1. Clone & install
git clone https://github.com/alexandremourin/patch.git
cd patch
make install        # or: python3.11 -m venv .venv && .venv/bin/pip install -r requirements.txt

# 2. Add YOUR key
cp .env.example .env
#   edit .env: ANTHROPIC_API_KEY=sk-ant-...   (OPENAI_API_KEY also works)

# 3. Fix a bug in any repo (patch works on a temp copy; your repo is safe)
.venv/bin/python -m patch.cli /path/to/repo "describe the bug" --test "python -m pytest -q"
#   or: make run REPO=/path/to/repo TASK="factorial(0) should be 1"

# 4. Benchmark the fix rate, and run the tests
make benchmark      # needs your key
make test           # runs WITHOUT a key (the LLM is mocked)

Without a key the CLI prints a clear message telling you to add one.

Based on princeton-nlp/SWE-agent — what I changed / added

patch was inspired by princeton-nlp/SWE-agent — studied, then rebuilt as a small, focused, inspectable agent. What's mine:

A minimal, bounded agent loop instead of a large framework: one JSON action per step, a fixed MAX_STEPS cap, and a hard test timeout, so runs always terminate at predictable cost.
A tight agent-computer interface (six tools) with soft failures — a bad edit, missing file, unknown tool, or hanging test returns an error the agent can read and recover from, never a crash.
Safe isolation: the agent works on a temp copy with a git baseline; the final patch is a real git diff, and the original repo is guaranteed untouched (with a path-escape guard on every tool).
A reproducible fix-rate benchmark the original doesn't ship: 8 seeded bugs with failing tests and a one-command harness that reports how many the agent repairs.
A fully mockable design so the whole loop — locate, edit, run tests, diff — is covered by tests that need no API key.

License

Educational project. Inspired by princeton-nlp/SWE-agent; the agent loop, tools, workspace isolation, benchmark, and CLI here are original work.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
benchmark		benchmark
patch		patch
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

patch

The problem

What patch does

Why it matters

Fix-rate benchmark

Architecture

Example: a fix patch produces

Run it yourself

Based on princeton-nlp/SWE-agent — what I changed / added

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

patch

The problem

What patch does

Why it matters

Fix-rate benchmark

Architecture

Example: a fix patch produces

Run it yourself

Based on princeton-nlp/SWE-agent — what I changed / added

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages