Skip to content

alexandremourin/patch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

patch

Describe a bug, point it at a repo — patch finds the faulty code, fixes it, and proves the fix with your tests.

patch is a small, autonomous SWE-agent. Give it a repository and a bug description; it explores the code, makes a minimal edit, runs the test suite, and iterates until the tests pass — then hands you a clean git diff. It works on a throwaway copy of your repo, so your code is never touched.

The problem

Fixing a bug isn't text generation — it requires understanding the code: finding the right file, making a change that doesn't break everything else, and verifying it. An LLM that just prints a plausible-looking snippet can't tell you whether it actually works.

What patch does

  • Locates the faulty code (list_files, search, read_file).
  • Edits it with a minimal, exact change (edit_file / write_file).
  • Runs the tests in an isolated copy of the repo (run_tests).
  • Iterates: reads the failure output, fixes, and tries again — bounded by MAX_STEPS so it always stops.
  • Returns a verified patch: a unified diff plus a summary of what changed and why, only counted as solved when the tests actually pass.

patch fix-rate on the seeded-bug benchmark

Why it matters

An agent that delivers test-verified fixes — not just suggestions — is the difference between "looks right" and "is right." Because every fix is gated on the test suite, you can trust the diff or throw it away with confidence.


Fix-rate benchmark

make benchmark runs patch against a bundled toy repo (benchmark/toy_repo) with 8 seeded bugs, each shipped with a test that fails at the start. For every bug, patch gets a fresh copy + the bug description, and we check whether its test goes green. Fix rate = bugs fixed / 8.

The chart above is a schematic placeholder — running make benchmark with your key overwrites it with real per-bug results (assets/benchmark_results.json). The benchmark needs a key because it runs the agent; the test suite itself doesn't.

The seeded bugs cover off-by-one (chunk), wrong base case (factorial), operator / ordering mistakes (clamp, fizzbuzz), a wrong constant (temperature), and subtle semantics (dedupe order, word_count whitespace, case-insensitive palindrome).


Architecture

  bug description + repo
          │
          ▼
   ┌─────────────┐   copies repo to a temp dir, git baseline commit
   │  Workspace  │   (original repo is never modified)
   └──────┬──────┘
          ▼
   ┌─────────────────────────── agent loop (bounded by MAX_STEPS) ──────────────┐
   │  LLM emits ONE JSON action  ->  a tool runs  ->  observation goes back       │
   │                                                                             │
   │   list_files · read_file · search · edit_file · write_file · run_tests       │
   │                                                                             │
   │   locate ──> edit ──> run_tests ──(red)──> read error ──> edit ──> ...       │
   │                          └────────────────(green)──────────> finish         │
   └─────────────────────────────────────────────────────────────────────────────┘
          ▼
   final unified diff  +  summary  +  solved? (tests pass)
Piece Choice Why
Agent loop explicit JSON-action loop, capped at MAX_STEPS inspectable, always terminates
Tools (ACI) list/read/search/edit/write/run_tests the minimal surface to fix a bug
Isolation temp copy + git baseline diff never mutate the real repo
LLM Anthropic Claude (primary) · OpenAI (fallback) bring your own key
Safety path-escape guard, test timeout, soft tool errors degrades, never crashes

The LLM is reached only through a tiny complete() interface, so the entire loop is testable without an API key — the suite scripts a fake model and runs the real tools, workspace, and diff end-to-end.

Example: a fix patch produces

Bug: factorial(0) returns 0 and factorial(5) returns 0.

--- a/toykit/math_ops.py
+++ b/toykit/math_ops.py
@@ def factorial(n):
     if n < 0:
         raise ValueError("factorial is undefined for negative numbers")
-    result = 0
+    result = 1
     for i in range(1, n + 1):
         result *= i
     return result
[1] read_file — locate factorial
[2] edit_file — initialise the accumulator to 1
[3] run_tests -> Tests PASSED.
[4] finish    — factorial must start its product at 1, not 0
Result: SOLVED (tests pass) in 4 steps.

Screenshot placeholder — add a capture of patch running here. assets/screenshot.png


Run it yourself

# 1. Clone & install
git clone https://github.com/alexandremourin/patch.git
cd patch
make install        # or: python3.11 -m venv .venv && .venv/bin/pip install -r requirements.txt

# 2. Add YOUR key
cp .env.example .env
#   edit .env: ANTHROPIC_API_KEY=sk-ant-...   (OPENAI_API_KEY also works)

# 3. Fix a bug in any repo (patch works on a temp copy; your repo is safe)
.venv/bin/python -m patch.cli /path/to/repo "describe the bug" --test "python -m pytest -q"
#   or: make run REPO=/path/to/repo TASK="factorial(0) should be 1"

# 4. Benchmark the fix rate, and run the tests
make benchmark      # needs your key
make test           # runs WITHOUT a key (the LLM is mocked)

Without a key the CLI prints a clear message telling you to add one.


Based on princeton-nlp/SWE-agent — what I changed / added

patch was inspired by princeton-nlp/SWE-agent — studied, then rebuilt as a small, focused, inspectable agent. What's mine:

  • A minimal, bounded agent loop instead of a large framework: one JSON action per step, a fixed MAX_STEPS cap, and a hard test timeout, so runs always terminate at predictable cost.
  • A tight agent-computer interface (six tools) with soft failures — a bad edit, missing file, unknown tool, or hanging test returns an error the agent can read and recover from, never a crash.
  • Safe isolation: the agent works on a temp copy with a git baseline; the final patch is a real git diff, and the original repo is guaranteed untouched (with a path-escape guard on every tool).
  • A reproducible fix-rate benchmark the original doesn't ship: 8 seeded bugs with failing tests and a one-command harness that reports how many the agent repairs.
  • A fully mockable design so the whole loop — locate, edit, run tests, diff — is covered by tests that need no API key.

License

MIT © 2026 Alexandre Mourin.

Educational project. Inspired by princeton-nlp/SWE-agent; the agent loop, tools, workspace isolation, benchmark, and CLI here are original work.

About

An autonomous SWE-agent: give it a repo + a bug, it locates the code, edits it, runs the tests, iterates, and returns a verified git diff. Works on a throwaway copy. Bounded, inspectable, fully test-mockable. Includes a seeded-bug fix-rate benchmark.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors