A Python adaptation of RelRepair, a retrieval-augmented approach to automated program repair. Implements two repair stages, BaseRepair (no retrieval baseline) and SigRepair (signature-based RAG), evaluated on the BugsInPy benchmark. Runs entirely locally via Ollama; no cloud API required.
| Stage | Strategy | Status |
|---|---|---|
| BaseRepair | Direct LLM repair with no retrieval | Done |
| SigRepair | Query rewriting + signature retrieval (nomic-embed-text) | Done |
| SnipRepair | Snippet retrieval over full code bodies | Planned |
- Python 3.9 (required for test suite compatibility; see Setup below)
- Ollama running locally
ollama pull qwen3-coder:30b # repair (LLM)
ollama pull nomic-embed-text # embeddings (SigRepair)The model can be changed in pyrelrepair/config.py (ollama_model field).
Python 3.9 is required because some BugsInPy projects (e.g. tqdm) use nose 1.3.7, which relies on the imp module removed in Python 3.12.
# Install Python 3.9 via pyenv if needed
pyenv install 3.9.18
pyenv local 3.9.18
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtBugsInPy contains real bugs from popular Python projects (black, tqdm, pandas, keras, etc.). The benchmark only ships metadata; the setup script clones each project at its buggy commit.
First, clone the BugsInPy metadata repository:
git clone https://github.com/soarsmu/BugsInPy data/bugsinpyThen set up the bugs you want to evaluate:
# Set up specific bugs
python scripts/bugsinpy_setup.py black:1 black:2 black:3 tqdm:1 tqdm:2 tqdm:3
# Set up all bugs in a project
python scripts/bugsinpy_setup.py --all blackThis creates data/bugsinpy_checked/<project>/<id>/ with:
project/: full source tree at the buggy commitmeta.json: fault line, test file, error messagetest_output.txt: captured failing test output
Single-file competitive programming bugs. Useful for fast BaseRepair benchmarking across models; SigRepair is not a good fit because ConDefects provides only single files rather than full project codebases to retrieve from.
git clone https://github.com/appmlk/ConDefects data/condefects
# For stdin/stdout test cases, also download and extract Test.zip from the ConDefects repopython scripts/run_baserepair.py --n 6| Flag | Default | Description |
|---|---|---|
--n N |
6 | Number of bugs to evaluate |
--data PATH |
data/bugsinpy_checked |
Path to preprocessed bugs |
--model M1 M2 ... |
config default | One or more Ollama models (prints comparison table) |
--debug |
off | Save full prompts and responses to debug.log |
--verbose / -v |
off | Enable DEBUG-level console output |
python scripts/run_sigrepair.py --n 6Same flags as BaseRepair. Requires nomic-embed-text to be available in Ollama; the script checks this on startup.
python scripts/run_pipeline.py --n 6Runs BaseRepair first; escalates to SigRepair only for bugs BaseRepair did not fix.
python scripts/run_condefects.py --base-only --n 20
# Compare multiple models
python scripts/run_condefects.py --base-only --n 20 --model qwen3-coder:30b qwen2.5-coder:3bResults are saved to results/<stage>_<model>_<timestamp>/:
<bug_id>.json: patch candidates, test pass/fail counts, token usagesummary.json: aggregate repair rate and token totalsdebug.log: full prompt/response log (with--debug)
For ConDefects, all models share a single run directory:
results/condefects_<stage>_<timestamp>/<model>/
Single model (BugsInPy):
=== BaseRepair Results (BugsInPy) ===
Model : qwen3-coder:30b
Bugs evaluated : 6
Repaired : 2/6 (33.3%)
Tokens (prompt): 7,313
Tokens (output): 19,126
Results saved : results/baserepair_qwen3-coder-30b_20260504_180632/
Multi-model comparison (ConDefects):
=== BaseRepair Comparison (ConDefects) ===
Model Base patches Tests pass Prompt tok Output tok
------------------------------------------------------------------------------------------
gemma2:9b 4/5 1/4 3,970 1,857
qwen2.5:7b 5/5 2/5 3,730 1,617
codellama:13b 4/5 0/4 4,244 1,447
deepseek-coder-v2:16b 5/5 2/5 4,074 1,947
Results saved: results/condefects_baserepair_20260504_163500/
Key settings in pyrelrepair/config.py:
| Field | Default | Description |
|---|---|---|
ollama_model |
qwen3-coder:30b |
LLM used for repair |
embed_model |
nomic-embed-text |
Embedding model for SigRepair |
llm_timeout |
300 |
Seconds per LLM request |
base_num_patches |
1 |
Patches generated per BaseRepair run |
sig_num_iterations |
3 |
SigRepair iterations (paper uses 20) |
sig_top_k |
25 |
Signatures retrieved per iteration |
Set sig_num_iterations = 20 for final evaluation runs.
Each iteration:
- Query rewrite: the LLM generates 2 root causes and 5 candidate function names from the buggy function
- Retrieval: root causes + function names are embedded with
nomic-embed-text; top-25 most similar function signatures from the project are retrieved by cosine similarity - Patch generation: same prompt as BaseRepair, augmented with the retrieved signatures
- Validation: patch is tested with
pytest; pipeline stops immediately if it passes - Deduplication: patches identical to ones already tested are skipped
- Liu, et al. (2025). RelRepair. arXiv:2509.16701. https://arxiv.org/abs/2509.16701
- Widyasari, et al. (2020). BugsInPy. FSE 2020.