Skip to content

Commit 28b88e7

Browse files
committed
Merge branch 'main' into chess-arena
2 parents 7420ca0 + 7fb9f18 commit 28b88e7

77 files changed

Lines changed: 7962 additions & 251 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# CodeClash Environment Variables
2+
# Copy this file to .env and fill in your values
3+
4+
# Required: GitHub token with repo access for cloning game repositories
5+
GITHUB_TOKEN=your_github_token_here
6+
7+
# Optional: LLM Provider API Keys (configure the ones you plan to use)
8+
OPENAI_API_KEY=
9+
ANTHROPIC_API_KEY=

CONTRIBUTING.md

Lines changed: 191 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,198 @@
1-
# Contributing Guide
1+
# Contributing to CodeClash
22

33
Thanks for your interest in contributing to CodeClash!
44

5-
We're actively working on expanding the coverage of CodeClash in terms of models, arenas, and evaluation techniques.
6-
We're also excited about your ideas!
5+
We're actively working on expanding the coverage of CodeClash in terms of models, arenas, and evaluation techniques. We'd love your help!
6+
7+
## Ideas and Discussions
78

89
We have a [living document](https://docs.google.com/document/d/17-Jcexy1KDAbxXILH-GlHrFwGTpLG5yml-0OMFfgnZU/edit?usp=sharing) where we track ideas and contributions we're excited about.
910

1011
Have suggestions? Please open an issue, and let's discuss!
12+
13+
## Development Setup
14+
15+
### Prerequisites
16+
17+
- Python 3.11+
18+
- [uv](https://docs.astral.sh/uv/) - Fast Python package manager
19+
- Docker - For running games in containers
20+
- Git
21+
22+
### Getting Started
23+
24+
```bash
25+
# Clone the repository
26+
git clone https://github.com/CodeClash-ai/CodeClash.git
27+
cd CodeClash
28+
29+
# Install uv (if you haven't already)
30+
curl -LsSf https://astral.sh/uv/install.sh | sh
31+
32+
# Install dependencies with dev extras
33+
uv sync --extra dev
34+
35+
# Install pre-commit hooks
36+
uv run pre-commit install
37+
38+
# Set up environment variables
39+
cp .env.example .env
40+
# Edit .env with your GITHUB_TOKEN and any LLM API keys
41+
```
42+
43+
### Running Tests
44+
45+
```bash
46+
# Run all tests
47+
uv run pytest
48+
49+
# Run with coverage
50+
uv run pytest --cov=codeclash
51+
52+
# Run tests in parallel
53+
uv run pytest -n auto
54+
55+
# Run a specific test file
56+
uv run pytest tests/test_integration.py
57+
```
58+
59+
### Code Quality
60+
61+
We use [ruff](https://docs.astral.sh/ruff/) for linting and formatting:
62+
63+
```bash
64+
# Check for linting issues
65+
uv run ruff check .
66+
67+
# Auto-fix linting issues
68+
uv run ruff check . --fix
69+
70+
# Format code
71+
uv run ruff format .
72+
73+
# Check formatting without changing files
74+
uv run ruff format . --check
75+
```
76+
77+
Pre-commit hooks will run these checks automatically before each commit.
78+
79+
### Documentation
80+
81+
We use [MkDocs Material](https://squidfunk.github.io/mkdocs-material/) for documentation:
82+
83+
```bash
84+
# Install docs dependencies
85+
uv sync --extra docs
86+
87+
# Preview docs locally (with hot reload)
88+
uv run mkdocs serve
89+
90+
# Build static docs
91+
uv run mkdocs build
92+
```
93+
94+
Documentation lives in the `docs/` directory.
95+
96+
## Project Structure
97+
98+
```
99+
CodeClash/
100+
├── codeclash/
101+
│ ├── agents/ # AI agent implementations (MiniSWEAgent, etc.)
102+
│ ├── arenas/ # Game arena implementations
103+
│ ├── analysis/ # Post-tournament analysis tools
104+
│ ├── tournaments/ # Tournament orchestration
105+
│ ├── viewer/ # Web-based results viewer
106+
│ └── utils/ # Shared utilities
107+
├── configs/ # Tournament configuration files
108+
├── docs/ # Documentation (MkDocs)
109+
├── tests/ # Test suite
110+
└── main.py # Main entry point
111+
```
112+
113+
## Types of Contributions
114+
115+
### Adding a New Arena
116+
117+
1. Create a new file in `codeclash/arenas/`
118+
2. Extend the `CodeArena` abstract class
119+
3. Implement required methods: `execute_round()`, `validate_code()`, `get_results()`
120+
4. Add documentation in `docs/reference/arenas/`
121+
5. Add example configs in `configs/`
122+
123+
### Adding a New Agent Type
124+
125+
1. Create a new file in `codeclash/agents/`
126+
2. Extend the `Player` abstract class
127+
3. Implement the `run()` method for code improvement logic
128+
4. Add documentation in `docs/reference/player/`
129+
130+
### Improving Analysis Tools
131+
132+
Analysis tools live in `codeclash/analysis/`. We're particularly interested in:
133+
134+
- New metrics for evaluating agent performance
135+
- Better visualization of tournament results
136+
- Statistical analysis improvements
137+
138+
### Bug Fixes and Improvements
139+
140+
- Bug fixes are always welcome!
141+
- Performance improvements
142+
- Documentation improvements
143+
- Test coverage improvements
144+
145+
## Pull Request Process
146+
147+
1. Fork the repository
148+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
149+
3. Make your changes
150+
4. Run tests and linting (`uv run pytest && uv run ruff check .`)
151+
5. Commit your changes with a descriptive message
152+
6. Push to your fork
153+
7. Open a Pull Request
154+
155+
### PR Guidelines
156+
157+
- Keep PRs focused on a single change
158+
- Add tests for new functionality
159+
- Update documentation as needed
160+
- Follow existing code style (enforced by ruff)
161+
162+
## Common Development Tasks
163+
164+
| Task | Command |
165+
|------|---------|
166+
| Install dependencies | `uv sync --extra dev` |
167+
| Run tests | `uv run pytest` |
168+
| Lint code | `uv run ruff check .` |
169+
| Format code | `uv run ruff format .` |
170+
| Preview docs | `uv run mkdocs serve` |
171+
| Build wheel | `uv build --wheel` |
172+
| Build wheel + sdist | `uv build` |
173+
| Run a tournament | `uv run python main.py <config>` |
174+
| View results | `uv run python scripts/run_viewer.py` |
175+
176+
### Building Distributions
177+
178+
To build a distributable wheel package:
179+
180+
```bash
181+
# Build wheel only
182+
uv build --wheel
183+
184+
# Build both wheel and source distribution
185+
uv build
186+
187+
# Build with clean output directory
188+
uv build --wheel --clear
189+
```
190+
191+
Built artifacts will be placed in the `dist/` directory by default.
192+
193+
## Contact
194+
195+
- **John Yang**: [johnby@stanford.edu](mailto:johnby@stanford.edu)
196+
- **Kilian Lieret**: [kl5675@princeton.edu](mailto:kl5675@princeton.edu)
197+
198+
Feel free to reach out with questions or ideas!

README.md

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
</p>
66

77
<div align="center">
8-
<a href="https://www.python.org/"><img alt="Build" src="https://img.shields.io/badge/Python-3.10+-1f425f.svg?color=purple"></a>
9-
<a href="https://copyright.princeton.edu/policy"><img alt="License" src="https://img.shields.io/badge/License-MIT-blue"></a> <a href="https://arxiv.org/abs/2511.00839"><img src="https://img.shields.io/badge/arXiv-2511.00839-b31b1b.svg"></a>
8+
<a href="https://www.python.org/"><img alt="Build" src="https://img.shields.io/badge/Python-3.11+-1f425f.svg?color=purple"></a>
9+
<a href="https://copyright.princeton.edu/policy"><img alt="License" src="https://img.shields.io/badge/License-MIT-blue"></a>
10+
<a href="https://arxiv.org/abs/2511.00839"><img src="https://img.shields.io/badge/arXiv-2511.00839-b31b1b.svg"></a>
11+
<a href="https://github.com/astral-sh/uv"><img src="https://img.shields.io/badge/uv-package%20manager-blueviolet"></a>
1012
</div>
1113

1214
<hr />
@@ -28,21 +30,50 @@ Check out our [arXiv paper](https://arxiv.org/abs/2511.00839) and [website](http
2830

2931
## 🏎️ Quick Start
3032

31-
To start, follow these steps to set up CodeClash and run a test battle:
33+
### Prerequisites
34+
35+
- **Python 3.11+**
36+
- **[uv](https://docs.astral.sh/uv/)** - Fast Python package manager
37+
- **Docker** - For running games in containers
38+
- **Git**
39+
40+
### Installation
41+
3242
```bash
33-
$ git clone git@github.com:CodeClash-ai/CodeClash.git
34-
$ cd CodeClash
35-
$ pip install -e '.[dev]'
36-
$ python main.py configs/test/battlesnake.yaml
43+
# Clone the repository
44+
git clone https://github.com/CodeClash-ai/CodeClash.git
45+
cd CodeClash
46+
47+
# Install uv (if you haven't already)
48+
curl -LsSf https://astral.sh/uv/install.sh | sh
49+
50+
# Install dependencies and create virtual environment
51+
uv sync --extra dev
52+
53+
# Set up your environment variables
54+
cp .env.example .env # Then edit .env with your GITHUB_TOKEN
55+
56+
# Run a test battle
57+
uv run python main.py configs/test/battlesnake.yaml
3758
```
3859

3960
> [!TIP]
4061
> CodeClash requires Docker to create execution environments. CodeClash was developed and tested on Ubuntu 22.04.4 LTS.
62+
> The same instructions should work for Mac. If not, check out [#81](https://github.com/CodeClash-ai/CodeClash/issues/81) for an alternative solution.
63+
64+
<details>
65+
<summary>Alternative: Using pip (not recommended)</summary>
66+
67+
```bash
68+
pip install -e '.[dev]'
69+
python main.py configs/test/battlesnake.yaml
70+
```
71+
</details>
4172

4273
Once this works, you should be set up to run a real tournament!
4374
To run *Claude Sonnet 4.5* against *o3* in a *BattleSnake* tournament with *5 rounds* and *1000 competition simulations* per round, run:
4475
```bash
45-
$ python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml
76+
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml
4677
```
4778

4879
## ⚔️ How It Works

codeclash/agents/minisweagent.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@
1616
from codeclash.utils.environment import copy_to_container
1717

1818
os.environ["MSWEA_MODEL_RETRY_STOP_AFTER_ATTEMPT"] = "90"
19-
os.environ["LITELLM_MODEL_REGISTRY_PATH"] = str((REPO_DIR / "configs" / "litellm_custom_model_config.yaml").resolve())
19+
os.environ["LITELLM_MODEL_REGISTRY_PATH"] = str(
20+
(REPO_DIR / "configs" / "mini" / "litellm_custom_model_config.yaml").resolve()
21+
)
2022

2123

2224
class ClashAgent(DefaultAgent):

codeclash/analysis/llm_as_judge/action_categories_by_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@
132132
gap = 0.3
133133
y_positions = []
134134
model_positions = []
135-
for i, model in enumerate(models):
135+
for i in range(len(models)):
136136
base = i * (2 + gap)
137137
y_positions.extend([base, base + 1])
138138
model_positions.append(base + 0.5)
@@ -197,7 +197,7 @@
197197
)
198198

199199
# Add round labels on the left side of the plot
200-
for i, (y_late, y_early) in enumerate(zip(y_positions[::2], y_positions[1::2])):
200+
for y_late, y_early in zip(y_positions[::2], y_positions[1::2]):
201201
ax.text(-0.2, y_late, "round ≥8", fontsize=11, ha="right", va="center", color="gray", fontproperties=FONT_BOLD)
202202
ax.text(-0.2, y_early, "round ≤7", fontsize=11, ha="right", va="center", color="gray", fontproperties=FONT_BOLD)
203203

codeclash/analysis/llm_as_judge/big_questions.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import re
77
from concurrent.futures import ThreadPoolExecutor, as_completed
88
from pathlib import Path
9-
from typing import Literal
9+
from typing import Any, Literal
1010

1111
import jinja2
1212
import yaml
@@ -23,7 +23,6 @@
2323
stop_after_attempt,
2424
wait_exponential,
2525
)
26-
from typing_extensions import Any
2726

2827
from codeclash.analysis.llm_as_judge.utils import FileLock, Instance, InstanceBatch, get_instances
2928
from codeclash.utils.log import get_logger

codeclash/analysis/llm_as_judge/visualize.ipynb.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@
280280
gap = 0.3
281281
y_positions = []
282282
model_positions = []
283-
for i, model in enumerate(models):
283+
for i in range(len(models)):
284284
base = i * (2 + gap)
285285
y_positions.extend([base, base + 1])
286286
model_positions.append(base + 0.5)
@@ -340,7 +340,7 @@
340340
left = [left[i] + values[i] for i in range(len(y_positions))]
341341

342342
# Add round labels on the right side of the plot
343-
for i, (y_late, y_early) in enumerate(zip(y_positions[::2], y_positions[1::2])):
343+
for y_late, y_early in zip(y_positions[::2], y_positions[1::2]):
344344
ax.text(-2, y_late, "round ≥8", fontsize=9, ha="right", va="center", color="gray")
345345
ax.text(-2, y_early, "round ≤7", fontsize=9, ha="right", va="center", color="gray")
346346

codeclash/analysis/matrix.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
from codeclash.agents.dummy_agent import Dummy
1010
from codeclash.agents.utils import GameContext
11-
from codeclash.arenas import get_game
11+
from codeclash.arenas import get_arena
1212
from codeclash.constants import DIR_WORK
1313
from codeclash.tournaments.utils.git_utils import filter_git_diff
1414
from codeclash.utils.atomic_write import atomic_write
@@ -109,7 +109,7 @@ def _initialize_game_pool(self):
109109
config = self.config.copy()
110110
config["game"]["sims_per_round"] = self.n_repetitions
111111

112-
game = get_game(
112+
game = get_arena(
113113
config,
114114
tournament_id=tournament_id,
115115
local_output_dir=self.pvp_output_dir / "matrix_eval" / f"worker_{i}",

0 commit comments

Comments
 (0)