Skip to content

Commit 8aac195

Browse files
committed
Add initial release blog
1 parent bc2e10c commit 8aac195

4 files changed

Lines changed: 65 additions & 3 deletions

File tree

pages/insights/20251103_release.md

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,52 @@ authors: John Yang, Kilian Lieret
55

66
Existing coding benchmarks evaluate Language Models (LMs) on *tasks*.
77

8-
By "task", we're referring to implementation requests with well-defined input-output specifications.
9-
For instance, here's
8+
Implement a function, fix a bug, write a test.
9+
10+
We tell models what to do, they give it a shot, and we evaluate correctness with unit tests.
11+
12+
This approach has driven impressive progress in LMs' code generation capabilities over the past few years.
13+
14+
However, as LM scores have skyrocketed on evaluations like [HumanEval](https://github.com/openai/human-eval) and [SWE-bench](https://www.swebench.com/), such improvement also beckons the question: Is the future of code evals just making harder tasks?
15+
16+
Our answer is founded in a simple question: Why do we write code?
17+
18+
To achieve *goals*!
19+
20+
Software developers aren't just incessantly solving tickets with no aim.
21+
We code to improve user retention, increase revenue, reduce costs, achieve higher customer satisfaction - the list is endless.
22+
23+
Towards these goals, we decompose objectives into steps, prioritize them, and must strategically decide which solutions to pursue.
24+
25+
And it's a continuous, often competitive loop. Propose changes, deploy them, analyze real-world feedback (e.g., metrics, user behavior, A/B test results), then do it all again.
26+
From this perspective, tasks are but small, isolated pieces tied together by an overarching goal.
27+
28+
So we posit - perhaps the next frontier in code evaluation is not harder tasks, but **goal-oriented software engineering**.
29+
30+
To formalize this, we're excited to share **CodeClash**!
31+
32+
Multiple LM systems compete to build the best codebase for achieving a high-level objective over the course of a multi-round tournament.
33+
These codebases implement solutions that compete in a code arena.
1034

1135
<img src="/static/images/insights/20251104_release/banner.png" class="img-insight" />
36+
<div style="text-align:center;">
37+
<span class="subtext">Picture Credit to <a href="https://abehou.github.io/">Abe Hou</a></span>
38+
</div>
39+
40+
Crucially, LMs do not play directly.
41+
Instead, they iteratively refine code that competes as their proxy.
42+
43+
CodeClash enables us to examine models as long-running, continually improving developers:
44+
45+
- Objectives are open-ended (win, survive, or maximize reward)
46+
- Arenas are diverse so solutions and interfaces differ dramatically
47+
- Competition rewards adaptive strategies rather than one-off correctness.
48+
49+
If you're curious about models using code as the modality to learn, adapt, and improve over time, CodeClash is the playground for you.
50+
51+
Thanks for reading! Check out our [paper](https://arxiv.org/abs/2511.00839) for the full story. And if you're ready to dive in, here's a quick video to show you how to set up the repository and run your first CodeClash tournament!
52+
53+
<div style="position: relative; padding-bottom: 65.01809408926417%; height: 0;">
54+
<iframe src="https://www.loom.com/embed/a04ea3ecc8d64cfd918b12f6f1775017" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;">
55+
</iframe>
56+
</div>

static/css/homepage.css

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,12 @@
7676
margin-right: 0.1em;
7777
display: inline-block;
7878
transform: translateY(-0.12em);
79+
filter: none;
80+
}
81+
82+
:root[data-theme="dark"] .hfeature img {
83+
/* Force SVG icons to render white in dark mode */
84+
filter: brightness(0) invert(1) !important;
7985
}
8086

8187
.hfeature p {

static/css/layout.css

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,17 @@ nav {
273273
padding-bottom: 3em;
274274
}
275275

276+
.insight-page a {
277+
color: var(--fg);
278+
text-decoration: none;
279+
border-bottom: 1px solid var(--fg);
280+
}
281+
282+
.insight-page a:hover {
283+
color: var(--accent);
284+
border-bottom: 1px solid var(--accent);
285+
}
286+
276287
/* Arenas grid */
277288
.arenas-container {
278289
padding: 0 1rem 1rem 1rem;

templates/team.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ <h1>Team</h1>
77
<img src="/static/images/logos/clash_r.svg" alt="Clash Red">
88
</div>
99

10-
<div class="team-container">
1110
<div class="team-grid">
1211
{% for contributor in data.contributors %}
1312
<a href="{{ contributor.link }}" class="team-card">
@@ -26,6 +25,7 @@ <h1>Team</h1>
2625
{% endfor %}
2726
</div>
2827

28+
<div class="team-container">
2929
<div style="display:flex; flex-direction:column;">
3030
<h3 style="margin: 1.5em 0 1em 0">Get in Touch</h3>
3131
<span>

0 commit comments

Comments
 (0)