Built on the Axelrod Python library — an open-source framework for research into the iterated Prisoner's Dilemma. The tournament infrastructure, strategy implementations, and game mechanics used in this example are all provided by that project. The seed strategies in this example were adapted from strategies in the Axelrod library. See the
NOTICEfile in this directory for license details.
This example uses OpenEvolve to evolve a strategy for the iterated Prisoner's Dilemma, inspired by Robert Axelrod's famous tournaments from the 1980s. The goal is to discover a strategy that maximizes its average score when competing against the full suite of Axelrod library strategies.
In the Prisoner's Dilemma, two players simultaneously choose to either Cooperate (C) or Defect (D). The payoff matrix rewards mutual cooperation, but each player is individually tempted to defect.
In the iterated version, players play many rounds against each other. Strategies that build cooperation tend to do well over time, but must also protect against exploitation by purely selfish strategies.
The evolved strategy plays a one-vs-all tournament against every strategy in the axelrod.short_run_time_strategies collection (over 100 strategies). It competes against each one in a separate match and is scored on its median score across all matchups.
The evaluator (evaluator.py) works as follows:
- Loads the evolved strategy as a class named
Evolvo - Constructs a tournament where
Evolvoplays every strategy inaxl.short_run_time_strategies - Plays all matches (1 repetition per pair, no round-robin among opponents)
- Returns the
Median_scoreas the primary metric (combined_score)
Additional artifacts are tracked for each evaluation:
- Cooperation_rating: How often the strategy cooperates overall
- Wins: Number of outright wins
- CC_rate / CD_rate / DC_rate / DD_rate: Frequencies of each outcome pair
- CC_to_C / CD_to_C / DC_to_C / DD_to_C: Conditional cooperation rates after each prior outcome
These artifacts are fed back into the LLM prompt (via include_artifacts: true) so the LLM can see exactly how the strategy is behaving and make informed improvements.
Rather than starting from a single naive strategy, this example seeds 4 diverse starting strategies—one per island—covering a range of well-known approaches. Each was adapted from the Axelrod library with the evolve block wrapping the class body so OpenEvolve can mutate the strategy.
| File | Original Strategy | Key Behavior |
|---|---|---|
seed1.py |
Tit-for-Tat | Cooperate first; mirror the opponent's last move |
seed2.py |
ForgetfulFoolMeOnce(0.05) | Forgive one defection; retaliate forever on a second; occasionally forget with 5% probability |
seed3.py |
OriginalGradual | Escalate punishment with each defection; enter a calming state to de-escalate |
seed4.py |
SecondByBorufsen | Track opponent patterns; switch to defect mode against random or defective opponents; detect mutual-defect cycles and alternating echoes |
These seeds were chosen to be near-top-level performers that are straightforward to implement, while remaining diverse enough to explore different regions of strategy space.
The example uses a custom runner (run_evolve.py) rather than the standard openevolve-run.py. This allows each seed to be evaluated and placed directly onto a specific island before evolution begins:
for i, seed_path in enumerate(initial_seeds):
with open(seed_path, 'r') as f:
code = f.read()
metrics = await openevolve.evaluator.evaluate_program(code, initial_program_id)
program = Program(id=..., code=code, metrics=metrics, ...)
openevolve.database.add(program, target_island=i)This ensures each island starts with a strong, distinct strategy—promoting diversity throughout the evolutionary process and preventing premature convergence to a single approach.
Install the dependency and run from the repository root:
pip install axelrod
python examples/axelrod/run_evolve.pyThe evolution is configured in config.yaml:
max_iterations: 1000
checkpoint_interval: 10
llm:
primary_model: "gemini-2.5-flash-lite" # Fast, cheap model for most mutations
primary_model_weight: 0.8
secondary_model: "gemini-2.5-flash" # Stronger model for refinement
secondary_model_weight: 0.2
temperature: 0.7
max_tokens: 16000
prompt:
num_top_programs: 3
num_diverse_programs: 2
include_artifacts: true # Feed cooperation stats back to LLM
system_message: "You're building a program that runs iterated prisoner's dilemma
in a tournament with many other strategies, similar to Axelrod's famous tournaments.
Your task is to improve the play strategy for maximum profit."
database:
population_size: 50
num_islands: 4
elite_selection_ratio: 0.2
exploitation_ratio: 0.7
similarity_threshold: 0.99
evaluator:
enable_artifacts: true
timeout: 60
parallel_evaluations: 3Key configuration decisions:
- 4 islands to match the 4 seed strategies, maintaining diversity throughout evolution
- Artifacts enabled so the LLM can reason about cooperation patterns and win/loss rates
- Similarity threshold of 0.99 to discourage near-duplicate strategies from accumulating in the population
The best evolved strategy (best_program.py) is named AdaptiveCycleBreakerTFT. It is a sophisticated refinement of the SecondByBorufsen seed, with several key changes:
Mode Switching (every 15 turns instead of 25):
- Evaluates opponent behavior more frequently, allowing faster adaptation
- Switches to permanent Defect mode if the opponent cooperated fewer than 3 times, or if less than 83% of their cooperations were in response to the player's own cooperation (stricter than the original 70% threshold)
Cycle-Breaking in Normal Mode (triggers at streak ≥ 2 instead of 3):
- If both players have mutually defected for 2+ consecutive turns, cooperate once to break the cycle
- If an alternating D/C echo pattern persists for 2+ turns, flag the next defect to become a cooperation
Tit-for-Tat as Default:
- When none of the special conditions apply, simply mirror the opponent's last move
The strategy is fully deterministic (stochastic: False) and has a declared memory_depth of 15, matching its re-evaluation period.
class AdaptiveCycleBreakerTFT(Player):
def strategy(self, opponent: Player) -> Action:
# Re-evaluate mode every 15 turns
if turn > 1 and turn % 15 == 0:
if self.opp_coops < 3 or self.cc_counts / self.opp_coops < 0.83:
self.mode = "Defect"
else:
self.mode = "Normal"
if self.mode == "Defect":
return D
# Break mutual defection cycles
if self.mutual_defect_streak >= 2:
return C
# Flag alternating echo patterns
if self.echo_streak >= 2:
self.flip_next_defect = True
# Tit-for-tat (with optional flip)
return C if opponent.history[-1] == C else (C if self.flip_next_defect else D)After evolution completes, you can run a full round-robin validation tournament:
python examples/axelrod/validation.pyThis runs AdaptiveCycleBreakerTFT against all short_run_time_strategies in a proper round-robin tournament (all vs. all) with 5 repetitions, giving a more robust estimate of its ranking.
-
Diversity through seeding: Starting with 4 distinct strategies on separate islands prevented premature convergence and gave the LLM a broader set of patterns to recombine and refine.
-
Artifact-driven learning: Feeding cooperation statistics (CC_rate, DC_rate, etc.) back to the LLM allowed it to reason about why a strategy was failing—e.g., too cooperative against defectors, or failing to re-establish cooperation after mutual defection.
-
Convergence toward known principles: The evolved strategy independently rediscovered key insights from decades of game theory research—the value of proportional retaliation, the need to detect and escape mutual defection traps, and the importance of distinguishing random from deterministically defective opponents.
-
Parameter refinement: The evolved strategy tightened the SecondByBorufsen parameters (25→15 turn evaluation period, 70%→83% cooperation threshold, streak threshold 3→2), suggesting the LLM learned to be more responsive and less forgiving than the original.