Skip to content

mahi97/pz_risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PZ Risk

A PettingZoo multi-agent reinforcement learning environment for the classic board game Risk. PZ Risk models a fully competitive, turn-based strategy game on a graph-based world map, supporting 2–6 agents.

πŸ“„ Associated paper: NIPS 2022 β€” included in this repository as NIPS2022.pdf


Table of Contents


Overview

PZ Risk implements the board game Risk as a multi-agent environment following the PettingZoo AEC (Agent-Environment-Cycle) API. The game is played on a graph where each node represents a territory and edges represent adjacency (attack routes). Players take turns reinforcing their territories, attacking neighbors, and fortifying positions β€” the last player with territories remaining wins.

Key features:

  • Multi-agent: 2, 4, or 6 competitive agents
  • Graph-based board: built with NetworkX
  • PettingZoo & Gym compatible: register and gym.make(...) supported
  • Multiple maps: classic world map and smaller test configurations
  • Built-in agents: Random, Greedy, Value-based, and Model-based
  • Observation wrappers: vector and graph-based observations
  • Reward wrappers: sparse and dense reward shaping

Installation

Prerequisites

  • Python 3.7+
  • pip

Install from source

git clone https://github.com/mahi97/pz_risk.git
cd pz_risk
pip install -r pz_risk/requirements.txt
pip install -e .

Dependencies

Package Purpose
gym Environment registration and API
pettingzoo Multi-agent AEC environment base
networkx Graph representation of the board
numpy Numerical operations
matplotlib Rendering
scipy Scientific utilities

Environments

PZ Risk registers the following Gym environments:

Environment ID Players Board
Risk-Normal-2-v0 2 World map
Risk-Normal-4-v0 4 World map
Risk-Normal-6-v0 6 World map
import gym
import pz_risk.envs  # registers environments

env = gym.make('Risk-Normal-6-v0')

Game Mechanics

Game States

Each turn progresses through an ordered sequence of states:

State Description
StartTurn Begin a new turn; calculate reinforcement units
Card Optionally trade in a set of 3 matching cards for bonus units
Reinforce Place earned units onto owned territories (one at a time)
Attack Optionally attack adjacent enemy territories
Move Move surviving units after a successful attack
Fortify Optionally move units between connected friendly territories
EndTurn Game over (triggered when one player controls all territories)

Action Spaces

Action spaces are state-dependent:

Game State Action Space Description
Reinforce Discrete(n_nodes) Index of territory to place 1 unit on
Attack MultiDiscrete([2, n_edges]) [skip, edge_index] β€” 0 to attack, 1 to skip
Move Discrete(100) Number of units to move into captured territory
Fortify MultiDiscrete([2, n_nodes, n_nodes, 100]) [skip, src, dst, units]
Card Discrete(2) 0 = skip, 1 = trade in best matching set

Observation Space

The raw observation returned by observe(agent) is the Board object, giving access to the full game graph and all state. Use the provided wrappers to convert this into a structured format suitable for learning algorithms.

Cards

Cards are dealt to players when they successfully conquer a territory. Card types are Infantry, Cavalry, Artillery, and Wild. Trading in three matching cards (or one of each type) awards bonus placement units:

Set Bonus Units
3Γ— Infantry 4
3Γ— Cavalry 6
3Γ— Artillery 8
Wild set 10

If a player holds 5 or more cards they are forced to trade.


Maps

Maps are stored as JSON files under pz_risk/maps/. Each map defines nodes (territories), edges (adjacencies), group membership, and metadata.

Map Name Nodes Description
world 42 Classic Risk world map
8node 8 Small 8-territory test map
6node 6 Small 6-territory test map
4node 4 Minimal 4-territory test map

Custom maps can be added by creating a JSON file and calling register_map(name, filepath).


Agents

Built-in agents are located in pz_risk/agents/:

Agent Class Description
Random RandomAgent Uniformly samples from valid actions
Greedy GreedyAgent Selects the action with the highest immediate advantage
Value β€” Uses a hand-crafted value function
Model β€” Model-based agent for planning

All agents implement the BaseAgent interface with reset() and act(state) methods.


Wrappers

PZ Risk provides several observation and reward wrappers in pz_risk/wrappers/:

Wrapper Description
AssertInvalidActionsWrapper Raises an error if an invalid action is submitted
VectorObservationWrapper Converts the board state to a flat numpy vector
GraphObservationWrapper Converts the board state to a graph observation for GNN-based agents
SparseRewardWrapper Provides +1 reward only when the game ends (win/lose)
DenseRewardWrapper Provides per-step shaped rewards based on territory control

Wrappers can be composed:

from pz_risk.wrappers import VectorObservationWrapper, SparseRewardWrapper

env = gym.make('Risk-Normal-6-v0')
env = VectorObservationWrapper(env)
env = SparseRewardWrapper(env)

Usage

Basic Usage

import gym
import pz_risk.envs  # required to register environments

env = gym.make('Risk-Normal-6-v0')
env.reset()

for agent in env.agent_iter():
    obs, reward, done, info = env.last()
    if done:
        action = None
    else:
        action = env.unwrapped.sample()  # random valid action
    env.step(action)

env.close()

Manual Play

Launch an interactive game where one agent is human-controlled (click-based) and the rest are random:

python manual.py --env Risk-Normal-6-v0 --num_agents 6 --num_manual 1

Benchmarking

Measure environment throughput (reset time, rendering FPS, agent-step FPS):

python benchmark.py --env-name Risk-Normal-6-v0 --num_resets 200 --num_frames 5000

Training with PPO

A PPO training script using the GraphObservationWrapper is included:

cd pz_risk
python train.py

A value-decomposition network (DVN) variant is also available via train_v.py.


Project Structure

pz_risk/
β”œβ”€β”€ agents/             # Built-in agent implementations
β”‚   β”œβ”€β”€ base.py
β”‚   β”œβ”€β”€ greedy.py
β”‚   β”œβ”€β”€ model.py
β”‚   β”œβ”€β”€ random.py
β”‚   β”œβ”€β”€ sampling.py
β”‚   └── value.py
β”œβ”€β”€ core/               # Core game logic
β”‚   β”œβ”€β”€ board.py        # Board, map loading, game step logic
β”‚   β”œβ”€β”€ card.py         # Card types and scoring
β”‚   β”œβ”€β”€ gamestate.py    # GameState enum
β”‚   └── player.py       # Player state
β”œβ”€β”€ envs/               # Gym-registered environment classes
β”‚   └── normal.py
β”œβ”€β”€ maps/               # Map definitions (JSON)
β”‚   β”œβ”€β”€ world.json
β”‚   β”œβ”€β”€ 4node.json
β”‚   β”œβ”€β”€ 6node.json
β”‚   └── 8node.json
β”œβ”€β”€ training/           # PPO and DVN training infrastructure
β”œβ”€β”€ wrappers/           # Observation and reward wrappers
β”œβ”€β”€ risk_env.py         # Main RiskEnv (AECEnv) implementation
β”œβ”€β”€ register.py         # Gym environment registration helper
β”œβ”€β”€ utils.py            # Utility functions (dice rolling, etc.)
β”œβ”€β”€ benchmark.py        # Throughput benchmarking script
β”œβ”€β”€ manual.py           # Interactive manual play script
β”œβ”€β”€ train.py            # PPO training entry point
└── train_v.py          # DVN training entry point

Citation

If you use PZ Risk in your research, please cite:

@inproceedings{pzrisk2022,
  title     = {PZ Risk: A Multi-Agent Reinforcement Learning Environment for the Game of Risk},
  booktitle = {NeurIPS 2022},
  year      = {2022}
}

See NIPS2022.pdf in this repository for the full paper.

About

Petting Zoo Enviroment for Multi-Agent Game of Risk!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors