JAX-based generals.io environment with Gymnasium and PettingZoo wrappers for RL. Game logic is adapted from generals-bots by Matej Straka (MIT License).
├── pyproject.toml
├── generals/ # Game engine (no package; run with PYTHONPATH=. or uv run)
│ ├── core/ # Env, game state, grid, actions, observations
│ ├── agents/ # RandomAgent, ExpanderAgent, etc.
│ ├── gui/ # Pygame GUI and replay
│ └── remote/ # Generals.io client
├── wrappers/ # RL API wrappers
│ ├── gymnasium.py # Single-agent Gymnasium env
│ └── pettingzoo.py # Two-agent PettingZoo Parallel env
├── policies/ # PyTorch policy networks
│ ├── layers.py # ConvNeXtBlock, ConvGRUCell, LayerNorm2d
│ ├── action_utils.py # Masking, sampling, log-prob utilities
│ └── recurrent_unet_convnext.py # RecurrentUNetPolicy
├── examples/ # Scripts and demos
├── tests/ # Unit and smoke tests
├── checkpoints/ # BC model checkpoints (.pt)
└── remote_agents/ # Deploy agents to generals.io
├── manek.py # Recurrent BC agent
└── manek_transformer.py # Transformer BC agent
No __init__.py files — this repo is not released as a package. Run scripts from the repo root with uv run or PYTHONPATH=. python ....
Requires Python 3.11+ and uv.
uv venv
uv syncfrom wrappers.gymnasium import GeneralsGymnasiumEnv
from generals.agents.expander_agent import ExpanderAgent
env = GeneralsGymnasiumEnv(
grid_dims=(10, 10),
truncation=500,
opponent=ExpanderAgent(),
)
obs, info = env.reset(seed=42)
for _ in range(100):
action = env.action_space.sample() # replace with your policy
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
breakfrom wrappers.pettingzoo import GeneralsPettingZooEnv
env = GeneralsPettingZooEnv(grid_dims=(10, 10), truncation=500)
observations, infos = env.reset(seed=42)
while env.agents:
actions = {agent: env.action_space(agent).sample() for agent in env.agents}
observations, rewards, terminations, truncations, infos = env.step(actions)- Observations:
(14, H, W)float32 tensor with fog-of-war. Channels: armies, generals, cities, mountains, neutral/owned/opponent/fog cells, counts, timestep. - Actions:
MultiDiscrete([2, H, W, 4, 2])—[pass, row, col, direction, split]. - Action masking:
info["valid_action_mask"]is(H, W, 4)boolean for legal moves.
The policies/ directory contains a Recurrent U-Net with ConvNeXt blocks and a ConvGRU bottleneck, designed for behaviour-cloning and PPO training.
import torch
from policies import RecurrentUNetPolicy
model = RecurrentUNetPolicy() # default: 14 input channels, 64/128/256 U-Net
obs = torch.randn(B, 14, H, W) # batched observations
logits, value, h_next = model.forward(obs) # logits (B,H,W,9), value (B,1), hidden (B,256,H/4,W/4)from policies import RecurrentUNetPolicy, reset_hidden_if_done
model = RecurrentUNetPolicy()
h = None # ConvGRU hidden state; None auto-initialises to zeros
obs_t = torch.from_numpy(obs).unsqueeze(0).float()
mask_t = torch.from_numpy(info["valid_action_mask"].copy()).unsqueeze(0)
action, h, extras = model.act(obs_t, mask_t, h_prev=h)
# action: (1,5) int64 — [pass, row, col, env_dir, split]
# extras: dict with flat_idx, log_prob, value
action_np = action.squeeze(0).numpy().astype(np.int32)
obs, reward, terminated, truncated, info = env.step(action_np)
# At episode boundaries, zero stale hidden states:
done = torch.tensor([terminated or truncated])
h = reset_hidden_if_done(h, done)from policies.action_utils import log_prob_of_action
lp = log_prob_of_action(logits, valid_mask, flat_idx) # (B,)uv run python -m pytest tests/ -vDeploy BC agents to a generals.io lobby:
# Set USER_ID and LOBBY_ID in .env
uv run python remote_agents/manek.py # Recurrent BC
uv run python remote_agents/manek_transformer.py # Transformer BCCheckpoints live in checkpoints/. Override with CHECKPOINT_PATH env var.
Game logic is based on generals-bots by Matej Straka, MIT License.