Skip to content

aadityabuilds/manek

Repository files navigation

Generals

JAX-based generals.io environment with Gymnasium and PettingZoo wrappers for RL. Game logic is adapted from generals-bots by Matej Straka (MIT License).

Project structure

├── pyproject.toml
├── generals/               # Game engine (no package; run with PYTHONPATH=. or uv run)
│   ├── core/               # Env, game state, grid, actions, observations
│   ├── agents/             # RandomAgent, ExpanderAgent, etc.
│   ├── gui/                # Pygame GUI and replay
│   └── remote/             # Generals.io client
├── wrappers/               # RL API wrappers
│   ├── gymnasium.py        # Single-agent Gymnasium env
│   └── pettingzoo.py       # Two-agent PettingZoo Parallel env
├── policies/               # PyTorch policy networks
│   ├── layers.py           # ConvNeXtBlock, ConvGRUCell, LayerNorm2d
│   ├── action_utils.py     # Masking, sampling, log-prob utilities
│   └── recurrent_unet_convnext.py  # RecurrentUNetPolicy
├── examples/               # Scripts and demos
├── tests/                  # Unit and smoke tests
├── checkpoints/            # BC model checkpoints (.pt)
└── remote_agents/          # Deploy agents to generals.io
    ├── manek.py            # Recurrent BC agent
    └── manek_transformer.py # Transformer BC agent

No __init__.py files — this repo is not released as a package. Run scripts from the repo root with uv run or PYTHONPATH=. python ....

Setup

Requires Python 3.11+ and uv.

uv venv
uv sync

Usage

Gymnasium (single-agent; opponent is a bot)

from wrappers.gymnasium import GeneralsGymnasiumEnv
from generals.agents.expander_agent import ExpanderAgent

env = GeneralsGymnasiumEnv(
    grid_dims=(10, 10),
    truncation=500,
    opponent=ExpanderAgent(),
)

obs, info = env.reset(seed=42)

for _ in range(100):
    action = env.action_space.sample()  # replace with your policy
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        break

PettingZoo Parallel (two learning agents)

from wrappers.pettingzoo import GeneralsPettingZooEnv

env = GeneralsPettingZooEnv(grid_dims=(10, 10), truncation=500)
observations, infos = env.reset(seed=42)

while env.agents:
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}
    observations, rewards, terminations, truncations, infos = env.step(actions)

Observations & actions

  • Observations: (14, H, W) float32 tensor with fog-of-war. Channels: armies, generals, cities, mountains, neutral/owned/opponent/fog cells, counts, timestep.
  • Actions: MultiDiscrete([2, H, W, 4, 2])[pass, row, col, direction, split].
  • Action masking: info["valid_action_mask"] is (H, W, 4) boolean for legal moves.

Policy network

The policies/ directory contains a Recurrent U-Net with ConvNeXt blocks and a ConvGRU bottleneck, designed for behaviour-cloning and PPO training.

Forward pass (training)

import torch
from policies import RecurrentUNetPolicy

model = RecurrentUNetPolicy()               # default: 14 input channels, 64/128/256 U-Net
obs = torch.randn(B, 14, H, W)             # batched observations
logits, value, h_next = model.forward(obs)  # logits (B,H,W,9), value (B,1), hidden (B,256,H/4,W/4)

Action sampling (inference / rollouts)

from policies import RecurrentUNetPolicy, reset_hidden_if_done

model = RecurrentUNetPolicy()
h = None  # ConvGRU hidden state; None auto-initialises to zeros

obs_t   = torch.from_numpy(obs).unsqueeze(0).float()
mask_t  = torch.from_numpy(info["valid_action_mask"].copy()).unsqueeze(0)

action, h, extras = model.act(obs_t, mask_t, h_prev=h)
# action: (1,5) int64 — [pass, row, col, env_dir, split]
# extras: dict with flat_idx, log_prob, value

action_np = action.squeeze(0).numpy().astype(np.int32)
obs, reward, terminated, truncated, info = env.step(action_np)

# At episode boundaries, zero stale hidden states:
done = torch.tensor([terminated or truncated])
h = reset_hidden_if_done(h, done)

Log-probability (for BC / RL losses)

from policies.action_utils import log_prob_of_action

lp = log_prob_of_action(logits, valid_mask, flat_idx)  # (B,)

Tests

uv run python -m pytest tests/ -v

Remote agents (generals.io)

Deploy BC agents to a generals.io lobby:

# Set USER_ID and LOBBY_ID in .env
uv run python remote_agents/manek.py              # Recurrent BC
uv run python remote_agents/manek_transformer.py  # Transformer BC

Checkpoints live in checkpoints/. Override with CHECKPOINT_PATH env var.

Acknowledgements

Game logic is based on generals-bots by Matej Straka, MIT License.

About

RL trained #1 generals.io player

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages