This is the official PyTorch implementation of DMAWM (Learning Disentangled Multi-Agent World Model for Decentralized Control), which is accepted at ICML 2026. This repository aims at helping others to reproduce the results in the paper.
World models enable agents to learn policies through latent imagination, but applying them to multi-agent reinforcement learning introduces a key difficulty: modeling joint dynamics in latent space can create spurious correlations between agents' latent states. These correlations are unavailable during fully decentralized execution, where each agent must act only from its own local information.
DMAWM addresses this challenge under the centralized training and decentralized execution (CTDE) paradigm. It learns decentralized policies in latent space with independent agent modules that maintain factorized latent states from local observations, together with a shared environment module that models multi-agent interactions during imagination.
During real-environment interaction, each agent module updates its internal state independently and selects actions from its own latent state. During imagination, the shared environment module replaces the real environment: it uses a transformer-based interaction predictor to generate joint stochastic latent states from the agents' deterministic states, and its joint prior is aligned with the factorized posterior produced by the agent modules. This alignment explicitly disentangles individual agent states while still capturing dependencies between agents.
As a result, imagined rollouts respect the information constraints of decentralized execution instead of relying on latent-state communication or observation-space rollouts. Policies and value functions are trained on these imagined trajectories, and the learned policies can be deployed in a fully decentralized manner.
First, create and activate a Conda virtual environment with Python 3.11:
conda create -n dmawm python=3.11
conda activate dmawmFor StarCraft Multi-Agent Challenge (SMAC) environments, please follow the installation instructions for StarCraft II version 2.4.10 (Linux) provided in the pymarl repository. Ensure that the SC2PATH environment variable is set to your StarCraft II installation directory (e.g., ~/pymarl/3rdparty/StarCraftII).
For SMACv2, download the additional maps from the SMACv2 releases page and place them in the $SC2PATH/Maps/SMAC_Maps directory.
Then, install the dependencies:
pip install -r requirements.txtTo reproduce the results on the SMACv2 protoss_5_vs_5 map, execute the following command:
CUDA_VISIBLE_DEVICES=0 python main.py \
--name train \
--trainer dreamer \
--env smacv2 \
--env_args.map_name protoss_5_vs_5 \
--env_args.use_absorbing_state True \
--env_args.trailing_absorbing_state_length 2 \
--train.num_env_steps 405000 \
--train.imagination_steps 4 \
--use_eval True \
--replay.capacity 250000 \
--critic.use_critic_transformer True \
--train.share_critics True \
--train.share_actors True \
--seed 0Scripts for reproducing experimental results are available in scripts/dreamer/:
- SMAC:
smac_2c_vs_64zg.sh,smac_2s_vs_1sc.sh,smac_3s5z.sh,smac_corridor.sh,smac_mmm2.sh - SMACv2:
smacv2_protoss_5_vs_5.sh,smacv2_terran_5_vs_5.sh,smacv2_zerg_5_vs_5.sh - Melting Pot:
meltingpot_commons_harvest__open.sh,meltingpot_coop_mining.sh
Configuration files are located in the configs/ directory. You can modify these files directly or override settings using command-line arguments.
By default, experiments are tracked using Weights & Biases (wandb). To set up your wandb account, run the following command in your terminal and follow the prompts:
wandb loginIf you are running on a machine without wandb access, disable wandb logging by adding this argument to the training command:
--logging.use_wandb FalseLogs, saved configs, and checkpoints are written under logs/{env}/{task}/{name}/{timestamp}.
If you use DMAWM in your research, please cite:
@inproceedings{xue2026learning,
title = {Learning Disentangled Multi-Agent World Model for Decentralized Control},
author = {Di Xue and Jing Jiang and Shaowei Zhang and Wenhao Guo and Lei Yuan and Zongzhang Zhang and Yang Yu},
booktitle = {International Conference on Machine Learning},
year = {2026}
}This work is benefited from the following open-source projects:
This project is licensed under the MIT License. See LICENSE for details.
