Skip to content

LAMDA-RL/DMAWM

 
 

Repository files navigation

Learning Disentangled Multi-Agent World Model for Decentralized Control (DMAWM)

This is the official PyTorch implementation of DMAWM (Learning Disentangled Multi-Agent World Model for Decentralized Control), which is accepted at ICML 2026. This repository aims at helping others to reproduce the results in the paper.

Overview

World models enable agents to learn policies through latent imagination, but applying them to multi-agent reinforcement learning introduces a key difficulty: modeling joint dynamics in latent space can create spurious correlations between agents' latent states. These correlations are unavailable during fully decentralized execution, where each agent must act only from its own local information.

DMAWM addresses this challenge under the centralized training and decentralized execution (CTDE) paradigm. It learns decentralized policies in latent space with independent agent modules that maintain factorized latent states from local observations, together with a shared environment module that models multi-agent interactions during imagination.

Illustration of agent and environment models

During real-environment interaction, each agent module updates its internal state independently and selects actions from its own latent state. During imagination, the shared environment module replaces the real environment: it uses a transformer-based interaction predictor to generate joint stochastic latent states from the agents' deterministic states, and its joint prior is aligned with the factorized posterior produced by the agent modules. This alignment explicitly disentangles individual agent states while still capturing dependencies between agents.

As a result, imagined rollouts respect the information constraints of decentralized execution instead of relying on latent-state communication or observation-space rollouts. Policies and value functions are trained on these imagined trajectories, and the learned policies can be deployed in a fully decentralized manner.

Usage

Installation

First, create and activate a Conda virtual environment with Python 3.11:

conda create -n dmawm python=3.11
conda activate dmawm

For StarCraft Multi-Agent Challenge (SMAC) environments, please follow the installation instructions for StarCraft II version 2.4.10 (Linux) provided in the pymarl repository. Ensure that the SC2PATH environment variable is set to your StarCraft II installation directory (e.g., ~/pymarl/3rdparty/StarCraftII).

For SMACv2, download the additional maps from the SMACv2 releases page and place them in the $SC2PATH/Maps/SMAC_Maps directory.

Then, install the dependencies:

pip install -r requirements.txt

Running Experiments

To reproduce the results on the SMACv2 protoss_5_vs_5 map, execute the following command:

CUDA_VISIBLE_DEVICES=0 python main.py \
    --name train \
    --trainer dreamer \
    --env smacv2 \
    --env_args.map_name protoss_5_vs_5 \
    --env_args.use_absorbing_state True \
    --env_args.trailing_absorbing_state_length 2 \
    --train.num_env_steps 405000 \
    --train.imagination_steps 4 \
    --use_eval True \
    --replay.capacity 250000 \
    --critic.use_critic_transformer True \
    --train.share_critics True \
    --train.share_actors True \
    --seed 0

Scripts for reproducing experimental results are available in scripts/dreamer/:

  • SMAC: smac_2c_vs_64zg.sh, smac_2s_vs_1sc.sh, smac_3s5z.sh, smac_corridor.sh, smac_mmm2.sh
  • SMACv2: smacv2_protoss_5_vs_5.sh, smacv2_terran_5_vs_5.sh, smacv2_zerg_5_vs_5.sh
  • Melting Pot: meltingpot_commons_harvest__open.sh, meltingpot_coop_mining.sh

Configuration

Configuration files are located in the configs/ directory. You can modify these files directly or override settings using command-line arguments.

Logging

By default, experiments are tracked using Weights & Biases (wandb). To set up your wandb account, run the following command in your terminal and follow the prompts:

wandb login

If you are running on a machine without wandb access, disable wandb logging by adding this argument to the training command:

--logging.use_wandb False

Logs, saved configs, and checkpoints are written under logs/{env}/{task}/{name}/{timestamp}.

Citation

If you use DMAWM in your research, please cite:

@inproceedings{xue2026learning,
  title     = {Learning Disentangled Multi-Agent World Model for Decentralized Control},
  author    = {Di Xue and Jing Jiang and Shaowei Zhang and Wenhao Guo and Lei Yuan and Zongzhang Zhang and Yang Yu},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}

Acknowledgements

This work is benefited from the following open-source projects:

License

This project is licensed under the MIT License. See LICENSE for details.

About

Official PyTorch implementation of DMAWM (ICML 2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.8%
  • Shell 2.2%