DreamerV3 from Scratch

A clean, educational implementation of DreamerV3 (Hafner et al., Nature 2025) — the state-of-the-art model-based reinforcement learning algorithm that learns a world model and trains agents entirely "in imagination."

What is DreamerV3?

Traditional RL algorithms (DQN, PPO) learn by trial and error in the real environment. DreamerV3 takes a fundamentally different approach:

Learn a World Model — An RSSM (Recurrent State-Space Model) that predicts how the environment works
Dream — Generate imagined trajectories using the learned model
Learn in Imagination — Train the actor-critic on dreamed experience, not real interaction

This is dramatically more sample-efficient: instead of needing millions of real environment steps, the agent can generate unlimited imagined experience from a compact world model.

Key Innovations Implemented

Innovation	What it does	Why it matters
RSSM	GRU (deterministic) + Categorical latent (stochastic)	Captures both temporal dependencies and uncertainty
Symlog Transform	`sign(x) * ln(\|x\|+1)`	Handles reward scales from -100 to +100k
Twohot Encoding	Soft distribution over 255 bins	Richer gradient signal than scalar regression
Free Bits	`max(1, KL)` per variable	Prevents KL collapse (prior becoming useless)
Percentile Normalization	`scale = max(1, P95-P5)`	Stable returns regardless of reward magnitude
Unimix	`0.99softmax + 0.01uniform`	Prevents categorical probability collapse
Straight-Through	`one_hot + probs - sg(probs)`	Gradients through discrete sampling

Project Structure

dreamer-from-scratch/
├── dreamer/
│   ├── utils.py              # symlog, twohot, percentile normalization
│   ├── networks.py           # Encoder, Decoder, RewardHead, ContinueHead, Actor, Critic
│   ├── rssm.py               # RSSM world model (GRU + categorical latent)
│   ├── replay_buffer.py      # Sequence replay buffer
│   └── agent.py              # DreamerAgent (world model + actor-critic)
├── 01_world_model_demo.py    # Train world model only, visualize predictions
├── 02_dreamer_cartpole.py    # Full DreamerV3 on CartPole-v1 (discrete)
├── 03_dreamer_pendulum.py    # Full DreamerV3 on Pendulum-v1 (continuous)
├── 04_visualize.py           # Latent t-SNE, imagined vs real trajectories
├── 05_dreamer_acrobot.py     # DreamerV3 on Acrobot-v1 (sparse reward)
├── 06_dreamer_lunar_lander.py # DreamerV3 on LunarLander-v3 (complex dynamics)
└── train_all.py              # Run all experiments

Quick Start

# Install dependencies
pip install torch gymnasium matplotlib numpy scikit-learn

# For LunarLander (optional)
pip install gymnasium[box2d]  # may need: brew install swig

# Run world model demo
python 01_world_model_demo.py

# Train on CartPole (discrete actions)
python 02_dreamer_cartpole.py

# Train on Pendulum (continuous actions)
python 03_dreamer_pendulum.py

# Train on harder environments
python 05_dreamer_acrobot.py
python 06_dreamer_lunar_lander.py

# Generate visualizations
python 04_visualize.py

How It Works

1. World Model (RSSM)

The RSSM maintains two types of state:

Deterministic:  h_t = GRU(h_{t-1}, [z_{t-1}, a_{t-1}])     # Memory
Stochastic:     z_t ~ Categorical(16 vars × 16 classes)      # Uncertainty

Posterior:      q(z_t | h_t, encoder(obs_t))    # Uses real observation
Prior:          p(z_t | h_t)                      # Prediction only (for dreaming!)

2. World Model Training

Loss = Decoder(symlog MSE) + Reward(twohot CE) + Continue(BCE) + 0.5 * max(1, KL)

3. Actor-Critic in Imagination

# Start from real states (posterior)
for t in range(horizon):
    action = actor(latent)              # Policy chooses action
    latent = world_model.imagine(action)  # Dream next state (no real env!)
    rewards.append(reward_head(latent))

# Compute lambda-returns and update actor to maximize returns
# Gradient flows: actor → action → world_model → reward → returns

Environments

Environment	Type	Difficulty	Key Challenge
CartPole-v1	Discrete	Easy	Balance a pole (constant +1 reward)
Pendulum-v1	Continuous	Medium	Swing up + balance (dense negative reward)
Acrobot-v1	Discrete	Hard	Swing up double pendulum (sparse -1 reward)
LunarLander-v3	Discrete	Hard	Land spacecraft (complex multi-phase reward)

Blog Post

For a detailed walkthrough of this implementation, check out: DreamerV3 World Model RL: PyTorch Implementation

Related Projects

simpleRL-reason — DQN to PPO from scratch
diffusion-from-scratch — Diffusion models from scratch
mcp-python-tutorial — MCP Protocol tutorial

References

Mastering Diverse Domains through World Models — Hafner et al., Nature 2025
Dream to Control — DreamerV1
Mastering Atari with Discrete World Models — DreamerV2

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamerV3 from Scratch

What is DreamerV3?

Key Innovations Implemented

Project Structure

Quick Start

How It Works

1. World Model (RSSM)

2. World Model Training

3. Actor-Critic in Imagination

Environments

Blog Post

Related Projects

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
dreamer		dreamer
.gitignore		.gitignore
01_world_model_demo.py		01_world_model_demo.py
02_dreamer_cartpole.py		02_dreamer_cartpole.py
03_dreamer_pendulum.py		03_dreamer_pendulum.py
04_visualize.py		04_visualize.py
05_dreamer_acrobot.py		05_dreamer_acrobot.py
06_dreamer_lunar_lander.py		06_dreamer_lunar_lander.py
README.md		README.md
requirements.txt		requirements.txt
train_all.py		train_all.py

Folders and files

Latest commit

History

Repository files navigation

DreamerV3 from Scratch

What is DreamerV3?

Key Innovations Implemented

Project Structure

Quick Start

How It Works

1. World Model (RSSM)

2. World Model Training

3. Actor-Critic in Imagination

Environments

Blog Post

Related Projects

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages