Proximal Policy Optimization (PPO) for MuJoCo

This repository contains an implementation of Proximal Policy Optimization (PPO) based on the original PPO paper for continuous control tasks in MuJoCo environments. The implementation has been tested on HalfCheetah, Swimmer, Hopper, and Walker2d and compared with A2C and Vanilla Policy Gradient (VPG).

Prerequisites

Install UV (if you don't have it already)

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dependencies

uv sync

If you want to use W&B to track training progress: Generate W&B API key, create .env file and add WANDB_API_KEY:

WANDB_API_KEY=<YOUR_API_KEY>

Training

Regular training with W&B

uv run python train.py --config config/<MUJOCO_ENV_NAME>/<a2c|ppo|vpg>.yaml

Regular training without W&B

uv run python train.py --config config/<MUJOCO_ENV_NAME>/<a2c|ppo|vpg>.yaml --disable-wandb

W&B sweep (runs all 3 random seeds)

wandb sweep config/<MUJOCO_ENV_NAME>/<a2c|ppo|vpg>.yaml      
wandb agent <AGENT_NAME>

Simulating

uv run python simulate.py --config config/<MUJOCO_ENV_NAME>/<a2c|ppo|vpg>.yaml --video-dir videos --episodes <NUM_EPISODES>

Supported Environments

Training Performance

I log smoothed returns over the last 100 episodes during training. Below are the learning curves of PPO, A2C, and Vanilla PG, averaged over 3 random seeds.

Legend: 🟩 PPO | 🟦 A2C | 🟧 Vanilla PG

Comparison of PPO, A2C, and Vanilla PG algorithms on different MuJoCo environments: average return over 100 episodes, trained for 1 million timesteps.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
weights		weights
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
actor_critic.py		actor_critic.py
env.py		env.py
pyproject.toml		pyproject.toml
rl_utils.py		rl_utils.py
simulate.py		simulate.py
test_env.py		test_env.py
train.py		train.py
trainer.py		trainer.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization (PPO) for MuJoCo

Prerequisites

Training

Regular training with W&B

Regular training without W&B

W&B sweep (runs all 3 random seeds)

Simulating

Supported Environments

Training Performance

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization (PPO) for MuJoCo

Prerequisites

Training

Regular training with W&B

Regular training without W&B

W&B sweep (runs all 3 random seeds)

Simulating

Supported Environments

Training Performance

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages