Skip to content

Commit 1b13827

Browse files
committed
Overcooked port
1 parent 6dfb5f7 commit 1b13827

111 files changed

Lines changed: 2064 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

config/overcooked.ini

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
[base]
2+
env_name = overcooked
3+
4+
[vec]
5+
total_agents = 8192
6+
7+
[env]
8+
num_agents = 2
9+
layout = 0
10+
grid_size = 100
11+
reward_dish_served_whole_team = 1.0
12+
reward_dish_served_agent = 0.0
13+
reward_pot_started = 0.15
14+
reward_ingredient_added = 0.15
15+
reward_ingredient_picked = 0.05
16+
reward_plate_picked = 0.05
17+
reward_soup_plated = 0.20
18+
reward_wrong_dish_served = 0.0
19+
reward_step_penalty = 0.0
20+
21+
[train]
22+
total_timesteps = 100_000_000
23+
learning_rate = 0.01
24+
minibatch_size = 32768
25+
gamma = 0.99
26+
ent_coef = 0.02
27+
gae_lambda = 0.97
28+
clip_coef = 0.15
29+
anneal_lr = 1

ocean/overcooked/README.md

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
# Overcooked Environment
2+
3+
A multi-agent cooking coordination environment where agents cooperate to prepare and serve onion soup. Based on the popular Overcooked video game, this environment tests agents' ability to coordinate, divide labor, and work together efficiently.
4+
5+
## File Structure
6+
7+
```
8+
overcooked/
9+
├── overcooked.h # Main entry point (init, reset, step, close)
10+
├── overcooked_types.h # Constants, enums, and struct definitions
11+
├── overcooked_items.h # Item and cooking pot management
12+
├── overcooked_obs.h # Observation computation
13+
├── overcooked_logic.h # Game logic (interaction, movement, cooking)
14+
├── overcooked_render.h # Rendering and texture management
15+
├── binding.c # Python bindings
16+
└── overcooked.py # Python environment wrapper
17+
```
18+
19+
## Observation Space
20+
21+
**39-dimensional vector per agent***see [compute_observations](overcooked_obs.h#L81)*
22+
23+
### Player Features (34 dims)
24+
- **Orientation** (4): One-hot encoding of facing direction — [overcooked_obs.h:101-103](overcooked_obs.h#L101-L103)
25+
- **Held Object** (4): One-hot encoding (onion, plated_soup, plate, empty) — [overcooked_obs.h:105-116](overcooked_obs.h#L105-L116)
26+
- **Proximity Features** (12): Normalized (dx, dy) to nearest — [overcooked_obs.h:118-167](overcooked_obs.h#L118-L167):
27+
- Onion source (ingredient box)
28+
- Dish source (plate box)
29+
- Plated soup on counter
30+
- Serving area
31+
- Empty counter
32+
- Pot (stove)
33+
- **Nearest Soup Ingredients** (2): Onion/tomato counts in nearest plated soup or held soup (normalized) — [overcooked_obs.h:169-179](overcooked_obs.h#L169-L179)
34+
- **Pot Soup Ingredients** (2): Onion/tomato counts in nearest pot (normalized) — [overcooked_obs.h:181-202](overcooked_obs.h#L181-L202)
35+
- **Pot Existence** (1): Binary flag for reachable pot — [overcooked_obs.h:205](overcooked_obs.h#L205)
36+
- **Pot State** (4): Binary flags (empty, full, cooking, ready) — [overcooked_obs.h:207-215](overcooked_obs.h#L207-L215)
37+
- **Cooking Time** (1): Remaining cook time (normalized) — [overcooked_obs.h:217-223](overcooked_obs.h#L217-L223)
38+
- **Wall Detection** (4): Binary flags for walls/obstacles (up, down, left, right) — [overcooked_obs.h:225-235](overcooked_obs.h#L225-L235)
39+
40+
### Spatial Features (4 dims)
41+
- **Teammate Relative Position** (2): Normalized (dx, dy) to other agent — [overcooked_obs.h:237-248](overcooked_obs.h#L237-L248)
42+
- **Absolute Position** (2): Normalized (x, y) coordinates — [overcooked_obs.h:250-252](overcooked_obs.h#L250-L252)
43+
44+
### Context (1 dim)
45+
- **Reward** (1): Current step reward — [overcooked_obs.h:255](overcooked_obs.h#L255)
46+
47+
## Action Space
48+
49+
**6 discrete actions***see [c_step](overcooked.h#L77)*
50+
- 0: No-op — [ACTION_NOOP](overcooked_types.h#L43)
51+
- 1: Move up — [ACTION_UP](overcooked_types.h#L44)
52+
- 2: Move down — [ACTION_DOWN](overcooked_types.h#L45)
53+
- 3: Move left — [ACTION_LEFT](overcooked_types.h#L46)
54+
- 4: Move right — [ACTION_RIGHT](overcooked_types.h#L47)
55+
- 5: Interact (pick up/place items, use equipment) — [ACTION_INTERACT](overcooked_types.h#L48)
56+
57+
## Reward System
58+
59+
*See [evaluate_dish_served](overcooked_logic.h#L229) and [handle_interaction](overcooked_logic.h#L106)*
60+
61+
### Main Rewards
62+
- **Correct dish served** (3 onions): +1.0 (shared), +0.0 (server bonus) — [overcooked_logic.h:237-241](overcooked_logic.h#L237-L241)
63+
- **Wrong dish served** (incorrect recipe): +0.0 (shared) — [overcooked_logic.h:252-258](overcooked_logic.h#L252-L258)
64+
- **Step penalty**: 0.0 — [overcooked.h:80](overcooked.h#L80)
65+
66+
### Intermediate Rewards
67+
- **Pick up ingredient**: +0.05 — [overcooked_logic.h:221](overcooked_logic.h#L221)
68+
- **Add onion to pot**: +0.15 — [overcooked_logic.h:133](overcooked_logic.h#L133)
69+
- **Start cooking** (3 onions in pot): +0.15 — [overcooked_logic.h:145-147](overcooked_logic.h#L145-L147)
70+
- **Plate cooked soup**: +0.20 — [overcooked_logic.h:159](overcooked_logic.h#L159)
71+
72+
## Recipe
73+
74+
The correct recipe requires **exactly 3 onions** in the soup. Agents must:
75+
1. Pick up onions from ingredient boxes
76+
2. Add 3 onions to a pot
77+
3. Start cooking (interact with pot when empty-handed)
78+
4. Wait for soup to cook (20 steps)
79+
5. Pick up a plate from plate box
80+
6. Plate the cooked soup (interact with pot while holding plate)
81+
7. Deliver plated soup to serving area
82+
83+
## Configuration
84+
85+
*See [Overcooked class](overcooked.py#L14)*
86+
87+
```python
88+
env = Overcooked(
89+
num_envs=1, # Number of parallel environments
90+
layout="cramped_room", # Layout name (see Available Layouts)
91+
num_agents=2, # Agents per environment
92+
render_mode=None, # Set to enable rendering
93+
log_interval=128, # Steps between log aggregation
94+
grid_size=32, # Render tile size in pixels
95+
96+
# Reward configuration (from config/ocean/overcooked.ini)
97+
reward_dish_served_whole_team=1.0, # Shared reward for correct dish
98+
reward_dish_served_agent=0.0, # Bonus for serving agent
99+
reward_pot_started=0.15, # Starting correct recipe
100+
reward_ingredient_added=0.15, # Adding onion to pot
101+
reward_ingredient_picked=0.05, # Picking up ingredient
102+
reward_soup_plated=0.20, # Plating cooked soup
103+
reward_wrong_dish_served=0.0, # Serving incorrect dish
104+
reward_step_penalty=0.0, # Per-step penalty
105+
)
106+
```
107+
108+
## Game Constants
109+
110+
- **Cooking time**: 20 steps — [COOKING_TIME](overcooked_types.h#L39)
111+
- **Max ingredients per pot**: 3 — [MAX_INGREDIENTS](overcooked_types.h#L40)
112+
- **Max episode steps**: 400 (default)
113+
- **Max dynamic items**: 20 — [overcooked.h:19](overcooked.h#L19)
114+
115+
## Available Layouts
116+
117+
*See [LAYOUTS](overcooked_types.h#L244-L259)*
118+
119+
### cramped_room (5x5)
120+
121+
```
122+
+---+---+---+---+---+
123+
| W | C | P | C | W | W = Wall
124+
+---+---+---+---+---+ C = Counter
125+
| I | | | | I | P = Pot (Stove)
126+
+---+---+---+---+---+ I = Ingredient Box (Onions)
127+
| C | | | | C | D = Dish/Plate Box
128+
+---+---+---+---+---+ S = Serving Area
129+
| C | | | | C |
130+
+---+---+---+---+---+
131+
| W | D | C | S | W |
132+
+---+---+---+---+---+
133+
```
134+
Spawns: (1,2) and (3,2)
135+
136+
### asymmetric_advantages (9x5)
137+
138+
```
139+
+---+---+---+---+---+---+---+---+---+
140+
| W | C | W | W | W | W | W | C | W |
141+
+---+---+---+---+---+---+---+---+---+
142+
| I | | C | S | W | I | C | | S |
143+
+---+---+---+---+---+---+---+---+---+
144+
| C | | | | P | | | | C |
145+
+---+---+---+---+---+---+---+---+---+
146+
| C | | | | P | | | | C |
147+
+---+---+---+---+---+---+---+---+---+
148+
| W | C | C | D | W | D | C | C | W |
149+
+---+---+---+---+---+---+---+---+---+
150+
```
151+
Spawns: (1,2) and (7,2)
152+
153+
### forced_coordination (5x5)
154+
155+
```
156+
+---+---+---+---+---+
157+
| W | C | W | P | W | W = Wall
158+
+---+---+---+---+---+ C = Counter
159+
| I | | C | | P | P = Pot (Stove)
160+
+---+---+---+---+---+ I = Ingredient Box (Onions)
161+
| I | | C | | C | D = Dish/Plate Box
162+
+---+---+---+---+---+ S = Serving Area
163+
| D | | C | | C |
164+
+---+---+---+---+---+
165+
| W | C | W | S | W |
166+
+---+---+---+---+---+
167+
```
168+
Spawns: (1,2) and (3,2)
169+
170+
A challenging layout with a center wall dividing the kitchen. Agents must coordinate through limited passage points.
171+
172+
### coordination_ring (5x5)
173+
174+
```
175+
+---+---+---+---+---+
176+
| W | C | C | P | W | W = Wall
177+
+---+---+---+---+---+ C = Counter
178+
| C | | | | P | P = Pot (Stove)
179+
+---+---+---+---+---+ I = Ingredient Box (Onions)
180+
| D | | C | | C | D = Dish/Plate Box
181+
+---+---+---+---+---+ S = Serving Area
182+
| I | | | | C |
183+
+---+---+---+---+---+
184+
| W | I | S | C | W |
185+
+---+---+---+---+---+
186+
```
187+
Spawns: (1,2) and (3,2)
188+
189+
Ring-shaped layout with a center counter obstacle. Agents must navigate around the center to coordinate ingredient pickup and soup delivery.
190+
191+
### counter_circuit (8x5)
192+
193+
```
194+
+---+---+---+---+---+---+---+---+
195+
| W | C | C | P | P | C | C | W |
196+
+---+---+---+---+---+---+---+---+
197+
| C | | | | | | | C |
198+
+---+---+---+---+---+---+---+---+
199+
| D | | C | C | C | C | | S |
200+
+---+---+---+---+---+---+---+---+
201+
| C | | | | | | | C |
202+
+---+---+---+---+---+---+---+---+
203+
| W | C | C | I | I | C | C | W |
204+
+---+---+---+---+---+---+---+---+
205+
```
206+
Spawns: (1,1) and (6,3)
207+
208+
Circuit-shaped layout with a center counter island. Agents must coordinate around the obstacle to efficiently transport ingredients and serve dishes. Features dual pots and dual ingredient boxes for parallel cooking.
209+
210+
## Logging Metrics
211+
212+
*See [Log struct](overcooked_types.h#L65-L78)*
213+
214+
| Metric | Description |
215+
|--------|-------------|
216+
| perf | Normalized performance (correct dishes served) |
217+
| score | Raw score (correct dishes served) |
218+
| episode_return | Sum of rewards over episode |
219+
| episode_length | Number of steps in episode |
220+
| dishes_served | Total dishes served (correct + wrong) |
221+
| correct_dishes | Number of 3-onion dishes served |
222+
| wrong_dishes | Number of incorrect dishes served |
223+
| ingredients_picked | Total ingredients picked up |
224+
| pots_started | Number of cooking sessions started |
225+
| items_dropped | Number of items placed on counters |
226+
| agent_collisions | Number of agent collision attempts |
227+
228+
## Agent Reset Mechanism
229+
230+
If an agent goes 512 steps without receiving a reward, it is automatically reset to its starting position with no held item. This prevents agents from getting stuck — [c_step](overcooked.h#L114-L133)
231+
232+
## Building
233+
234+
```bash
235+
# Build the environment
236+
python setup.py build_overcooked --inplace
237+
238+
# Run standalone test
239+
python pufferlib/ocean/overcooked/overcooked.py
240+
241+
# Run standalone demo with specific layout
242+
./overcooked cramped_room
243+
./overcooked asymmetric_advantages
244+
./overcooked forced_coordination
245+
./overcooked coordination_ring
246+
./overcooked counter_circuit
247+
```

ocean/overcooked/binding.c

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#include "overcooked.h"
2+
3+
#define OBS_SIZE 43
4+
#define NUM_ATNS 1
5+
#define ACT_SIZES {6}
6+
#define OBS_TENSOR_T FloatTensor
7+
8+
#define Env Overcooked
9+
#include "vecenv.h"
10+
11+
void my_init(Env* env, Dict* kwargs) {
12+
env->layout_id = (LayoutType)dict_get(kwargs, "layout")->value;
13+
env->num_agents = (int)dict_get(kwargs, "num_agents")->value;
14+
env->grid_size = (int)dict_get(kwargs, "grid_size")->value;
15+
env->observation_size = OBS_SIZE;
16+
17+
env->rewards_config.dish_served_whole_team = dict_get(kwargs, "reward_dish_served_whole_team")->value;
18+
env->rewards_config.dish_served_agent = dict_get(kwargs, "reward_dish_served_agent")->value;
19+
env->rewards_config.pot_started = dict_get(kwargs, "reward_pot_started")->value;
20+
env->rewards_config.ingredient_added = dict_get(kwargs, "reward_ingredient_added")->value;
21+
env->rewards_config.ingredient_picked = dict_get(kwargs, "reward_ingredient_picked")->value;
22+
env->rewards_config.plate_picked = dict_get(kwargs, "reward_plate_picked")->value;
23+
env->rewards_config.soup_plated = dict_get(kwargs, "reward_soup_plated")->value;
24+
env->rewards_config.wrong_dish_served = dict_get(kwargs, "reward_wrong_dish_served")->value;
25+
env->rewards_config.step_penalty = dict_get(kwargs, "reward_step_penalty")->value;
26+
27+
init(env);
28+
}
29+
30+
void my_log(Log* log, Dict* out) {
31+
dict_set(out, "perf", log->perf);
32+
dict_set(out, "score", log->score);
33+
dict_set(out, "episode_return", log->episode_return);
34+
dict_set(out, "episode_length", log->episode_length);
35+
dict_set(out, "dishes_served", log->dishes_served);
36+
dict_set(out, "correct_dishes", log->correct_dishes);
37+
dict_set(out, "wrong_dishes", log->wrong_dishes);
38+
dict_set(out, "ingredients_picked", log->ingredients_picked);
39+
dict_set(out, "pots_started", log->pots_started);
40+
dict_set(out, "items_dropped", log->items_dropped);
41+
dict_set(out, "agent_collisions", log->agent_collisions);
42+
}

0 commit comments

Comments
 (0)