Skip to content

Commit 9a5249c

Browse files
committed
OSRS environments: inferno, zulrah, NH PvP with shared combat engine
three OSRS encounters sharing a common combat, pathfinding, and rendering engine. all game logic in pure C, platform-independent. inferno: 69-wave PvE challenge. 8-head discrete action space (79 logits), 1017-dim obs. prayer switching, gear switching, barrage AoE, blowpipe spec, pillar safespotting, NPC types: nibbler, bat, blob (prayer reader + splits), meleer (dig mechanic), ranger, mager (resurrects dead NPCs), jad (50/50 prayer flick), zuk (shield mechanic). working RL training on Metal at wave ~24 avg, wave 40+ best runs. zulrah: solo boss encounter with 4 rotation patterns, 3 forms, venom clouds, snakelings, gear tier system. reward shaping for training. NH PvP: no-honour PvP with 24 scripted opponents, PFSP support, full combat (gear switches, prayers, eating, specs, movement). shared engine (osrs/): osrs_combat_shared.h — tbow scaling, barrage AoE, blowpipe spec, hit formulas osrs_encounter.h — pathfinding (BFS), chase logic, potion effects, spec helpers osrs_collision.h — line of sight through blockers, collision maps osrs_items.h — full equipment database with all stats osrs_pathfinding.h — BFS pathfinding with collision awareness visual debug binary (make visual in ocean/osrs/): 3D raylib viewer with NPC models, animations, projectiles, spell effects. replay recording + deterministic playback. debug overlay (D key) shows per-NPC attack timers, LOS, prayer state. human play mode (H key). asset export scripts for models, animations, sprites from OSRS cache. binary assets gitignored — regenerate via scripts/ from an OSRS cache download.
1 parent 4e0c951 commit 9a5249c

94 files changed

Lines changed: 43881 additions & 2403 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

config/ocean/osrs_inferno.ini

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Metal osrs_inferno config.
2+
# long episodes (300-2000+ ticks), 7 action heads (76 logits), 380+76 obs.
3+
# vf_coef must stay low (< 0.15) — fused decoder amplifies value gradients
4+
# into policy logits via MinGRU scan backward. replay_ratio < 1.0 to avoid
5+
# stale target drift.
6+
7+
[base]
8+
env_name = osrs_inferno
9+
score_metric = episode_return
10+
11+
[env]
12+
start_wave = 0.0
13+
mask_in_obs = 1.0
14+
15+
[vec]
16+
total_agents = 2048
17+
num_buffers = 4
18+
num_threads = 4
19+
20+
[policy]
21+
hidden_size = 256
22+
num_layers = 2
23+
24+
[train]
25+
# anchor from sweep trial #33 (score 74.6, wave 17+, prayer 60%)
26+
total_timesteps = 400000000
27+
horizon = 32
28+
min_lr_ratio = 0.003872
29+
learning_rate = 0.003069
30+
beta1 = 0.95
31+
eps = 0.000004
32+
ent_coef = 0.0017
33+
gamma = 0.998319
34+
gae_lambda = 0.8
35+
vtrace_rho_clip = 2.243133
36+
vtrace_c_clip = 1.971016
37+
prio_alpha = 0.0
38+
prio_beta0 = 0.275787
39+
clip_coef = 0.611932
40+
vf_coef = 0.063963
41+
vf_clip_coef = 0.404894
42+
max_grad_norm = 0.997781
43+
replay_ratio = 0.790328
44+
minibatch_size = 4096
45+
ns_iters = 5
46+
weight_decay = 0.089232
47+
48+
[sweep]
49+
min_sps = 50000
50+
max_suggestion_cost = 3600
51+
metric = episode_return
52+
metric_distribution = linear
53+
54+
[sweep.train.horizon]
55+
distribution = uniform_pow2
56+
min = 32
57+
max = 256
58+
scale = auto
59+
60+
[sweep.train.learning_rate]
61+
distribution = log_normal
62+
min = 0.0003
63+
max = 0.01
64+
scale = 0.5
65+
66+
[sweep.train.ent_coef]
67+
distribution = log_normal
68+
min = 0.001
69+
max = 0.03
70+
scale = auto
71+
72+
[sweep.train.gamma]
73+
distribution = logit_normal
74+
min = 0.99
75+
max = 0.9999
76+
scale = auto
77+
78+
[sweep.train.min_lr_ratio]
79+
distribution = uniform
80+
min = 0.0
81+
max = 0.3
82+
scale = auto
83+
84+
[sweep.train.beta1]
85+
distribution = uniform
86+
min = 0.8
87+
max = 0.99
88+
scale = auto
89+
90+
[sweep.train.eps]
91+
distribution = log_normal
92+
min = 1e-6
93+
max = 1e-4
94+
scale = auto
95+
96+
[sweep.train.gae_lambda]
97+
distribution = logit_normal
98+
min = 0.5
99+
max = 0.999
100+
scale = auto
101+
102+
[sweep.train.vtrace_rho_clip]
103+
distribution = uniform
104+
min = 1.0
105+
max = 3.0
106+
scale = auto
107+
108+
[sweep.train.vtrace_c_clip]
109+
distribution = uniform
110+
min = 1.0
111+
max = 2.5
112+
scale = auto
113+
114+
[sweep.train.prio_alpha]
115+
distribution = logit_normal
116+
min = 0.0
117+
max = 0.8
118+
scale = auto
119+
120+
[sweep.train.prio_beta0]
121+
distribution = logit_normal
122+
min = 0.01
123+
max = 0.8
124+
scale = auto
125+
126+
[sweep.train.clip_coef]
127+
distribution = uniform
128+
min = 0.2
129+
max = 1.0
130+
scale = auto
131+
132+
[sweep.train.vf_coef]
133+
distribution = log_normal
134+
min = 0.005
135+
max = 0.5
136+
scale = auto
137+
138+
[sweep.train.vf_clip_coef]
139+
distribution = uniform
140+
min = 0.1
141+
max = 2.0
142+
scale = auto
143+
144+
[sweep.train.max_grad_norm]
145+
distribution = uniform
146+
min = 0.5
147+
max = 3.0
148+
scale = auto
149+
150+
[sweep.train.replay_ratio]
151+
distribution = uniform
152+
min = 0.1
153+
max = 1.0
154+
scale = auto
155+
156+
[sweep.train.weight_decay]
157+
distribution = log_normal
158+
min = 0.001
159+
max = 0.3
160+
scale = auto
161+
162+
[sweep.train.minibatch_size]
163+
distribution = uniform_pow2
164+
min = 2048
165+
max = 8192
166+
scale = auto
167+
168+
[sweep.train.num_buffers]
169+
distribution = uniform_pow2
170+
min = 1
171+
max = 4
172+
scale = auto
173+
174+
[sweep.policy.hidden_size]
175+
distribution = uniform_pow2
176+
min = 128
177+
max = 512
178+
scale = auto
179+
180+
[sweep.policy.num_layers]
181+
distribution = uniform
182+
min = 2
183+
max = 5.0
184+
scale = auto
185+
186+

config/ocean/osrs_pvp.ini

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[base]
2+
env_name = osrs_pvp
3+
score_metric = score
4+
5+
[env]
6+
start_wave = 0.0
7+
mask_in_obs = 1.0
8+
9+
[vec]
10+
total_agents = 2048
11+
num_buffers = 4
12+
num_threads = 4
13+
14+
[policy]
15+
hidden_size = 256
16+
num_layers = 2
17+
18+
[train]
19+
total_timesteps = 200000000
20+
horizon = 32
21+
learning_rate = 0.003
22+
gamma = 0.998
23+
ent_coef = 0.001
24+
clip_coef = 0.6
25+
vf_coef = 0.1
26+
replay_ratio = 0.5
27+
minibatch_size = 4096
28+
weight_decay = 0.05

config/ocean/osrs_zulrah.ini

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[base]
2+
env_name = osrs_zulrah
3+
score_metric = score
4+
5+
[env]
6+
start_wave = 0.0
7+
mask_in_obs = 1.0
8+
9+
[vec]
10+
total_agents = 2048
11+
num_buffers = 4
12+
num_threads = 4
13+
14+
[policy]
15+
hidden_size = 256
16+
num_layers = 2
17+
18+
[train]
19+
total_timesteps = 200000000
20+
horizon = 32
21+
learning_rate = 0.003
22+
gamma = 0.998
23+
ent_coef = 0.001
24+
clip_coef = 0.6
25+
vf_coef = 0.1
26+
replay_ratio = 0.5
27+
minibatch_size = 4096
28+
weight_decay = 0.05

ocean/osrs/Makefile

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# OSRS PvP C Environment Makefile
2+
#
3+
# standalone targets (no PufferLib dependency):
4+
# make — headless benchmark binary
5+
# make visual — headed raylib viewer with human input
6+
# make debug — debug build with sanitizers
7+
#
8+
# PufferLib training uses setup.py build_osrs instead.
9+
10+
CC = clang
11+
CFLAGS = -Wall -Wextra -O3 -ffast-math -flto -fPIC -std=c11
12+
DEBUG_FLAGS = -Wall -Wextra -g -O0 -fPIC -std=c11 -DDEBUG
13+
LDFLAGS = -lm
14+
15+
TARGET = osrs_pvp
16+
DEMO_SRC = osrs_pvp.c
17+
HEADERS = osrs_pvp.h
18+
19+
# Raylib (for visual target). download from https://github.com/raysan5/raylib/releases
20+
RAYLIB_DIR = raylib-5.5_macos
21+
UNAME_S := $(shell uname -s)
22+
ifeq ($(UNAME_S),Darwin)
23+
RAYLIB_FLAGS = -I$(RAYLIB_DIR)/include $(RAYLIB_DIR)/lib/libraylib.a \
24+
-framework Cocoa -framework OpenGL -framework IOKit -framework CoreVideo
25+
else
26+
RAYLIB_FLAGS = -I$(RAYLIB_DIR)/include -L$(RAYLIB_DIR)/lib -lraylib -lGL -lpthread -ldl -lrt
27+
endif
28+
29+
.PHONY: all clean debug visual
30+
31+
all: $(TARGET)
32+
33+
$(TARGET): $(DEMO_SRC) $(HEADERS)
34+
$(CC) $(CFLAGS) -o $@ $(DEMO_SRC) $(LDFLAGS)
35+
36+
visual: $(DEMO_SRC) $(HEADERS) osrs_pvp_render.h osrs_pvp_gui.h
37+
$(CC) $(CFLAGS) -DOSRS_PVP_VISUAL $(RAYLIB_FLAGS) -o $(TARGET)_visual $(DEMO_SRC) $(LDFLAGS)
38+
39+
debug: $(DEMO_SRC) $(HEADERS)
40+
$(CC) $(DEBUG_FLAGS) -o $(TARGET)_debug $(DEMO_SRC) $(LDFLAGS)
41+
42+
clean:
43+
rm -f $(TARGET) $(TARGET)_debug $(TARGET)_visual *.o

ocean/osrs/README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# OSRS PvP Environment
2+
3+
C implementation of Old School RuneScape NH PvP for reinforcement learning.
4+
~1.1M env steps/sec standalone, ~235K+ training SPS on Metal.
5+
6+
## Build and train
7+
8+
```bash
9+
python setup.py build_osrs_pvp --inplace --force
10+
python train_pvp.py --no-wandb --total-timesteps 50000000
11+
12+
# zulrah (separate build, overwrites _C.so)
13+
python setup.py build_osrs_zulrah --inplace --force
14+
python train_zulrah.py --no-wandb --total-timesteps 500000000
15+
```
16+
17+
Two bindings: `binding.c` (metal vecenv.h) and `ocean_binding.c` (PufferLib env_binding.h).
18+
19+
## Data assets
20+
21+
Not in git. Exported from the OSRS game cache:
22+
23+
1. Download a modern cache from https://archive.openrs2.org/ ("flat file" export)
24+
2. `cd pufferlib/ocean/osrs_pvp && ./scripts/export_all.sh /path/to/cache`
25+
26+
Pure Python, no deps.
27+
28+
## Spaces
29+
30+
**Obs:** 373 = 334 features + 39 action mask, normalized in C.
31+
32+
**Actions:** MultiDiscrete `[9, 13, 6, 2, 5, 2, 2]` — loadout, combat, prayer, food, potion, karambwan, veng.
33+
34+
**Timing:** tick N actions apply at tick N+1 (OSRS-accurate async).
35+
36+
## Opponents
37+
38+
28 scripted policies from trivial (`true_random`) to boss (`nightmare_nh` — onetick + 50% action reading). Curriculum mixes and PFSP supported.
39+
40+
## Encounters
41+
42+
Vtable interface (`osrs_encounter.h`). Current: NH PvP, Zulrah (81 obs, 6 heads, 3 forms, venom, clouds, collision).
43+
44+
## Files
45+
46+
Core env: `osrs_types/items + osrs_pvp_gear/combat/collision/pathfinding/movement/observations/actions/opponents/api.h`
47+
48+
Visual: `osrs_pvp_render/gui/anim/models/terrain/objects/effects/human_input.h`
49+
50+
Encounters: `encounters/encounter_nh_pvp.h`, `encounters/encounter_zulrah.h`
51+
52+
Data: `data/` (gitignored binaries + C model headers), `scripts/` (cache exporters)

0 commit comments

Comments
 (0)