Skip to content

Commit 04f6e41

Browse files
committed
feat: refine README structure, expand GOAP sample, add tick timing percentiles
1 parent e9b2aa4 commit 04f6e41

16 files changed

Lines changed: 833 additions & 291 deletions

README.md

Lines changed: 53 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,41 @@ The architecture evolved in a legacy 3D MMORPG sandbox. It progressed from a mon
1818

1919
> **Scope.** This is a cleaned extraction from a private working repo, published as an architecture reference. Live runtime config, environment assets, and operational glue are intentionally omitted. See [docs/samples/](docs/samples/) for real session output, or [`docs/walkthrough.md`](docs/walkthrough.md) for a step-by-step trace of one tick from perception to motor output.
2020
21+
## Reading the Code
22+
23+
Start with any rule module in [`brain/rules/`](src/brain/rules/) to see how conditions and score functions are written. Then inspect [`brain/goap/planner.py`](src/brain/goap/planner.py) for A\* search with Monte Carlo robustness gating, [`brain/learning/encounters.py`](src/brain/learning/encounters.py) for Bayesian posteriors and Thompson Sampling, [`brain/world/model.py`](src/brain/world/model.py) for derived world intelligence, and [`brain/runner/loop.py`](src/brain/runner/loop.py) for the 10 Hz execution path.
24+
25+
For a step-by-step trace of one tick from perception to motor output, see [`docs/walkthrough.md`](docs/walkthrough.md). For architecture details beyond the README, see [`docs/architecture.md`](docs/architecture.md). For design rationale, [`docs/design-decisions.md`](docs/design-decisions.md). For the full evolutionary arc, [`docs/evolution.md`](docs/evolution.md).
26+
27+
<details>
28+
<summary><strong>Project Structure</strong></summary>
29+
30+
```
31+
src/
32+
core/ Cross-cutting primitives (types, constants, exceptions, features)
33+
runtime/ Agent wiring and session lifecycle
34+
perception/ Environment state reading (snapshot contract, pointer traversal)
35+
brain/ Decision stack: priority rules, utility scoring, GOAP planner
36+
brain/state/ Typed sub-state dataclasses (combat, pet, camp, inventory, ...)
37+
brain/runner/ 10 Hz tick loop, lifecycle management, level-up handling
38+
brain/world/ World model, entity tracking, anomaly detection
39+
brain/goap/ GOAP planner, world state, actions, goals, spawn predictor
40+
brain/learning/ Encounter history, spatial memory, scorecard, weight gradient
41+
brain/scoring/ Target scoring, utility curves, weight learner
42+
brain/rules/ Priority rules across 4 modules (survival, combat, maintenance, nav)
43+
routines/ State machine behaviors (enter/tick/exit)
44+
routines/strategies/ Swappable combat strategy implementations
45+
nav/ JPS/A* pathfinding, DDA line-of-sight, waypoint graphs, zone graph
46+
nav/terrain/ 1-unit heightmaps from zone geometry, obstacle detection
47+
motor/ Action interface (movement, targeting, casting, stance)
48+
eq/ Environment data parsers (geometry, spells, zone models)
49+
simulator/ Offline scenario runner for testing decisions without live environment
50+
util/ Structured logging, event schemas, forensics, invariant checking
51+
docs/ Architecture, design decisions, evolution history, retrospective
52+
```
53+
54+
</details>
55+
2156
---
2257

2358
## Architecture
@@ -89,7 +124,7 @@ flowchart TB
89124

90125
No module imports upward. The dependency graph is a DAG, and each layer is independently understandable.
91126

92-
The **brain thread** runs at 10 Hz: read state, evaluate the decision stack, tick the active routine, issue motor commands. A single cycle runs in well under 100ms. A secondary thread handles observability output and runtime control signals. Thread safety between them comes from immutable perception snapshots: frozen `GameState` dataclasses produced each tick, never modified after creation. No locks, no races.
127+
The **brain thread** runs at 10 Hz: read state, evaluate the decision stack, tick the active routine, issue motor commands. A single cycle completes in well under 100ms (p99: 0.5ms in headless simulation). A secondary thread handles observability output and runtime control signals. Thread safety between them comes from immutable perception snapshots: frozen `GameState` dataclasses produced each tick, never modified after creation. No locks, no races.
93128

94129
### Perception
95130

@@ -103,7 +138,7 @@ The brain runs a three-layer decision stack. Each layer adds capability; the lay
103138

104139
**Utility scoring** operates within the safety envelope. Non-emergency rules produce float scores reflecting "how valuable is this action right now?" Five selection phases are configurable at runtime: Phase 0 ignores scores (conservative baseline), Phase 1 logs divergences without changing behavior (observation mode), Phase 2 uses scores within priority tiers, Phase 3 uses weighted cross-tier comparison, Phase 4 uses declarative consideration-based scoring with weighted geometric mean. This escalation path allows the scoring system to be validated before it influences decisions.
105140

106-
**GOAP planning** generates multi-step action sequences toward explicit goals: survive, gain XP, manage resources. The planner uses A\* on the goal state space, not the terrain, with preconditions, effects, and learned cost functions per action. Plans run 3–8 steps and are generated once per routine completion or on plan invalidation (budget: <50ms). Candidate plans are evaluated via Monte Carlo rollouts: action effects are sampled stochastically from learned posterior distributions to estimate expected plan value under uncertainty, so a plan that performs well across noisy outcomes is preferred over one that looks optimal only under point estimates. The relationship to the priority system is explicit: GOAP proposes, priorities dispose. Each tick, the current plan step's routine receives a score boost in the utility selection phase; emergency rules evaluate first and invalidate the plan if any fires. Spawn prediction (Poisson process from defeat timestamps) feeds both the planner's positioning decisions and the wander routine's directional bias.
141+
**GOAP planning** generates multi-step action sequences toward explicit goals: survive, gain XP, manage resources. The planner uses A\* on the goal state space with learned cost functions per action, producing 3–8 step plans within a 50ms budget. Candidate plans pass a Monte Carlo robustness gate: action effects are sampled stochastically from learned posterior distributions, and plans that fail under noisy outcomes are rejected. The relationship to the priority system is explicit: GOAP proposes, priorities dispose. Emergency rules evaluate first each tick and invalidate the active plan if any fires. See [`docs/architecture.md`](docs/architecture.md#goap-planner) and [`docs/samples/goap-planner.md`](docs/samples/goap-planner.md) for the full planning pipeline.
107142

108143
### Routines
109144

@@ -212,16 +247,16 @@ Each session produces 6 output files: 4 log files (one per tier threshold), a st
212247

213248
All samples in [docs/samples/](docs/samples/) are real output from live sessions, not hand-written examples:
214249

215-
| Sample | What it shows |
216-
| ---------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
217-
| [Session tiers](docs/samples/session-tiers.md) | One session viewed through all 4 log tiers: EVENT arc, INFO routine flow, VERBOSE rule cascade, DEBUG motor commands |
218-
| [Decision trace](docs/samples/decision-trace.md) | 18 decision receipts showing WANDER → ACQUIRE → PULL (locked) → IN_COMBAT with tick timing and rule evaluation |
219-
| [GOAP plan](docs/samples/goap-planner.md) | Plan generation, step-by-step cost accuracy (estimated vs actual), and plan completion |
220-
| [Forensics buffer](docs/samples/forensics-ring-buffer.md) | 300-tick ring buffer dump: skeleton aggro interrupts spell memorization, FLEE fires within one tick |
221-
| [Learned encounter data](docs/samples/learned-encounter-data.md) | Cross-session improvement: grade B → A, fight duration 29.5s → 15.9s, auto-tuned parameters drifting from defaults |
222-
| [Fight event](docs/samples/structured-fight-event.md) | Structured `fight_end` event with all 20 fields and embedded world snapshot |
223-
| [Convergence](docs/samples/convergence.md) | 10-session headless run: fight duration drops 53% as encounter history accumulates and cost functions self-correct |
224-
| [Ablation results](docs/samples/ablation-results.md) | Learning vs. defaults: 97% GOAP cost error reduction, 25x danger discrimination, weight tuning stability |
250+
| Sample | What it shows |
251+
| --- | --- |
252+
| [Session tiers](docs/samples/session-tiers.md) | One session through all 4 log tiers |
253+
| [Decision trace](docs/samples/decision-trace.md) | 18 receipts across a WANDER → ACQUIRE → PULL → IN_COMBAT transition |
254+
| [GOAP plan](docs/samples/goap-planner.md) | Goal evaluation, A\* search, MC robustness gate, cost self-correction |
255+
| [Forensics buffer](docs/samples/forensics-ring-buffer.md) | 300-tick ring buffer dump after skeleton aggro interrupts memorization |
256+
| [Learned encounter data](docs/samples/learned-encounter-data.md) | Cross-session improvement: grade B → A, fight duration 29.5s → 15.9s |
257+
| [Fight event](docs/samples/structured-fight-event.md) | Structured `fight_end` event with all 20 fields |
258+
| [Convergence](docs/samples/convergence.md) | 10-session run: 53% fight duration reduction as posteriors tighten |
259+
| [Ablation results](docs/samples/ablation-results.md) | Learning vs. defaults: cost error, danger discrimination, weight stability |
225260

226261
---
227262

@@ -250,44 +285,14 @@ Convergence mode preserves learning state across sessions. Fight duration drops
250285

251286
See [docs/samples/convergence.md](docs/samples/convergence.md) for the full output and explanation.
252287

253-
---
254-
255-
## Reading the Code
256-
257-
After the top-level overview, read by subsystem. Start with any rule module in [`brain/rules/`](src/brain/rules/) to see how conditions and score functions are written. Then inspect [`brain/goap/planner.py`](src/brain/goap/planner.py) for goal-directed sequencing, [`brain/world/model.py`](src/brain/world/model.py) for derived world intelligence, and [`brain/runner/loop.py`](src/brain/runner/loop.py) for the 10 Hz execution path. Combat strategies live in [`routines/strategies/`](src/routines/strategies/).
258-
259-
For a step-by-step trace of one tick from perception to motor output, see [`docs/walkthrough.md`](docs/walkthrough.md). For architecture details beyond the README, see [`docs/architecture.md`](docs/architecture.md). For design rationale, [`docs/design-decisions.md`](docs/design-decisions.md). For the full evolutionary arc, [`docs/evolution.md`](docs/evolution.md).
260-
261-
<details>
262-
<summary><strong>Project Structure</strong></summary>
263-
264-
```
265-
src/
266-
core/ Cross-cutting primitives (types, constants, exceptions, features)
267-
runtime/ Agent wiring and session lifecycle
268-
perception/ Environment state reading (snapshot contract, pointer traversal)
269-
brain/ Decision stack: priority rules, utility scoring, GOAP planner
270-
brain/state/ Typed sub-state dataclasses (combat, pet, camp, inventory, ...)
271-
brain/runner/ 10 Hz tick loop, lifecycle management, level-up handling
272-
brain/world/ World model, entity tracking, anomaly detection
273-
brain/goap/ GOAP planner, world state, actions, goals, spawn predictor
274-
brain/learning/ Encounter history, spatial memory, scorecard, weight gradient
275-
brain/scoring/ Target scoring, utility curves, weight learner
276-
brain/rules/ Priority rules across 4 modules (survival, combat, maintenance, nav)
277-
routines/ State machine behaviors (enter/tick/exit)
278-
routines/strategies/ Swappable combat strategy implementations
279-
nav/ JPS/A* pathfinding, DDA line-of-sight, waypoint graphs, zone graph
280-
nav/terrain/ 1-unit heightmaps from zone geometry, obstacle detection
281-
motor/ Action interface (movement, targeting, casting, stance)
282-
eq/ Environment data parsers (geometry, spells, zone models)
283-
simulator/ Offline scenario runner for testing decisions without live environment
284-
util/ Structured logging, event schemas, forensics, invariant checking
285-
docs/ Architecture, design decisions, evolution history, retrospective
286-
```
288+
Tick timing across scenarios:
287289

288-
</details>
290+
| | p50 | p95 | p99 | max |
291+
|---|---|---|---|---|
292+
| camp_session (1 280 ticks) | 0.1 ms | 0.1 ms | 0.5 ms | 0.7 ms |
293+
| survival_stress (220 ticks) | 0.1 ms | 0.2 ms | 0.3 ms | 0.3 ms |
289294

290-
Built with Python 3.14, zero runtime dependencies, and the standard library. See [docs/testing.md](docs/testing.md) for the test strategy and coverage philosophy.
295+
See [docs/testing.md](docs/testing.md) for the test strategy and coverage philosophy.
291296

292297
---
293298

docs/design-decisions.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@
22

33
# Design Decisions
44

5-
Rationale for major architectural choices and rejected alternatives.
6-
7-
This document records the rationale behind each major architectural choice in Compass. Decisions are grouped by subsystem.
5+
Rationale for major architectural choices and rejected alternatives, grouped by subsystem.
86

97
---
108

docs/retrospective.md

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -62,18 +62,7 @@ This architecture was developed and validated against Classic EverQuest: a persi
6262

6363
## Architecture Evolution
6464

65-
| Stage | Milestone |
66-
| -------------------- | -------------------------------------------------------- |
67-
| Monolith | Main loop with if/else chains |
68-
| Pipeline | Extraction into perception / brain / routines / motor |
69-
| Priority rules | Priority rule engine, routine state machines |
70-
| Observability | 4-tier logging, structured JSONL events |
71-
| Persistent learning | Spatial memory, encounter history, scorecard auto-tuning |
72-
| Utility scoring | Utility scoring with phase-gated selection |
73-
| Learning loops | Finite-difference weight tuning, Thompson Sampling, threshold tuning |
74-
| GOAP | Goal planner with Monte Carlo evaluation, spawn prediction |
75-
76-
The architecture has been [additive since the pipeline](evolution.md#the-invariant-each-stage-is-additive). Nothing was replaced. Everything composes.
65+
The architecture progressed through six stages, each solving a specific failure mode of the previous one: monolith, pipeline, priority rules, utility scoring, learning loops, and GOAP. See [Evolution](evolution.md) for the full stage-by-stage history. The architecture has been [additive since the pipeline](evolution.md#the-invariant-each-stage-is-additive). Nothing was replaced. Everything composes.
7766

7867
---
7968

0 commit comments

Comments
 (0)