You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+53-48Lines changed: 53 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,41 @@ The architecture evolved in a legacy 3D MMORPG sandbox. It progressed from a mon
18
18
19
19
> **Scope.** This is a cleaned extraction from a private working repo, published as an architecture reference. Live runtime config, environment assets, and operational glue are intentionally omitted. See [docs/samples/](docs/samples/) for real session output, or [`docs/walkthrough.md`](docs/walkthrough.md) for a step-by-step trace of one tick from perception to motor output.
20
20
21
+
## Reading the Code
22
+
23
+
Start with any rule module in [`brain/rules/`](src/brain/rules/) to see how conditions and score functions are written. Then inspect [`brain/goap/planner.py`](src/brain/goap/planner.py) for A\* search with Monte Carlo robustness gating, [`brain/learning/encounters.py`](src/brain/learning/encounters.py) for Bayesian posteriors and Thompson Sampling, [`brain/world/model.py`](src/brain/world/model.py) for derived world intelligence, and [`brain/runner/loop.py`](src/brain/runner/loop.py) for the 10 Hz execution path.
24
+
25
+
For a step-by-step trace of one tick from perception to motor output, see [`docs/walkthrough.md`](docs/walkthrough.md). For architecture details beyond the README, see [`docs/architecture.md`](docs/architecture.md). For design rationale, [`docs/design-decisions.md`](docs/design-decisions.md). For the full evolutionary arc, [`docs/evolution.md`](docs/evolution.md).
No module imports upward. The dependency graph is a DAG, and each layer is independently understandable.
91
126
92
-
The **brain thread** runs at 10 Hz: read state, evaluate the decision stack, tick the active routine, issue motor commands. A single cycle runs in well under 100ms. A secondary thread handles observability output and runtime control signals. Thread safety between them comes from immutable perception snapshots: frozen `GameState` dataclasses produced each tick, never modified after creation. No locks, no races.
127
+
The **brain thread** runs at 10 Hz: read state, evaluate the decision stack, tick the active routine, issue motor commands. A single cycle completes in well under 100ms (p99: 0.5ms in headless simulation). A secondary thread handles observability output and runtime control signals. Thread safety between them comes from immutable perception snapshots: frozen `GameState` dataclasses produced each tick, never modified after creation. No locks, no races.
93
128
94
129
### Perception
95
130
@@ -103,7 +138,7 @@ The brain runs a three-layer decision stack. Each layer adds capability; the lay
103
138
104
139
**Utility scoring** operates within the safety envelope. Non-emergency rules produce float scores reflecting "how valuable is this action right now?" Five selection phases are configurable at runtime: Phase 0 ignores scores (conservative baseline), Phase 1 logs divergences without changing behavior (observation mode), Phase 2 uses scores within priority tiers, Phase 3 uses weighted cross-tier comparison, Phase 4 uses declarative consideration-based scoring with weighted geometric mean. This escalation path allows the scoring system to be validated before it influences decisions.
105
140
106
-
**GOAP planning** generates multi-step action sequences toward explicit goals: survive, gain XP, manage resources. The planner uses A\* on the goal state space, not the terrain, with preconditions, effects, and learned cost functions per action. Plans run 3–8 steps and are generated once per routine completion or on plan invalidation (budget: <50ms). Candidate plans are evaluated via Monte Carlo rollouts: action effects are sampled stochastically from learned posterior distributions to estimate expected plan value under uncertainty, so a plan that performs well across noisy outcomes is preferred over one that looks optimal only under point estimates. The relationship to the priority system is explicit: GOAP proposes, priorities dispose. Each tick, the current plan step's routine receives a score boost in the utility selection phase; emergency rules evaluate first and invalidate the plan if any fires. Spawn prediction (Poisson process from defeat timestamps) feeds both the planner's positioning decisions and the wander routine's directional bias.
141
+
**GOAP planning** generates multi-step action sequences toward explicit goals: survive, gain XP, manage resources. The planner uses A\* on the goal state space with learned cost functions per action, producing 3–8 step plans within a 50ms budget. Candidate plans pass a Monte Carlo robustness gate: action effects are sampled stochastically from learned posterior distributions, and plans that fail under noisy outcomes are rejected. The relationship to the priority system is explicit: GOAP proposes, priorities dispose. Emergency rules evaluate first each tick and invalidate the active plan if any fires. See [`docs/architecture.md`](docs/architecture.md#goap-planner) and [`docs/samples/goap-planner.md`](docs/samples/goap-planner.md) for the full planning pipeline.
107
142
108
143
### Routines
109
144
@@ -212,16 +247,16 @@ Each session produces 6 output files: 4 log files (one per tier threshold), a st
212
247
213
248
All samples in [docs/samples/](docs/samples/) are real output from live sessions, not hand-written examples:
|[Session tiers](docs/samples/session-tiers.md)| One session viewed through all 4 log tiers: EVENT arc, INFO routine flow, VERBOSE rule cascade, DEBUG motor commands|
@@ -250,44 +285,14 @@ Convergence mode preserves learning state across sessions. Fight duration drops
250
285
251
286
See [docs/samples/convergence.md](docs/samples/convergence.md) for the full output and explanation.
252
287
253
-
---
254
-
255
-
## Reading the Code
256
-
257
-
After the top-level overview, read by subsystem. Start with any rule module in [`brain/rules/`](src/brain/rules/) to see how conditions and score functions are written. Then inspect [`brain/goap/planner.py`](src/brain/goap/planner.py) for goal-directed sequencing, [`brain/world/model.py`](src/brain/world/model.py) for derived world intelligence, and [`brain/runner/loop.py`](src/brain/runner/loop.py) for the 10 Hz execution path. Combat strategies live in [`routines/strategies/`](src/routines/strategies/).
258
-
259
-
For a step-by-step trace of one tick from perception to motor output, see [`docs/walkthrough.md`](docs/walkthrough.md). For architecture details beyond the README, see [`docs/architecture.md`](docs/architecture.md). For design rationale, [`docs/design-decisions.md`](docs/design-decisions.md). For the full evolutionary arc, [`docs/evolution.md`](docs/evolution.md).
| camp_session (1 280 ticks) | 0.1 ms | 0.1 ms | 0.5 ms | 0.7 ms |
293
+
| survival_stress (220 ticks) | 0.1 ms | 0.2 ms | 0.3 ms | 0.3 ms |
289
294
290
-
Built with Python 3.14, zero runtime dependencies, and the standard library. See [docs/testing.md](docs/testing.md) for the test strategy and coverage philosophy.
295
+
See [docs/testing.md](docs/testing.md) for the test strategy and coverage philosophy.
| GOAP | Goal planner with Monte Carlo evaluation, spawn prediction |
75
-
76
-
The architecture has been [additive since the pipeline](evolution.md#the-invariant-each-stage-is-additive). Nothing was replaced. Everything composes.
65
+
The architecture progressed through six stages, each solving a specific failure mode of the previous one: monolith, pipeline, priority rules, utility scoring, learning loops, and GOAP. See [Evolution](evolution.md) for the full stage-by-stage history. The architecture has been [additive since the pipeline](evolution.md#the-invariant-each-stage-is-additive). Nothing was replaced. Everything composes.
0 commit comments