Streaming-mode CPU reduction (multicast-only default, L2-5) by packethog · Pull Request #10 · packethog/order_book_server

packethog · 2026-05-18T21:19:40Z

Summary

Reduces order-book-server CPU in streaming ingest mode so it no longer starves the co-located Hyperliquid validator (the original "excessive streaming CPU → validator impact → negative feedback loop" problem). Four independent perf levers plus a default behavior change that makes the publisher multicast-only.

L2-1 — Px::num_digits() f64 log10().floor() → integer u64::ilog10() (also fixes a latent off-by-one for u64 values just below large powers of ten).
L2-3 — WebSocket/TOB-L2 path only. In multicast-only mode, skip the 6 unused WebSocket l2Book aggregation variants (the n_sig_figs/mantissa bucket combos) and compute only the single unbucketed L2 snapshot the TOB quote publisher needs (1 instead of 7 per coin). This does NOT touch the DZ-DoB feed (--dob-group): depth-of-book still streams full per-order L4 deltas (every OrderAdd/Cancel/Execute) and full per-instrument resting-order snapshots from a separate path (dob_tap / clone_coin_orders), unchanged. The dropped variants have zero consumers when there are no WebSocket l2Book subscribers.
L2-4 — pre-size L2 level output vectors to skip realloc growth (bounded for bucketed variants so a deep book can't retain huge capacity).
L2-5 — streaming TOB L2 snapshot emitted once per finalized block instead of once per file-read chunk, with a stuck-stream backstop and recovery-path emission so corrected BBOs are never withheld.

Behavior change (action required for existing deployments)

--enable-websocket is now off by default. The publisher is multicast-only unless the flag is passed. Deployments that relied on the default WS server must add --enable-websocket. Passing it restores byte-for-byte pre-L2-5 streaming cadence (the durable rollback contract, enforced by discriminating tests + goldens).
Output-config validation: a no-output config is rejected, and --dob-group requires --multicast-group (the instrument registry is only bootstrapped in multicast mode). Enforced at both the CLI and the run_websocket_server library boundary.

Scope clarification: TOB-L2 vs DZ-DoB

Two independent multicast feeds:

TOB (--multicast-group): top-of-book quote feed, derived from compute_l2_snapshots then truncate(1). L2-1/L2-3/L2-4/L2-5 all act here.
DZ-DoB (--dob-group): full binary depth-of-book — per-order L4 delta stream + full per-instrument snapshots, built from dob_tap events and clone_coin_orders (the entire resting-order set, every level). No lever in this PR reduces DoB depth or level coverage.

Hardening

~24 adversarial-review iterations. All substantive defects resolved: the catch-up stale-quote leak, a publisher-authority supersede model, three classes of off-lock recovery TOCTOU race (height → intra-block mutation_seq → stream-finalized-height gating), per-diff L4 fan-out CPU waste in multicast-only mode, the stuck-stream backstop (dirty-epoch-age gate + mutation_seq content-dedup), caught_up/periodic-resend hygiene incl. broadcast-Lagged, and Correction resync-eligibility. Net simplification where an over-engineered cached-supersede retry was removed in favor of the standard caught-up resync + race-free in-loop force-publish. One documented irreducible best-effort tradeoff remains in the stuck-stream backstop (inherent to fire-and-forget UDP multicast during a stalled feed; recovered by periodic resync) — not a ship-blocker.

Tests

Full lib suite green; block-mode goldens byte-identical throughout (block mode never regressed).
New coverage: ilog10 boundary, L2-5 stream-finalization suite, publisher decision matrices (snapshot_should_publish, next_pending_provisional, caught_up_after_publish), backstop dirty-epoch/dedup, recovery race + finalized-height guards, --enable-websocket byte-for-byte parity, no-output/DoB-only rejection (CLI + runner), dual-validator block-vs-stream parity.

Live canary validation (tyo-hl-node)

Cross-built (Docker linux/amd64 bookworm; ABI-matched: GLIBC_2.34, libssl.so.3) and deployed to the live validator host as the canary. Steady state: order-book ~1 core vs validator ~3–4 cores on a 16-core box, e2e ~0.3s, both services NRestarts=0 — the original CPU-starvation pathology is not in evidence. One operational tweak (systemd unit, not code) was needed: --dob-channel-bound 4096 → 65536 to absorb HL block-boundary OrderAdd bursts and eliminate dob_tap: channel full drops (0 drops over sustained windows after, incl. catch-up).

Caveats

No true A/B: the prior binary never ran healthy on the canary box during the session (it was crash-looping on an unrelated validator outage), so this is an absolute steady-state characterization, not a measured % reduction.
Deployed unmerged as a canary per operator direction; recommend a soak before merge / wider rollout.

Test plan

cargo test --workspace green; goldens unchanged
cargo clippy --workspace --all-targets (warn-level pedantic baseline only)
Soak on tyo-hl-node: journalctl -u orderbook-server clean, validator NRestarts=0, e2e sub-second, no dob_tap drops
Operator perf re-measure / A/B vs old build on a stable box
Confirm any other deployments add --enable-websocket if they need the WS server

🤖 Generated with Claude Code

…nalization-driven snapshots

…acement)

…backstop ticker)

…ld (250ms = catchup_threshold_ms/2)

…up log

gate the TcpListener::bind + axum::serve block on `enable_websocket`; when false (the default), the ws port is never bound and the task parks via `std::future::pending` so the hl_listen and multicast publisher tasks keep running. adds an integration test that verifies the port remains rebindable in the disabled case.

…ocket disabled

L2-5: in streaming + WS-disabled, TOB L2 snapshots now emit once per finalized block (when the book changed) instead of once per file-read chunk. Eliminates the per-chunk snapshot CPU multiplier. Streaming TOB goldens regenerated: the per-chunk path emitted ~27 duplicate quotes per block-time; the new path emits one authoritative quote per finalized dirty block. Every regenerated quote is byte-identical to the block-mode golden at the same source timestamp (verified). Block-mode goldens unchanged. finalize_streaming_for_test updated to mirror finalize_stream_block's authoritative dirty-emission so the regenerated goldens reflect production behavior including the final block.

8 tests covering the l2-5 behavioral contracts: one authoritative snapshot per finalized bbo-changing block; no snapshot for clean blocks; finalization in height order; backstop only after 250ms dirty interval; backstop provisional does not suppress later authoritative finalization; tolerated no-op update/remove stays clean; recovery emits authoritative snapshot immediately.

…th identical bbo (l2-5)

…led-stream provisional is always corrected

…y from stream dirty epoch

…t so --enable-websocket preserves pre-L2-5 streaming cadence

… just the cli

…tch the cli

…leak vs stranded-provisional dilemma)

…o actual send

… per epoch

…de obligation

…stalls don't strand subscribers

…d-stale supersede

…nd stays gated

…apshot memory)

…mutation seq

…n recovery mutations

…); drop one-shot state; don't cache provisional for resend

…thoritative resend mid-stall)

…ble freshness window

…icast; flag-based rollback)

… authoritative while obligation pending

…stop on dirty-epoch age not latest mutation

…he prior block (S1)

…an't poison the resend/retry cache

…igation (no stuck pending)

…ublish + standard caught-up resync

…oadcast as current

…s pending provisional

…itative supersedes; revert lag-clear)

…ation emission, supersede model, recovery guards

…e docs

packethog added 30 commits May 15, 2026 13:19

draft: streaming-mode cpu reduction design spec

384bcd4

spec: revise streaming cpu design with --enable-websocket flag and fi…

00c6b9e

…nalization-driven snapshots

plan: streaming cpu reduction implementation plan

d4fd8ef

plan: address codex findings on plan v1 (recovery, backstop, dirty pl…

ae3bb49

…acement)

plan: address codex v2 findings (ws-gate, snapped gate, dedicated 5s …

4b57d07

…backstop ticker)

plan: align stuck-stream backstop interval with tob freshness thresho…

cff3895

…ld (250ms = catchup_threshold_ms/2)

test: add Px::num_digits boundary tests around 10^n thresholds

f794ca1

perf: replace Px::num_digits f64 log10 with u64::ilog10

9e20066

perf: pre-size L2 level output Vecs to skip realloc growth

6100fa2

feat: add --enable-websocket cli flag (default off)

f27a70e

feat: thread enable_websocket through run_websocket_server with start…

c9927b6

…up log

feat: thread enable_websocket into OrderBookListener and OrderBookState

387b224

test: tidy websocket_disabled_test doc backticks and unwrap allow

1704c9b

perf: skip 6 bucketed l2 variants and cap unbucketed to bbo when webs…

5d9fb0e

…ocket disabled

test: assert tob bbo is identical across enable_websocket configs

f9eae5d

fix: reject startup when no market-data output is configured

acce669

feat: add book_dirty flag to OrderBookState set only on real mutations

3f9f006

feat: 5s stuck-stream snapshot backstop on dedicated 250ms ticker

552f3bd

fix: emit authoritative tob snapshot after streaming recovery

161410d

test: add streaming-mode listener test constructor

8b49e3e

refactor: expose dual-validator fixture snapshot capture for reuse

e4e4696

test: assert stream quote sequence is an ordered block subsequence wi…

9075ef3

…th identical bbo (l2-5)

docs: document --enable-websocket multicast-only default in README

190383d

docs: changelog for streaming cpu reduction and ws default-off

eeed11c

fix: authoritative tob snapshots bypass staleness suppression so stal…

274804d

…led-stream provisional is always corrected

fix: narrow staleness bypass to corrections only and decouple recover…

26c00f3

…y from stream dirty epoch

fix: gate L2-5 finalization and recovery emission on !enable_websocke…

54fe8c8

…t so --enable-websocket preserves pre-L2-5 streaming cadence

packethog added 30 commits May 18, 2026 10:20

websocket server: reject no-output config at the runner boundary, not…

068d101

… just the cli

websocket server: reject dob-only config at the runner boundary to ma…

d641d65

…tch the cli

publisher: own the provisional-supersede decision (resolves catch-up …

fcd31f1

…leak vs stranded-provisional dilemma)

backstop: trigger on source age not local clock; tie supersede flag t…

99ec7b9

…o actual send

stream cpu: skip l4 fanout when websocket disabled; one-shot backstop…

64e9075

… per epoch

gitignore: exclude .claude/*.lock harness artifacts

8916d43

publisher: recovery correction must not clear the provisional-superse…

5790c04

…de obligation

backstop: re-arm one-shot guard on post-provisional mutation so long …

6523ca5

…stalls don't strand subscribers

publisher: only enter caught-up state on a fresh publish, not a force…

ee981aa

…d-stale supersede

publisher: clear caught-up on a forced-stale publish so periodic rese…

7c9e044

…nd stays gated

l2: don't reserve full source depth for bucketed variants (bounded sn…

269fb6f

…apshot memory)

recovery: reject off-lock report on same-height intra-block race via …

651d830

…mutation seq

recovery: only validate finalized stream heights; bump mutation seq o…

7dae2f4

…n recovery mutations

backstop: freshness-window emit (retry-while-fresh, silent-when-stale…

0f7262c

…); drop one-shot state; don't cache provisional for resend

publisher: a provisional publish must not mark caught-up (no stale au…

271b74f

…thoritative resend mid-stall)

backstop: dedup provisional by mutation seq instead of a phase-skippa…

793837e

…ble freshness window

docs: reconcile readme/changelog with final design (dob requires mult…

77a137d

…icast; flag-based rollback)

publisher: discharge supersede only on full local send; resend cached…

9fa87c6

… authoritative while obligation pending

fix: revert pending-driven resend (rolls back subscribers); gate back…

edee645

…stop on dirty-epoch age not latest mutation

publisher: retry only the cached forced-stale supersede (S2), never t…

6de440d

…he prior block (S1)

publisher: cache only published snapshots so a suppressed stale one c…

19d545e

…an't poison the resend/retry cache

publisher: a fully-sent interval retry discharges the provisional obl…

a1623d7

…igation (no stuck pending)

publisher: drop racy cached-supersede resend; rely on in-loop force-p…

fc77a81

…ublish + standard caught-up resync

publisher: drop caught-up on broadcast lag so stale cache is not rebr…

618c460

…oadcast as current

publisher: corrections stay resync-eligible; broadcast lag also clear…

a23e3c9

…s pending provisional

publisher: keep pending-provisional across broadcast lag (next author…

3951bce

…itative supersedes; revert lag-clear)

gitignore: exclude docker cross-build output dir

31ac40f

docs: update ARCHITECTURE.md for multicast-only default, L2-5 finaliz…

d57d8c0

…ation emission, supersede model, recovery guards

docs: add repo claude.md onboarding pointing at architecture.md

7693bc6

binaries: default --dob-channel-bound to 65536 (burst-safe); reconcil…

13dc3e8

…e docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming-mode CPU reduction (multicast-only default, L2-5)#10

Streaming-mode CPU reduction (multicast-only default, L2-5)#10
packethog wants to merge 64 commits into
mainfrom
ss/blissful-grothendieck-f57f39

packethog commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

packethog commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior change (action required for existing deployments)

Scope clarification: TOB-L2 vs DZ-DoB

Hardening

Tests

Live canary validation (tyo-hl-node)

Caveats

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

packethog commented May 18, 2026 •

edited

Loading