Skip to content

Streaming-mode CPU reduction (multicast-only default, L2-5)#10

Open
packethog wants to merge 64 commits into
mainfrom
ss/blissful-grothendieck-f57f39
Open

Streaming-mode CPU reduction (multicast-only default, L2-5)#10
packethog wants to merge 64 commits into
mainfrom
ss/blissful-grothendieck-f57f39

Conversation

@packethog
Copy link
Copy Markdown
Owner

@packethog packethog commented May 18, 2026

Summary

Reduces order-book-server CPU in streaming ingest mode so it no longer starves the co-located Hyperliquid validator (the original "excessive streaming CPU → validator impact → negative feedback loop" problem). Four independent perf levers plus a default behavior change that makes the publisher multicast-only.

  • L2-1Px::num_digits() f64 log10().floor() → integer u64::ilog10() (also fixes a latent off-by-one for u64 values just below large powers of ten).
  • L2-3WebSocket/TOB-L2 path only. In multicast-only mode, skip the 6 unused WebSocket l2Book aggregation variants (the n_sig_figs/mantissa bucket combos) and compute only the single unbucketed L2 snapshot the TOB quote publisher needs (1 instead of 7 per coin). This does NOT touch the DZ-DoB feed (--dob-group): depth-of-book still streams full per-order L4 deltas (every OrderAdd/Cancel/Execute) and full per-instrument resting-order snapshots from a separate path (dob_tap / clone_coin_orders), unchanged. The dropped variants have zero consumers when there are no WebSocket l2Book subscribers.
  • L2-4 — pre-size L2 level output vectors to skip realloc growth (bounded for bucketed variants so a deep book can't retain huge capacity).
  • L2-5 — streaming TOB L2 snapshot emitted once per finalized block instead of once per file-read chunk, with a stuck-stream backstop and recovery-path emission so corrected BBOs are never withheld.

Behavior change (action required for existing deployments)

  • --enable-websocket is now off by default. The publisher is multicast-only unless the flag is passed. Deployments that relied on the default WS server must add --enable-websocket. Passing it restores byte-for-byte pre-L2-5 streaming cadence (the durable rollback contract, enforced by discriminating tests + goldens).
  • Output-config validation: a no-output config is rejected, and --dob-group requires --multicast-group (the instrument registry is only bootstrapped in multicast mode). Enforced at both the CLI and the run_websocket_server library boundary.

Scope clarification: TOB-L2 vs DZ-DoB

Two independent multicast feeds:

  • TOB (--multicast-group): top-of-book quote feed, derived from compute_l2_snapshots then truncate(1). L2-1/L2-3/L2-4/L2-5 all act here.
  • DZ-DoB (--dob-group): full binary depth-of-book — per-order L4 delta stream + full per-instrument snapshots, built from dob_tap events and clone_coin_orders (the entire resting-order set, every level). No lever in this PR reduces DoB depth or level coverage.

Hardening

~24 adversarial-review iterations. All substantive defects resolved: the catch-up stale-quote leak, a publisher-authority supersede model, three classes of off-lock recovery TOCTOU race (height → intra-block mutation_seq → stream-finalized-height gating), per-diff L4 fan-out CPU waste in multicast-only mode, the stuck-stream backstop (dirty-epoch-age gate + mutation_seq content-dedup), caught_up/periodic-resend hygiene incl. broadcast-Lagged, and Correction resync-eligibility. Net simplification where an over-engineered cached-supersede retry was removed in favor of the standard caught-up resync + race-free in-loop force-publish. One documented irreducible best-effort tradeoff remains in the stuck-stream backstop (inherent to fire-and-forget UDP multicast during a stalled feed; recovered by periodic resync) — not a ship-blocker.

Tests

  • Full lib suite green; block-mode goldens byte-identical throughout (block mode never regressed).
  • New coverage: ilog10 boundary, L2-5 stream-finalization suite, publisher decision matrices (snapshot_should_publish, next_pending_provisional, caught_up_after_publish), backstop dirty-epoch/dedup, recovery race + finalized-height guards, --enable-websocket byte-for-byte parity, no-output/DoB-only rejection (CLI + runner), dual-validator block-vs-stream parity.

Live canary validation (tyo-hl-node)

Cross-built (Docker linux/amd64 bookworm; ABI-matched: GLIBC_2.34, libssl.so.3) and deployed to the live validator host as the canary. Steady state: order-book ~1 core vs validator ~3–4 cores on a 16-core box, e2e ~0.3s, both services NRestarts=0 — the original CPU-starvation pathology is not in evidence. One operational tweak (systemd unit, not code) was needed: --dob-channel-bound 4096 → 65536 to absorb HL block-boundary OrderAdd bursts and eliminate dob_tap: channel full drops (0 drops over sustained windows after, incl. catch-up).

Caveats

  • No true A/B: the prior binary never ran healthy on the canary box during the session (it was crash-looping on an unrelated validator outage), so this is an absolute steady-state characterization, not a measured % reduction.
  • Deployed unmerged as a canary per operator direction; recommend a soak before merge / wider rollout.

Test plan

  • cargo test --workspace green; goldens unchanged
  • cargo clippy --workspace --all-targets (warn-level pedantic baseline only)
  • Soak on tyo-hl-node: journalctl -u orderbook-server clean, validator NRestarts=0, e2e sub-second, no dob_tap drops
  • Operator perf re-measure / A/B vs old build on a stable box
  • Confirm any other deployments add --enable-websocket if they need the WS server

🤖 Generated with Claude Code

packethog added 30 commits May 15, 2026 13:19
gate the TcpListener::bind + axum::serve block on `enable_websocket`;
when false (the default), the ws port is never bound and the task
parks via `std::future::pending` so the hl_listen and multicast
publisher tasks keep running. adds an integration test that verifies
the port remains rebindable in the disabled case.
L2-5: in streaming + WS-disabled, TOB L2 snapshots now emit once per
finalized block (when the book changed) instead of once per file-read
chunk. Eliminates the per-chunk snapshot CPU multiplier.

Streaming TOB goldens regenerated: the per-chunk path emitted ~27
duplicate quotes per block-time; the new path emits one authoritative
quote per finalized dirty block. Every regenerated quote is
byte-identical to the block-mode golden at the same source timestamp
(verified). Block-mode goldens unchanged. finalize_streaming_for_test
updated to mirror finalize_stream_block's authoritative dirty-emission
so the regenerated goldens reflect production behavior including the
final block.
8 tests covering the l2-5 behavioral contracts: one authoritative snapshot
per finalized bbo-changing block; no snapshot for clean blocks; finalization
in height order; backstop only after 250ms dirty interval; backstop provisional
does not suppress later authoritative finalization; tolerated no-op update/remove
stays clean; recovery emits authoritative snapshot immediately.
…t so --enable-websocket preserves pre-L2-5 streaming cadence
packethog added 30 commits May 18, 2026 10:20
…); drop one-shot state; don't cache provisional for resend
…ation emission, supersede model, recovery guards
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant