Streaming-mode CPU reduction (multicast-only default, L2-5)#10
Open
packethog wants to merge 64 commits into
Open
Streaming-mode CPU reduction (multicast-only default, L2-5)#10packethog wants to merge 64 commits into
packethog wants to merge 64 commits into
Conversation
…nalization-driven snapshots
…ld (250ms = catchup_threshold_ms/2)
gate the TcpListener::bind + axum::serve block on `enable_websocket`; when false (the default), the ws port is never bound and the task parks via `std::future::pending` so the hl_listen and multicast publisher tasks keep running. adds an integration test that verifies the port remains rebindable in the disabled case.
L2-5: in streaming + WS-disabled, TOB L2 snapshots now emit once per finalized block (when the book changed) instead of once per file-read chunk. Eliminates the per-chunk snapshot CPU multiplier. Streaming TOB goldens regenerated: the per-chunk path emitted ~27 duplicate quotes per block-time; the new path emits one authoritative quote per finalized dirty block. Every regenerated quote is byte-identical to the block-mode golden at the same source timestamp (verified). Block-mode goldens unchanged. finalize_streaming_for_test updated to mirror finalize_stream_block's authoritative dirty-emission so the regenerated goldens reflect production behavior including the final block.
8 tests covering the l2-5 behavioral contracts: one authoritative snapshot per finalized bbo-changing block; no snapshot for clean blocks; finalization in height order; backstop only after 250ms dirty interval; backstop provisional does not suppress later authoritative finalization; tolerated no-op update/remove stays clean; recovery emits authoritative snapshot immediately.
…th identical bbo (l2-5)
…led-stream provisional is always corrected
…y from stream dirty epoch
…t so --enable-websocket preserves pre-L2-5 streaming cadence
…leak vs stranded-provisional dilemma)
…stalls don't strand subscribers
…d-stale supersede
…n recovery mutations
…); drop one-shot state; don't cache provisional for resend
…thoritative resend mid-stall)
…ble freshness window
…icast; flag-based rollback)
… authoritative while obligation pending
…stop on dirty-epoch age not latest mutation
…he prior block (S1)
…an't poison the resend/retry cache
…igation (no stuck pending)
…ublish + standard caught-up resync
…oadcast as current
…s pending provisional
…itative supersedes; revert lag-clear)
…ation emission, supersede model, recovery guards
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reduces order-book-server CPU in streaming ingest mode so it no longer starves the co-located Hyperliquid validator (the original "excessive streaming CPU → validator impact → negative feedback loop" problem). Four independent perf levers plus a default behavior change that makes the publisher multicast-only.
Px::num_digits()f64log10().floor()→ integeru64::ilog10()(also fixes a latent off-by-one foru64values just below large powers of ten).l2Bookaggregation variants (then_sig_figs/mantissabucket combos) and compute only the single unbucketed L2 snapshot the TOB quote publisher needs (1 instead of 7 per coin). This does NOT touch the DZ-DoB feed (--dob-group): depth-of-book still streams full per-order L4 deltas (every OrderAdd/Cancel/Execute) and full per-instrument resting-order snapshots from a separate path (dob_tap/clone_coin_orders), unchanged. The dropped variants have zero consumers when there are no WebSocketl2Booksubscribers.Behavior change (action required for existing deployments)
--enable-websocketis now off by default. The publisher is multicast-only unless the flag is passed. Deployments that relied on the default WS server must add--enable-websocket. Passing it restores byte-for-byte pre-L2-5 streaming cadence (the durable rollback contract, enforced by discriminating tests + goldens).--dob-grouprequires--multicast-group(the instrument registry is only bootstrapped in multicast mode). Enforced at both the CLI and therun_websocket_serverlibrary boundary.Scope clarification: TOB-L2 vs DZ-DoB
Two independent multicast feeds:
--multicast-group): top-of-book quote feed, derived fromcompute_l2_snapshotsthentruncate(1). L2-1/L2-3/L2-4/L2-5 all act here.--dob-group): full binary depth-of-book — per-order L4 delta stream + full per-instrument snapshots, built fromdob_tapevents andclone_coin_orders(the entire resting-order set, every level). No lever in this PR reduces DoB depth or level coverage.Hardening
~24 adversarial-review iterations. All substantive defects resolved: the catch-up stale-quote leak, a publisher-authority supersede model, three classes of off-lock recovery TOCTOU race (height → intra-block
mutation_seq→ stream-finalized-height gating), per-diff L4 fan-out CPU waste in multicast-only mode, the stuck-stream backstop (dirty-epoch-age gate +mutation_seqcontent-dedup),caught_up/periodic-resend hygiene incl. broadcast-Lagged, and Correction resync-eligibility. Net simplification where an over-engineered cached-supersede retry was removed in favor of the standard caught-up resync + race-free in-loop force-publish. One documented irreducible best-effort tradeoff remains in the stuck-stream backstop (inherent to fire-and-forget UDP multicast during a stalled feed; recovered by periodic resync) — not a ship-blocker.Tests
snapshot_should_publish,next_pending_provisional,caught_up_after_publish), backstop dirty-epoch/dedup, recovery race + finalized-height guards,--enable-websocketbyte-for-byte parity, no-output/DoB-only rejection (CLI + runner), dual-validator block-vs-stream parity.Live canary validation (tyo-hl-node)
Cross-built (Docker
linux/amd64bookworm; ABI-matched:GLIBC_2.34,libssl.so.3) and deployed to the live validator host as the canary. Steady state: order-book ~1 core vs validator ~3–4 cores on a 16-core box,e2e ~0.3s, both servicesNRestarts=0— the original CPU-starvation pathology is not in evidence. One operational tweak (systemd unit, not code) was needed:--dob-channel-bound 4096 → 65536to absorb HL block-boundaryOrderAddbursts and eliminatedob_tap: channel fulldrops (0 drops over sustained windows after, incl. catch-up).Caveats
Test plan
cargo test --workspacegreen; goldens unchangedcargo clippy --workspace --all-targets(warn-level pedantic baseline only)journalctl -u orderbook-serverclean, validatorNRestarts=0, e2e sub-second, nodob_tapdrops--enable-websocketif they need the WS server🤖 Generated with Claude Code