feat(server): wire cluster query coordinator + distributed cursor — query Phase 3 (SPEC-304)#46
Merged
Merged
Conversation
In non-GROUP-BY multi-node plans the cursor vertex was being appended after network-receiver (coordinator-side), meaning each node sent all matching records and the coordinator filtered them — wrong for keyset pagination. Move the cursor vertex into the multi-node branch so it precedes network-sender; Step 3b (single-node path) is guarded with `!multi_node` to avoid double-insertion. Add converter unit test asserting cursor→network-sender edge exists and network-receiver→cursor edge does not exist in multi-node plans. Also route handle_query_subscribe through coordinator.execute_distributed when self.coordinator is Some, falling back to run_dag_local otherwise. The linear_engine_for_tests opt-out retains highest priority. The coordinator field was previously declared but never read on the query path.
Construct a ClusterQueryCoordinator and call QueryService::with_coordinator only when --seed-nodes is provided (cluster_mode = !seed_list.is_empty()). Single-node startup leaves coordinator: None so existing behaviour is unchanged. Key wiring links: - completion_registry Arc<DashMap<String, oneshot::Sender<DagCompletePayload>>> is shared between the coordinator and the cluster dispatch loop so DagComplete frames from peer nodes resolve the coordinator's awaiting receivers. - Inbound routing task now routes DagExecute/DagComplete/DagData frames to the dispatch channel instead of discarding them (bin:343-356 previously dropped all non-heartbeat frames, which would have timed out every distributed query). - ClusterStateAdapter bridges Arc<ClusterState> to Arc<dyn ClusterService> for the coordinator (same pattern as SimClusterService in simulation tests). - build_services gains a cluster_params: Option<ClusterParams> parameter; when Some, constructs the coordinator with the real connection_registry and record_store_factory available inside the function.
Adds a 2-node partition fault-injection simulation test proving that the
global keyset cursor is correctly applied worker-side before the network
boundary.
Test setup: node-0 owns scores {10,30,50}, node-1 owns {20,40,60}.
Page 1 (no cursor, limit=3) returns [10,20,30]. A CursorData is encoded
from the last row (score=30, key="delta"). A partition is injected then
healed. Page 2 (with cursor, limit=3) returns [40,50,60] — strictly after
the cursor keyset position, in global sort order, with no overlap or
duplicates from page 1.
Coverage note (per spec AC4): this sim drives coordinator.execute_distributed
directly and does NOT exercise handle_query_subscribe routing or the prod
bin dispatch-loop wiring. The dispatch-loop to completion_registry link
is verified by source inspection per the Validation Checklist.
- Replaces is_some() + expect() with an if-let binding so the cursor string is captured without a panic path (clippy unnecessary_unwrap) - No behavior change: branch remains gated on !multi_node
Eliminates the duplicated predicate/sort-hash + cursor-config map block shared by the multi-node worker-side and single-node cursor paths.
Deploying topgun with
|
| Latest commit: |
4d31e09
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://00196dd2.topgun-f45.pages.dev |
| Branch Preview URL: | https://sf-442-wire-coordinator.topgun-f45.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SPEC-304 (from TODO-442) — Query Engine Consolidation Phase 3: wire the cluster query
coordinator into production and complete the distributed merge path. Builds on SPEC-301, which
fixed distributed merge correctness (global sort + global limit) but deliberately left the
coordinator unwired.
Changes
bin/topgun_server.rs— construct and wireClusterQueryCoordinatorinto the prod assemblybehind a cluster-mode gate; share the completion_registry so DagComplete frames resolve the
coordinator's awaiting receivers; route cluster DAG frames to the coordinator dispatch loop.
dag/converter.rs— place the cursor vertex worker-side in multi-node DAG plans (soper-node keyset pagination positions are produced before the coordinator merge);
build_cursor_vertex_confighelper + if-let cleanup for the single-node branch.service/domain/query.rs— coordinator wiring surface.sim/cluster.rs—distributed_keyset_cursor_under_partitionfault-injection sim testexercising distributed cursor pagination under a network partition.
Verification
Reviewed via the SpecFlow impl-review gate. 4 Rust files (within the 5-file Language Profile).
CI gate is Rust (cargo test + sim + clippy
--all-targets --all-features -D warnings+ fmt);must be green before merge.