Phase 5: Distributed Optimization

Overview

Phase 5 introduced high-level optimizations to reduce network latency and enable complex multi-shard query patterns.

Optimized query routing based on partitioning keys.

Predicate Analysis: Detects filters on sharding keys (e.g., WHERE id = 100).
Targeted Dispatch: Routes fragments only to the specific node owning the shard, avoiding cluster-wide broadcasts.

Implemented coordination for distributed analytics.

Partial Aggregation: Data nodes compute local counts and sums.
Global Merge: The coordinator identifies aggregate functions in the SELECT list and merges partial results from all shards into a final result set.

Developed a prototype for cross-shard JOINs.

Table Fetching: Coordinator retrieves full data from a smaller table across all shards.
Broadcasting: Pushes the gathered data to the ShuffleBuffer of every node in the cluster.
Local Execution: Rewrites the query so each node joins its local shard with the broadcasted buffer data.

Enabled inter-node data movement.

BufferScanOperator: A physical operator that reads from in-memory shuffle buffers instead of heap files.
ClusterManager Buffering: Thread-safe staging area for data received via PushData RPCs.

Broadcast joins are highly effective for small-to-large table joins but require careful consideration of coordinator memory limits.
Merging aggregates at the coordinator is a bottleneck for very large clusters; future work could explore tree-based merging.

All scenarios, including distributed transactions (2PC) and join orchestration, have been verified with automated integration tests.