Skip to content

Commit a73926b

Browse files
committed
docs: update bloom filter architecture docs with 3-phase build process
1 parent 1a2874a commit a73926b

2 files changed

Lines changed: 25 additions & 21 deletions

File tree

docs/performance/SQLITE_COMPARISON.md

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -45,34 +45,37 @@ We addressed the gaps via the following optimizations:
4545
Distributed shuffle joins send **all tuples** across the network to partitioned nodes, even when many will never match. This causes unnecessary network traffic and buffer memory usage.
4646

4747
### Solution: Bloom Filter Integration
48-
Implemented bloom filters to filter tuples at the source before network transmission:
49-
- **One-sided bloom filter**: Built from the left/build table, applied to filter the right/probe table
50-
- **Distributed construction**: Each data node constructs its local bloom during the left/build scan phase
51-
- **Coordinator coordination**: `BloomFilterPush` RPC broadcasts filter metadata to all nodes before the right/probe shuffle
48+
Implemented bloom filters to filter tuples at the source before network transmission using a 3-phase approach:
49+
- **Phase 1 (Local Build)**: Each data node scans its local left/build table partition, extracts join key values, and builds a local bloom filter
50+
- **Phase 2 (Bit Aggregation)**: Coordinator sends `BloomFilterBits` RPC to each data node; each responds with local bloom bits; coordinator OR-aggregates all bits into a single filter
51+
- **Phase 3 (Sender-Side Filter)**: Coordinator broadcasts aggregated filter via `BloomFilterPush` RPC; before sending right/probe tuples, `ShuffleFragment` handler checks `might_contain()` and skips tuples that will definitely not match
5252

5353
### Architecture
5454
```
55-
[Phase 1: Shuffle Left] [Phase 2: Shuffle Right]
56-
| |
57-
v v
58-
Build local bloom Apply bloom filter
59-
from join keys before buffering
60-
| |
61-
+---- BloomFilterPush ----->---+
62-
(filter metadata) |
63-
v
64-
Filtered tuples buffered
55+
Phase 1: Scan Left Phase 2: Aggregate Bits Phase 3: Filter Right
56+
| | |
57+
v v v
58+
Build local bloom <---> BloomFilterBits RPC <-------- Aggregate & Broadcast
59+
on each data node (OR-aggregate bits) via BloomFilterPush
60+
| | |
61+
| v v
62+
+-----------------> BloomFilterPush might_contain() check
63+
(metadata only) | before PushData
64+
|
65+
v
66+
Filtered tuples buffered
6567
```
6668

6769
### Key Components
6870
| Component | Location | Purpose |
6971
|-----------|----------|---------|
7072
| `BloomFilter` class | `include/common/bloom_filter.hpp` | MurmurHash3-based bloom filter |
71-
| `BloomFilterArgs` RPC | `include/network/rpc_message.hpp` | Serialization for network transfer |
72-
| `ClusterManager` storage | `include/common/cluster_manager.hpp` | Stores bloom filter per context |
73-
| `PushData` handler | `src/main.cpp` | Receives and buffers filtered tuples |
74-
| `ShuffleFragment` handler | `src/main.cpp` | Applies bloom filter before sending |
75-
| Coordinator | `src/distributed/distributed_executor.cpp` | Broadcasts filter after Phase 1 |
73+
| `BloomFilterBitsArgs` RPC | `include/network/rpc_message.hpp` | Local bloom bits from data nodes |
74+
| `BloomFilterArgs` RPC | `include/network/rpc_message.hpp` | Aggregated filter broadcast |
75+
| `ClusterManager` storage | `include/common/cluster_manager.hpp` | Stores local and aggregated bloom filters |
76+
| `BloomFilterBits` handler | `src/main.cpp` | Returns local bloom bits to coordinator |
77+
| `ShuffleFragment` handler | `src/main.cpp` | Builds local bloom during Phase 1 scan |
78+
| Coordinator | `src/distributed/distributed_executor.cpp` | Collects bits, aggregates, broadcasts filter |
7679

7780
### Test Coverage
7881
- 10 unit tests covering: BloomFilter class, BloomFilterArgs serialization, ClusterManager storage, filter application logic

docs/phases/PHASE_6_DISTRIBUTED_JOIN.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@ Seamlessly integrated shuffle buffers into the Volcano execution model.
2828
### 5. Bloom Filter Optimization (`common/bloom_filter.hpp`)
2929
Added probabilistic filtering to reduce network traffic in shuffle joins.
3030
- **MurmurHash3-based BloomFilter**: Configurable false positive rate (default 1%) with optimal bit count and hash function calculation.
31-
- **Filter Construction**: Built during Phase 1 scan, stored in `ClusterManager` per context.
32-
- **Filter Application**: `PushData` handler checks `might_contain()` before buffering, skipping tuples that will definitely not match.
31+
- **Distributed Construction**: Each data node builds a local bloom filter from its left/build table partition during Phase 1 scan.
32+
- **Bit Aggregation**: Coordinator collects local bloom bits from all data nodes via `BloomFilterBits` RPC and OR-aggregates them into a single filter.
33+
- **Sender-Side Filtering**: Aggregated filter is broadcast via `BloomFilterPush` before Phase 2; `ShuffleFragment` handler applies `might_contain()` before sending `PushData`, skipping tuples that will definitely not match.
3334

3435
## Lessons Learned
3536
- Shuffle joins significantly reduce network traffic compared to broadcast joins for large-to-large table joins.

0 commit comments

Comments
 (0)