Skip to content

Commit 0991945

Browse files
committed
fix: skip bloom filtering when filter_data is empty
The bloom filter POC sends empty filter_data during BloomFilterPush, intending for data nodes to build filters locally. But get_bloom_filter() was returning BloomFilter(1) - a filter with 64 bits all set to 0. Since might_contain() returns false when any bit is 0, this caused ALL rows to be filtered out during Phase 2 shuffle. Fix has_bloom_filter() to return false when filter_data is empty, effectively disabling bloom filtering for the POC. This allows E2E JOIN tests to pass.
1 parent e3ec112 commit 0991945

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

include/common/cluster_manager.hpp

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,10 +230,16 @@ class ClusterManager {
230230

231231
/**
232232
* @brief Check if a bloom filter exists for a context
233+
* @note Returns false if filter_data is empty, so bloom filtering is skipped
233234
*/
234235
[[nodiscard]] bool has_bloom_filter(const std::string& context_id) const {
235236
const std::scoped_lock<std::mutex> lock(mutex_);
236-
return bloom_filters_.count(context_id) != 0U;
237+
auto it = bloom_filters_.find(context_id);
238+
if (it == bloom_filters_.end()) {
239+
return false;
240+
}
241+
// Only consider bloom filter valid if it has actual filter data
242+
return !it->second.filter_data.empty();
237243
}
238244

239245
/**

0 commit comments

Comments
 (0)