Align grouped first_value/last_value FILTER handling with shared Some(true) semantics#22681
Open
kosiew wants to merge 3 commits into
Open
Align grouped first_value/last_value FILTER handling with shared Some(true) semantics#22681kosiew wants to merge 3 commits into
first_value/last_value FILTER handling with shared Some(true) semantics#22681kosiew wants to merge 3 commits into
Conversation
…on functionality - Updated `filter_to_validity` to public in `groups_accumulator/nulls.rs`. - Modified `first_last.rs` to use shared `filter_to_validity` for grouped first/last. - Enhanced handling to reject NULL filter rows even if the value bit is true. - Added tests for: - First update path - Last update path - First merge path
- Destructured group_indices loop by removing redundant variable assignment. - Added test helper: new_int64_first_last_group_acc(...) for improved test setup. - Added test helper: nullable_bool_filter(...) to facilitate testing with nullable booleans. - Replaced repeated test setup across tests with the newly created helpers for better maintainability.
- Changed valid to validity in nullable_bool_filter. - Added assert_group_acc_int64_result function. - Replaced repeated evaluate/downcast/assert blocks in three tests for better maintainability.
|
The same bug in #22666. Looks like this PR covers that one too. Might be worth a sqllogictest on top since the repro was SQL. Here's mine if you want it (without the fix, g=1 returns 10 instead of NULL): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Grouped
first_valueandlast_valueused a localBooleanArray::value(idx)check when evaluating aggregateFILTERpredicates. This reads only the value bit and does not account for NULL validity, which can incorrectly treat a NULL filter value with atruevalue bit as passing.DataFusion's aggregate FILTER semantics require that only
Some(true)passes; bothSome(false)andNoneshould reject the row. This change reuses the existing shared filter validity logic to ensure groupedfirst_valueandlast_valuebehave consistently with other aggregate implementations.What changes are included in this PR?
Made
filter_to_validitypublicly accessible fromdatafusion-functions-aggregate-commonso it can be reused by grouped aggregate implementations.Updated
FirstLastGroupsAccumulatorto:filter_to_validity.BooleanArray::value(idx).Added targeted unit test helpers for grouped
first_value/last_valueaccumulators.Added regression tests covering nullable FILTER predicates where the value bit is
truebut the validity bit is NULL:test_first_group_acc_rejects_null_filter_with_true_value_bittest_last_group_acc_rejects_null_filter_with_true_value_bittest_first_group_acc_merge_rejects_null_filter_with_true_value_bitAre these changes tested?
Yes.
This PR adds the following unit tests:
test_first_group_acc_rejects_null_filter_with_true_value_bittest_last_group_acc_rejects_null_filter_with_true_value_bittest_first_group_acc_merge_rejects_null_filter_with_true_value_bitThese tests verify that rows with a NULL FILTER value do not pass even when the underlying boolean value bit is
true, and that the same semantics are preserved in the merge path.Are there any user-facing changes?
No user-facing changes are intended.
This is a correctness fix that aligns grouped
first_valueandlast_valueaggregate FILTER evaluation with existing DataFusion aggregate semantics for nullable filter predicates.LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.