fix groupconcat#24382
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
aunjgr
left a comment
There was a problem hiding this comment.
Right direction (revert window-rewrite, use SORT+AGG with per-aggregate ORDER BY). Two correctness gaps not covered by the new tests.
Blocking
1. Multiple GROUP_CONCATs with different ORDER BY clauses are conflated (pkg/sql/plan/query_builder.go:3155-3175, pkg/sql/plan/having_binder.go:291).
The accumulator ctx.groupConcatOrderBys collects ORDER BY specs from all GROUP_CONCAT aggregates in the SELECT and emits a single composite SORT. For:
SELECT GROUP_CONCAT(x ORDER BY a),
GROUP_CONCAT(y ORDER BY b DESC)
FROM t GROUP BY g;the resulting SORT key is (g ASC, a, b DESC). Only the first aggregate's ordering is honored; the second sees rows ordered by (a, b DESC) rather than b DESC. Result: wrong concatenation order for the second aggregate.
The previous window-rewrite supported this correctly via independent window specs per aggregate. The new test file group_concat_orderby_test.go does not cover this shape.
Two options:
- Push ORDER BY into each
GROUP_CONCATaggregate so the runtime sorts per-aggregate inside its own state. - Reject (with a clear error) multiple GROUP_CONCATs with conflicting orderings, until per-aggregate sort is supported.
2. Shuffle/parallel AGG can re-order input across the SORT (pkg/sql/plan/shuffle.go:614-697, untouched).
Nothing in this PR prevents dontShuffle from returning false on the AGG node whose child is a SORT carrying groupConcatOrderBys. Hash-shuffle between SORT and AGG, or multi-CN merge, can interleave rows from different upstream producers within the same group, breaking per-group ordering. No OrderRequired flag on the AGG, no Shuffle=false enforcement, no merge-sort placement at the consumer.
Small-table unit tests won't hit this; production data above threshHoldForShuffleGroup will. The window-rewrite was immune because window operators preserve their own input ordering; the new SORT+AGG pattern relies on the AGG not reshuffling, which isn't guaranteed.
Test coverage gaps
group_concat_orderby_test.go only checks plan shape, not result correctness. Missing cases:
- Multiple GROUP_CONCATs with different ORDER BY (concern #1).
- DISTINCT inside GROUP_CONCAT.
- GROUP_CONCAT inside HAVING.
- Custom SEPARATOR + ORDER BY.
- Large-data shuffled execution (concern #2).
Verified
- Removal of window path is clean: no remaining references to
forceWindowsor the oldgetFramehelper. having_binder.go:248correctly returns the syntax error (the previous baremoerr.NewSyntaxErrorf(...)withoutreturnis fixed).BindContext.groupConcatOrderBysis planning-time only, not serialized; safe for plan cache and remote exec.- Updated
.resultfiles (function_group_concat.result:294,window/window.result:4166) — empty-table GROUP_CONCAT now returns one NULL row, matching MySQL semantics. New result is correct.
Minor
having_binder.go:250useslen(astExpr.Exprs)-1because the parser appends the separator as the last element — correct but reads as off-by-one. Worth a comment.
|
此修改会触发bug |
What type of PR is this?
Which issue(s) this PR fixes:
issue #24359
What this PR does / why we need it:
group_concat 改写为window函数的方式不对,导致执行结果错误.
修改为main上, 改写agg + order 的方式. 不再使用改写为window函数的方式.