Avoid rerunning no-op logical optimizer rules#22412
Conversation
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimizer-skip-unchanged-noop-rules (2e0e199) to 0da8961 (merge-base) diff File an issue against this benchmark runner |
|
run benchmark sql_planner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
…anged-noop-rules # Conflicts: # datafusion/optimizer/src/push_down_filter.rs
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimizer-skip-unchanged-noop-rules (cfebeca) to 0da8961 (merge-base) diff File an issue against this benchmark runner |
- Move duplicated `transformed_if_changed` into `utils.rs` and use it from `decorrelate_predicate_subquery`, `eliminate_cross_join`, `optimize_unions`, and `push_down_join`. - `eliminate_duplicated_expr`: track change inline via length comparison and skip the `Aggregate::try_new`/`Sort` rebuild when no duplicates are present — avoids both the helper call and the schema-recompute. - `optimize_projections`: avoid cloning the whole `Window`/`TableScan` when nothing changes; OR `aggregate_changed` into the recursive result so the `Aggregate` branch reports `Transformed::yes` when FD pruning reduces group_expr/aggr_expr counts. - `optimizer.rs`: extract the `Transformed::yes`/`no` consistency check into `assert_transformed_matches_plan` so the loop stays readable. - `push_down_filter`: revert the simplify-predicate check to a length-based comparison (matches `simplify_predicates` actual semantics — it only changes content via merging, not reordering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimizer-skip-unchanged-noop-rules (29096cf) to d318324 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
The logical optimizer currently reruns every rule on each pass. If a rule already returned
Transformed::noand no later rule has changed the plan since then, rerunning that rule does not add new information and adds planning cost.What changes are included in this PR?
This tracks a cheap plan version inside
Optimizer::optimize. The version increments whenever a rule reportstransformed = true. If a rule previously returnedTransformed::nofor the current plan version, the optimizer skips rerunning it until some rule changes the plan.Are these changes tested?
Existing tests
Are there any user-facing changes?
No