perf: Reorder predicates in conjuncts via simple heuristic#22343
perf: Reorder predicates in conjuncts via simple heuristic#22343neilconway wants to merge 10 commits into
Conversation
DataFusion's vectorized AND evaluator already short-circuits the right-hand side when the LHS keeps few rows. Until now the order of conjuncts in a Filter was whatever the user wrote, so expensive predicates like LIKE and regex could run on the full batch even when a cheap comparison would have filtered most rows first. This change classifies each conjunct as cheap or expensive (LIKE, SIMILAR TO, regex operators, scalar functions, and subqueries are expensive; everything else is cheap) and does a stable partition that puts cheap predicates first. The helper reports whether any reorder actually happened so the caller skips rebuilding the conjunction when the input was already cheap-first. On ClickBench (hits_partitioned, 5 iterations) the reorder yields +13-16% on Q21 and +7-9% on Q22, the two queries that mix LIKE with a cheap `<>` predicate; other queries are unchanged within noise.
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
I agree on the general idea, but considering short-circuiting and also conjuncts' selectivity, this can actually backfire in practice, so a config knob seems important to have (I might have missed it, but I couldn't see it in the PR). Ideally |
|
@asolimando Thanks for the feedback!
It's possible to add a knob if we feel like there is a need for one, but I'd rather not add one reflexively. I made the definition of "cheap" vs. "expensive" very conservative partly in hopes of avoiding a config knob. Looking more closely at the previous rewriting logic, we actually have not been respecting the predicate order in the query text for a while:
I think per-UDF extensibility to express some notion of cost or selectivity could definitely make sense, although that's a much bigger task to take on. |
|
BTW I checked all of the ClickBench queries were we see minor regressions (40, 41, 38, 24, 12, 5, and 7), and none of them have predicates that will be reordered by this PR. So I suspect those regressions are just noise. |
|
Agreed that the regressions look like noise. But also the only real win seems |
I see consistent improvements on ClickBench Q21 (~10-13%) and Q22 (~5%), which are both cases where we now reorder Interestingly, this PR does not fire for Q73 in TPC-DS, so I'm not sure what is going on there 😊 I couldn't repro an improvement locally, so I guess it is just benchmark noise. To see improvements for a broader class of queries, we'd need to extend the heuristics to consider more criteria. |
|
One challenge is that "cheap" means different things depending on where the predicate is evaluated:
Perhaps this kind of reordering could be implemented as a runtime optimization inside |
That is exactly what #22144 does 😃. I think we could re-use pretty much the exact same machinery. It took a lot of iterations to arrive at the right metrics: you want to take into account time spent on compute no just selectivity, etc. Someone please correct me if I'm wrong but IIRC currently because of the tree structure we compute each side of a binary expression and apply the slice to the array, then compute the next side, etc. I wonder if an approach like apache/arrow-rs#9659 might be helpful to mitigate overheads from non-selective masks? |
I think it's still useful to be able to re-order "statically" as you might want to use statistics for that, which might be more stable then dynamic approaches, which are usually sensitive to the "shape" of the first part of the data, and the choice is usually not revisited (and even in that case, it might fluctuate, while in some cases the static order could be the optimal one). I think it's good to have multiple options, as long as downstream users can mix and match what works best for them, and they can "easily" correct course for problematic queries without the need of code changes. |
|
I think reordering predicates based on planning time info will suffer from bad statistics (as do all such plan time decisions). If we can figure out some way to make the decision more dynamic I think it will be a better design in the long run / harder to get icnorrectt |
I agree that doing predicate reordering dynamically has a lot of promise (as well as some potential concerns, like overhead and implementation complexity). But a simple static predicate reordering does not preclude also doing dynamic reordering; indeed, we need some initial predicate order to dynamically adjust later. We already reorder predicates today, in |
alamb
left a comment
There was a problem hiding this comment.
Sorry for not responding to this one with more detail until now. Thank you for the thoughtful comments and review @kosiew @2010YOUY01 and @asolimando
TLDR is if this PR had a config.optimizer.reorder_filters type config knob to turn this optimization off, I think it would be a good addition to DataFusion.
In general I agree that using static heuristics such as this will result in better plans most of the time.
My concern is that there will certainly be cases where an "expensive" predicate is actually very selective and should be done before "inexpensive" (but unselective ones) ones.
For example
WHERE col LIKE '....' AND col = 'bar' and col = 'baz' AND col = ...If the col LIKE '.... is super selective (selects a single row) and the others are not, then doing it first is probably the right thing to do.
We have had cases like this before in DataFusion and what we have done is leave an "escape" hatch in the form of a config parameter, so that if a user has some query where the old heuristic ("syntactic optimizer!") work better, they can avoid the new optimization
cc @adriangb
| /// | ||
| /// Returns `(predicates, changed)`. When `changed` is `false` the input was | ||
| /// already cheap-first and the caller can skip rebuilding the conjunction. | ||
| pub(crate) fn reorder_predicates(predicates: Vec<Expr>) -> (Vec<Expr>, bool) { |
There was a problem hiding this comment.
It could also potentially return Transformed<Vec<Expr>>which carries the "was this thing changed" flag alread
| return (predicates, false); | ||
| } | ||
|
|
||
| let mut cheap = Vec::with_capacity(predicates.len()); |
There was a problem hiding this comment.
You could probably save the allocation / sort in place using https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort_by and a custom comparator
| /// comparisons, negations, casts), and consider anything outside this list to | ||
| /// be expensive. New/unrecognized expressions therefore default to being | ||
| /// expensive. | ||
| fn is_cheap_node(expr: &Expr) -> bool { |
There was a problem hiding this comment.
I would recommend:
- Make this a method on
Expr(so it is more visible and potentially easier to override for functions for example) - Return a
u8or something else that could represent different levels of cheapness rather than just a boolean
There was a problem hiding this comment.
Both of these changes could make sense, but I think this moves us in the direction of having a full-blown function cost model, which is the kind of API I was hoping to avoid committing to right now. IOW, if we start with something minimal and conservative like the current approach, we can always extend it to something much more ambitious in the future.
| | Expr::Cast(_) | ||
| | Expr::TryCast(_) | ||
| | Expr::InList(_) => true, | ||
| // BinaryExpr is cheap unless the operator is LIKE or regexp matching. |
There was a problem hiding this comment.
regexp functions also come to mind as being relatively expensive to evaluate
There was a problem hiding this comment.
@alamb regexp functions and regexp operators (as well as LIKE and SIMILAR TO) will be considered expensive in the current version of the PR.
@adriangb A CASE expression is considered cheap if every sub-expression in the CASE is cheap. Personally that seems pretty reasonable. Maybe you could find a machine-generated CASE expression that is so large or deeply nested that it is expensive to evaluate despite consisting only of cheap operations? We could also move CASE to be expensive if you'd prefer, I don't feel super strongly about it.
There was a problem hiding this comment.
@alamb regexp functions and regexp operators (as well as LIKE and SIMILAR TO) will be considered expensive in the current version of the PR.
I see -- they would be considered expensive because they are not explicitly considered cheap (along with all other functions (like atan and whatever)) That makes sense 🤔
I think it would be misleading if We could go further and have My broader concern is that it seems pretty fragile to guarantee that filter evaluation order matches query text order. I could see this very easily being violated by other optimizations, either present or future. For example, what about predicate pushdown? If the query has a list of predicates and some of them can be pushed down while others cannot, that would result in predicate evaluation order becoming inconsistent with the query text order. So we could also disable predicate pushdown when tldr -- If we actually want to guarantee that evaluation order matches query text order, considerably more work is required than just gating this PR in a boolean variable. Do we have any signal that this is a real pain point with users? |
…reorder # Conflicts: # datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part
|
The CI failure looks like an instance of #22621 |
🤔 that is a good point
@neilconway you make great points. The scenario I am imagining in my head is
You have an excellent point that we don't have any real way to prevent this scenario from happening with other optimizer passes / changes now other than hoping we prevent it with code review 🤔 How do you think we should proceed? Heuristic reordering and then rely on stuff like @adriangb dynamic reordering to avoid the regression? |
|
I do agree with @neilconway that "we should guarantee" expression ordering is not a guarantee we are making now or should be making. |
Thank you for the pointer -- I reviewed that PR and it looks good to me |
|
re dynamic vs. heuristic based: my feeling is we'll want dynamic adjustment long term. Neil is right that you need to "seed" in some way, and heuristic based is one option. But if it ends up being not much benefit and considerable code complexity it's not worth the step. If it ends up being very little code and helpful in reasonable cases then it's worth keeping. I do think there's also other heuristics we could use e.g. size of the columns the filter references from stats (so e.g. a like expression on a small column is not penalized as much as a large column, and could beat out a large mathematical expression). |
|
One thing that came up from #22698: we should probably come up with some benchmarks for cases we think are interesting and adversarial for various implementations and use that to guide development. It seems there is not much signal in our standard queries for this. |
|
@alamb To me, this falls into the category of stuff that might break when you upgrade performance-sensitive apps that are built on a system with a declarative query language / query optimizer. I think it's untenable to promise that no user workloads will see performance regressions from new versions. The reordering done by this PR is intentionally very conservative / simple, so I would be surprised if we see widespread issues in the field arising from this change. If a user's workload is that sensitive to the exact predicate evaluation order, they might be better off encoding their filtering criteria as a custom UDF. Dynamic filter reordering would probably help most cases in practice (albeit it might make the actual runtime behavior more unpredictable). At some point in the future, we could also potentially ship either a cost-based optimizer (where users could annotate individual UDFs with cost estimates), and/or some facility for manually specifying properties of the evaluation order (e.g., "hints"). |
|
The cost based optimizer discussion is interesting, but I worry that for analytical systems runtime adaptivity is more impactful and easier to implement universally. Cost based optimizers are only as good as your estimates, which are going to get right if we don't have persistent stats, users write custom UDFs and increasingly AI is writing crazy complex queries. Cost based optimizers are essential for transactional systems where the average rows per query may be 1, but for the analytical workloads that people tend to run with DataFusion the average rows processed is in the tens if not hundreds of thousands (otherwise batch sizes of 8k and row groups of 1M would make no sense). This gives us an opportunity: if a 1s query is acceptable then running for 100ms / a small fraction of the query in a sub-optimal way while we gather stats but then making the other 900ms take 500ms is a big win. We'll see enough data that we can derive relatively high quality runtime statistics. AFAIK Ballista, Spark, others I don't remember off the top of my head have runtime adaptivity. They may also have a cost based optimizer though 😄. |
…r reordering - adaptive_filter.slt: results and EXPLAIN are identical with the flag on and off (reordering changes evaluation order only). - benchmarks/sql_benchmarks/adversarial_filter: a self-contained SQL benchmark suite (synthetic data generated inline via generate_series) of five equally-expensive regexp predicates with the selective one written last — where SQL order, the apache#22343 cost heuristic, and BinaryExpr pre-selection all leave the order wrong. Toggle with ADAPTIVE_FILTER_REORDERING: BENCH_NAME=adversarial_filter ADAPTIVE_FILTER_REORDERING=true \ cargo bench --bench sql Q01 (selective last): ~1.75x faster at 10M rows (more at higher ADV_ROWS). Q02 (selective first, control): neutral — confirms the win is an ordering fix and the adaptive path adds no overhead when it cannot help. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reorder # Conflicts: # datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part
…r reordering - adaptive_filter.slt: results and EXPLAIN are identical with the flag on and off (reordering changes evaluation order only). - benchmarks/sql_benchmarks/adversarial_filter: a self-contained SQL benchmark suite (synthetic data generated inline via generate_series) of five equally-expensive regexp predicates with the selective one written last — where SQL order, the apache#22343 cost heuristic, and BinaryExpr pre-selection all leave the order wrong. Toggle with ADAPTIVE_FILTER_REORDERING and run via the standard harness: BENCH_NAME=adversarial_filter ADAPTIVE_FILTER_REORDERING=true \ cargo bench --bench sql # or: ADAPTIVE_FILTER_REORDERING=true ./benchmarks/bench.sh run adversarial_filter Q01 (selective last): ~1.75x faster at 10M rows (more at higher ADV_ROWS). Q02 (selective first, control): neutral — confirms the win is an ordering fix and the adaptive path adds no overhead when it cannot help. - bench.sh: add an `adversarial_filter` run target (data generated inline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r reordering - adaptive_filter.slt: results and EXPLAIN are identical with the flag on and off (reordering changes evaluation order only). - benchmarks/sql_benchmarks/adversarial_filter: a self-contained SQL benchmark suite (synthetic data generated inline via generate_series) of five equally-expensive regexp predicates with the selective one written last — where SQL order, the apache#22343 cost heuristic, and BinaryExpr pre-selection all leave the order wrong. Toggle with the standard config env var and run via the standard harness: DATAFUSION_EXECUTION_ADAPTIVE_FILTER_REORDERING=true \ ./benchmarks/bench.sh run adversarial_filter (The dfbench suites read that env var via SessionConfig::from_env; the SQL bench harness uses SessionContext::new(), so the suite's init SQL wires it in via env interpolation.) Q01 (selective last): ~1.75x faster at 10M rows (more at higher ADV_ROWS). Q02 (selective first, control): neutral — confirms the win is an ordering fix and the adaptive path adds no overhead when it cannot help. - bench.sh: add an `adversarial_filter` run target (data generated inline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Which issue does this PR close?
Rationale for this change
If a filter consists of a mix of cheap and expensive predicates, evaluating the cheap predicates first can improve performance, because it reduces the number of rows that the expensive predicate must be evaluated on. This PR implements this idea, by reordering predicates in a conjunction to place "cheap" predicates first.
Predicates are assessed as "cheap" or "expensive" using an intentionally simple heuristic: "cheap" predicates are expressions that consist of only cheap operations like binary comparisons, negations, and casts, and "expensive" predicates are everything else (e.g.,
LIKE, regexp matching, subqueries, and function calls). Importantly, we use a stable sort when reordering predicates, which means that the original order of operations is preserved within these two classes.Arbitrarily more sophisticated schemes for predicting predicate evaluation cost (and selectivity) are possible, but a simple approach seems like a good place to start.
We avoid reordering predicates if the filter contains a volatile expression, to be safe. We could be a bit fancier and reorder conjuncts in the prefix of the filter list before the volatile expression, but we don't attempt to do that for now.
We don't reorder operands to
OR: I believe this would be worth doing if #22342 is implemented.On ClickBench, this improves performance by ~10-13% on Q21 and ~5% on Q22, in both cases by reordering simple comparisons to run before
LIKEpredicates.What changes are included in this PR?
reorder_predicateshelperreorder_predicatesas part of thePushDownFilterrewrite passreorder_predicatesAre these changes tested?
Yes. Added new unit tests for predicate reordering behavior, updated some expected
EXPLAINoutput.Are there any user-facing changes?
Yes. Users that expect their predicates to be evaluated in a strictly left-to-right manner might see changes in performance and/or behavior. Performance changes could be improvements or regressions. Behavioral changes are possible if the query includes fallible operations like certain casts or division by zero. Note that the SQL standard is clear that implementations are allowed to evaluate predicates in any order, so user queries that depend on an evaluation order are fundamentally fragile.