perf: Reorder predicates in conjuncts via simple heuristic by neilconway · Pull Request #22343 · apache/datafusion

neilconway · 2026-05-18T17:12:34Z

Which issue does this PR close?

Closes Reorder boolean expressions (including filter predicates) according to evaluation cost / selectivity #11262.

Rationale for this change

If a filter consists of a mix of cheap and expensive predicates, evaluating the cheap predicates first can improve performance, because it reduces the number of rows that the expensive predicate must be evaluated on. This PR implements this idea, by reordering predicates in a conjunction to place "cheap" predicates first.

Predicates are assessed as "cheap" or "expensive" using an intentionally simple heuristic: "cheap" predicates are expressions that consist of only cheap operations like binary comparisons, negations, and casts, and "expensive" predicates are everything else (e.g., LIKE, regexp matching, subqueries, and function calls). Importantly, we use a stable sort when reordering predicates, which means that the original order of operations is preserved within these two classes.

Arbitrarily more sophisticated schemes for predicting predicate evaluation cost (and selectivity) are possible, but a simple approach seems like a good place to start.

We avoid reordering predicates if the filter contains a volatile expression, to be safe. We could be a bit fancier and reorder conjuncts in the prefix of the filter list before the volatile expression, but we don't attempt to do that for now.

We don't reorder operands to OR: I believe this would be worth doing if #22342 is implemented.

On ClickBench, this improves performance by ~10-13% on Q21 and ~5% on Q22, in both cases by reordering simple comparisons to run before LIKE predicates.

What changes are included in this PR?

Add a new reorder_predicates helper
Invoke reorder_predicates as part of the PushDownFilter rewrite pass
Add unit tests for reorder_predicates
Update expected query plans in SLT
Add migration guide note for change to predicate evaluation order

Are these changes tested?

Yes. Added new unit tests for predicate reordering behavior, updated some expected EXPLAIN output.

Are there any user-facing changes?

Yes. Users that expect their predicates to be evaluated in a strictly left-to-right manner might see changes in performance and/or behavior. Performance changes could be improvements or regressions. Behavioral changes are possible if the query includes fallible operations like certain casts or division by zero. Note that the SQL standard is clear that implementations are allowed to evaluate predicates in any order, so user queries that depend on an evaluation order are fundamentally fragile.

DataFusion's vectorized AND evaluator already short-circuits the right-hand side when the LHS keeps few rows. Until now the order of conjuncts in a Filter was whatever the user wrote, so expensive predicates like LIKE and regex could run on the full batch even when a cheap comparison would have filtered most rows first. This change classifies each conjunct as cheap or expensive (LIKE, SIMILAR TO, regex operators, scalar functions, and subqueries are expensive; everything else is cheap) and does a stable partition that puts cheap predicates first. The helper reports whether any reorder actually happened so the caller skips rebuilding the conjunction when the input was already cheap-first. On ClickBench (hits_partitioned, 5 iterations) the reorder yields +13-16% on Q21 and +7-9% on Q22, the two queries that mix LIKE with a cheap `<>` predicate; other queries are unchanged within noise.

neilconway · 2026-05-18T17:19:07Z

run benchmarks

adriangbot · 2026-05-18T17:22:40Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4480099627-184-cfkxz 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-18T17:22:41Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4480099627-183-ppdg8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-18T17:22:42Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4480099627-185-c22sx 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-18T17:36:06Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and neilc_perf-predicate-reorder
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃   neilc_perf-predicate-reorder ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 38.37 / 39.92 ±1.76 / 43.12 ms │ 38.94 / 40.45 ±0.84 / 41.37 ms │ no change │
│ QQuery 2  │ 20.26 / 20.48 ±0.22 / 20.91 ms │ 20.45 / 20.62 ±0.16 / 20.91 ms │ no change │
│ QQuery 3  │ 32.98 / 34.99 ±2.20 / 39.02 ms │ 32.95 / 34.82 ±2.04 / 38.52 ms │ no change │
│ QQuery 4  │ 17.47 / 18.16 ±0.77 / 19.28 ms │ 17.34 / 17.62 ±0.27 / 18.12 ms │ no change │
│ QQuery 5  │ 42.03 / 43.02 ±1.12 / 45.14 ms │ 41.36 / 42.59 ±0.74 / 43.60 ms │ no change │
│ QQuery 6  │ 16.42 / 16.52 ±0.08 / 16.64 ms │ 16.47 / 17.15 ±0.87 / 18.87 ms │ no change │
│ QQuery 7  │ 46.80 / 47.63 ±1.07 / 49.75 ms │ 47.19 / 48.65 ±1.20 / 50.15 ms │ no change │
│ QQuery 8  │ 44.94 / 45.37 ±0.42 / 46.16 ms │ 45.02 / 45.25 ±0.19 / 45.50 ms │ no change │
│ QQuery 9  │ 49.14 / 50.67 ±1.07 / 52.24 ms │ 49.97 / 50.52 ±0.40 / 51.06 ms │ no change │
│ QQuery 10 │ 63.50 / 63.66 ±0.18 / 63.99 ms │ 63.44 / 63.59 ±0.15 / 63.86 ms │ no change │
│ QQuery 11 │ 13.30 / 13.63 ±0.20 / 13.90 ms │ 13.29 / 13.51 ±0.17 / 13.82 ms │ no change │
│ QQuery 12 │ 24.46 / 25.18 ±1.05 / 27.25 ms │ 24.62 / 25.21 ±1.01 / 27.22 ms │ no change │
│ QQuery 13 │ 33.62 / 35.39 ±1.88 / 39.04 ms │ 34.22 / 35.93 ±1.81 / 39.45 ms │ no change │
│ QQuery 14 │ 25.49 / 25.80 ±0.20 / 26.10 ms │ 25.57 / 26.03 ±0.61 / 27.23 ms │ no change │
│ QQuery 15 │ 31.59 / 32.07 ±0.49 / 32.77 ms │ 31.57 / 31.91 ±0.30 / 32.46 ms │ no change │
│ QQuery 16 │ 14.78 / 14.98 ±0.12 / 15.14 ms │ 14.99 / 15.24 ±0.21 / 15.57 ms │ no change │
│ QQuery 17 │ 73.97 / 74.76 ±0.45 / 75.17 ms │ 74.37 / 75.67 ±1.30 / 77.72 ms │ no change │
│ QQuery 18 │ 62.69 / 63.82 ±0.88 / 65.23 ms │ 62.24 / 65.09 ±3.19 / 71.24 ms │ no change │
│ QQuery 19 │ 35.24 / 35.71 ±0.54 / 36.75 ms │ 35.34 / 35.93 ±0.90 / 37.72 ms │ no change │
│ QQuery 20 │ 37.71 / 38.08 ±0.47 / 38.94 ms │ 37.90 / 38.27 ±0.48 / 39.20 ms │ no change │
│ QQuery 21 │ 56.23 / 56.73 ±0.38 / 57.38 ms │ 57.17 / 59.04 ±1.51 / 61.01 ms │ no change │
│ QQuery 22 │ 23.45 / 25.05 ±1.77 / 28.05 ms │ 23.50 / 23.92 ±0.39 / 24.43 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 821.62ms │
│ Total Time (neilc_perf-predicate-reorder)   │ 827.01ms │
│ Average Time (HEAD)                         │  37.35ms │
│ Average Time (neilc_perf-predicate-reorder) │  37.59ms │
│ Queries Faster                              │        0 │
│ Queries Slower                              │        0 │
│ Queries with No Change                      │       22 │
│ Queries with Failure                        │        0 │
└─────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	29.7s
CPU sys	2.1s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.7 GiB
Avg memory	5.1 GiB
CPU user	29.8s
CPU sys	2.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-18T17:37:43Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and neilc_perf-predicate-reorder
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃          neilc_perf-predicate-reorder ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.46 / 7.00 ±0.81 / 8.61 ms │           6.59 / 7.05 ±0.82 / 8.69 ms │     no change │
│ QQuery 2  │        82.95 / 83.26 ±0.29 / 83.63 ms │        83.29 / 83.57 ±0.23 / 83.86 ms │     no change │
│ QQuery 3  │        29.76 / 30.04 ±0.27 / 30.52 ms │        29.23 / 29.56 ±0.19 / 29.83 ms │     no change │
│ QQuery 4  │     545.51 / 555.45 ±7.21 / 563.95 ms │     542.26 / 551.85 ±5.48 / 556.72 ms │     no change │
│ QQuery 5  │        53.62 / 54.17 ±0.47 / 54.99 ms │        52.92 / 53.52 ±0.48 / 54.34 ms │     no change │
│ QQuery 6  │        36.91 / 37.44 ±0.33 / 37.80 ms │        37.10 / 37.51 ±0.44 / 38.31 ms │     no change │
│ QQuery 7  │     109.89 / 110.73 ±0.70 / 111.74 ms │     108.84 / 109.97 ±1.03 / 111.82 ms │     no change │
│ QQuery 8  │        39.95 / 40.26 ±0.20 / 40.52 ms │        39.61 / 39.88 ±0.27 / 40.35 ms │     no change │
│ QQuery 9  │        53.20 / 55.41 ±1.43 / 57.13 ms │        54.87 / 56.04 ±0.94 / 56.87 ms │     no change │
│ QQuery 10 │        82.81 / 84.02 ±1.78 / 87.53 ms │        83.19 / 84.18 ±1.47 / 87.06 ms │     no change │
│ QQuery 11 │     349.81 / 355.08 ±4.01 / 361.81 ms │     343.89 / 348.48 ±2.95 / 352.78 ms │     no change │
│ QQuery 12 │        29.74 / 30.09 ±0.27 / 30.51 ms │        29.38 / 30.06 ±0.35 / 30.31 ms │     no change │
│ QQuery 13 │     129.77 / 130.38 ±0.71 / 131.77 ms │     129.16 / 129.86 ±0.53 / 130.50 ms │     no change │
│ QQuery 14 │     514.42 / 515.64 ±0.96 / 517.09 ms │     514.20 / 516.22 ±1.55 / 517.98 ms │     no change │
│ QQuery 15 │        64.67 / 65.34 ±0.46 / 65.95 ms │        63.95 / 64.85 ±0.91 / 66.11 ms │     no change │
│ QQuery 16 │           7.41 / 7.60 ±0.14 / 7.78 ms │           7.43 / 7.51 ±0.10 / 7.71 ms │     no change │
│ QQuery 17 │        83.17 / 83.90 ±0.93 / 85.74 ms │        82.46 / 83.55 ±1.39 / 86.25 ms │     no change │
│ QQuery 18 │     154.27 / 156.11 ±1.00 / 157.23 ms │     155.12 / 155.70 ±0.50 / 156.64 ms │     no change │
│ QQuery 19 │        42.21 / 42.97 ±1.06 / 45.06 ms │        42.29 / 42.49 ±0.20 / 42.86 ms │     no change │
│ QQuery 20 │        36.18 / 36.86 ±0.37 / 37.22 ms │        36.46 / 37.09 ±0.39 / 37.58 ms │     no change │
│ QQuery 21 │        18.99 / 19.12 ±0.11 / 19.26 ms │        18.47 / 18.82 ±0.28 / 19.14 ms │     no change │
│ QQuery 22 │        64.77 / 65.58 ±0.76 / 66.63 ms │        64.44 / 65.61 ±0.79 / 66.83 ms │     no change │
│ QQuery 23 │     497.38 / 502.15 ±3.41 / 505.65 ms │     497.01 / 500.31 ±2.61 / 503.87 ms │     no change │
│ QQuery 24 │     238.15 / 241.22 ±4.87 / 250.86 ms │     236.73 / 239.38 ±2.03 / 241.56 ms │     no change │
│ QQuery 25 │     115.62 / 118.26 ±2.77 / 123.33 ms │     114.54 / 115.64 ±0.99 / 117.38 ms │     no change │
│ QQuery 26 │        72.31 / 73.15 ±0.49 / 73.83 ms │        71.84 / 73.50 ±2.51 / 78.46 ms │     no change │
│ QQuery 27 │           7.32 / 7.51 ±0.13 / 7.70 ms │           7.45 / 7.55 ±0.12 / 7.76 ms │     no change │
│ QQuery 28 │        63.46 / 64.19 ±0.82 / 65.71 ms │        58.99 / 61.60 ±2.19 / 64.16 ms │     no change │
│ QQuery 29 │     100.71 / 101.12 ±0.29 / 101.58 ms │      99.84 / 101.54 ±1.43 / 104.11 ms │     no change │
│ QQuery 30 │        31.84 / 32.26 ±0.42 / 32.94 ms │        31.72 / 33.18 ±1.41 / 35.48 ms │     no change │
│ QQuery 31 │     114.51 / 116.31 ±1.56 / 118.52 ms │     114.53 / 115.03 ±0.33 / 115.38 ms │     no change │
│ QQuery 32 │        22.41 / 22.65 ±0.20 / 22.88 ms │        21.57 / 22.47 ±1.10 / 24.58 ms │     no change │
│ QQuery 33 │        40.20 / 41.04 ±0.64 / 41.91 ms │        39.86 / 40.28 ±0.25 / 40.62 ms │     no change │
│ QQuery 34 │        10.31 / 10.50 ±0.26 / 11.01 ms │        10.47 / 10.73 ±0.18 / 11.01 ms │     no change │
│ QQuery 35 │        83.09 / 83.72 ±0.55 / 84.67 ms │        82.56 / 83.96 ±1.97 / 87.88 ms │     no change │
│ QQuery 36 │           6.89 / 7.03 ±0.10 / 7.18 ms │           6.64 / 6.83 ±0.16 / 7.08 ms │     no change │
│ QQuery 37 │           7.77 / 7.96 ±0.16 / 8.16 ms │           7.42 / 7.61 ±0.13 / 7.80 ms │     no change │
│ QQuery 38 │        71.83 / 73.16 ±0.99 / 74.49 ms │        71.32 / 72.19 ±0.75 / 73.44 ms │     no change │
│ QQuery 39 │     104.30 / 104.97 ±0.80 / 106.47 ms │     103.35 / 104.38 ±1.31 / 106.92 ms │     no change │
│ QQuery 40 │        24.20 / 25.24 ±0.95 / 27.03 ms │        23.95 / 24.62 ±0.41 / 25.07 ms │     no change │
│ QQuery 41 │        15.12 / 15.51 ±0.44 / 16.32 ms │        14.79 / 14.92 ±0.08 / 14.98 ms │     no change │
│ QQuery 42 │        24.60 / 24.90 ±0.26 / 25.38 ms │        24.50 / 24.62 ±0.11 / 24.78 ms │     no change │
│ QQuery 43 │           5.58 / 5.68 ±0.11 / 5.89 ms │           5.44 / 5.55 ±0.11 / 5.77 ms │     no change │
│ QQuery 44 │        11.79 / 11.91 ±0.15 / 12.19 ms │        11.29 / 11.53 ±0.14 / 11.70 ms │     no change │
│ QQuery 45 │        44.09 / 44.44 ±0.39 / 45.11 ms │        43.10 / 43.84 ±0.53 / 44.61 ms │     no change │
│ QQuery 46 │        14.13 / 14.40 ±0.19 / 14.60 ms │        14.29 / 14.51 ±0.22 / 14.84 ms │     no change │
│ QQuery 47 │     246.92 / 250.95 ±2.78 / 254.26 ms │     247.66 / 249.87 ±1.67 / 252.28 ms │     no change │
│ QQuery 48 │     105.51 / 107.11 ±1.40 / 108.95 ms │     105.60 / 106.08 ±0.70 / 107.43 ms │     no change │
│ QQuery 49 │        81.35 / 81.88 ±0.48 / 82.77 ms │        81.78 / 83.00 ±1.19 / 84.99 ms │     no change │
│ QQuery 50 │        61.08 / 63.04 ±2.47 / 67.54 ms │        60.83 / 61.58 ±0.48 / 62.03 ms │     no change │
│ QQuery 51 │       91.92 / 95.87 ±2.68 / 100.23 ms │        92.21 / 95.20 ±2.12 / 98.40 ms │     no change │
│ QQuery 52 │        24.76 / 25.19 ±0.28 / 25.51 ms │        24.46 / 24.68 ±0.12 / 24.82 ms │     no change │
│ QQuery 53 │        31.34 / 31.49 ±0.14 / 31.67 ms │        30.85 / 31.13 ±0.15 / 31.28 ms │     no change │
│ QQuery 54 │        56.47 / 58.07 ±2.35 / 62.73 ms │        55.60 / 56.48 ±0.63 / 57.22 ms │     no change │
│ QQuery 55 │        24.10 / 24.37 ±0.24 / 24.75 ms │        23.99 / 24.93 ±1.19 / 27.24 ms │     no change │
│ QQuery 56 │        41.50 / 41.69 ±0.21 / 42.07 ms │        40.43 / 40.96 ±0.55 / 41.97 ms │     no change │
│ QQuery 57 │     182.97 / 187.00 ±4.47 / 195.47 ms │     182.33 / 184.30 ±1.42 / 185.96 ms │     no change │
│ QQuery 58 │     121.88 / 123.01 ±0.66 / 123.94 ms │     118.32 / 118.81 ±0.44 / 119.43 ms │     no change │
│ QQuery 59 │     119.94 / 121.35 ±1.30 / 123.60 ms │     120.05 / 121.08 ±0.54 / 121.58 ms │     no change │
│ QQuery 60 │        41.03 / 41.49 ±0.34 / 41.96 ms │        40.60 / 41.11 ±0.41 / 41.63 ms │     no change │
│ QQuery 61 │        14.59 / 14.84 ±0.16 / 15.03 ms │        14.25 / 14.42 ±0.15 / 14.69 ms │     no change │
│ QQuery 62 │        48.34 / 49.74 ±2.01 / 53.72 ms │        46.76 / 47.31 ±0.38 / 47.94 ms │     no change │
│ QQuery 63 │        31.51 / 31.66 ±0.13 / 31.86 ms │        31.18 / 31.37 ±0.12 / 31.56 ms │     no change │
│ QQuery 64 │     472.81 / 475.03 ±2.39 / 479.33 ms │     469.27 / 473.93 ±3.87 / 480.69 ms │     no change │
│ QQuery 65 │     148.12 / 150.45 ±2.50 / 155.04 ms │     145.47 / 148.09 ±1.63 / 150.05 ms │     no change │
│ QQuery 66 │        86.21 / 86.72 ±0.49 / 87.54 ms │        84.14 / 86.77 ±3.32 / 93.25 ms │     no change │
│ QQuery 67 │     263.65 / 268.45 ±5.14 / 276.94 ms │     265.24 / 268.40 ±3.41 / 274.38 ms │     no change │
│ QQuery 68 │        14.52 / 14.63 ±0.07 / 14.72 ms │        14.41 / 14.59 ±0.11 / 14.74 ms │     no change │
│ QQuery 69 │        78.63 / 80.31 ±2.61 / 85.47 ms │        78.16 / 78.78 ±0.71 / 80.15 ms │     no change │
│ QQuery 70 │     110.40 / 111.99 ±2.23 / 116.39 ms │     106.52 / 113.35 ±5.99 / 122.62 ms │     no change │
│ QQuery 71 │        36.58 / 36.65 ±0.08 / 36.81 ms │        36.06 / 36.89 ±1.20 / 39.27 ms │     no change │
│ QQuery 72 │ 2124.12 / 2213.61 ±52.37 / 2274.74 ms │ 2128.60 / 2246.57 ±86.51 / 2376.59 ms │     no change │
│ QQuery 73 │        10.16 / 12.84 ±4.76 / 22.33 ms │        10.19 / 10.34 ±0.14 / 10.56 ms │ +1.24x faster │
│ QQuery 74 │     199.80 / 201.22 ±1.44 / 203.76 ms │     191.46 / 194.95 ±3.51 / 201.50 ms │     no change │
│ QQuery 75 │     150.81 / 151.60 ±0.86 / 153.28 ms │     150.47 / 151.76 ±1.61 / 154.89 ms │     no change │
│ QQuery 76 │        36.12 / 36.64 ±0.31 / 37.09 ms │        35.69 / 36.18 ±0.26 / 36.50 ms │     no change │
│ QQuery 77 │        62.90 / 63.66 ±0.71 / 64.76 ms │        62.60 / 64.50 ±2.30 / 68.90 ms │     no change │
│ QQuery 78 │     195.26 / 197.04 ±1.08 / 198.30 ms │     194.61 / 196.69 ±2.28 / 200.53 ms │     no change │
│ QQuery 79 │        69.58 / 70.35 ±1.34 / 73.02 ms │        68.30 / 68.77 ±0.40 / 69.47 ms │     no change │
│ QQuery 80 │     102.98 / 105.49 ±2.91 / 111.13 ms │     102.14 / 103.77 ±1.11 / 104.90 ms │     no change │
│ QQuery 81 │        25.68 / 25.94 ±0.20 / 26.16 ms │        25.60 / 26.79 ±1.36 / 29.40 ms │     no change │
│ QQuery 82 │        17.16 / 17.64 ±0.56 / 18.75 ms │        17.46 / 17.77 ±0.20 / 18.10 ms │     no change │
│ QQuery 83 │        38.74 / 39.74 ±1.58 / 42.90 ms │        38.51 / 39.18 ±0.48 / 39.94 ms │     no change │
│ QQuery 84 │        44.42 / 44.79 ±0.28 / 45.24 ms │        44.45 / 45.42 ±1.34 / 48.06 ms │     no change │
│ QQuery 85 │     138.86 / 140.46 ±1.59 / 142.99 ms │     139.50 / 140.45 ±0.69 / 141.52 ms │     no change │
│ QQuery 86 │        26.00 / 26.16 ±0.20 / 26.53 ms │        25.83 / 26.26 ±0.48 / 27.18 ms │     no change │
│ QQuery 87 │        71.92 / 72.75 ±0.65 / 73.88 ms │        71.98 / 73.19 ±0.78 / 74.29 ms │     no change │
│ QQuery 88 │        66.48 / 67.90 ±2.33 / 72.54 ms │        66.38 / 67.13 ±0.41 / 67.54 ms │     no change │
│ QQuery 89 │        36.94 / 37.35 ±0.32 / 37.86 ms │        37.34 / 37.97 ±0.53 / 38.94 ms │     no change │
│ QQuery 90 │        18.66 / 20.03 ±2.36 / 24.75 ms │        18.37 / 18.74 ±0.25 / 18.99 ms │ +1.07x faster │
│ QQuery 91 │        53.43 / 54.01 ±0.37 / 54.49 ms │        53.32 / 54.21 ±0.86 / 55.81 ms │     no change │
│ QQuery 92 │        30.22 / 30.48 ±0.30 / 30.98 ms │        29.94 / 30.37 ±0.29 / 30.73 ms │     no change │
│ QQuery 93 │        51.27 / 51.95 ±0.47 / 52.58 ms │        51.59 / 51.84 ±0.20 / 52.21 ms │     no change │
│ QQuery 94 │        38.73 / 39.16 ±0.30 / 39.59 ms │        39.33 / 39.71 ±0.39 / 40.46 ms │     no change │
│ QQuery 95 │        86.08 / 86.97 ±1.13 / 89.09 ms │        85.89 / 87.13 ±1.08 / 89.05 ms │     no change │
│ QQuery 96 │        25.20 / 25.68 ±0.39 / 26.34 ms │        25.14 / 25.36 ±0.18 / 25.63 ms │     no change │
│ QQuery 97 │        47.21 / 47.48 ±0.18 / 47.76 ms │        46.99 / 47.62 ±0.48 / 48.38 ms │     no change │
│ QQuery 98 │        42.78 / 44.13 ±0.88 / 45.46 ms │        43.48 / 44.16 ±0.79 / 45.65 ms │     no change │
│ QQuery 99 │        72.18 / 72.46 ±0.23 / 72.77 ms │        70.81 / 71.74 ±0.65 / 72.53 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 10886.26ms │
│ Total Time (neilc_perf-predicate-reorder)   │ 10860.81ms │
│ Average Time (HEAD)                         │   109.96ms │
│ Average Time (neilc_perf-predicate-reorder) │   109.71ms │
│ Queries Faster                              │          2 │
│ Queries Slower                              │          0 │
│ Queries with No Change                      │         97 │
│ Queries with Failure                        │          0 │
└─────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	6.6 GiB
Avg memory	6.0 GiB
CPU user	244.9s
CPU sys	5.3s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	6.8 GiB
Avg memory	6.2 GiB
CPU user	241.2s
CPU sys	5.4s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-18T17:41:58Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and neilc_perf-predicate-reorder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃          neilc_perf-predicate-reorder ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.22 / 4.74 ±6.91 / 18.56 ms │          1.25 / 4.81 ±6.93 / 18.67 ms │     no change │
│ QQuery 1  │        12.37 / 12.77 ±0.25 / 13.07 ms │        12.58 / 13.04 ±0.29 / 13.36 ms │     no change │
│ QQuery 2  │        37.17 / 37.55 ±0.38 / 38.07 ms │        35.95 / 36.20 ±0.38 / 36.95 ms │     no change │
│ QQuery 3  │        31.03 / 31.79 ±0.54 / 32.61 ms │        31.88 / 32.84 ±0.83 / 34.34 ms │     no change │
│ QQuery 4  │    222.56 / 236.76 ±12.87 / 259.32 ms │     240.83 / 245.21 ±3.72 / 249.89 ms │     no change │
│ QQuery 5  │     268.66 / 280.28 ±8.27 / 292.42 ms │     288.03 / 295.85 ±4.28 / 299.28 ms │  1.06x slower │
│ QQuery 6  │           6.28 / 6.71 ±0.29 / 7.05 ms │           6.14 / 6.63 ±0.55 / 7.63 ms │     no change │
│ QQuery 7  │        13.66 / 13.77 ±0.09 / 13.87 ms │        14.30 / 14.59 ±0.20 / 14.89 ms │  1.06x slower │
│ QQuery 8  │    321.47 / 333.23 ±12.39 / 350.16 ms │     321.88 / 331.11 ±9.96 / 344.81 ms │     no change │
│ QQuery 9  │     458.83 / 467.84 ±8.18 / 482.64 ms │     456.02 / 464.06 ±8.41 / 478.35 ms │     no change │
│ QQuery 10 │        69.71 / 71.87 ±2.04 / 75.62 ms │        70.19 / 72.66 ±1.52 / 74.97 ms │     no change │
│ QQuery 11 │        81.30 / 85.67 ±4.61 / 93.63 ms │        84.86 / 85.48 ±0.46 / 86.07 ms │     no change │
│ QQuery 12 │    263.44 / 273.85 ±11.80 / 296.32 ms │     285.50 / 290.38 ±5.13 / 299.42 ms │  1.06x slower │
│ QQuery 13 │    358.03 / 375.84 ±15.07 / 395.05 ms │     368.44 / 378.59 ±7.33 / 387.11 ms │     no change │
│ QQuery 14 │    276.98 / 287.98 ±11.44 / 307.13 ms │     283.64 / 288.24 ±4.80 / 296.63 ms │     no change │
│ QQuery 15 │    273.21 / 291.43 ±12.39 / 312.00 ms │    267.62 / 284.47 ±12.61 / 298.99 ms │     no change │
│ QQuery 16 │     618.62 / 637.72 ±9.77 / 646.04 ms │    609.40 / 637.13 ±19.89 / 657.85 ms │     no change │
│ QQuery 17 │     638.36 / 643.07 ±3.72 / 648.67 ms │    621.87 / 642.25 ±11.97 / 659.27 ms │     no change │
│ QQuery 18 │ 1311.64 / 1332.85 ±13.92 / 1350.76 ms │ 1280.47 / 1301.21 ±20.06 / 1331.61 ms │     no change │
│ QQuery 19 │        28.71 / 32.34 ±3.96 / 37.75 ms │        27.90 / 30.31 ±4.63 / 39.58 ms │ +1.07x faster │
│ QQuery 20 │    525.47 / 540.51 ±15.98 / 568.87 ms │     520.45 / 525.64 ±3.61 / 530.08 ms │     no change │
│ QQuery 21 │     600.54 / 602.49 ±1.28 / 604.28 ms │     523.74 / 530.97 ±6.28 / 542.27 ms │ +1.13x faster │
│ QQuery 22 │ 1067.19 / 1078.47 ±11.06 / 1098.24 ms │ 1003.29 / 1019.80 ±13.00 / 1038.00 ms │ +1.06x faster │
│ QQuery 23 │ 3300.58 / 3323.51 ±14.12 / 3342.32 ms │ 3201.35 / 3237.89 ±33.05 / 3291.06 ms │     no change │
│ QQuery 24 │        42.48 / 42.84 ±0.25 / 43.18 ms │        42.73 / 46.60 ±4.07 / 54.29 ms │  1.09x slower │
│ QQuery 25 │     112.77 / 118.77 ±6.76 / 131.94 ms │     111.16 / 115.54 ±2.72 / 118.42 ms │     no change │
│ QQuery 26 │        42.53 / 43.52 ±0.64 / 44.23 ms │        43.72 / 44.74 ±0.93 / 46.15 ms │     no change │
│ QQuery 27 │    667.92 / 684.46 ±10.62 / 700.94 ms │     682.06 / 687.57 ±2.89 / 690.26 ms │     no change │
│ QQuery 28 │ 3040.07 / 3074.66 ±22.96 / 3107.51 ms │ 3046.17 / 3085.16 ±28.67 / 3122.35 ms │     no change │
│ QQuery 29 │        42.87 / 52.04 ±7.54 / 60.33 ms │       41.55 / 59.10 ±13.93 / 80.53 ms │  1.14x slower │
│ QQuery 30 │     311.80 / 320.81 ±5.78 / 329.40 ms │     301.57 / 314.79 ±9.17 / 325.38 ms │     no change │
│ QQuery 31 │     288.02 / 297.61 ±8.23 / 309.39 ms │    288.80 / 301.87 ±11.18 / 317.31 ms │     no change │
│ QQuery 32 │   943.55 / 967.83 ±24.90 / 1005.96 ms │   975.64 / 987.07 ±10.39 / 1000.10 ms │     no change │
│ QQuery 33 │ 1503.31 / 1515.49 ±10.58 / 1530.55 ms │ 1414.56 / 1468.40 ±48.86 / 1550.63 ms │     no change │
│ QQuery 34 │ 1421.13 / 1504.85 ±50.78 / 1581.24 ms │ 1497.60 / 1518.62 ±18.77 / 1546.10 ms │     no change │
│ QQuery 35 │    302.24 / 326.70 ±37.12 / 400.16 ms │    297.34 / 306.06 ±10.18 / 325.93 ms │ +1.07x faster │
│ QQuery 36 │        71.84 / 77.86 ±5.97 / 86.60 ms │        67.74 / 72.53 ±5.42 / 82.60 ms │ +1.07x faster │
│ QQuery 37 │        37.34 / 38.34 ±1.13 / 40.37 ms │        36.42 / 38.71 ±2.87 / 44.15 ms │     no change │
│ QQuery 38 │        41.04 / 43.54 ±3.00 / 48.79 ms │        42.70 / 49.91 ±6.17 / 59.73 ms │  1.15x slower │
│ QQuery 39 │     144.06 / 154.30 ±7.01 / 161.17 ms │     143.14 / 156.98 ±9.25 / 172.22 ms │     no change │
│ QQuery 40 │        15.12 / 15.46 ±0.38 / 16.14 ms │        14.88 / 17.23 ±2.78 / 22.47 ms │  1.11x slower │
│ QQuery 41 │        14.41 / 17.27 ±5.19 / 27.65 ms │        14.51 / 18.83 ±4.97 / 26.98 ms │  1.09x slower │
│ QQuery 42 │        14.02 / 14.26 ±0.18 / 14.57 ms │        14.02 / 14.14 ±0.11 / 14.32 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 20313.62ms │
│ Total Time (neilc_perf-predicate-reorder)   │ 20073.22ms │
│ Average Time (HEAD)                         │   472.41ms │
│ Average Time (neilc_perf-predicate-reorder) │   466.82ms │
│ Queries Faster                              │          5 │
│ Queries Slower                              │          8 │
│ Queries with No Change                      │         30 │
│ Queries with Failure                        │          0 │
└─────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	30.8 GiB
Avg memory	23.1 GiB
CPU user	1052.2s
CPU sys	69.3s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	105.0s
Peak memory	31.6 GiB
Avg memory	23.3 GiB
CPU user	1040.0s
CPU sys	69.7s
Peak spill	0 B

File an issue against this benchmark runner

asolimando · 2026-05-18T17:48:30Z

I agree on the general idea, but considering short-circuiting and also conjuncts' selectivity, this can actually backfire in practice, so a config knob seems important to have (I might have missed it, but I couldn't see it in the PR).

Ideally reorder_predicates should be over ridable, to give downstream systems a chance to change behavior if needed, but also to complement knowledge about UDFs which core DF can't know.

neilconway · 2026-05-18T19:11:48Z

@asolimando Thanks for the feedback!

I agree on the general idea, but considering short-circuiting and also conjuncts' selectivity, this can actually backfire in practice, so a config knob seems important to have (I might have missed it, but I couldn't see it in the PR).

It's possible to add a knob if we feel like there is a need for one, but I'd rather not add one reflexively. I made the definition of "cheap" vs. "expensive" very conservative partly in hopes of avoiding a config knob. Looking more closely at the previous rewriting logic, we actually have not been respecting the predicate order in the query text for a while: simplify_predicates already reordered predicates quite freely, and AFAIK no one has complained about that. Users that need to fix an evaluation order should very likely be using CASE ... WHEN anyway.

Ideally reorder_predicates should be over ridable, to give downstream systems a chance to change behavior if needed, but also to complement knowledge about UDFs which core DF can't know.

I think per-UDF extensibility to express some notion of cost or selectivity could definitely make sense, although that's a much bigger task to take on.

neilconway · 2026-05-18T19:16:49Z

BTW I checked all of the ClickBench queries were we see minor regressions (40, 41, 38, 24, 12, 5, and 7), and none of them have predicates that will be reordered by this PR. So I suspect those regressions are just noise.

adriangb · 2026-05-18T20:08:37Z

Agreed that the regressions look like noise. But also the only real win seems Q73 in tpcds? What is your intuition for where the win is coming from? I'm wondering if it's just happening to hit a positive case that would be handled by #22144 already or if it's completely unrelated (e.g. in a complex join key).

neilconway · 2026-05-19T01:34:54Z

@adriangb

But also the only real win seems Q73 in tpcds? What is your intuition for where the win is coming from? I'm wondering if it's just happening to hit a positive case that would be handled by #22144 already or if it's completely unrelated (e.g. in a complex join key).

I see consistent improvements on ClickBench Q21 (~10-13%) and Q22 (~5%), which are both cases where we now reorder LIKE predicates after simple comparisons.

Interestingly, this PR does not fire for Q73 in TPC-DS, so I'm not sure what is going on there 😊 I couldn't repro an improvement locally, so I guess it is just benchmark noise.

To see improvements for a broader class of queries, we'd need to extend the heuristics to consider more criteria.

2010YOUY01 · 2026-05-19T02:38:34Z

One challenge is that "cheap" means different things depending on where the predicate is evaluated:

In FilterExec, for in-memory evaluation, I think this is a great default heuristic.
In Parquet decoding with late materialization, things can get trickier. For example, given (c1 LIKE '%foo%bar%') AND (c2 > 0) AND (c3 > 0), if the regex is very selective while the other predicates are not selective and c2 / c3 are heavily compressed, we might want to decode and evaluate regex conjunct first.

Perhaps this kind of reordering could be implemented as a runtime optimization inside FilterExec: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap".

adriangb · 2026-05-19T03:28:57Z

Perhaps this kind of reordering could be implemented as a runtime optimization inside FilterExec: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap".

That is exactly what #22144 does 😃. I think we could re-use pretty much the exact same machinery. It took a lot of iterations to arrive at the right metrics: you want to take into account time spent on compute no just selectivity, etc.

Someone please correct me if I'm wrong but IIRC currently because of the tree structure we compute each side of a binary expression and apply the slice to the array, then compute the next side, etc. I wonder if an approach like apache/arrow-rs#9659 might be helpful to mitigate overheads from non-selective masks?

asolimando · 2026-05-19T08:25:48Z

Perhaps this kind of reordering could be implemented as a runtime optimization inside FilterExec: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap".

I think it's still useful to be able to re-order "statically" as you might want to use statistics for that, which might be more stable then dynamic approaches, which are usually sensitive to the "shape" of the first part of the data, and the choice is usually not revisited (and even in that case, it might fluctuate, while in some cases the static order could be the optimal one).

I think it's good to have multiple options, as long as downstream users can mix and match what works best for them, and they can "easily" correct course for problematic queries without the need of code changes.

…reorder

alamb · 2026-05-27T17:13:25Z

I think reordering predicates based on planning time info will suffer from bad statistics (as do all such plan time decisions). If we can figure out some way to make the decision more dynamic I think it will be a better design in the long run / harder to get icnorrectt

neilconway · 2026-05-27T17:28:21Z

I think reordering predicates based on planning time info will suffer from bad statistics (as do all such plan time decisions). If we can figure out some way to make the decision more dynamic I think it will be a better design in the long run / harder to get icnorrectt

I agree that doing predicate reordering dynamically has a lot of promise (as well as some potential concerns, like overhead and implementation complexity). But a simple static predicate reordering does not preclude also doing dynamic reordering; indeed, we need some initial predicate order to dynamically adjust later. We already reorder predicates today, in simplify_predicates; this is just a different and I'd say overall slightly better ordering heuristic.

alamb

Sorry for not responding to this one with more detail until now. Thank you for the thoughtful comments and review @kosiew @2010YOUY01 and @asolimando

TLDR is if this PR had a config.optimizer.reorder_filters type config knob to turn this optimization off, I think it would be a good addition to DataFusion.

In general I agree that using static heuristics such as this will result in better plans most of the time.

My concern is that there will certainly be cases where an "expensive" predicate is actually very selective and should be done before "inexpensive" (but unselective ones) ones.

For example

WHERE col LIKE '....' AND col = 'bar' and col = 'baz' AND col = ...

If the col LIKE '.... is super selective (selects a single row) and the others are not, then doing it first is probably the right thing to do.

We have had cases like this before in DataFusion and what we have done is leave an "escape" hatch in the form of a config parameter, so that if a user has some query where the old heuristic ("syntactic optimizer!") work better, they can avoid the new optimization

cc @adriangb

alamb · 2026-05-28T17:31:07Z

+///
+/// Returns `(predicates, changed)`. When `changed` is `false` the input was
+/// already cheap-first and the caller can skip rebuilding the conjunction.
+pub(crate) fn reorder_predicates(predicates: Vec<Expr>) -> (Vec<Expr>, bool) {


It could also potentially return Transformed<Vec<Expr>>which carries the "was this thing changed" flag alread

alamb · 2026-05-28T17:32:31Z

+        return (predicates, false);
+    }
+
+    let mut cheap = Vec::with_capacity(predicates.len());


You could probably save the allocation / sort in place using https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort_by and a custom comparator

alamb · 2026-05-28T17:34:19Z

+/// comparisons, negations, casts), and consider anything outside this list to
+/// be expensive. New/unrecognized expressions therefore default to being
+/// expensive.
+fn is_cheap_node(expr: &Expr) -> bool {


I would recommend:

Make this a method on Expr (so it is more visible and potentially easier to override for functions for example)

Return a u8 or something else that could represent different levels of cheapness rather than just a boolean

Both of these changes could make sense, but I think this moves us in the direction of having a full-blown function cost model, which is the kind of API I was hoping to avoid committing to right now. IOW, if we start with something minimal and conservative like the current approach, we can always extend it to something much more ambitious in the future.

alamb · 2026-05-28T17:34:46Z

+        | Expr::Cast(_)
+        | Expr::TryCast(_)
+        | Expr::InList(_) => true,
+        // BinaryExpr is cheap unless the operator is LIKE or regexp matching.


regexp functions also come to mind as being relatively expensive to evaluate

case expressions too?

@alamb regexp functions and regexp operators (as well as LIKE and SIMILAR TO) will be considered expensive in the current version of the PR.

@adriangb A CASE expression is considered cheap if every sub-expression in the CASE is cheap. Personally that seems pretty reasonable. Maybe you could find a machine-generated CASE expression that is so large or deeply nested that it is expensive to evaluate despite consisting only of cheap operations? We could also move CASE to be expensive if you'd prefer, I don't feel super strongly about it.

@alamb regexp functions and regexp operators (as well as LIKE and SIMILAR TO) will be considered expensive in the current version of the PR.

I see -- they would be considered expensive because they are not explicitly considered cheap (along with all other functions (like atan and whatever)) That makes sense 🤔

neilconway · 2026-05-31T17:00:02Z

@alamb

if this PR had a config.optimizer.reorder_filters type config knob to turn this optimization off, I think it would be a good addition to DataFusion.

I think it would be misleading if reorder_filters=false disabled only this optimization, because we reorder filters today (and that behavior is not configurable):

datafusion/datafusion/optimizer/src/simplify_expressions/simplify_predicates.rs

Lines 50 to 89 in 3e006c9

    
           // Group predicates by their column reference 
        
           let mut column_predicates: BTreeMap<Column, Vec<Expr>> = BTreeMap::new(); 
        
           let mut other_predicates = Vec::new(); 
        
           for pred in predicates { 
        
               match &pred { 
        
                   Expr::BinaryExpr(BinaryExpr { 
        
                       left, 
        
                       op: 
        
                           Operator::Gt 
        
                           | Operator::GtEq 
        
                           | Operator::Lt 
        
                           | Operator::LtEq 
        
                           | Operator::Eq, 
        
                       right, 
        
                   }) => { 
        
                       if let (Some(col), Some(_)) = 
        
                           (extract_column_from_expr(left), right.as_literal()) 
        
                       { 
        
                           column_predicates.entry(col).or_default().push(pred); 
        
                       } else if let (Some(_), Some(col)) = 
        
                           (left.as_literal(), extract_column_from_expr(right)) 
        
                       { 
        
                           column_predicates.entry(col).or_default().push(pred); 
        
                       } else { 
        
                           other_predicates.push(pred); 
        
                       } 
        
                   } 
        
                   _ => other_predicates.push(pred), 
        
               } 
        
           } 
        
           // Process each column's predicates to remove redundancies 
        
           let mut result = other_predicates; 
        
           for (_, preds) in column_predicates { 
        
               let simplified = simplify_column_predicates(preds)?; 
        
               result.extend(simplified); 
        
           } 
        
           Ok(result)

We could go further and have reorder_filters=false either disable predicate simplification or ensure that any predicate simplification that is done respects the query text order.

My broader concern is that it seems pretty fragile to guarantee that filter evaluation order matches query text order. I could see this very easily being violated by other optimizations, either present or future. For example, what about predicate pushdown? If the query has a list of predicates and some of them can be pushed down while others cannot, that would result in predicate evaluation order becoming inconsistent with the query text order.

So we could also disable predicate pushdown when reorder_filters is false... but then what about CNF/DNF-style normalization? Offhand I'd expect that might also change predicate evaluation order, I'd need to check more thoroughly. I'd also expect that when we convert WHERE-clause join filters into ON clauses, we probably don't do that in a way that always respects the query text for all classes of predicates (e.g., non-equijoin filters, volatile expressions, etc).

tldr -- If we actually want to guarantee that evaluation order matches query text order, considerably more work is required than just gating this PR in a boolean variable. Do we have any signal that this is a real pain point with users?

…reorder # Conflicts: # datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part

neilconway · 2026-05-31T19:03:45Z

The CI failure looks like an instance of #22621

alamb · 2026-06-01T14:39:43Z

My broader concern is that it seems pretty fragile to guarantee that filter evaluation order matches query text order. I could see this very easily being violated by other optimizations, either present or future. For example, what about predicate pushdown? If the query has a list of predicates and some of them can be pushed down while others cannot, that would result in predicate evaluation order becoming inconsistent with the query text order.

🤔 that is a good point

tldr -- If we actually want to guarantee that evaluation order matches query text order, considerably more work is required than just gating this PR in a boolean variable. Do we have any signal that this is a real pain point with users?

@neilconway you make great points.

The scenario I am imagining in my head is

Someone upgrades to the version of DataFusion that has this feature
Their query performance gets worse because filter order is different due to this change
There is no way to work around the issue without changing the code

You have an excellent point that we don't have any real way to prevent this scenario from happening with other optimizer passes / changes now other than hoping we prevent it with code review 🤔

How do you think we should proceed? Heuristic reordering and then rely on stuff like @adriangb dynamic reordering to avoid the regression?

Adaptive (runtime, stats-based) conjunct reordering for FilterExec #22698

adriangb · 2026-06-01T14:43:04Z

I do agree with @neilconway that "we should guarantee" expression ordering is not a guarantee we are making now or should be making.

alamb · 2026-06-01T15:00:57Z

The CI failure looks like an instance of #22621

Thank you for the pointer -- I reviewed that PR and it looks good to me

adriangb · 2026-06-01T15:08:52Z

re dynamic vs. heuristic based: my feeling is we'll want dynamic adjustment long term. Neil is right that you need to "seed" in some way, and heuristic based is one option. But if it ends up being not much benefit and considerable code complexity it's not worth the step. If it ends up being very little code and helpful in reasonable cases then it's worth keeping. I do think there's also other heuristics we could use e.g. size of the columns the filter references from stats (so e.g. a like expression on a small column is not penalized as much as a large column, and could beat out a large mathematical expression).

adriangb · 2026-06-01T15:09:47Z

One thing that came up from #22698: we should probably come up with some benchmarks for cases we think are interesting and adversarial for various implementations and use that to guide development. It seems there is not much signal in our standard queries for this.

neilconway · 2026-06-01T15:12:38Z

@alamb To me, this falls into the category of stuff that might break when you upgrade performance-sensitive apps that are built on a system with a declarative query language / query optimizer. I think it's untenable to promise that no user workloads will see performance regressions from new versions.

The reordering done by this PR is intentionally very conservative / simple, so I would be surprised if we see widespread issues in the field arising from this change. If a user's workload is that sensitive to the exact predicate evaluation order, they might be better off encoding their filtering criteria as a custom UDF.

Dynamic filter reordering would probably help most cases in practice (albeit it might make the actual runtime behavior more unpredictable). At some point in the future, we could also potentially ship either a cost-based optimizer (where users could annotate individual UDFs with cost estimates), and/or some facility for manually specifying properties of the evaluation order (e.g., "hints").

adriangb · 2026-06-01T15:36:36Z

The cost based optimizer discussion is interesting, but I worry that for analytical systems runtime adaptivity is more impactful and easier to implement universally. Cost based optimizers are only as good as your estimates, which are going to get right if we don't have persistent stats, users write custom UDFs and increasingly AI is writing crazy complex queries.

Cost based optimizers are essential for transactional systems where the average rows per query may be 1, but for the analytical workloads that people tend to run with DataFusion the average rows processed is in the tens if not hundreds of thousands (otherwise batch sizes of 8k and row groups of 1M would make no sense). This gives us an opportunity: if a 1s query is acceptable then running for 100ms / a small fraction of the query in a sub-optimal way while we gather stats but then making the other 900ms take 500ms is a big win. We'll see enough data that we can derive relatively high quality runtime statistics. AFAIK Ballista, Spark, others I don't remember off the top of my head have runtime adaptivity. They may also have a cost based optimizer though 😄.

…r reordering - adaptive_filter.slt: results and EXPLAIN are identical with the flag on and off (reordering changes evaluation order only). - benchmarks/sql_benchmarks/adversarial_filter: a self-contained SQL benchmark suite (synthetic data generated inline via generate_series) of five equally-expensive regexp predicates with the selective one written last — where SQL order, the apache#22343 cost heuristic, and BinaryExpr pre-selection all leave the order wrong. Toggle with ADAPTIVE_FILTER_REORDERING: BENCH_NAME=adversarial_filter ADAPTIVE_FILTER_REORDERING=true \ cargo bench --bench sql Q01 (selective last): ~1.75x faster at 10M rows (more at higher ADV_ROWS). Q02 (selective first, control): neutral — confirms the win is an ordering fix and the adaptive path adds no overhead when it cannot help. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…reorder # Conflicts: # datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part

…r reordering - adaptive_filter.slt: results and EXPLAIN are identical with the flag on and off (reordering changes evaluation order only). - benchmarks/sql_benchmarks/adversarial_filter: a self-contained SQL benchmark suite (synthetic data generated inline via generate_series) of five equally-expensive regexp predicates with the selective one written last — where SQL order, the apache#22343 cost heuristic, and BinaryExpr pre-selection all leave the order wrong. Toggle with ADAPTIVE_FILTER_REORDERING and run via the standard harness: BENCH_NAME=adversarial_filter ADAPTIVE_FILTER_REORDERING=true \ cargo bench --bench sql # or: ADAPTIVE_FILTER_REORDERING=true ./benchmarks/bench.sh run adversarial_filter Q01 (selective last): ~1.75x faster at 10M rows (more at higher ADV_ROWS). Q02 (selective first, control): neutral — confirms the win is an ordering fix and the adaptive path adds no overhead when it cannot help. - bench.sh: add an `adversarial_filter` run target (data generated inline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r reordering - adaptive_filter.slt: results and EXPLAIN are identical with the flag on and off (reordering changes evaluation order only). - benchmarks/sql_benchmarks/adversarial_filter: a self-contained SQL benchmark suite (synthetic data generated inline via generate_series) of five equally-expensive regexp predicates with the selective one written last — where SQL order, the apache#22343 cost heuristic, and BinaryExpr pre-selection all leave the order wrong. Toggle with the standard config env var and run via the standard harness: DATAFUSION_EXECUTION_ADAPTIVE_FILTER_REORDERING=true \ ./benchmarks/bench.sh run adversarial_filter (The dfbench suites read that env var via SessionConfig::from_env; the SQL bench harness uses SessionContext::new(), so the suite's init SQL wires it in via env interpolation.) Q01 (selective last): ~1.75x faster at 10M rows (more at higher ADV_ROWS). Q02 (selective first, control): neutral — confirms the win is an ordering fix and the adaptive path adds no overhead when it cannot help. - bench.sh: add an `adversarial_filter` run target (data generated inline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 18, 2026

Trim unnecessary unit test

c7523fd

neilconway commented May 18, 2026

View reviewed changes

Comment thread datafusion/optimizer/src/push_down_filter.rs Outdated

Tweak text

7284fe1

prettier

40995f1

.

74b99e3

adriangb mentioned this pull request May 18, 2026

Reorder boolean expressions (including filter predicates) according to evaluation cost / selectivity #11262

Open

Update expected plan for TPC-H Q16

4a4528b

neilconway added 2 commits May 21, 2026 10:17

Merge remote-tracking branch 'origin/main' into neilc/perf-predicate-…

60aad85

…reorder

Fix clippy

644ffb9

alamb reviewed May 28, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into neilc/perf-predicate-…

535a45a

…reorder # Conflicts: # datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part

adriangb mentioned this pull request Jun 1, 2026

Adaptive (runtime, stats-based) conjunct reordering for FilterExec #22698

Draft

Merge remote-tracking branch 'origin/main' into neilc/perf-predicate-…

cfad0f7

…reorder # Conflicts: # datafusion/sqllogictest/test_files/tpch/plans/q16.slt.part

Conversation

neilconway commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

neilconway commented May 18, 2026

Uh oh!

adriangbot commented May 18, 2026

Uh oh!

adriangbot commented May 18, 2026

Uh oh!

adriangbot commented May 18, 2026

Uh oh!

adriangbot commented May 18, 2026

Uh oh!

adriangbot commented May 18, 2026

Uh oh!

adriangbot commented May 18, 2026

Uh oh!

asolimando commented May 18, 2026

Uh oh!

neilconway commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neilconway commented May 18, 2026

Uh oh!

adriangb commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neilconway commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

2010YOUY01 commented May 19, 2026

Uh oh!

adriangb commented May 19, 2026

Uh oh!

asolimando commented May 19, 2026

Uh oh!

alamb commented May 27, 2026

Uh oh!

neilconway commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilconway commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neilconway commented May 31, 2026

Uh oh!

alamb commented Jun 1, 2026

Uh oh!

adriangb commented Jun 1, 2026

Uh oh!

alamb commented Jun 1, 2026

Uh oh!

adriangb commented Jun 1, 2026

Uh oh!

neilconway commented May 18, 2026 •

edited

Loading

neilconway commented May 18, 2026 •

edited

Loading

adriangb commented May 18, 2026 •

edited

Loading

neilconway commented May 19, 2026 •

edited

Loading

neilconway commented May 27, 2026 •

edited

Loading

neilconway commented May 31, 2026 •

edited

Loading

adriangb commented Jun 1, 2026 •

edited

Loading