Skip to content

[fix](sort) keep heap TopN pruning for expression order keys#63902

Draft
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:keep-heap-topn-prue
Draft

[fix](sort) keep heap TopN pruning for expression order keys#63902
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:keep-heap-topn-prue

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 29, 2026

What problem does this PR solve?

Problem Summary:

For TopN queries with computed order keys, such as SELECT MURMUR_HASH3_32(number) AS n FROM numbers_20m ORDER BY n LIMIT 65535, Doris created a TopN runtime predicate but could not push it down to storage because the target was an expression instead of a slot/key column. However, heap sort treated the existence of any runtime predicate as a reason to disable its local TopN pruning, so this path lost local pruning without gaining effective scan-side pruning.

This PR narrows that condition. Heap sort now disables local TopN pruning only when the runtime predicate targets are all slots. Expression order keys keep the local heap filter, which restores early row pruning in the sorter.

In local reproduction of function_order_by_number_limit_big, Doris improved from about 2.3s to 2.8s down to about 0.53s to 0.69s, and the new profile shows TopNFilterRows: 19.555616M instead of 0.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 29, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31470 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e46b6093e0a831ecb5f49400b3c98e86cc79ee3f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17651	4039	4034	4034
q2	q3	10775	1369	824	824
q4	4699	473	356	356
q5	7825	2308	2110	2110
q6	380	177	140	140
q7	964	793	650	650
q8	9351	1777	1577	1577
q9	7045	4956	4977	4956
q10	6467	2255	1870	1870
q11	453	274	249	249
q12	703	432	314	314
q13	18195	3440	2819	2819
q14	268	264	234	234
q15	q16	824	776	715	715
q17	1006	943	986	943
q18	7019	5925	5603	5603
q19	1186	1302	1147	1147
q20	499	410	254	254
q21	5913	2641	2362	2362
q22	443	357	313	313
Total cold run time: 101666 ms
Total hot run time: 31470 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4541	4452	4378	4378
q2	q3	4554	4932	4360	4360
q4	2180	2219	1423	1423
q5	4506	4332	5079	4332
q6	258	199	152	152
q7	1997	1867	1611	1611
q8	2621	2223	2224	2223
q9	7971	7974	8101	7974
q10	4877	4757	4319	4319
q11	562	428	378	378
q12	759	771	550	550
q13	3378	3647	3008	3008
q14	302	311	282	282
q15	q16	704	725	667	667
q17	1370	1369	1346	1346
q18	7973	7396	6848	6848
q19	1112	1120	1134	1120
q20	2223	2230	1924	1924
q21	5365	4727	4559	4559
q22	512	466	427	427
Total cold run time: 57765 ms
Total hot run time: 51881 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172273 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e46b6093e0a831ecb5f49400b3c98e86cc79ee3f, data reload: false

query5	4308	679	529	529
query6	349	223	198	198
query7	4295	616	294	294
query8	323	241	235	235
query9	8818	4156	4136	4136
query10	464	353	304	304
query11	5768	2543	2204	2204
query12	190	137	129	129
query13	1311	664	458	458
query14	6142	5551	5245	5245
query14_1	4594	4584	4511	4511
query15	232	213	188	188
query16	1009	443	476	443
query17	1180	767	621	621
query18	2454	499	379	379
query19	220	221	169	169
query20	141	139	133	133
query21	222	154	126	126
query22	13655	13551	13484	13484
query23	17430	16561	16190	16190
query23_1	16352	16397	16390	16390
query24	7453	1803	1325	1325
query24_1	1351	1351	1339	1339
query25	578	484	421	421
query26	1313	310	175	175
query27	2702	569	346	346
query28	4416	2063	2034	2034
query29	1004	622	510	510
query30	304	237	197	197
query31	1141	1086	962	962
query32	89	76	76	76
query33	569	361	303	303
query34	1184	1160	681	681
query35	784	806	707	707
query36	1433	1416	1295	1295
query37	158	113	96	96
query38	3245	3216	3076	3076
query39	923	941	926	926
query39_1	888	888	877	877
query40	236	146	128	128
query41	67	65	61	61
query42	112	112	114	112
query43	349	344	304	304
query44	
query45	217	207	198	198
query46	1076	1200	759	759
query47	2404	2386	2287	2287
query48	399	415	292	292
query49	643	505	390	390
query50	993	354	258	258
query51	4337	4292	4273	4273
query52	114	112	126	112
query53	260	292	208	208
query54	323	282	261	261
query55	100	93	90	90
query56	304	313	310	310
query57	1451	1456	1314	1314
query58	309	278	267	267
query59	1649	1682	1459	1459
query60	330	329	316	316
query61	152	154	152	152
query62	706	646	600	600
query63	252	204	215	204
query64	2422	799	661	661
query65	
query66	1711	500	361	361
query67	29742	29663	29521	29521
query68	
query69	472	355	311	311
query70	1057	1033	995	995
query71	305	282	279	279
query72	2984	2724	2354	2354
query73	895	757	430	430
query74	5100	4955	4804	4804
query75	2701	2616	2272	2272
query76	2337	1165	872	872
query77	419	413	345	345
query78	12378	12397	11860	11860
query79	1503	1102	749	749
query80	642	548	454	454
query81	453	283	241	241
query82	1403	163	123	123
query83	360	278	257	257
query84	261	142	112	112
query85	901	541	463	463
query86	392	347	320	320
query87	3399	3404	3249	3249
query88	3721	2817	2803	2803
query89	453	395	345	345
query90	1920	190	188	188
query91	182	172	144	144
query92	82	80	77	77
query93	1514	1485	889	889
query94	551	352	329	329
query95	701	388	353	353
query96	1031	813	378	378
query97	2727	2762	2625	2625
query98	239	227	233	227
query99	1154	1165	1040	1040
Total cold run time: 254682 ms
Total hot run time: 172273 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (13/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.81% (27361/38104)
Line Coverage 55.54% (293793/528958)
Region Coverage 52.34% (243593/465442)
Branch Coverage 53.84% (105365/195711)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants