Skip to content

[improvement](mtmv) Optimize MTMV partition lineage check#63899

Open
seawinde wants to merge 1 commit into
apache:masterfrom
seawinde:fix-mtmv-lineage-cache-perf
Open

[improvement](mtmv) Optimize MTMV partition lineage check#63899
seawinde wants to merge 1 commit into
apache:masterfrom
seawinde:fix-mtmv-lineage-cache-perf

Conversation

@seawinde
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary:
Complex partitioned async MTMV creation can spend excessive FE CPU in partition lineage analysis. The hot path repeatedly shuttles partition and checked expressions through the full plan lineage replacer, so wide UNION ALL, join, and aggregate plans multiply the same plan walks during CREATE MATERIALIZED VIEW analysis.

Root cause: In PartitionIncrementMaintainer.PartitionIncrementChecker.checkPartition(), each partition candidate and checked expression calls ExpressionUtils.shuttleExpressionWithLineage() separately. Each call traverses the plan through ExpressionLineageReplacer and rebuilds equivalent normalized expressions.

Change Summary:

File Change Description
PartitionIncrementMaintainer.java Batch lineage shuttle calls, cache lineage-visible named expressions by plan identity, cache normalized expressions, and reuse the normalization rewrite context during one partition increment check.
PartitionColumnTraceTest.java Add a CTE plus UNION ALL plus wide aggregate lineage test to keep partition lineage behavior covered.
test_mtmv_partition_lineage_performance.groovy Add a desensitized static SQL performance regression case for the complex partitioned MTMV shape.

Design Rationale: The change keeps the existing ExpressionLineageReplacer semantics and limits caching to a single PartitionIncrementCheckContext. This avoids sharing mutable analysis state across optimizer contexts while removing repeated full plan walks for the same plan and expression set.

Release note

Improve performance when creating complex partitioned async materialized views.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
      • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.exploration.mv.PartitionColumnTraceTest
      • git diff --check
      • Tried ./run-regression-test.sh --run -d performance_p0 -s test_mtmv_partition_lineage_performance, but the local Doris FE was not running on 127.0.0.1:9030, so the regression could not execute SQL.
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: Complex partitioned async MTMV creation can spend excessive FE CPU in partition lineage analysis. The hot path repeatedly shuttles partition and checked expressions through the full plan lineage replacer, so wide UNION ALL, join, and aggregate plans multiply the same plan walks during CREATE MATERIALIZED VIEW analysis.

Root cause: In PartitionIncrementMaintainer.PartitionIncrementChecker.checkPartition(), each partition candidate and checked expression calls ExpressionUtils.shuttleExpressionWithLineage() separately. Each call traverses the plan through ExpressionLineageReplacer and rebuilds equivalent normalized expressions.

Change Summary:

| File | Change Description |
|------|-------------------|
| PartitionIncrementMaintainer.java | Batch lineage shuttle calls, cache lineage-visible named expressions by plan identity, cache normalized expressions, and reuse the normalization rewrite context during one partition increment check. |
| PartitionColumnTraceTest.java | Add a CTE plus UNION ALL plus wide aggregate lineage test to keep partition lineage behavior covered. |
| test_mtmv_partition_lineage_performance.groovy | Add a desensitized static SQL performance regression case for the complex partitioned MTMV shape. |

Design Rationale: The change keeps the existing ExpressionLineageReplacer semantics and limits caching to a single PartitionIncrementCheckContext. This avoids sharing mutable analysis state across optimizer contexts while removing repeated full plan walks for the same plan and expression set.

### Release note

Improve performance when creating complex partitioned async materialized views.

### Check List (For Author)

- Test: Unit Test / Manual test
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.exploration.mv.PartitionColumnTraceTest
    - Manual test: git diff --check
    - Manual test: Tried ./run-regression-test.sh --run -d performance_p0 -s test_mtmv_partition_lineage_performance, but the local Doris FE was not running on 127.0.0.1:9030, so the regression could not execute SQL.
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@seawinde seawinde changed the title [improvement](fe) Optimize MTMV partition lineage check [improvement](mtmv) Optimize MTMV partition lineage check May 29, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31786 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 838ebc2f092a73702f1bea0d4781fdf2ef6689a6, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17587	4061	4076	4061
q2	q3	10768	1394	823	823
q4	4685	474	344	344
q5	7601	2251	2070	2070
q6	238	179	136	136
q7	998	762	640	640
q8	9355	1764	1640	1640
q9	5137	4921	4916	4916
q10	6393	2201	1876	1876
q11	438	272	248	248
q12	638	434	301	301
q13	18109	3372	2779	2779
q14	272	261	249	249
q15	q16	831	784	718	718
q17	996	938	990	938
q18	7048	5651	5586	5586
q19	1311	1248	1081	1081
q20	548	571	310	310
q21	6306	2775	2744	2744
q22	568	375	326	326
Total cold run time: 99827 ms
Total hot run time: 31786 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4832	4792	4893	4792
q2	q3	4916	5381	4685	4685
q4	2086	2191	1369	1369
q5	4995	4664	4655	4655
q6	254	183	134	134
q7	1934	1764	1518	1518
q8	2378	2232	2151	2151
q9	7901	7338	7357	7338
q10	4741	4656	4313	4313
q11	559	382	353	353
q12	727	738	525	525
q13	3025	3431	2789	2789
q14	267	278	250	250
q15	q16	681	702	614	614
q17	1266	1273	1250	1250
q18	7324	6785	6864	6785
q19	1152	1096	1088	1088
q20	2210	2209	1961	1961
q21	5249	4589	4435	4435
q22	506	460	430	430
Total cold run time: 57003 ms
Total hot run time: 51435 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 838ebc2f092a73702f1bea0d4781fdf2ef6689a6, data reload: false

query5	4394	653	517	517
query6	330	224	203	203
query7	4225	578	321	321
query8	323	249	235	235
query9	8829	4158	4152	4152
query10	460	347	310	310
query11	5795	2419	2240	2240
query12	181	131	125	125
query13	1287	602	442	442
query14	6050	5504	5221	5221
query14_1	4539	4543	4533	4533
query15	216	209	189	189
query16	1022	466	417	417
query17	1154	741	615	615
query18	2705	505	376	376
query19	226	209	176	176
query20	144	137	131	131
query21	225	145	126	126
query22	13668	13600	13401	13401
query23	17425	16523	16217	16217
query23_1	16368	16431	16337	16337
query24	7465	1799	1309	1309
query24_1	1305	1311	1329	1311
query25	593	509	456	456
query26	1313	359	184	184
query27	2690	544	357	357
query28	4395	2048	2049	2048
query29	1045	654	539	539
query30	308	244	205	205
query31	1137	1094	967	967
query32	95	79	78	78
query33	552	372	356	356
query34	1177	1172	665	665
query35	801	811	707	707
query36	1420	1394	1269	1269
query37	156	106	90	90
query38	3222	3199	3114	3114
query39	940	924	911	911
query39_1	889	867	883	867
query40	234	145	125	125
query41	64	62	66	62
query42	113	112	111	111
query43	340	336	300	300
query44	
query45	212	203	200	200
query46	1067	1218	739	739
query47	2381	2376	2242	2242
query48	392	422	309	309
query49	641	497	384	384
query50	1052	357	257	257
query51	4374	4320	4182	4182
query52	112	106	96	96
query53	257	288	209	209
query54	309	281	278	278
query55	103	98	101	98
query56	301	330	316	316
query57	1440	1393	1365	1365
query58	309	266	274	266
query59	1660	1707	1441	1441
query60	317	325	311	311
query61	166	158	160	158
query62	703	644	589	589
query63	253	205	207	205
query64	2388	805	651	651
query65	
query66	1699	498	355	355
query67	29866	29705	29611	29611
query68	
query69	476	345	308	308
query70	1057	982	1022	982
query71	303	281	259	259
query72	2984	2700	2456	2456
query73	885	763	450	450
query74	5098	5004	4801	4801
query75	2680	2624	2264	2264
query76	2282	1154	788	788
query77	405	441	341	341
query78	12306	12462	11701	11701
query79	1482	1048	759	759
query80	700	533	462	462
query81	467	285	245	245
query82	1532	160	124	124
query83	341	281	259	259
query84	260	155	113	113
query85	884	532	455	455
query86	395	335	325	325
query87	3445	3351	3249	3249
query88	3685	2790	2764	2764
query89	449	406	354	354
query90	1866	193	176	176
query91	181	175	145	145
query92	84	75	76	75
query93	1515	1479	916	916
query94	564	341	333	333
query95	697	475	353	353
query96	1033	808	343	343
query97	2741	2728	2610	2610
query98	238	233	227	227
query99	1184	1158	1041	1041
Total cold run time: 254918 ms
Total hot run time: 172293 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 88.89% (56/63) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 85.71% (54/63) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants