Skip to content

[refact](udf) remove the udf cache expiration_time ‌property‌#63897

Open
zhangstar333 wants to merge 4 commits into
apache:masterfrom
zhangstar333:doris-master-udf
Open

[refact](udf) remove the udf cache expiration_time ‌property‌#63897
zhangstar333 wants to merge 4 commits into
apache:masterfrom
zhangstar333:doris-master-udf

Conversation

@zhangstar333
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Problem Summary:
doc apache/doris-website#3845

CREATE FUNCTION print_12() RETURNS int 
PROPERTIES (
    "file" = "file:///path/to/java-udf-demo-jar-with-dependencies.jar",
    "symbol" = "org.apache.doris.udf.Print", 
    "always_nullable"="true",
    "type" = "JAVA_UDF",
    "static_load" = "true", // default value is false
    "expiration_time" = "60" // default value is 360 minutes
);
before in the java-udf could use  static_load and expiration_time to control the cache jar times in BE.
which use a backgroud thread to scan the jars every ten minutes, check it's init times, and then drop it if time expire.
those will cause some long running query failed when the backgroud thread remove it.
Now, remove the expiration_time, and the jar will be clean when drop fucntion immediately

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • [] No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Issue Number: close #xxx

<!--Describe your changes.-->
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhangstar333
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: I found two correctness issues in the new static UDF classloader cache behavior. The main blocker is a first-use race where one executor can close another executor's live URLClassLoader, preserving the NoClassDefFoundError class of failure this PR is trying to eliminate. There is also a regression for static-load UDFs loaded through the system classloader, where a null classLoader is valid but now treated as a cache miss.

Critical checkpoints:

  • Goal/test: The PR aims to stop time-based UDF classloader eviction from breaking static-load Java UDFs. The goal is only partially met; concurrent first use can still close a live loader, and system-classloader static UDFs are not cached effectively. I did not find tests covering these concurrency/system-loader paths.
  • Scope: The change is focused, but the cache lifecycle semantics changed from synchronized ExpiringMap operations to ConcurrentHashMap replacement without atomic construction.
  • Concurrency: The modified static cache is shared by concurrent Java UDF executors and BE clean-cache tasks. The cache miss/build/put path is not atomic, which creates the live-loader close race noted inline.
  • Lifecycle: UdfClassCache.classLoader may intentionally be null for system-classloader UDFs; the new validity check does not preserve that lifecycle invariant.
  • Configuration/compatibility: expiration_time remains accepted and serialized but is now ignored; this is a user-visible semantic change and should be documented or removed in a coordinated way.
  • Parallel paths: DROP FUNCTION cleanup still exists through FE clean-cache tasks and BE JNI cleanup; static-load lookup is the affected path.
  • Testing: No new tests were included for concurrent static-load first use, DROP/reload lifecycle, or empty jarPath/system-classloader static UDFs.
  • Observability/performance: No additional observability is required for the core issue, but repeated rebuilding for system-classloader static UDFs is avoidable overhead.
  • Data/transaction/persistence: Not applicable to data visibility or transaction persistence.

User focus: No additional user-provided review focus was specified.

@zhangstar333
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31603 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 599f7b7cd3e0ac61b8ab16bfd113d28c9cdd8a4e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17594	4048	4037	4037
q2	q3	10803	1405	811	811
q4	4688	475	336	336
q5	7604	2287	2123	2123
q6	239	175	135	135
q7	941	805	646	646
q8	9376	1776	1636	1636
q9	5172	4978	4968	4968
q10	6417	2204	1881	1881
q11	435	271	244	244
q12	639	414	301	301
q13	18121	3414	2740	2740
q14	263	255	235	235
q15	q16	820	775	707	707
q17	970	880	939	880
q18	6949	5792	5480	5480
q19	1297	1310	1090	1090
q20	601	472	291	291
q21	6209	2887	2741	2741
q22	493	431	321	321
Total cold run time: 99631 ms
Total hot run time: 31603 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4848	4768	4806	4768
q2	q3	4914	5293	4622	4622
q4	2146	2194	1416	1416
q5	5056	4733	4659	4659
q6	231	183	127	127
q7	1915	1740	1546	1546
q8	2405	2176	2085	2085
q9	7945	7458	7483	7458
q10	4747	4671	4240	4240
q11	532	391	352	352
q12	726	740	529	529
q13	2996	3398	2830	2830
q14	286	290	263	263
q15	q16	677	699	623	623
q17	1276	1251	1247	1247
q18	7309	6745	6882	6745
q19	1120	1116	1112	1112
q20	2219	2220	1945	1945
q21	5328	4686	4545	4545
q22	517	461	386	386
Total cold run time: 57193 ms
Total hot run time: 51498 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172206 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 599f7b7cd3e0ac61b8ab16bfd113d28c9cdd8a4e, data reload: false

query5	4318	668	534	534
query6	333	234	209	209
query7	4249	580	330	330
query8	331	240	220	220
query9	8801	4088	4097	4088
query10	445	366	331	331
query11	5819	2545	2258	2258
query12	183	134	127	127
query13	1286	623	474	474
query14	6162	5488	5184	5184
query14_1	4490	4507	4481	4481
query15	217	209	190	190
query16	999	453	439	439
query17	1085	754	627	627
query18	2532	518	392	392
query19	224	202	161	161
query20	138	127	134	127
query21	218	134	119	119
query22	13592	13598	13391	13391
query23	17282	16469	16142	16142
query23_1	16249	16320	16334	16320
query24	7619	1784	1318	1318
query24_1	1327	1314	1352	1314
query25	553	485	427	427
query26	1307	321	179	179
query27	2697	555	340	340
query28	4433	2030	2036	2030
query29	982	617	498	498
query30	308	245	207	207
query31	1133	1094	949	949
query32	99	79	77	77
query33	549	352	300	300
query34	1185	1175	669	669
query35	774	794	700	700
query36	1430	1463	1326	1326
query37	155	104	90	90
query38	3223	3171	3111	3111
query39	929	929	899	899
query39_1	890	888	882	882
query40	237	146	127	127
query41	67	63	62	62
query42	111	109	109	109
query43	332	337	299	299
query44	
query45	214	207	193	193
query46	1111	1203	768	768
query47	2386	2376	2280	2280
query48	394	441	278	278
query49	632	500	386	386
query50	984	338	257	257
query51	4479	4359	4247	4247
query52	105	113	94	94
query53	252	275	209	209
query54	327	275	269	269
query55	99	96	90	90
query56	302	308	312	308
query57	1433	1465	1332	1332
query58	304	273	270	270
query59	1635	1664	1436	1436
query60	324	331	312	312
query61	158	155	158	155
query62	705	671	593	593
query63	248	202	204	202
query64	2402	847	636	636
query65	
query66	1714	488	355	355
query67	29813	29043	29556	29043
query68	
query69	478	346	301	301
query70	1002	970	974	970
query71	307	275	275	275
query72	3054	2845	2648	2648
query73	807	782	477	477
query74	5199	4985	4801	4801
query75	2742	2638	2304	2304
query76	2317	1174	776	776
query77	422	432	345	345
query78	12397	12582	11927	11927
query79	1540	1034	806	806
query80	648	538	456	456
query81	459	282	241	241
query82	1524	152	119	119
query83	352	273	250	250
query84	294	141	115	115
query85	876	551	457	457
query86	411	339	316	316
query87	3450	3389	3256	3256
query88	3708	2771	2752	2752
query89	456	400	355	355
query90	1982	179	184	179
query91	183	167	143	143
query92	83	79	75	75
query93	1552	1439	949	949
query94	562	344	305	305
query95	652	388	359	359
query96	1076	867	332	332
query97	2774	2751	2668	2668
query98	237	227	237	227
query99	1161	1142	1024	1024
Total cold run time: 254974 ms
Total hot run time: 172206 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants