8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime#621
Open
mmm-choi wants to merge 2 commits into
Open
8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime#621mmm-choi wants to merge 2 commits into
mmm-choi wants to merge 2 commits into
Conversation
|
👋 Welcome back mmm-choi! A progress list of the required criteria for merging this PR into |
5 tasks
|
❗ This change is not yet ready to be integrated. |
|
This backport pull request has now been updated with issue from the original commit. |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a backport of JDK-8372584 (858d2e43) to 25u.
This is the second of three related backports and is the substantive change. It must be applied after JDK-8372625 (#620):
supports_fast_thread_cpu_timedual-path logic. This is the prerequisite./procparsing withclock_gettime. This is the optimization.Why this backport (this is an enhancement)
While it is classified as an enhancement, it addresses a long-standing and severe performance defect, JDK-8210452 (reported in 2018), where
getCurrentThreadUserTime()is 30x to 400x slower thangetCurrentThreadCpuTime()on Linux and degrades further under concurrency.The legacy implementation opens, reads, and
sscanf-parses/proc/self/task/<tid>/staton every call. That is roughly 3 syscalls plus a full-page (4096-byte) kernel allocation per sample, of which over 99% is waste, all just to extract one integer. This change replaces that with a singleclock_gettime()using theCPUCLOCK_VIRTclock-id bit, which is the de-facto kernel ABI glibc has relied on for over 20 years. There is more background at https://norlinder.nu/posts/User-CPU-Time-JVM/.There is concrete downstream demand for this. The
/procoverhead currently makes continuous self-instrumentation of per-thread CPU usage prohibitive for production users. The change is also a net code simplification (the diff removes more than it adds) and is fully integrated in mainline (JDK 26).Not a clean backport
Manual resolution was required for one hunk. Mainline commit openjdk/jdk@80ab094 JDK-8347707 (
os::snprintfstandardization, not in 25u) had changedsnprintftoos::snprintf_checkedinside the/proc-reading function that this change deletes, so the deletion context did not match. I resolved it by taking mainline's side (the newclock_gettimebody) in full. The resulting code, including the newget_thread_clockid()helper and theThreadMXBeanBenchmicrobenchmark, is byte-for-byte identical to mainline.How the fix was validated and checked for regressions
I built two
releaseimages that are identical except for these three commits:master(the exact commit this stack is based on)masterplus JDK-8372625, JDK-8372584, and JDK-8373557I confirmed via
nmonlibjvm.sothat the only difference is this code. The baseline exports the oldslow_thread_cpu_timeand noos::Linux::thread_cpu_time(clockid_t), and the fixed build is the inverse. After resolution I also diffed the thread-CPU-time region ofos_linux.cppandos_linux.hppagainst the mainline commit and confirmed it is identical, apart from retaining_pthread_setname_np, which is unrelated and absent from mainline only because of JDK-8368124, which is not in 25u.Functional and regression testing: The thread-CPU-time test set passes on both this commit and the full 3-commit stack, on linux-x86_64 and linux-aarch64:
java/lang/management/ThreadMXBean/ThreadUserTime.javajava/lang/management/ThreadMXBean/ThreadCpuTime.javajava/lang/management/ThreadMXBean/VirtualThreads.javavmTestbase/nsk/monitoring/ThreadMXBean/GetThreadCpuTime(10)vmTestbase/nsk/monitoring/ThreadMXBean/isThreadCpuTimeSupported(5)vmTestbase/nsk/monitoring/ThreadMXBean/isCurrentThreadCpuTimeSupported(5)jdk/jfr/event/runtime/TestThreadCpuTimeEvent.javajdk/jfr/event/profiling/*(CPU-time sampler)These exercise every consumer of the modified path: JMX (
ThreadMXBean), JVMTI (nsk), the JFR thread-CPU-time event, and the JFR CPU-time sampler.ThreadUserTime.javain particular asserts the key invariants the change must preserve.getCurrentThreadUserTime()returns-1before enabling, returns a non-negative value after enabling, and per-thread user time is monotonic and never exceeds the corresponding CPU (user+sys) time.Performance, reproduced in 25u. Measured with the exact JMH microbenchmark added by this commit (
ThreadMXBeanBench,SampleTime,@Fork(10) @Warmup(2x5s) @Measurement(5x5s), single thread), compiled against JMH 1.37 and run as the same benchmarks jar on both images:/proc)clock_gettime)(baseline n=6,586,868 samples, fixed n=5,230,948.) The common case drops from about 19 us to under 1 us, which is a single syscall versus a
/procopen+read+parse, and the worst-case sample drops from about 10 ms to about 1.5 ms. This matches the shape of the original PR's result (openjdk/jdk#28556) and confirms the tail-latency reduction that motivates the change.Risk
This is a Linux-only behavioral change (proc-derived to
clock_gettime(CPUCLOCK_VIRT)for user time), with no change to other platforms. The shared-file edits in this stack are comment-only. It was reviewed in tip and is covered by the existing regression testThreadUserTime.javaplus the added microbenchmark.Testing
Built
releaseon linux-x86_64 and linux-aarch64. The thread-CPU-time test set passes (see the table above).Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk25u-dev.git pull/621/head:pull/621$ git checkout pull/621Update a local copy of the PR:
$ git checkout pull/621$ git pull https://git.openjdk.org/jdk25u-dev.git pull/621/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 621View PR using the GUI difftool:
$ git pr show -t 621Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk25u-dev/pull/621.diff
Using Webrev
Link to Webrev Comment