Skip to content

8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime#621

Open
mmm-choi wants to merge 2 commits into
openjdk:masterfrom
mmm-choi:JDK-8372584
Open

8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime#621
mmm-choi wants to merge 2 commits into
openjdk:masterfrom
mmm-choi:JDK-8372584

Conversation

@mmm-choi

@mmm-choi mmm-choi commented Jun 17, 2026

Copy link
Copy Markdown

This is a backport of JDK-8372584 (858d2e43) to 25u.

This is the second of three related backports and is the substantive change. It must be applied after JDK-8372625 (#620):

  1. JDK-8372625 (#620): Removes the now-unnecessary supports_fast_thread_cpu_time dual-path logic. This is the prerequisite.
  2. JDK-8372584 (this PR): Replaces /proc parsing with clock_gettime. This is the optimization.
  3. JDK-8373557 (#622): Removes comments made stale by this change.

Why this backport (this is an enhancement)

While it is classified as an enhancement, it addresses a long-standing and severe performance defect, JDK-8210452 (reported in 2018), where getCurrentThreadUserTime() is 30x to 400x slower than getCurrentThreadCpuTime() on Linux and degrades further under concurrency.

The legacy implementation opens, reads, and sscanf-parses /proc/self/task/<tid>/stat on every call. That is roughly 3 syscalls plus a full-page (4096-byte) kernel allocation per sample, of which over 99% is waste, all just to extract one integer. This change replaces that with a single clock_gettime() using the CPUCLOCK_VIRT clock-id bit, which is the de-facto kernel ABI glibc has relied on for over 20 years. There is more background at https://norlinder.nu/posts/User-CPU-Time-JVM/.

There is concrete downstream demand for this. The /proc overhead currently makes continuous self-instrumentation of per-thread CPU usage prohibitive for production users. The change is also a net code simplification (the diff removes more than it adds) and is fully integrated in mainline (JDK 26).

Not a clean backport

Manual resolution was required for one hunk. Mainline commit openjdk/jdk@80ab094 JDK-8347707 (os::snprintf standardization, not in 25u) had changed snprintf to os::snprintf_checked inside the /proc-reading function that this change deletes, so the deletion context did not match. I resolved it by taking mainline's side (the new clock_gettime body) in full. The resulting code, including the new get_thread_clockid() helper and the ThreadMXBeanBench microbenchmark, is byte-for-byte identical to mainline.

How the fix was validated and checked for regressions

I built two release images that are identical except for these three commits:

  • baseline is 25u master (the exact commit this stack is based on)
  • fixed is master plus JDK-8372625, JDK-8372584, and JDK-8373557

I confirmed via nm on libjvm.so that the only difference is this code. The baseline exports the old slow_thread_cpu_time and no os::Linux::thread_cpu_time(clockid_t), and the fixed build is the inverse. After resolution I also diffed the thread-CPU-time region of os_linux.cpp and os_linux.hpp against the mainline commit and confirmed it is identical, apart from retaining _pthread_setname_np, which is unrelated and absent from mainline only because of JDK-8368124, which is not in 25u.

Functional and regression testing: The thread-CPU-time test set passes on both this commit and the full 3-commit stack, on linux-x86_64 and linux-aarch64:

Test Result
java/lang/management/ThreadMXBean/ThreadUserTime.java pass
java/lang/management/ThreadMXBean/ThreadCpuTime.java pass
java/lang/management/ThreadMXBean/VirtualThreads.java pass
vmTestbase/nsk/monitoring/ThreadMXBean/GetThreadCpuTime (10) pass
vmTestbase/nsk/monitoring/ThreadMXBean/isThreadCpuTimeSupported (5) pass
vmTestbase/nsk/monitoring/ThreadMXBean/isCurrentThreadCpuTimeSupported (5) pass
jdk/jfr/event/runtime/TestThreadCpuTimeEvent.java pass
jdk/jfr/event/profiling/* (CPU-time sampler) pass

These exercise every consumer of the modified path: JMX (ThreadMXBean), JVMTI (nsk), the JFR thread-CPU-time event, and the JFR CPU-time sampler. ThreadUserTime.java in particular asserts the key invariants the change must preserve. getCurrentThreadUserTime() returns -1 before enabling, returns a non-negative value after enabling, and per-thread user time is monotonic and never exceeds the corresponding CPU (user+sys) time.

Performance, reproduced in 25u. Measured with the exact JMH microbenchmark added by this commit (ThreadMXBeanBench, SampleTime, @Fork(10) @Warmup(2x5s) @Measurement(5x5s), single thread), compiled against JMH 1.37 and run as the same benchmarks jar on both images:

Metric baseline (/proc) this change (clock_gettime) improvement
Mean 0.019014 ms/op 0.000775 ms/op 24.5x
p50 0.018528 ms/op 0.000760 ms/op 24.4x
p90 0.019680 ms/op 0.000770 ms/op 25.6x
p99 0.025184 ms/op 0.001090 ms/op 23.1x
p99.9 0.039232 ms/op 0.005656 ms/op 6.9x
p99.99 0.607232 ms/op 0.014432 ms/op 42.1x
p100 (max) 10.059776 ms/op 1.470464 ms/op 6.8x

(baseline n=6,586,868 samples, fixed n=5,230,948.) The common case drops from about 19 us to under 1 us, which is a single syscall versus a /proc open+read+parse, and the worst-case sample drops from about 10 ms to about 1.5 ms. This matches the shape of the original PR's result (openjdk/jdk#28556) and confirms the tail-latency reduction that motivates the change.

Risk

This is a Linux-only behavioral change (proc-derived to clock_gettime(CPUCLOCK_VIRT) for user time), with no change to other platforms. The shared-file edits in this stack are comment-only. It was reviewed in tip and is covered by the existing regression test ThreadUserTime.java plus the added microbenchmark.

Testing

Built release on linux-x86_64 and linux-aarch64. The thread-CPU-time test set passes (see the table above).



Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • JDK-8372584 needs maintainer approval

Issue

  • JDK-8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk25u-dev.git pull/621/head:pull/621
$ git checkout pull/621

Update a local copy of the PR:
$ git checkout pull/621
$ git pull https://git.openjdk.org/jdk25u-dev.git pull/621/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 621

View PR using the GUI difftool:
$ git pr show -t 621

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk25u-dev/pull/621.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper

bridgekeeper Bot commented Jun 17, 2026

Copy link
Copy Markdown

👋 Welcome back mmm-choi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk

openjdk Bot commented Jun 17, 2026

Copy link
Copy Markdown

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot changed the title Backport 858d2e434dd4eb8aa94784bb1cd115554eec5dff 8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime Jun 17, 2026
@openjdk

openjdk Bot commented Jun 17, 2026

Copy link
Copy Markdown

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk Bot added backport Port of a pull request already in a different code base rfr Pull request is ready for review labels Jun 17, 2026
@mlbridge

mlbridge Bot commented Jun 17, 2026

Copy link
Copy Markdown

Webrevs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Port of a pull request already in a different code base rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

2 participants