[WIP]: fix(cosmos): HTTP/2 PING handler must observe child-stream read activity, not just parent-pipeline frames by jeet1995 · Pull Request #49398 · Azure/azure-sdk-for-java

jeet1995 · 2026-06-07T01:55:39Z

Description

Follow-up to #49095.

Root cause

Http2PingHandler is installed on the parent (TCP) channel. Its lastReadActivityNanos field was only updated from channelRead on the parent pipeline — which observes PING ACK / SETTINGS / GOAWAY but not HEADERS / DATA frames.

With reactor-netty's Http2MultiplexHandler in the parent pipeline, all application H2 traffic (request/response HEADERS + DATA) is demuxed onto per-stream child channels, so it never reaches the parent's channelRead. Result: a connection actively serving thousands of QPS still appeared "idle" to the PING handler, which then sent unnecessary PINGs every interval. Under load this added measurable latency overhead (visible as a P95 regression in benchmarking against v4.80.0 with thin-client enabled).

Fix

New package-private AttributeKey<AtomicLong> LAST_CHILD_STREAM_READ_ACTIVITY_NANOS seeded on the parent channel in Http2PingHandler.handlerAdded.
Http2PingCloseRewrapHandler (already installed in every child stream's pipeline) overrides channelReadComplete to stamp the parent attribute via accumulateAndGet(now, Math::max) — once per read cycle, monotonic, not per-frame.
Http2PingHandler.maybeSendPing now reads effectiveLastReadActivityNanos = max(lastReadActivityNanos, child attr) for both the idle-threshold check and the "outstanding PING + activity resumed" early-return: if child-stream reads arrive after an outstanding PING, clear pingOutstandingSinceNanos and reset consecutiveFailures (defense against middleboxes that drop PING-ACK but pass application H2 frames).
Names use lastReadActivity / "read activity" throughout — both signals are inbound reads, never writes; the naming documents the semantics precisely.

Tests

Two new unit tests in Http2PingHandlerTest:

childStreamReadActivity_suppressesIdlePing — stale parent timestamp + fresh child-stream attribute → no PING is sent.
childStreamReadActivityAfterOutstandingPing_resetsFailureState — outstanding PING + stale parent + fresh child-stream attribute → pingOutstandingSinceNanos cleared, consecutiveFailures reset, no new PING.

Both bypass real time via reflection on package-private fields and verify state transitions only.

Additionally validated end-to-end with Http2PingKeepaliveTest (Linux tc netem / iptables network-fault test) against a live thin-client Cosmos account in Docker: passed (1 test, 0 failures, 57.98s).

Why this isn't covered by the merged PR

PR #49095 installs the PING handler correctly and detects truly idle connections. It only mis-attributes activity when Http2MultiplexHandler is present in the pipeline — which is always the case for reactor-netty's HTTP/2 client. The defect manifests only under sustained load (where the connection is busy enough that the PING is pure overhead, not a probe).

Copilot

Pull request overview

This PR fixes Cosmos DB’s manual HTTP/2 keepalive so the parent-channel Http2PingHandler correctly treats child-stream HTTP/2 traffic (HEADERS/DATA) as read activity when determining idleness, avoiding unnecessary PINGs (and potential spurious failure accumulation) under sustained load with Http2MultiplexHandler.

Changes:

Track child-stream inbound read activity via a parent-channel AttributeKey<AtomicLong> updated from each child stream’s channelReadComplete.
Update the PING state machine to use an “effective last read activity” (max of parent-pipeline reads and child-stream reads), including clearing an outstanding PING when post-PING child-stream traffic proves the connection is healthy.
Add unit tests validating child-stream activity suppresses idle PINGs and resets outstanding-PING failure state.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2PingHandler.java	Adds child-stream read activity tracking and updates idle/outstanding-PING logic to use effective read activity.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2PingCloseRewrapHandler.java	Stamps parent-channel child-read activity once per child read cycle via `channelReadComplete`.
sdk/cosmos/azure-cosmos/CHANGELOG.md	Documents the behavioral fix and its impact under load.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/http/Http2PingHandlerTest.java	Adds unit tests covering the child-stream activity behavior and state transitions.

jeet1995 · 2026-06-07T02:09:44Z

@sdkReviewAgent

…ity, not just parent-pipeline frames Follow-up to Azure#49095. Root cause: Http2PingHandler is installed on the parent (TCP) channel and only updated lastReadActivityNanos from parent-pipeline channelRead -- which sees PING ACK / SETTINGS / GOAWAY but NOT HEADERS / DATA frames. With reactor-netty's Http2MultiplexHandler in the parent pipeline, all application H2 traffic is demuxed onto per-stream child channels, so a connection serving thousands of QPS still looked idle to the PING handler, causing unnecessary PINGs and measurable P95 overhead under load. Fix: - New package-private AttributeKey<AtomicLong> LAST_CHILD_STREAM_READ_ACTIVITY_NANOS seeded on the parent channel in handlerAdded. - Http2PingCloseRewrapHandler (already installed in every child stream's pipeline) overrides channelReadComplete to stamp the parent attribute via accumulateAndGet(now, Math::max) -- once per read cycle, monotonic. - Http2PingHandler.maybeSendPing now uses effectiveLastReadActivityNanos = max(field, child attr) for both the idle-threshold check AND the outstanding-PING early-return: if child-stream reads arrived after an outstanding PING, clear pingOutstandingSinceNanos and reset consecutiveFailures (defense against middleboxes that drop PING-ACK but pass application H2 frames). - Names use lastReadActivity / read activity -- both signals are inbound reads, never writes. Tests: two new unit tests in Http2PingHandlerTest covering the suppression path and the outstanding-PING reset path; both bypass real time via reflection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jeet1995 · 2026-06-07T02:55:53Z

/azp run java - cosmos - tests

azure-pipelines · 2026-06-07T02:56:17Z

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 · 2026-06-07T03:15:33Z

@sdkReviewAgent

@sharable

…equest The previous .doOnRequest() chained onto the HTTP/2 HttpClient builder wrapped every outgoing request's Mono in a Reactor operator. That paid per-request costs even when the rewrap handler was already installed: * MonoPeekTerminal wrap + per-request subscriber allocation (InternalMonoOperator.subscribe +2.60% in alloc profile) * per-request BiConsumer lambda invocation (Http2Pool\$\...run +1.16% in alloc profile) * per-request channel pipeline String-key walk via pipeline.get(...) Long-run async-profiler diff vs v4.80.0 on D2 attributed the PR's -7.6% QPS / +14.5% mean / +166% tail-max regression primarily to a +10.2% allocation rate -- with the operators above dominating the delta. Move the install into the existing .doOnConnected() block, which already runs once per HTTP/2 child stream (the customHeaderCleaner install right above it relies on the same contract). The rewrap handler is @sharable, so installing once per stream is correct and amortizes the install + pipeline-walk away from the hot path. The ch.parent() != null guard is kept so the install only targets child streams, never the parent TCP channel (which uses Http2ParentChannelExceptionHandler instead). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-06-08T05:51:29Z

                    // stack maps that to GATEWAY_HTTP2_PING_TIMEOUT_CHANNEL_CLOSED so
                    // ClientRetryPolicy can suppress region mark-down.
                    //
+                    // doOnConnected fires once per child stream channel (the existing


🔴 Blocking — Lifecycle Mismatch: doOnConnected does not fire for H2 child streams — handler is never installed

This comment claims doOnConnected fires once per child stream channel, but it contradicts the comment 87 lines earlier in the same file (lines 152–154):

"For H2, doOnConnected fires on the parent TCP channel when the connection is first established (State.CONFIGURED). Child stream channels emit STREAM_CONFIGURED, which does not trigger this callback."

In reactor-netty, ClientTransportDoOn.onStateChange matches only State.CONFIGURED. H2 child streams emit HttpClientState.STREAM_CONFIGURED — a completely different enum value that never reaches this callback. When doOnConnected fires on the parent TCP channel, connection.channel() is the parent, and ch.parent() at line 251 is null. The guard ch.parent() != null evaluates to false, and Http2PingCloseRewrapHandler is never installed on any child stream.

Two failures result:

The core fix is nullified. Http2PingCloseRewrapHandler.channelReadComplete never runs on child streams → LAST_CHILD_STREAM_READ_ACTIVITY_NANOS is never updated beyond its seed value → effectiveLastReadActivityNanos() returns only the stale parent-pipeline timestamp → connections serving thousands of QPS still appear "idle" → spurious PINGs continue on every interval. This is exactly the bug the PR is supposed to fix.

Existing rewrap feature regresses. Http2PingCloseRewrapHandler.channelInactive never fires on child streams → PING-timeout closes produce bare PrematureCloseException instead of Http2PingTimeoutChannelClosedException → ClientRetryPolicy cannot suppress region mark-down (regression from PR Add HTTP/2 PING for broken connection detection. #49095).

The claim that "the existing customHeaderCleaner install above relies on the same contract" is also incorrect — customHeaderCleaner installs on the parent pipeline (after reactor.left.httpCodec, which is the H2 frame codec on the parent) and works correctly precisely because it runs on the parent.

Suggested fix: Revert commit 26f1615f914 and keep the handler installation in doOnRequest. The per-request overhead is a single HashMap.get() + addFirst() call for a @Sharable singleton — negligible compared to the HTTP request itself. If the goal is to avoid per-request overhead, use reactor.netty.http.client.HttpClient.observe(ConnectionObserver) with a custom observer that matches both CONFIGURED and STREAM_CONFIGURED.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

xinlian12 · 2026-06-08T05:51:29Z

    }

+    @Test(groups = "unit")
+    public void childStreamReadActivity_suppressesIdlePing() throws Exception {


🟡 Recommendation — Test Gap: New tests bypass Http2PingCloseRewrapHandler.channelReadComplete entirely

Both new tests (childStreamReadActivity_suppressesIdlePing and childStreamReadActivityAfterOutstandingPing_resetsFailureState) set the LAST_CHILD_STREAM_READ_ACTIVITY_NANOS attribute directly via AtomicLong.set(). They never exercise the actual production code path in Http2PingCloseRewrapHandler.channelReadComplete() — the channelReadComplete → parent.attr(...).get().accumulateAndGet(System.nanoTime(), Math::max) path.

This means:

If channelReadComplete used the wrong attribute key, set(0) instead of accumulateAndGet(now, max), or accidentally omitted the super.channelReadComplete(ctx) call, all tests would still pass.

The monotonic guarantee (Math::max) for concurrent child streams is never verified.

Combined with the doOnConnected wiring issue above, the entire child-stream-to-parent signaling path has zero automated verification.

Consider adding a test that constructs a parent EmbeddedChannel (with Http2PingHandler) and a child EmbeddedChannel (with Http2PingCloseRewrapHandler.INSTANCE), fires channelReadComplete on the child, and asserts the parent's LAST_CHILD_STREAM_READ_ACTIVITY_NANOS attribute advanced.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

xinlian12 · 2026-06-08T05:51:29Z

+     */
+    private long effectiveLastReadActivityNanos(ChannelHandlerContext ctx) {
+        AtomicLong childReadActivity = ctx.channel().attr(LAST_CHILD_STREAM_READ_ACTIVITY_NANOS).get();
+        long childNanos = childReadActivity != null ? childReadActivity.get() : 0L;


🟢 Suggestion — Defensive fallback 0L is not a valid nanoTime sentinel

When childReadActivity is null (defensive path — should never happen since handlerAdded seeds it), the fallback 0L is not a valid System.nanoTime() value. The Java spec allows nanoTime() to return negative values (arbitrary origin). If the JVM's nanoTime origin is negative:

lastReadActivityNanos could be e.g. -500_000_000

Math.max(-500_000_000, 0L) = 0L — a timestamp in the "future" relative to the current clock

idleNanos = now - 0L would be negative → the connection would never appear idle → PINGs would never fire on that connection

The null path is essentially unreachable in practice, and the fix is trivial:

long childNanos = childReadActivity != null ? childReadActivity.get() : lastReadActivityNanos;

This makes the null case a pure no-op (falls back to parent-only behavior) rather than introducing a foreign sentinel.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

xinlian12 · 2026-06-08T05:51:29Z

+        // activity fix to prevent rapid-fire PING storms after write failure) does not
+        // block the second send within the same pingInterval.
+        setField(handler, "lastReadActivityNanos", longAgo);
+        setField(handler, "lastPingSendNanos", longAgo);


🟡 Recommendation — Test Gap: lastPingSendNanos send-throttle is tested by bypassing it, never by triggering it

The lastPingSendNanos field exists to prevent rapid-fire PING storms after write failure. This test proves the multi-tick failure path, but explicitly rewinds lastPingSendNanos to longAgo to bypass the throttle on the second tick (the comment at line 231-233 even acknowledges this).

No test ever asserts the converse: that a second call to maybeSendPing within the same pingIntervalNanos window (without rewinding lastPingSendNanos) does NOT send a second PING.

If lastPingSendNanos were accidentally removed from the idleAnchor computation at line 264 (long idleAnchor = Math.max(effectiveLastReadActivityNanos(ctx), lastPingSendNanos)), the PING-storm defense would disappear and all tests would still pass.

Consider adding a writeFailure_throttlesRapidResend test:

First tick: write fails, consecutiveFailures = 1, lastPingSendNanos = now

Rewind lastReadActivityNanos and child attr to stale, but do not rewind lastPingSendNanos

Second tick: assert pingsSent and consecutiveFailures did not advance (throttle held)

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

xinlian12 · 2026-06-08T05:51:45Z

✅ Review complete (50:38)

Posted 4 inline comment(s).

_{Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage}

Copilot AI review requested due to automatic review settings June 7, 2026 01:55

jeet1995 requested review from a team and kirankumarkolli as code owners June 7, 2026 01:55

Copilot started reviewing on behalf of jeet1995 June 7, 2026 01:55 View session

github-actions Bot added the Cosmos label Jun 7, 2026

Copilot AI reviewed Jun 7, 2026

View reviewed changes

jeet1995 force-pushed the users/jeet1995/http2-ping-child-stream-activity branch from c85d48c to 55ed09e Compare June 7, 2026 02:31

jeet1995 mentioned this pull request Jun 7, 2026

Gate HTTP/2 PING failure-based connection close exception rewraps to PING enablement flag. #49403

Merged

jeet1995 changed the title ~~fix(cosmos): HTTP/2 PING handler must observe child-stream read activity, not just parent-pipeline frames~~ [WIP]: fix(cosmos): HTTP/2 PING handler must observe child-stream read activity, not just parent-pipeline frames Jun 7, 2026

xinlian12 reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: fix(cosmos): HTTP/2 PING handler must observe child-stream read activity, not just parent-pipeline frames#49398

[WIP]: fix(cosmos): HTTP/2 PING handler must observe child-stream read activity, not just parent-pipeline frames#49398
jeet1995 wants to merge 2 commits into
Azure:mainfrom
jeet1995:users/jeet1995/http2-ping-child-stream-activity

jeet1995 commented Jun 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeet1995 commented Jun 7, 2026

Uh oh!

jeet1995 commented Jun 7, 2026

Uh oh!

azure-pipelines Bot commented Jun 7, 2026

Uh oh!

jeet1995 commented Jun 7, 2026

Uh oh!

xinlian12 Jun 8, 2026

Uh oh!

xinlian12 Jun 8, 2026

Uh oh!

xinlian12 Jun 8, 2026

Uh oh!

xinlian12 Jun 8, 2026

Uh oh!

xinlian12 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeet1995 commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root cause

Fix

Tests

Why this isn't covered by the merged PR

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeet1995 commented Jun 7, 2026

Uh oh!

jeet1995 commented Jun 7, 2026

Uh oh!

azure-pipelines Bot commented Jun 7, 2026

Uh oh!

jeet1995 commented Jun 7, 2026

Uh oh!

xinlian12 Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

xinlian12 Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

xinlian12 Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

xinlian12 Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

xinlian12 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeet1995 commented Jun 7, 2026 •

edited

Loading