Skip to content

[WIP] Default thin-client to enabled and add proxy connectivity-probe gate#49437

Open
jeet1995 wants to merge 6 commits into
Azure:mainfrom
jeet1995:jeet1995/thin-client-probe-flow
Open

[WIP] Default thin-client to enabled and add proxy connectivity-probe gate#49437
jeet1995 wants to merge 6 commits into
Azure:mainfrom
jeet1995:jeet1995/thin-client-probe-flow

Conversation

@jeet1995

Copy link
Copy Markdown
Member

Summary

Adds an EndpointOrchestrator that fans out POST /connectivity-probe to every thin-client regional endpoint after each topology refresh. The SDK only routes data-plane traffic through thin-client (Gateway V2) when all regional probes return HTTP 200 across N consecutive refresh cycles; otherwise traffic falls back to Gateway V1 at the next refresh boundary. No mid-flight fallback.

COSMOS.THINCLIENT_ENABLED now defaults to true. The new probe gate makes that safe by closing thin-client routing automatically if the proxy fleet is unreachable.

Gating caveats

  • Direct mode: probe is not wired at all.
  • HTTP/2 required: probe is not wired unless Http2ConnectionConfig is configured and effectively enabled.
  • Metadata / QueryPlan / AllVersionsAndDeletes: continue to route through Compute Gateway (Gateway V1) via the existing useThinClientStoreModel predicate.
  • Init-safe: probe wiring + trigger are guarded with try/catch and fire-and-forget, so a probe issue can never trip CosmosClient initialization or fail a topology refresh.
  • Close-safe: EndpointOrchestrator implements Closeable and is closed from GlobalEndpointManager.close(); no further probes are issued after client shutdown.

Configuration

System property Default Notes
COSMOS.THINCLIENT_ENABLED true (was false) Master opt-out.
COSMOS.THINCLIENT_PROBE_ENABLED true Per-cycle bypass; orchestrator stays optimistic when off.
COSMOS.THINCLIENT_PROBE_FAILURE_THRESHOLD 2 Consecutive RED cycles before flipping unhealthy.
COSMOS.THINCLIENT_PROBE_PATH /connectivity-probe

Tests

  • 8 unit tests for EndpointOrchestrator (hysteresis, RED/GREEN flips, no-op gates).
  • 9 Configs tests for the new properties (parse, fallback, invalid input).
  • 5 new ThinClientProbeWiringTests for GEM integration (probe fires on refresh, healthy default, threshold flip, region discovery via LocationCache).
  • All 44 unit tests pass with mvn -Punit on azure-cosmos-tests.
  • Existing ThinClientE2ETest continues to pass against a live multi-region thin-client account.

Changelog

Single entry added under 4.81.0-beta.1 -> Other Changes.

…robe gate

Adds an EndpointOrchestrator that fans out POST /connectivity-probe to
every thin-client regional endpoint after each topology refresh. SDK
only routes data-plane traffic to thin-client (Gateway V2) when all
regional probes succeed across N consecutive refresh cycles
(configurable via COSMOS.THINCLIENT_PROBE_FAILURE_THRESHOLD, default 2);
otherwise traffic falls back to Gateway V1 at the next refresh
boundary. No mid-flight fallback.

Caveats:
- Probe wiring is skipped entirely for Direct mode and when HTTP/2 is
  not configured; controlled by RxDocumentClientImpl.useThinClient.
- QueryPlan, metadata reads, and AllVersionsAndDeletes change feed
  continue to route through Compute Gateway (Gateway V1).
- Probe failures are absorbed inside the orchestrator and the trigger
  is fire-and-forget on the GEM scheduler, so probe issues can never
  trip CosmosClient initialization or fail a topology refresh.
- EndpointOrchestrator implements Closeable and is closed from
  GlobalEndpointManager.close() so no further probes are issued after
  client shutdown.

THINCLIENT_ENABLED now defaults to true; opt out via
COSMOS.THINCLIENT_ENABLED=false or COSMOS.THINCLIENT_PROBE_ENABLED=false.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 10, 2026 01:32
@jeet1995 jeet1995 requested review from a team and kirankumarkolli as code owners June 10, 2026 01:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes thin-client (Gateway V2) routing default-on and adds an HTTP/2 connectivity-probe gate that periodically validates thin-client regional endpoints after topology refreshes, falling back to Gateway V1 when the proxy fleet is deemed unhealthy.

Changes:

  • Default COSMOS.THINCLIENT_ENABLED to true and introduce new probe-related configuration knobs in Configs.
  • Add EndpointOrchestrator and wire it into GlobalEndpointManager refresh flows; gate thin-client routing via isProxyProbeHealthy().
  • Add unit/integration-style tests covering orchestrator behavior, config parsing, and GEM wiring.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java Wires thin-client HttpClient into GEM during init; adds probe-health condition to thin-client routing predicate.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/LocationCache.java Exposes thin-client regional endpoints from the latest topology snapshot for probe fan-out.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GlobalEndpointManager.java Hosts and triggers the probe orchestrator after refreshes; exposes probe health/diagnostics; closes orchestrator on shutdown.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/EndpointOrchestrator.java New component that executes per-region POST /connectivity-probe and maintains hysteresis-based health state.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Configs.java Flips thin-client default to enabled and adds probe enable/threshold/path configuration accessors.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the thin-client default flip and the new probe gating behavior.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ThinClientProbeWiringTests.java Verifies GEM wiring, probe triggering on refresh, and endpoint discovery via LocationCache.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/EndpointOrchestratorTests.java Unit tests for orchestrator hysteresis, error handling, feature flag no-op, and request targeting.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ConfigsTests.java Adds coverage for new probe properties and updates thin-client default expectation.

Comment thread sdk/cosmos/azure-cosmos/CHANGELOG.md Outdated
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)

#### Other Changes
* Defaulted `COSMOS.THINCLIENT_ENABLED=true` and added an HTTP/2 connectivity-probe (`EndpointOrchestrator`) that gates thin-client (Gateway V2) data-plane routing on per-region probe health; thin-client only activates when probes are green for all regional endpoints across N consecutive topology refresh cycles, otherwise traffic falls back to Gateway V1.
Comment on lines +447 to +474
private void triggerThinClientProbeCycle() {
try {
EndpointOrchestrator orchestrator = this.thinClientProbeOrchestrator.get();
if (orchestrator == null) {
return;
}
if (!this.hasThinClientReadLocations.get()) {
return;
}
Set<URI> endpoints = this.locationCache.getThinClientRegionalEndpoints();
if (endpoints.isEmpty()) {
return;
}
// Fire-and-forget: probe runs out-of-band on the global endpoint manager
// scheduler. Failures are absorbed inside runProbeCycle and reflected in the
// orchestrator's internal state, which is consulted at the next routing decision.
// We additionally guard against any synchronous throw here so a probe issue
// can never trip CosmosClient initialization or a topology refresh.
orchestrator
.runProbeCycle(endpoints)
.subscribeOn(CosmosSchedulers.GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTIC)
.subscribe(
healthy -> {
if (logger.isDebugEnabled()) {
logger.debug("Thin-client probe cycle completed; proxyHealthy={}", healthy);
}
},
t -> logger.debug("Thin-client probe cycle subscription error", t));
Comment on lines +104 to +112
// Now swap to a green client and run another cycle on a fresh orchestrator that already saw a red.
Map<URI, Integer> greenByEndpoint = new HashMap<>();
greenByEndpoint.put(REGION_EAST, 200);
EndpointOrchestrator greenOrchestrator = new EndpointOrchestrator(mockClient(greenByEndpoint, new AtomicInteger(), false));

// Drive greenOrchestrator into the unhealthy state manually by replaying a red first.
Map<URI, Integer> redOnly = new HashMap<>();
redOnly.put(REGION_EAST, 503);
EndpointOrchestrator combo = new EndpointOrchestrator(toggleClient(REGION_EAST, 503, 200));
# Conflicts:
#	sdk/cosmos/azure-cosmos/CHANGELOG.md
@jeet1995 jeet1995 changed the title [Cosmos] Default thin-client to enabled and add HTTP/2 connectivity-probe gate [WIP] Default thin-client to enabled and add proxy connectivity-probe gate Jun 10, 2026
@jeet1995

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@jeet1995

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995

Copy link
Copy Markdown
Member Author

@sdkReviewAgent

@jeet1995 jeet1995 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Thin Client Probe Flow

Great work on wiring up the thin-client connectivity probe. I've reviewed the design and implementation, specifically looking for Reactor/Netty lifecycles, concurrency, and API contracts.

Here are a few important findings:

*1. [Resource Lifecycle / Memory Leak] Dangling Subscription in \EndpointOrchestrator*
File: \sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/EndpointOrchestrator.java, Line 204
By using .map\ and a fire-and-forget .subscribe()\ on
esponse.body(), the buffer draining process escapes the parent \Mono\ lifecycle. If the proxy trickles data or hangs on sending the body, this background task will consume resources indefinitely without backpressure or timeout because it's detached from the \send's \perProbeTimeout.
Recommendation: Use .flatMap\ instead to properly chain the asynchronous draining, and apply a timeout to the body draining phase:
\\java
.flatMap(response -> {
int status = response.statusCode();
boolean ok = status == 200;
if (!ok) {
logger.debug("Thin-client probe to {} returned status {}", regionalEndpoint, status);
}

// Drain body so reactor-netty releases the underlying buffer, tied to the Reactor lifecycle
return response.body()
    .doOnNext(buf -> {
        if (buf != null) {
            buf.release();
        }
    })
    .ignoreElement()
    .timeout(this.perProbeTimeout) // Prevent slow-draining bodies from hanging the cycle indefinitely
    .doFinally(s -> safeClose(response))
    .thenReturn(new ProbeResult(regionalEndpoint, ok, "status:" + status))
    .onErrorResume(t -> Mono.just(new ProbeResult(regionalEndpoint, ok, "status:" + status)));

})
\\

2. [Resource Lifecycle] Uncancelled In-Flight Probe Cycles
File: \sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GlobalEndpointManager.java, Line 466
The \orchestrator.runProbeCycle(endpoints).subscribe(...)\ returns a \Disposable\ which is not stored or tracked. If \GlobalEndpointManager.close()\ is called while a probe cycle is in-flight, the cycle will not be cancelled.
Recommendation: Store the \Disposable\ in an \AtomicReference\ and explicitly .dispose()\ it during \close(), and also before starting a new cycle to prevent overlapping probes if topology refreshes rapidly.

3. [Documentation / Consistency] Optimistic vs Pessimistic Startup
File: \sdk/cosmos/azure-cosmos/CHANGELOG.md, Line 27
The changelog wording implies a pessimistic startup constraint: "only activates when probes are green... across N consecutive topology refresh cycles". However, \EndpointOrchestrator\ implements an optimistic startup (\proxyHealthy\ defaults to \ rue\ and trips to \ alse\ after N \RED\ cycles), and only requires 1 \GREEN\ cycle to restore health.
Recommendation: Update the changelog to accurately reflect the optimistic startup and fallback logic so customers understand that thin-client activates immediately on SDK init.

Nit: In \LocationCache.getThinClientRegionalEndpoints(), reading \ his.locationInfo\ outside a synchronized block is technically a stale read if another thread updates it, though it's safe here because \GlobalEndpointManager\ calls it from within the write lock.

if (orchestrator == null) {
return;
}
if (!this.hasThinClientReadLocations.get()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Correctness: hasThinClientReadLocations and getThinClientRegionalEndpoints() can disagree, bypassing probe safety net

if (!this.hasThinClientReadLocations.get()) {
    return;
}
Set<URI> endpoints = this.locationCache.getThinClientRegionalEndpoints();
if (endpoints.isEmpty()) {
    return;
}

hasThinClientReadLocations is set from the raw databaseAccount.getThinClientReadableLocations() (line 493), while getThinClientRegionalEndpoints() reads from LocationCache.availableReadRegionalRoutingContextsByRegionName — which requires region-name matching between thin-client and gateway locations. If the names don't match (e.g., normalization difference), the thin-client endpoint is silently dropped from the LocationCache map.

When they disagree: hasThinClientReadLocations=true → routing gate passes in useThinClientStoreModel(). But endpoints.isEmpty()=true → no probe fires → proxyHealthy stays at its optimistic default true. The probe safety net is completely bypassed.

Failure scenario: Service returns a thin-client region name that doesn't match the normalized gateway region key. The LocationCache.updateLocationCache() code that sets RegionalRoutingContext.setThinclientRegionalEndpoint() catches and logs the NPE (line 1055-1062) but the endpoint is lost.

Suggestion: Either derive hasThinClientReadLocations from locationCache.getThinClientRegionalEndpoints().isEmpty() (single source of truth), or add a safeguard in triggerThinClientProbeCycle() that marks the probe unhealthy when hasThinClientReadLocations=true but endpoints are empty.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

private boolean useThinClientStoreModel(RxDocumentServiceRequest request) {
if (!useThinClient
|| !this.globalEndpointManager.hasThinClientReadLocations()
|| !this.globalEndpointManager.isProxyProbeHealthy()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Test Coverage: No test proves the routing fallback when probe is unhealthy

|| !this.globalEndpointManager.isProxyProbeHealthy()

This is the most critical behavioral change in the PR — the probe health gates whether data-plane traffic routes to thin-client or falls back to Gateway V1. But no test verifies the routing consequence: that when isProxyProbeHealthy() returns false, useThinClientStoreModel() returns false and getStoreProxy() returns gatewayProxy instead of thinProxy.

The existing tests verify the probe state machine (health flag flips) and the GEM integration (probe fires on refresh), but stop short of proving the downstream routing behavior. If someone later refactors the condition order in useThinClientStoreModel() or accidentally removes this check, no test would catch it.

Suggestion: Add a test (e.g., in a new ThinClientRoutingGateTests or extending ThinClientProbeWiringTests) that:

  1. Wires a GEM with a stubbed orchestrator returning all-503.
  2. Drives the probe to unhealthy (cross threshold).
  3. Asserts useThinClientStoreModel(documentPointReadRequest) returns false.
  4. Restores probe to healthy and asserts it returns true again.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

*/
public Set<URI> getThinClientRegionalEndpoints() {
UnmodifiableMap<String, RegionalRoutingContext> byRegion =
this.locationInfo.availableReadRegionalRoutingContextsByRegionName;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Coverage Gap: Only read-region thin-client endpoints are probed

UnmodifiableMap<String, RegionalRoutingContext> byRegion =
    this.locationInfo.availableReadRegionalRoutingContextsByRegionName;

This method collects thin-client endpoints exclusively from the read regional routing contexts. However, useThinClientStoreModel() routes writes (point operations, batch) through thin-client too. If a write-only region has a separate thin-client endpoint (stored in availableWriteRegionalRoutingContextsByRegionName), that endpoint is never probed.

Consequence: If the write-region thin-client proxy goes down but read-region probes pass, isProxyHealthy() stays true and write traffic continues routing to the dead proxy.

This may be dormant in practice if Cosmos DB always returns the same regions for both read and write thin-client locations. But the gap exists conceptually, and adding write-region endpoints to the probe set would be a straightforward defense-in-depth improvement.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

}

private static HttpResponse stubResponse(int status) {
ByteBuf empty = Unpooled.EMPTY_BUFFER;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Test Quality: Unpooled.EMPTY_BUFFER singleton causes silent body-drain failures

ByteBuf empty = Unpooled.EMPTY_BUFFER;
return new HttpResponse() {
    ...
    @Override public Mono<ByteBuf> body() { return Mono.just(empty); }
    ...
};

Unpooled.EMPTY_BUFFER is a global singleton with refCnt=1. The production code calls buf.release() on it. After the first probe endpoint drains the body, refCnt drops to 0. Every subsequent buf.release() (for additional probe endpoints in the same test, or across tests) throws IllegalReferenceCountException, which is silently swallowed by the t -> {} error handler.

This means the body-drain path is exercised correctly only for the first probe in the first test — all subsequent body drains silently fail. The tests still pass because assertions only check the health flag, not body-drain behavior.

In production, ReactorNettyHttpResponse.body() on an empty response returns Mono.empty() (via ByteBufFlux.fromInbound().aggregate() on an empty stream), so onNext never fires and buf.release() is never called.

Suggestion: Return Mono.empty() from body() to match real HTTP/2 empty-body behavior:

@Override public Mono<ByteBuf> body() { return Mono.empty(); }

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

this.lastFailedEndpoints.set(Collections.unmodifiableSet(failedEndpoints));

if (cycleGreen) {
int prior = this.consecutiveFailures.getAndSet(0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Correctness: Asymmetric recovery can cause oscillation under flapping

int prior = this.consecutiveFailures.getAndSet(0);
...
this.proxyHealthy.set(true);

The probe requires N consecutive RED cycles (default 2) to flip to unhealthy, but only 1 GREEN cycle to immediately restore health. This asymmetry creates an oscillation risk if a region is flapping (intermittently returning 200/503).

Scenario: With threshold=2, a flapping region produces: RED → RED → flip unhealthy → GREEN → restore healthy → RED → RED → flip unhealthy → GREEN → restore… The routing target flip-flops at every refresh boundary that happens to land on a GREEN probe.

The Rust Cosmos SDK addresses this with jittered failback — after marking unhealthy, it transitions through a ProbeCandidate state with a recovery cooldown before fully restoring health.

Suggestion: Consider either:

  • Requiring M consecutive GREEN cycles to restore (matching the RED threshold), or
  • Adding a minimum cooldown duration before re-enabling (e.g., if (Instant.now().isAfter(lastFailureAt + cooldown)))

This would prevent rapid routing oscillation while still allowing recovery.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

logger.debug("Thin-client probe to {} returned status {}", regionalEndpoint, status);
}
// Drain body so reactor-netty releases the underlying buffer.
response.body()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Resource Handling: Detached body drain with inert cleanup

response.body()
    .doFinally(s -> safeClose(response))
    .subscribe(buf -> { if (buf != null) buf.release(); }, t -> { });
return new ProbeResult(regionalEndpoint, ok, "status:" + status);

This creates a detached subscription inside .map() — the ProbeResult is returned to the outer Mono immediately, while the body drain runs independently as an orphaned reactive chain. Two issues:

  1. safeClose(response) is a no-op. HttpResponse.close() is an empty method (line 108-110 of HttpResponse.java), and ReactorNettyHttpResponse does not override it. The doFinally provides zero cleanup — it only gives a false sense of safety.

  2. SDK convention is to compose body consumption into the main chain. Across the codebase, body consumption is always part of the reactive pipeline: ClientTelemetry.java:232 uses .flatMap(HttpResponse::bodyAsString), HttpClientUtils.java:42 uses httpResponse.bodyAsString(), ErrorUtils.java:20 chains bodyAsString(). This detached subscribe diverges from that pattern.

For an empty probe response, this is likely benign in practice. But if the error handler t -> {} fires (e.g., connection reset mid-body), the error is silently swallowed AND safeClose does nothing — the HTTP/2 stream cleanup is entirely deferred to reactor-netty internals.

Suggestion: Compose the body drain into the returned Mono:

return this.httpClient
    .send(request, this.perProbeTimeout)
    .flatMap(response -> {
        int status = response.statusCode();
        boolean ok = status == 200;
        return response.bodyAsString()
            .defaultIfEmpty("")
            .map(ignored -> new ProbeResult(regionalEndpoint, ok, "status:" + status));
    })

This matches SDK conventions and ensures the body is drained before the ProbeResult is emitted.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (49:43)

Posted 6 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

@jeet1995

Copy link
Copy Markdown
Member Author

Deep review findings for PR #49437:

  1. [Thread safety / failover] GlobalEndpointManager.java:465 + EndpointOrchestrator.java:249 — stale probe cycles can overwrite newer health. Severity: Blocking. triggerThinClientProbeCycle() starts an independent fire-and-forget subscription after each refresh, while applyCycleResult() unconditionally mutates proxyHealthy and consecutiveFailures. Force refreshes from ClientRetryPolicy.refreshLocation(...)->refreshLocationAsync(null, forceRefresh) can overlap, so a slow older RED can mark the proxy unhealthy after a newer GREEN, or a slow older GREEN can reset failures after a newer RED. This directly risks eager fallback to Gateway V1 or missed fallback from thin-client. Fix with a monotonic probe generation, serialized/switch-latest probe execution, or stale-result rejection; add tests where cycle 1 is delayed and cycle 2 completes first. Source grounding: Java Concurrency in Practice, §2.2.3 “Race Conditions”.

  2. [Resource lifecycle / Reactor] GlobalEndpointManager.java:465-474 + EndpointOrchestrator.java:155-164 — probe subscriptions are not retained or cancelled on close. Severity: Blocking. GEM stores/disposes the background refresh Disposable, but not the probe Disposable; EndpointOrchestrator.close() only flips a flag and explicitly allows in-flight probes to keep running and update state. That leaves request work alive after client close and can race with RxDocumentClientImpl closing the shared HttpClient. Fix by retaining probe disposables in a Disposables.composite()/swap() and disposing them from close(), or by making the probe lifecycle owned by a serialized stream. Source grounding: Project Reactor Reference, “Cancelling a subscribe() with Its Disposable”.

  3. [Reactor / ByteBuf lifecycle] EndpointOrchestrator.java:195-207 — response-body drain is detached from the probe Mono. Severity: Blocking. The code returns the probe result from send(...).map(...) and starts response.body() with a nested subscribe() whose Disposable is discarded. A probe can mark GREEN before the body is drained/closed, and a hung/delayed body cannot be cancelled by the outer probe or GEM close. Compose the drain into the returned Mono instead, e.g. flatMap(response -> response.body().doOnNext(release).thenReturn(result).doFinally(close)), and test with a delayed or Mono.never() body. Source grounding: Project Reactor Reference, “Using Resources and the Finally Block” (doFinally/using), and Reactor Netty HTTP Client Reference, §4 “Consuming Data”.

  4. [Failover gating / default-on behavior] Configs.java:54,66,136, EndpointOrchestrator.java:68, RxDocumentClientImpl.java:9003-9018 — default-on thin-client can route before any successful probe. Severity: Blocking/Watch depending intended contract. COSMOS.THINCLIENT_ENABLED now defaults true, proxyHealthy starts true, and the first probe is async/fire-and-forget. With the default RED threshold of 2 and background refresh interval of 5 minutes, an unreachable proxy can still receive data-plane traffic before any 200 probe result, and potentially until enough RED cycles complete. If the PR contract is “only use thin-client after successful probes,” start the gate as unknown/unhealthy until first GREEN or block the initial refresh on the probe; otherwise update the PR description/tests to make the optimistic window explicit.

  5. [NPE / routing fallback] LocationCache.java:1055-1056, GlobalEndpointManager.java:491-493, ThinClientStoreModel.java:101-103, RxGatewayStoreModel.java:539-544 — thin-client eligibility can be true while the selected routing context has no thin endpoint. Severity: Watch. hasThinClientReadLocations is based on the raw service list, but LocationCache attaches thin endpoints only when the thin-client location name matches an existing gateway-location key. If that attach is skipped, useThinClientStoreModel() can still route to ThinClientStoreModel; getRootUri() then returns null and the base URI path dereferences it. Fix by basing eligibility/probe endpoints on resolved contexts with non-null thin endpoints, and add a mismatch test such as gateway "East US" with thin-client "eastus" or an unmatched thin-client location.

@jeet1995

Copy link
Copy Markdown
Member Author

Review: thin-client connectivity-probe gate

Deep review focused on Reactor correctness, dangling workers, NPE/IllegalStateException, and missed/eager failover. The defensive coding here is solid — try/catch guards, null-handling, and fire-and-forget isolation mean I found no NPE/ISE/crash bugs in the new paths. The real risks are in the failover timing and correctness semantics, which matter more now that THINCLIENT_ENABLED defaults to true. Two are worth blocking on.


🔴 HIGH-1 — Default-on + optimistic start + 5-min probe cadence + no mid-flight fallback = minutes-long outage window

Configs.java flips the default to true. The gate starts optimistic (proxyHealthy=true) and only flips after failureThreshold (default 2) consecutive RED cycles. Probe cycles fire at topology-refresh boundaries, and the background refresh interval default is 5 min (getUnavailableLocationsExpirationTimeInSeconds()*1000).

When the proxy fleet is dead but the account/regions are healthy, nothing forces a topology refresh (503 paths in ClientRetryPolicy retry across regions; they don't force a refresh). Consequences:

  • Worst-case time-to-fallback ≈ 2 × 5 min ≈ 10 min, during which all Document traffic routes to the dead proxy with no mid-flight fallback.
  • Startup race: the first data-plane requests evaluate isProxyProbeHealthy() == true and route to thin-client before the first async probe cycle (~5s) completes.

Forced refreshes (410/Gone, region failover via ClientRetryPolicy.refreshLocation) can accelerate the flip, so it isn't always 10 min — but that can't be relied on for the "proxy down, DB up" case.

Suggestion: a startup UNKNOWN state that gates thin-client until the first GREEN (bounded timeout), and/or a dedicated faster probe timer decoupled from topology refresh, and/or a lower first-flip threshold.

🔴 HIGH-2 — Hysteresis state machine is neither single-flight nor ordered → eager and missed failover

triggerThinClientProbeCycle() fire-and-forgets a new runProbeCycle on every refresh with no guard against a prior cycle still running. applyCycleResult mutates shared consecutiveFailures / proxyHealthy on completion, and Reactor imposes no ordering across independent subscribe() flows:

  • Eager failover: under a forced-refresh storm, N overlapping RED cycles increment consecutiveFailures by N in quick succession — crossing the threshold faster than "N consecutive refreshes" intends.
  • Missed/flapping failover: a slow stale GREEN completing after a newer RED runs getAndSet(0) + proxyHealthy.set(true), wiping a legitimate RED.

Suggestion: an AtomicBoolean cycleInProgress to skip overlapping triggers, or a monotonic cycle-generation id so stale completions are dropped before applying state.


🟡 MEDIUM — All-or-nothing cycle health is coarse

cycleGreen = successCount == attemptedEndpoints.size(). A single slow/bad region (one 5s probe timeout) marks the whole cycle RED, and the global isProxyProbeHealthy() gate then pulls healthy regions off thin-client too. Consider per-endpoint health gating only the selected endpoint.

⚪ Minor

  • Body-drain anti-pattern: the detached .subscribe() inside .map drains correctly, but safeClose(response) is effectively dead code — ReactorNettyHttpResponse doesn't override close() (base is a no-op). Prefer folding the drain into the chain (.flatMap(r -> r.body()...thenReturn(result))) and using ReferenceCountUtil.safeRelease.
  • Close doesn't cancel in-flight probes: runProbeCycle checks closed.get() at assembly, not subscription, and the fire-and-forget Disposable isn't tracked — a small dangling-work / post-close-mutation window remains. Re-check closed at subscribe, or takeUntilOther(closeSignal).
  • onErrorResume double-apply: if .map(applyCycleResult) ever threw mid-mutation, the fallback calls applyCycleResult again. applyCycleResult is effectively non-throwing, so this is theoretical only.
  • CAS allocation nit: compareAndSet(null, new EndpointOrchestrator(...)) constructs even on CAS failure (once in practice — negligible).
  • Probe request shape: sends only x-ms-thinclient-proxy-operation-type: ConnectivityProbe + empty body, vs. real requests carrying RNTBD framing. Routing is by endpoint host so this is plausibly fine, but worth confirming against the proxy contract / an integration test so probes can't be false-RED.

Good: correct wiring-before-init() ordering; the useThinClient short-circuit keeps non-thin clients zero-overhead; robust null-guards in LocationCache.getThinClientRegionalEndpoints() and applyCycleResult; fast-fail thin-client connect timeout.

@jeet1995 jeet1995 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting these findings as inline comments for visibility.

logger.debug("Thin-client probe to {} returned status {}", regionalEndpoint, status);
}
// Drain body so reactor-netty releases the underlying buffer.
response.body()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resource Lifecycle / Memory Leak] — Detached reactive chain (Dangling subscription).

By using .map and a fire-and-forget .subscribe() on response.body(), the buffer draining process escapes the parent Mono lifecycle. If the proxy trickles data or hangs on sending the body, this background task will consume resources indefinitely without backpressure or timeout because it is detached from the sends perProbeTimeout.

Recommendation: Use .flatMap instead to properly chain the asynchronous draining:

.flatMap(response -> {
    int status = response.statusCode();
    boolean ok = status == 200;
    if (!ok) {
        logger.debug("Thin-client probe to {} returned status {}", regionalEndpoint, status);
    }
    
    return response.body()
        .doOnNext(buf -> {
            if (buf != null) buf.release();
        })
        .ignoreElement()
        .timeout(this.perProbeTimeout)
        .doFinally(s -> safeClose(response))
        .thenReturn(new ProbeResult(regionalEndpoint, ok, "status:" + status))
        .onErrorResume(t -> Mono.just(new ProbeResult(regionalEndpoint, ok, "status:" + status)));
})

Comment thread sdk/cosmos/azure-cosmos/CHANGELOG.md Outdated

#### Other Changes
* Added HTTP/2 PING keepalive (default ON) for Gateway service endpoints to detect silently-broken connections. - See [PR 49095](https://github.com/Azure/azure-sdk-for-java/pull/49095)
* Defaulted `COSMOS.THINCLIENT_ENABLED=true` and added an HTTP/2 connectivity-probe (`EndpointOrchestrator`) that gates thin-client (Gateway V2) data-plane routing on per-region probe health; thin-client only activates when probes are green for all regional endpoints across N consecutive topology refresh cycles, otherwise traffic falls back to Gateway V1.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation / Consistency]

The wording here implies a pessimistic startup constraint: "only activates when probes are green... across N consecutive topology refresh cycles".

However, EndpointOrchestrator implements an optimistic startup (proxyHealthy defaults to true and trips to false after N RED cycles), and only requires 1 GREEN cycle to restore health.

Should update the changelog to accurately reflect the optimistic startup and fallback logic so customers understand that thin-client activates immediately.

// We additionally guard against any synchronous throw here so a probe issue
// can never trip CosmosClient initialization or a topology refresh.
orchestrator
.runProbeCycle(endpoints)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resource Lifecycle] — Uncancelled In-Flight Probe Cycles.

The orchestrator.runProbeCycle(endpoints).subscribe(...) returns a Disposable which is not stored or tracked. If GlobalEndpointManager.close() is called while a probe cycle is in-flight, the cycle will not be cancelled.

Recommendation: Store the Disposable in an AtomicReference<Disposable> and explicitly .dispose() it during close(), and also before starting a new cycle to prevent overlapping probes if topology refreshes rapidly.

…in lifecycle, cancellable in-flight probes

- EndpointOrchestrator: fold body-drain into probe Mono via flatMap+then(perProbeTimeout) so a slow/hanging response body cannot leak resources outside the cycle budget (Copilot #1, deep-review #3, jeet HIGH-2 minor).

- EndpointOrchestrator: add single-flight CAS (cycleInProgress) plus monotonic cycle id; closed-check inside applyCycleResult drops late results so a post-close cycle cannot mutate health state (deep-review #1+#2, jeet HIGH-2).

- EndpointOrchestrator: re-evaluate closed/feature-flag/endpoints at subscription time via Mono.defer so GEM.close() cancellation is honored before any HTTP I/O is issued.

- GlobalEndpointManager: retain probe Disposable in AtomicReference; close() now disposes the in-flight probe subscription so probe work cannot outlive the GEM/CosmosClient (Copilot #2, deep-review #2).

- CHANGELOG: moved entry to unreleased 4.82.0-beta.1, reworded to honestly describe optimistic startup, N=2 RED-to-fallback hysteresis, and Direct-mode/metadata exclusion (Copilot #3, deep-review #4).

Tests: 45 unit tests pass (EndpointOrchestratorTests + ConfigsTests + ThinClientProbeWiringTests).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995

Copy link
Copy Markdown
Member Author

Addressed review feedback in d2ec2f8

Thanks for the deep reviews. Pushed fixes for the convergent blocking issues:

1. Body-drain leak (Copilot #1, deep-review #3)
The detached response.body().subscribe(...) inside .map { ... } is gone. Body drain is now folded into the probe chain via .flatMap { body().doOnNext(release).then(Mono.just(result)).timeout(perProbeTimeout) }, so a slow/hanging body honors the per-probe timeout and cannot leak past the cycle budget.

2. Cancellable in-flight probes on close (Copilot #2, deep-review #2)
GlobalEndpointManager now keeps the probe-cycle subscription in an AtomicReference<Disposable>. close() disposes it and triggerThinClientProbeCycle() swap-and-disposes the prior one. Probe work can no longer outlive CosmosClient.close().

3. Stale / overlapping cycles (deep-review #1, jeet HIGH-2)
EndpointOrchestrator now has an AtomicBoolean cycleInProgress single-flight CAS + monotonic cycleIdSeq. Concurrent topology-refresh triggers no longer accumulate consecutiveFailures faster than one per completed cycle. applyCycleResult also short-circuits when closed is set, so a late-arriving result from an in-flight cycle cannot mutate health state after close. runProbeCycle is wrapped in Mono.defer so the closed / flag / endpoints checks re-evaluate at subscription time, not assembly time.

4. CHANGELOG honesty (Copilot #3, deep-review #4)
Moved the entry into the unreleased 4.82.0-beta.1 section (it was stranded in the now-frozen 4.81.0 block after the upstream merge bumped the version) and reworded to honestly describe: optimistic startup, N=2 consecutive RED cycles → flip to GW v1, single GREEN restores, Direct-mode and metadata/queryplan are excluded by construction.

Tests: 45 unit tests pass (EndpointOrchestratorTests + ConfigsTests + ThinClientProbeWiringTests). Will add explicit single-flight + close-mid-cycle + body-drain-timeout tests in a follow-up.

Deferred (separate PR): the resolved-context NPE risk in LocationCache.hasThinClientReadLocations vs ThinClientStoreModel.getRootUri() (deep-review #5) is a pre-existing latent issue not introduced by this PR; tracking separately so this PR stays scoped to the probe feature.

- LocationCache.getThinClientRegionalEndpoints now walks both read and write region endpoint maps so single-master write-region failures still flip the probe gate.
- EndpointOrchestrator.forceUnhealthy(reason) provides a non-HTTP path to flip the gate; GlobalEndpointManager calls it when topology says thin-client is eligible but no regional endpoint resolves.
- Symmetric hysteresis: new COSMOS.THINCLIENT_PROBE_RECOVERY_THRESHOLD (default 1) so operators can require N consecutive GREEN cycles before flipping back to proxy.
- Extracted RxDocumentClientImpl.useThinClientStoreModel(...) body into package-private static shouldUseThinClientStoreModel for direct unit testability; added ThinClientRoutingGateTests covering 9 routing paths.
- EndpointOrchestratorTests.stubResponse now returns Mono.empty() to avoid Unpooled.EMPTY_BUFFER refCnt underflow across multiple probe calls.
- Removed unused locals; added recoveryThresholdRequiresMultipleGreenCycles, forceUnhealthy_flipsGateToRedWithoutRunningProbe, forceUnhealthy_onClosedOrchestrator_isNoOp tests.

All 57 unit tests in the touched files pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995

Copy link
Copy Markdown
Member Author

Second batch of fixes pushed in 3f1b1be. Per-comment mapping:

Reviewer Comment Fix
@xinlian12 LocationCache.java:150 — getThinClientRegionalEndpoints only walked read regions, so single-master write-region failures wouldn't flip the probe gate Fix A — collectThinClientEndpoints helper now walks both read+write maps
@xinlian12 GlobalEndpointManager.java:463 — silent return when thin-client eligible but no endpoint resolves Fix B — new EndpointOrchestrator.forceUnhealthy(reason) public method; GEM calls it instead of bailing
@xinlian12 EndpointOrchestrator.java:299 — asymmetric recovery (single GREEN flips back after N REDs) Fix C — new COSMOS.THINCLIENT_PROBE_RECOVERY_THRESHOLD (default 1, symmetric with failure threshold)
@xinlian12 EndpointOrchestratorTests.java:239 — stubResponse used Unpooled.EMPTY_BUFFER singleton causing refCnt underflow across cycles Fix D — body now returns Mono.empty()
@xinlian12 RxDocumentClientImpl.java:9006 — routing-gate logic untestable from outside Fix F — extracted package-private static shouldUseThinClientStoreModel(boolean,boolean,boolean,RxDocumentServiceRequest); added ThinClientRoutingGateTests with 9 tests covering all-true, probe-unhealthy fallback, flag-off, no-read-locations, non-Document, query, batch, AllVersionsAndDeletes CF (→ gateway), incremental CF (→ proxy)
Copilot EndpointOrchestratorTests.java:112 — unused locals Fix E — removed greenByEndpoint/greenOrchestrator/redOnly

Plus three new EndpointOrchestratorTests: recoveryThresholdRequiresMultipleGreenCycles, forceUnhealthy_flipsGateToRedWithoutRunningProbe, forceUnhealthy_onClosedOrchestrator_isNoOp.

Validated: mvn -Punit verify against the four touched test classes — 57/57 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995

Copy link
Copy Markdown
Member Author

Third batch addressed in 66fca70 — quick summary:

Comment File Status
3385316064 EndpointOrchestrator.java body-drain Already addressed in 3f1b1be. Current code (lines 258-271) drains via
esponse.body().doOnNext(...).then(Mono.just(result)).timeout(...).doFinally(...).onErrorResume(...) — fully chained into the returned Mono, no dangling .subscribe(). The suggested .flatMap(b -> b.ignoreElement().thenReturn(result)) is functionally equivalent; sticking with the current shape to minimize churn.
3385316067 CHANGELOG wording Fixed in 66fca70. Clarified that the recovery threshold is configurable (COSMOS.THINCLIENT_PROBE_RECOVERY_THRESHOLD, default 1) and pointed out the symmetric-hysteresis tuning knob.
3385316069 GlobalEndpointManager probe Disposable Already addressed in 3f1b1be. Disposable is stored in thinClientProbeDisposable (AtomicReference<Disposable>, line 54), swapped with prior-disposable cleanup on every trigger (line 490), and disposed in close() (line 206) so it cannot outlive the client.

…cleanup, fix gwV2Cto and ThinClient user-agent assertions

- GlobalEndpointManager: convert thin-client probe trigger to a Mono<Void>
  chained into the topology-refresh reactor pipeline (replaces fire-and-forget
  subscribe). Removes thinClientProbeDisposable field and its close() handling
  since cancellation now propagates through the outer subscription.
- EndpointProbeClient/EndpointProbeClientTests/ThinClientProbeWiringTests:
  replace inline FQNs with imports (java.io.Closeable, java.util.List,
  java.net.ConnectException, com.azure.cosmos.implementation.http.HttpHeaders).
- ClientConfigDiagnosticsTest: compute gwV2Cto dynamically from
  Configs.isThinClientEnabled() so assertions remain valid after the default
  flip to true.
- ConfigsTests: update default-threshold assertions from 2 to 1 to match
  DEFAULT_THINCLIENT_PROBE_FAILURE_THRESHOLD=1.
- UserAgentContainerTest.UserAgentIntegration: expect '|F4' suffix because
  the ThinClient feature flag (1 << 2) is now included by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995

Copy link
Copy Markdown
Member Author

Pushed 8a602fbc04b addressing the latest review batch:

Round 4 / earlier feedback (re-confirmed):

  1. ✅ CHANGELOG: concise, focused on default Gateway V2 enablement.
  2. ✅ LocationCache: ALL-or-NOTHING — probe only when every thin-client endpoint resolves; partial maps flip the gate red.
  3. DEFAULT_THINCLIENT_PROBE_FAILURE_THRESHOLD = 1.
  4. ✅ Renamed EndpointOrchestratorEndpointProbeClient; lean DiagnosticsSnapshot (last-state + lastUpdatedAt only).

Round 5 (new):
5. ✅ ClientConfigDiagnosticsTest: assertions for gwV2Cto now derive the value dynamically from Configs.isThinClientEnabled() so they remain valid after the default flip. UserAgentContainerTest.UserAgentIntegration: updated to expect the |F4 suffix because the ThinClient user-agent flag is now included by default.

Round 6 (new):
6. ✅ GlobalEndpointManager: probe is now part of the reactor chain. triggerThinClientProbeCycle() (fire-and-forget .subscribe(...)) replaced with runThinClientProbeCycleMono(): Mono<Void>; the three trigger sites (forceRefresh path, refreshLocationPrivateAsync prefix, and the inner shouldRefreshEndpoints branch) all chain via .flatMap(...)/.then(Mono.defer(...)). thinClientProbeDisposable field and its close() handling removed — cancellation propagates through the outer subscription disposed by backgroundRefreshDisposable.dispose(). runProbeCycle already absorbs per-probe errors and has a per-probe timeout, so chaining is safe.

Round 7 (new):
7. ✅ Removed inline FQNs in favor of imports:

  • EndpointProbeClient: java.io.Closeable, java.util.List
  • EndpointProbeClientTests: java.net.ConnectException
  • ThinClientProbeWiringTests: com.azure.cosmos.implementation.http.HttpHeaders

Verification:

  • mvn -o install -pl :azure-cosmos -am -DskipTests=true → SUCCESS
  • mvn -o verify -Punit -Dit.test=EndpointProbeClientTests,ConfigsTests,ThinClientProbeWiringTests,ThinClientRoutingGateTests,ClientConfigDiagnosticsTest,UserAgentContainerTest → BUILD SUCCESS (all tests green)

@jeet1995

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@jeet1995

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@jeet1995

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants