feat: Async Refresh for Regional Access Boundaries by vverman · Pull Request #1880 · googleapis/google-auth-library-java

vverman · 2026-01-25T21:35:27Z

Contains changes for the feature Regional Access Boundary (Previously Called Trust Boundaries).

The following are salient changes:

Calls to refresh RAB are now all async and in a separate thread.
Logic for refreshing RAB now exists in its own class for cleaner maintenance.
Self-signed jwts are within scope.
Changes to how we trigger RAB refresh and deal with refresh errors.

…pe. Updated tests.

…rors are now retryable.

oauth2_http/java/com/google/auth/http/HttpCredentialsAdapter.java

oauth2_http/java/com/google/auth/oauth2/GoogleCredentials.java

oauth2_http/java/com/google/auth/oauth2/RABManager.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundaryManager.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundary.java

oauth2_http/java/com/google/auth/oauth2/GoogleCredentials.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundaryManager.java

…nce.

oauth2_http/java/com/google/auth/oauth2/ImpersonatedCredentials.java

oauth2_http/java/com/google/auth/oauth2/OAuth2Utils.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundaryProvider.java

oauth2_http/java/com/google/auth/http/HttpCredentialsAdapter.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundaryManager.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundary.java

nbayati · 2026-03-12T20:40:27Z

Potential minor issue worth checking out:
The use of private static mutable clock fields in RegionalAccessBoundary.java and RegionalAccessBoundaryManager.java is problematic for two main reasons:

Test Isolation: Since the clock is static, it is shared across all instances in the JVM. If one test mocks the clock using setClockForTest, it will inadvertently affect all other tests running in parallel or sequentially. This makes the test suite brittle and prone to non-deterministic failures.
Architectural Inconsistency: The rest of the library (such as OAuth2Credentials) uses instance-level clocks. Making these new classes use static clocks deviates from the established pattern and limits production flexibility (e.g., if a user wanted to customize the clock for a specific set of credentials).

lqiu96

Thanks for creating the follow up testing issue. The parameter is not idea, but given our time commitments, I am fine with it.

Took another look and I think this looks good. Could someone from AION take another pass to double check this?

nbayati · 2026-03-18T19:15:51Z

Discussed with Pranav offline, to avoid adding the disableRabRefreshForTest flag and keep the test suite stable, we’ve decided to keep GOOGLE_AUTH_TRUST_BOUNDARY_ENABLE_EXPERIMENT in this PR. We'll track the removal of that env var and the corresponding test refactor in issue #1898 to keep this PR focused.

lsirac · 2026-03-19T02:05:29Z

Blocking issues:

getRequestMetadata(URI) is not fail-open and can crash on RAB lookup. refreshRegionalAccessBoundaryIfExpired(...) is called with no try/catch block.
Scheduling failures permanently block RAB refreshes. In RegionalAccessBoundaryManager.java, you acquire the lock through refreshFuture.compareAndSet(null, future) before calling executor.execute() or Thread.start(). That lock is only released inside the finally block of the refreshTask itself. If executor.execute(...) throws a RejectedExecutionException (or thread creation fails), the task never runs, the finally block is never reached, and RAB will never refresh again for that credential.
Stale x-allowed-locations headers can survive serialization round-trips: Serialization resets the live RAB manager but preserves the old x-allowed-locations inside cached OAuth request metadata. After deserialization, we start from the serialized metadata, see no current live RAB to overwrite it with, and we keep sending the old header until access token refresh. It's safer to send no header at all vs. a stale header.
Static mutable test hooks break test isolation: clock and maxRetryElapsedTimeMillis are declared as private static in RegionalAccessBoundary.java and RegionalAccessBoundaryManager.java and mutated via @VisibleForTesting setters. This will cause race conditions when test suites run in parallel. Use the established library pattern where Clock is an instance-level transient field (e.g. like in OAuth2Credentials).
There is inconsistent host scoping: The PR skips RAB injection only for .rep.googleapis.com endpoints. If a RAB is already cached, addRegionalAccessBoundaryToRequestMetadata attaches the header to all hosts.
Impersonated credentials ignore iamEndpointOverride on the RAB path. Is this intentional?

RAB lookup should be best-effort and non-blocking, and it should not accidentally become synchronous depending on the executor the caller passes in. Lets also make sure we cover all of these with tests as well.

…thout a try catch block. 2. Lock acquiral for refreshFuture.compareAndSet(null, future) now fixed. 3. Oauth2Credentials isn't caching RAB which was earlier leading to serialization issues.

vverman · 2026-03-20T06:03:22Z

Thanks for the catches, addressed first 4 of Leo's comments with unit testing.

Regarding the rest of the 3 points

The RAB header being sent to regional endpoints shouldn't be an issue as per the group's discussion. IIUC, this isn't a blocking issue.
I believe this is an open question as to whether an IAM overriden RAB endpoint is even possible. Once that is answered, I can implement accordingly.
The Google Credentials already has a synchronous requestMetadata which doesn't accept an executor. The async getRequestMetadata is the one which accepts a user provided executor which is a pool used to execute requests asynchronously. Here I feel we should respect the user's decision and use that pool for our async RAB refresh as well.

vverman · 2026-03-20T06:33:42Z

Env vars are back in @nbayati would appreciate a look!

…m upon deserialization.

nbayati · 2026-03-24T22:29:11Z

Thanks for the catches, addressed first 4 of Leo's comments with unit testing.

Regarding the rest of the 3 points

The RAB header being sent to regional endpoints shouldn't be an issue as per the group's discussion. IIUC, this isn't a blocking issue.

IAM & STS endpoints are excluded from RAB scope.

The Google Credentials already has a synchronous requestMetadata which doesn't accept an executor. The async getRequestMetadata is the one which accepts a user provided executor which is a pool used to execute requests asynchronously. Here I feel we should respect the user's decision and use that pool for our async RAB refresh as well.

Regarding 1, Seems like you've fixed this in the Using per-instance clocks. commit, so I think we're covered here.

Regarding 3, I see the point you are making about respecting the user’s choice for their own async request. However, I want to share a different perspective on side-effects and resource isolation. The RAB lookup is an internal library feature that happens under the hood. The user did not explicitly ask for it when they requested a token. If the user passes an Executor (per the async path), they are expecting that executor to handle their async request. They do not expect the library to piggyback on it for hidden I/O operations (like the RAB lookup network call).

If a user happens to pass a synchronous executor, their main API request will block and wait for a hidden RAB network trip that they didn't even ask for. I think since the RAB lookups are transparent to the users and are essentially happening behind the scenes, their resource consumption should be transparent too. If we use the unmanaged new Thread() fallback (or create a dedicated internal static pool), it ensures that the library never accidentally blocks a user's threads for internal calls.

vverman · 2026-03-25T05:19:33Z

If the user passes an Executor (per the async path), they are expecting that executor to handle their async request. They do not expect the library to piggyback on it for hidden I/O operations (like the RAB lookup network call).

IIUC: The executor is a thread pool that the user passes to the auth library to do background work. The expectation is that the lib won't spin up its own threads (which is an expensive operation). Using the caller's executor ensures that the library operates inside the sandbox the user defined for us.

If a user happens to pass a synchronous executor, their main API request will block and wait for a hidden RAB network trip that they didn't even ask for. I think since the RAB lookups are transparent to the users and are essentially happening behind the scenes, their resource consumption should be transparent too. If we use the unmanaged new Thread() fallback (or create a dedicated internal static pool), it ensures that the library never accidentally blocks a user's threads for internal calls.

IIUC: the getRequestMetadata which accepts a RequestMetadataCallback which says ->

* The callback that receives the result of the asynchronous {@link
 * Credentials#getRequestMetadata(java.net.URI, java.util.concurrent.Executor,
 * RequestMetadataCallback)}. Exactly one method should be called.
 *

Which means it is intended to be used as an async method. While the user could pass in a synchronous executor, I believe we shouldn't consider that the expected flow.

TBF I was doing it the way you are suggesting previously but me and @lqiu96 had a discussion about user choice with directExecutors and ended up changing it.

vverman · 2026-03-27T21:03:38Z

Executor is no longer used for async RAB refresh, we are now initiating a new Thread call instead.

lqiu96 · 2026-03-31T15:33:26Z

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundaryManager.java

+    if (cooldownState.compareAndSet(currentCooldownState, next)) {
+      LoggingUtils.log(
+          LOGGER_PROVIDER,
+          Level.INFO,


Thoughts on this: If we want RAB to transparent to the user, perhaps we should aim for a lower log level?

Perhaps either Debug or Warn? Info may be confusing as users may not have any idea what RAB is.

Changed it to fine so it is still available when the users want to debug.

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundaryManager.java

oauth2_http/java/com/google/auth/oauth2/RegionalAccessBoundary.java

lsirac · 2026-04-01T03:17:28Z

Some more things:

We're spawning raw threads via new Thread(). Across a bunch of cred instances this is unbounded. It would be better to have a private executor / pool.
You made clock and maxRetryElapsedTimeMillis instance fields but environmentProvider in RegionalAccessBoundary.java remains a static mutable field mutated via a @VisibleForTesting setter

Added async logic for RAB refresh/ Now self-signed JWT are in RAB sco…

1b160a8

…pe. Updated tests.

vverman requested review from a team January 25, 2026 21:35

product-auto-label bot added the size: xl Pull request size is extra large. label Jan 25, 2026

Lint fixes.

525ae6d

vverman requested review from lqiu96 and nbayati January 25, 2026 21:49

Url for RAB to include only GDU. Only 500, 502, 503 and 504 lookup er…

b7058ea

…rors are now retryable.

lqiu96 changed the base branch from feat-tb-sa to feat/agentic-identities-cloudrun February 3, 2026 19:16

lqiu96 requested a review from a team as a code owner February 3, 2026 19:16