feat: Add client secret expiry tracking and automatic renew for DCR#4194
feat: Add client secret expiry tracking and automatic renew for DCR#4194Sanskarzz wants to merge 9 commits intostacklok:mainfrom
Conversation
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4194 +/- ##
==========================================
- Coverage 68.77% 68.66% -0.11%
==========================================
Files 473 474 +1
Lines 47919 48115 +196
==========================================
+ Hits 32955 33040 +85
- Misses 12299 12342 +43
- Partials 2665 2733 +68 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
294c25f to
a104b56
Compare
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
PR size has been reduced below the XL threshold. Thank you for splitting this up!
|
✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up! |
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
amirejaz
left a comment
There was a problem hiding this comment.
Review: PR #4194 — Client Secret Expiry and Renewal (RFC 7592)
The overall design is sound and addresses a real DCR lifecycle gap. Found a few spec compliance issues, a double-renewal race, and some convention violations that should be addressed before merge.
Summary of findings
- Must fix: secret accumulation in the secrets store on renewal, double PUT to RFC 7592 endpoint, hardcoded
ClientNameliteral, wrongCallbackPortin renewal request body - Should fix: hand-written mock (use generated one),
slog.Infofor success/diagnostic paths,strings.NewReader(string([]byte))allocation, wrong//nolint:gosecrule number, in-memory state updated beforeSaveState - Nice to have: injectable HTTP client for testability, full response body drain, testify consistency
| } | ||
|
|
||
| // Use the rotated registration_access_token if provided; fall back to existing. | ||
| newRegToken := updateResp.RegistrationAccessToken |
There was a problem hiding this comment.
[Spec / Secret leak] When the server rotates the registration_access_token (newRegToken != ""), persistClientCredentials calls GenerateUniqueSecretNameWithPrefix which mints a new OAUTH_REG_TOKEN_* key. The old key is never deleted, so each renewal accumulates a stale secret in the store.
Same issue applies to OAUTH_CLIENT_SECRET_* keys — every call to persistClientCredentials with a non-empty clientSecret generates a new key.
Fix: before writing the new secret, delete the old key if CachedRegTokenRef / CachedClientSecretRef is already set (or reuse the existing key name rather than generating a fresh one).
| slog.Debug("Using cached DCR client credentials", "client_id", clientID) | ||
|
|
||
| // Proactively renew the client secret if it is expiring soon (RFC 7592) | ||
| if h.isSecretExpiredOrExpiringSoon() { |
There was a problem hiding this comment.
[Spec / Double renewal] tryRestoreFromCachedTokens (below, ~line 270) and resolveClientCredentials (here) both call renewClientSecret independently. When tryRestoreFromCachedTokens is on the call path it calls renewClientSecret first, then calls resolveClientCredentials, which evaluates isSecretExpiredOrExpiringSoon() again — but h.config.CachedSecretExpiry is never updated in memory by persistClientCredentials (only the persisted config is updated). So the second check still fires and a second PUT is sent to the RFC 7592 endpoint.
Fix: after a successful renewal, update h.config.CachedSecretExpiry in memory so the second isSecretExpiredOrExpiringSoon() call returns false.
|
|
||
| // Proactively renew the client secret if it is expiring soon (RFC 7592) | ||
| if h.isSecretExpiredOrExpiringSoon() { | ||
| slog.Info("Cached client secret is expiring soon, attempting renewal", |
There was a problem hiding this comment.
[Convention] slog.Info fires on every token resolution when the secret is expiring. Per .claude/rules/go-style.md "silent success — no output at INFO or above for successful operations". This should be slog.Debug (or slog.Warn only when the secret is already past expiry, not just approaching it).
| slog.Info("Cached client secret is expiring soon, attempting renewal", | |
| slog.Debug("Cached client secret is expiring soon, attempting renewal", |
| // Check if the cached client secret is expired before attempting token refresh. | ||
| // If it has fully expired and renewal also fails we must force a fresh OAuth flow. | ||
| if h.isSecretExpiredOrExpiringSoon() { | ||
| slog.Info("Cached client secret is expiring or expired; attempting renewal before token restore", |
There was a problem hiding this comment.
[Convention] Same slog.Info → slog.Debug/slog.Warn issue as above. Diagnostic messages that fire on every tryRestoreFromCachedTokens call should not be at INFO level.
| slog.Info("Cached client secret is expiring or expired; attempting renewal before token restore", | |
| slog.Debug("Cached client secret is expiring or expired; attempting renewal before token restore", |
| // that were provided during the initial registration. | ||
| updateReq := clientUpdateRequest{ | ||
| ClientID: h.config.CachedClientID, | ||
| ClientName: "ToolHive MCP Client", |
There was a problem hiding this comment.
[Spec compliance] Two problems with this request body:
-
ClientNameis a hardcoded string literal instead ofoauth.ToolHiveMCPClientName. If the constant changes, the value sent during registration and the value sent during renewal will diverge — violating RFC 7592's requirement for consistent metadata. -
h.config.CallbackPortis the configured default port, not the port that was actually registered during DCR (which may have been reassigned bynetworking.FindOrUsePort). If the port differs, strict RFC 7592 servers will reject the request because theredirect_uriwon't match.
Fix: use oauth.ToolHiveMCPClientName for the name, and persist the registered callback port (similar to how CachedRegClientURI is now persisted) so it can be echoed back accurately.
| return fmt.Errorf("failed to persist renewed client secret: %w", err) | ||
| } | ||
|
|
||
| slog.Info("Successfully renewed client secret via RFC 7592", |
There was a problem hiding this comment.
[Convention] slog.Info for a successful operation. Per .claude/rules/go-style.md "silent success" rule, this should be slog.Debug.
| slog.Info("Successfully renewed client secret via RFC 7592", | |
| slog.Debug("Successfully renewed client secret via RFC 7592", |
|
|
||
| // validateRegistrationClientURI validates that the registration_client_uri is | ||
| // a valid HTTPS URL (or localhost for development). | ||
| func validateRegistrationClientURI(registrationClientURI string) error { |
There was a problem hiding this comment.
[Edge case] validateRegistrationClientURI validates scheme and host but permits any path, including /. A registration_client_uri of https://example.com/ passes validation but would send a PUT to the server root — almost certainly a misconfiguration.
RFC 7592 §2.2 implies the URI must be the unique management endpoint for a specific client registration. Add a check:
if parsedURL.Path == "" || parsedURL.Path == "/" {
return fmt.Errorf("registration_client_uri must include a non-root path: %s", registrationClientURI)
}|
|
||
| // mockSecretProvider is a simple in-memory secret store for tests. | ||
| // It implements the full secrets.Provider interface. | ||
| type mockSecretProvider struct { |
There was a problem hiding this comment.
[Convention] Per .claude/rules/testing.md: "Use go.uber.org/mock (gomock) framework — never hand-write mocks."
A generated mock for secrets.Provider already exists at pkg/secrets/mocks/mock_provider.go (run task gen to regenerate). Please replace this hand-written mockSecretProvider with the generated mock to stay consistent with project conventions and avoid silent interface drift if secrets.Provider gains new methods.
|
|
||
| err := h.renewClientSecret(context.Background()) | ||
| if err == nil { | ||
| t.Fatal("expected error, got nil") |
There was a problem hiding this comment.
[Convention] This test uses t.Fatal / t.Errorf while the rest of the test file uses testify require / assert. Per .claude/rules/testing.md, prefer require.NoError / assert.Contains throughout for consistency.
| t.Fatal("expected error, got nil") | |
| require.Error(t, err) |
| regAccessToken, regClientURI string, | ||
| tokenEndpointAuthMethod string, | ||
| ) error { | ||
| r.Config.RemoteAuthConfig.CachedClientID = clientID |
There was a problem hiding this comment.
[Convention / Correctness] r.Config.RemoteAuthConfig.CachedClientID is updated in memory before the subsequent StoreSecretInManagerWithProvider and SaveState calls. If either of those fails, the in-memory config reflects the new clientID but the durable state is inconsistent with it.
Per .claude/rules/go-style.md: write to durable storage before updating in-memory state. Move all r.Config.RemoteAuthConfig.* assignments to after SaveState returns successfully.
Summary
Implements client secret expiry tracking and automatic renewal for Dynamic Client Registration (DCR)
Fixes: #3631
ToolHive already stored the
client_idandclient_secretfrom DCR responses, but discarded theregistration_access_token,registration_client_uri, andclient_secret_expires_atfields. Without these, there was no way to detect or renew an expired client secret, causing silent authentication failures for long-running workloads on providers like Keycloak that issue expiring secrets.This PR implements the full RFC 7591 / RFC 7592 secret lifecycle in three phases:
Note
All the phases are added in the same PR.
Changes
pkg/auth/remote/persisting_token_source.goExtended
ClientCredentialsPersisterfunction type signature:pkg/auth/remote/config.goCachedRegClientURIfield to storeregistration_client_urias plain textCachedTokenEndpointAuthMethodfield to store the authentication method used during DCRClearCachedClientCredentials()to also clear the new fieldspkg/auth/discovery/discovery.goOAuthFlowConfig(used as the threading vehicle fromhandleDynamicRegistrationthrough to the result)OAuthFlowResultwithSecretExpiry,RegistrationAccessToken,RegistrationClientURI, andTokenEndpointAuthMethodhandleDynamicRegistration()to capture all four fromDynamicClientRegistrationResponse:ClientSecretExpiresAt > 0→time.Unix(...)(zero if the field is0, meaning never expires)RegistrationAccessTokenandRegistrationClientURIcopied as-isTokenEndpointAuthMethodcaptured from response (defaults toclient_secret_postin request)newOAuthFlow()to populate the new fields inOAuthFlowResultpkg/auth/remote/handler.gowrapWithPersistence()to pass all 6 arguments toclientCredentialsPersister"time"importresolveClientCredentials()now proactively callsrenewClientSecret()when the secret is expiring within 24 h; renewal failures are soft-logged and execution continuestryRestoreFromCachedTokens()now checks expiry before attempting token refresh:pkg/runner/runner.goUpdated
SetClientCredentialsPersistercallback to match the new 6-argument signature and persist:CachedSecretExpiry— stored directly in configregistrationAccessToken— stored securely in the secret manager, reference saved toCachedRegTokenRefregistrationClientURI— stored as plain text inCachedRegClientURItokenEndpointAuthMethod— stored directly in config asCachedTokenEndpointAuthMethodpkg/auth/remote/secret_renewal.go(new file)Implements RFC 7592 §2.2 client secret renewal:
isSecretExpiredOrExpiringSoon()— returns true when the secret is withinsecretExpiryBuffer(24 h) of expiry or already past it; false for zero expiry (never expires)renewClientSecret(ctx)— sends an authenticated HTTPPUTtoregistration_client_uriper RFC 7592 §2.2:registrationAccessTokenfrom the secret managerregistration_client_uri(must be HTTPS or localhost)client_secret,client_secret_expires_at, and optionally a rotatedregistration_access_tokenclientCredentialsPersistervalidateRegistrationClientURI(uri)— validates the URI is HTTPS (or localhost for development)pkg/auth/remote/secret_renewal_test.go(new file)16 new tests covering:
TestIsSecretExpiredOrExpiringSoon(5 cases)TestValidateRegistrationClientURI(6 cases)TestRenewClientSecret_MissingConfig(3 cases)TestRenewClientSecret_SuccessTestRenewClientSecret_ServerErrorTestRenewClientSecret_NoPersisterTestRenewClientSecret_ZeroExpiryInResponseclient_secret_expires_at: 0→time.Time{}(never expires)Breaking Changes
Warning
ClientCredentialsPersisterfunction type signature changed.Any code outside this repository that implements or calls
ClientCredentialsPersistermust be updated to use the new 6-parameter signature. Within this repository, the only call site ispkg/runner/runner.go, which is updated in this PR.Backward Compatibility
client_secret_expires_at = 0)CachedSecretExpiryistime.Time{}.isSecretExpiredOrExpiringSoon()returnsfalse. No renewal is attempted.registration_access_token/ URI)renewClientSecret()returns an immediate error. Caller logs a warning and continues with the existing secret.RFC References
client_secret_expires_at,registration_access_token,registration_client_uriin registration responseTest Results
Type of change
Test plan
I have done E2E testing with Keycloak.