Skip to content

feat: Add client secret expiry tracking and automatic renew for DCR#4194

Open
Sanskarzz wants to merge 9 commits intostacklok:mainfrom
Sanskarzz:autosecretDCR
Open

feat: Add client secret expiry tracking and automatic renew for DCR#4194
Sanskarzz wants to merge 9 commits intostacklok:mainfrom
Sanskarzz:autosecretDCR

Conversation

@Sanskarzz
Copy link
Copy Markdown
Contributor

@Sanskarzz Sanskarzz commented Mar 17, 2026

Summary

Implements client secret expiry tracking and automatic renewal for Dynamic Client Registration (DCR)

Fixes: #3631

ToolHive already stored the client_id and client_secret from DCR responses, but discarded the registration_access_token, registration_client_uri, and client_secret_expires_at fields. Without these, there was no way to detect or renew an expired client secret, causing silent authentication failures for long-running workloads on providers like Keycloak that issue expiring secrets.

This PR implements the full RFC 7591 / RFC 7592 secret lifecycle in three phases:

  1. Store all DCR response metadata through the persistence pipeline
  2. Detect expiry proactively (24 h buffer) and on session restore
  3. Renew the secret via RFC 7592 §2.2 before it becomes a problem

Note
All the phases are added in the same PR.


Changes

pkg/auth/remote/persisting_token_source.go

Extended ClientCredentialsPersister function type signature:

// Before
type ClientCredentialsPersister func(clientID, clientSecret string) error

// After
type ClientCredentialsPersister func(
    clientID string,
    clientSecret string,
    secretExpiry time.Time,           // zero = never expires (RFC 7591 §3.2.1)
    registrationAccessToken string,   // for RFC 7592 management operations
    registrationClientURI string,     // endpoint for RFC 7592 updates
    tokenEndpointAuthMethod string,   // "client_secret_post" or "client_secret_basic" (RFC 7591 §2)
) error

pkg/auth/remote/config.go

  • Added CachedRegClientURI field to store registration_client_uri as plain text
  • Added CachedTokenEndpointAuthMethod field to store the authentication method used during DCR
  • Updated ClearCachedClientCredentials() to also clear the new fields

pkg/auth/discovery/discovery.go

  • Added DCR renewal metadata fields to OAuthFlowConfig (used as the threading vehicle from handleDynamicRegistration through to the result)
  • Extended OAuthFlowResult with SecretExpiry, RegistrationAccessToken, RegistrationClientURI, and TokenEndpointAuthMethod
  • Updated handleDynamicRegistration() to capture all four from DynamicClientRegistrationResponse:
    • ClientSecretExpiresAt > 0time.Unix(...) (zero if the field is 0, meaning never expires)
    • RegistrationAccessToken and RegistrationClientURI copied as-is
    • TokenEndpointAuthMethod captured from response (defaults to client_secret_post in request)
  • Updated newOAuthFlow() to populate the new fields in OAuthFlowResult

pkg/auth/remote/handler.go

  • Updated wrapWithPersistence() to pass all 6 arguments to clientCredentialsPersister
  • Added "time" import
  • resolveClientCredentials() now proactively calls renewClientSecret() when the secret is expiring within 24 h; renewal failures are soft-logged and execution continues
  • tryRestoreFromCachedTokens() now checks expiry before attempting token refresh:
    • Within 24 h buffer + renewal fails → warning, continue with existing secret
    • Past expiry + renewal fails → hard error, forces a fresh OAuth flow

pkg/runner/runner.go

Updated SetClientCredentialsPersister callback to match the new 6-argument signature and persist:

  • CachedSecretExpiry — stored directly in config
  • registrationAccessToken — stored securely in the secret manager, reference saved to CachedRegTokenRef
  • registrationClientURI — stored as plain text in CachedRegClientURI
  • tokenEndpointAuthMethod — stored directly in config as CachedTokenEndpointAuthMethod

pkg/auth/remote/secret_renewal.go (new file)

Implements RFC 7592 §2.2 client secret renewal:

  • isSecretExpiredOrExpiringSoon() — returns true when the secret is within secretExpiryBuffer (24 h) of expiry or already past it; false for zero expiry (never expires)
  • renewClientSecret(ctx) — sends an authenticated HTTP PUT to registration_client_uri per RFC 7592 §2.2:
    • Retrieves registrationAccessToken from the secret manager
    • Validates registration_client_uri (must be HTTPS or localhost)
    • Sends current client metadata as the request body (required by RFC 7592)
    • Parses the response (RFC 7592 §2.1), extracts new client_secret, client_secret_expires_at, and optionally a rotated registration_access_token
    • Persists everything via clientCredentialsPersister
  • validateRegistrationClientURI(uri) — validates the URI is HTTPS (or localhost for development)

pkg/auth/remote/secret_renewal_test.go (new file)

16 new tests covering:

Test Coverage
TestIsSecretExpiredOrExpiringSoon (5 cases) Zero, far-future, within-buffer, past-expiry, at-boundary
TestValidateRegistrationClientURI (6 cases) Empty, HTTPS, HTTP non-localhost rejected, localhost/127.0.0.1 allowed, invalid URL
TestRenewClientSecret_MissingConfig (3 cases) No URI, no token ref, no secret provider
TestRenewClientSecret_Success Happy path with mock RFC 7592 server; token rotation
TestRenewClientSecret_ServerError HTTP 401 from server
TestRenewClientSecret_NoPersister Server OK but no persister configured
TestRenewClientSecret_ZeroExpiryInResponse client_secret_expires_at: 0time.Time{} (never expires)

Breaking Changes

Warning

ClientCredentialsPersister function type signature changed.

Any code outside this repository that implements or calls ClientCredentialsPersister must be updated to use the new 6-parameter signature. Within this repository, the only call site is pkg/runner/runner.go, which is updated in this PR.


Backward Compatibility

Scenario Behaviour
Provider does not issue expiring secrets (client_secret_expires_at = 0) CachedSecretExpiry is time.Time{}. isSecretExpiredOrExpiringSoon() returns false. No renewal is attempted.
Provider does not support RFC 7592 (no registration_access_token / URI) renewClientSecret() returns an immediate error. Caller logs a warning and continues with the existing secret.
Provider does issue expiring secrets and supports RFC 7592 Secret renewed automatically within 24 h of expiry.
Renewal fails but secret is still within its validity window Soft warning logged, existing secret used.
Renewal fails and secret is past its expiry Hard error returned; caller performs a fresh OAuth flow.

RFC References

  • RFC 7591 3.2.1client_secret_expires_at, registration_access_token, registration_client_uri in registration response
  • RFC 7592 2.2 — Client Update Request (PUT with Bearer token)
  • RFC 7592 2.1 — Client Information Response (response format for read/update)

Test Results

ok  github.com/stacklok/toolhive/pkg/auth               PASS
ok  github.com/stacklok/toolhive/pkg/auth/awssts        PASS
ok  github.com/stacklok/toolhive/pkg/auth/discovery     PASS
ok  github.com/stacklok/toolhive/pkg/auth/oauth         PASS
ok  github.com/stacklok/toolhive/pkg/auth/remote        PASS  (16 new tests)
ok  github.com/stacklok/toolhive/pkg/auth/secrets       PASS
ok  github.com/stacklok/toolhive/pkg/auth/tokenexchange PASS
ok  github.com/stacklok/toolhive/pkg/auth/upstreamswap  PASS
ok  github.com/stacklok/toolhive/pkg/auth/upstreamtoken PASS

Type of change

  • Bug fix
  • [*] New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

I have done E2E testing with Keycloak.

sanskar@sanskar-HP-Laptop-15s-du1xxx:~/opensource/stacklok/toolhive$ ./bin/thv logs keycloak-e2e-detached --proxy | grep -E "renew|RFC 7592|expired"
{"time":"2026-03-22T03:19:13+05:30","level":"DEBUG","msg":"DCR response includes registration access token for RFC 7592 operations"}
{"time":"2026-03-22T03:20:32+05:30","level":"DEBUG","msg":"Stored DCR registration access token for RFC 7592 operations"}
{"time":"2026-03-22T03:24:40+05:30","level":"DEBUG","msg":"DCR response includes registration access token for RFC 7592 operations"}
sanskar@sanskar-HP-Laptop-15s-du1xxx:~/opensource/stacklok/toolhive$ 
{"time":"2026-03-22T03:46:24+05:30","level":"INFO","msg":"Cached client secret is expiring or expired; attempting renewal before token restore","expiry":"2024-01-01T00:00:00Z"}
{"time":"2026-03-22T03:46:24+05:30","level":"DEBUG","msg":"Attempting RFC 7592 client secret renewal","registration_client_uri":"http://localhost:8080/realms/mcp-test/clients-registrations/openid-connect/8327e637-eef9-4f2e-93fb-768e5ed6b4a1"}
{"time":"2026-03-22T03:46:24+05:30","level":"INFO","msg":"Successfully renewed client secret via RFC 7592","client_id":"8327e637-eef9-4f2e-93fb-768e5ed6b4a1","new_expiry_zero":true,"reg_token_rotated":true}

- [*] Unit tests (`task test`)
- [*] E2E tests (`task test-e2e`)
- [*] Linting (`task lint-fix`)
- [*] Manual testing (describe below) Pending

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Mar 17, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 51.16279% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.66%. Comparing base (0108b11) to head (0e0a520).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/runner/runner.go 0.00% 62 Missing ⚠️
pkg/auth/remote/secret_renewal.go 86.13% 7 Missing and 7 partials ⚠️
pkg/auth/discovery/discovery.go 0.00% 12 Missing ⚠️
pkg/auth/remote/handler.go 64.28% 8 Missing and 2 partials ⚠️
pkg/auth/remote/persisting_token_source.go 0.00% 3 Missing ⚠️
pkg/state/runconfig.go 33.33% 2 Missing ⚠️
pkg/workloads/types/types.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4194      +/-   ##
==========================================
- Coverage   68.77%   68.66%   -0.11%     
==========================================
  Files         473      474       +1     
  Lines       47919    48115     +196     
==========================================
+ Hits        32955    33040      +85     
- Misses      12299    12342      +43     
- Partials     2665     2733      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 17, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 18, 2026
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 18, 2026
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 18, 2026
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 18, 2026
@github-actions github-actions bot dismissed their stale review March 18, 2026 21:28

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions
Copy link
Copy Markdown
Contributor

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 21, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@Sanskarzz Sanskarzz marked this pull request as ready for review March 21, 2026 23:49
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 21, 2026
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 22, 2026
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 20, 2026
Copy link
Copy Markdown
Contributor

@amirejaz amirejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #4194 — Client Secret Expiry and Renewal (RFC 7592)

The overall design is sound and addresses a real DCR lifecycle gap. Found a few spec compliance issues, a double-renewal race, and some convention violations that should be addressed before merge.

Summary of findings

  • Must fix: secret accumulation in the secrets store on renewal, double PUT to RFC 7592 endpoint, hardcoded ClientName literal, wrong CallbackPort in renewal request body
  • Should fix: hand-written mock (use generated one), slog.Info for success/diagnostic paths, strings.NewReader(string([]byte)) allocation, wrong //nolint:gosec rule number, in-memory state updated before SaveState
  • Nice to have: injectable HTTP client for testability, full response body drain, testify consistency

}

// Use the rotated registration_access_token if provided; fall back to existing.
newRegToken := updateResp.RegistrationAccessToken
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Spec / Secret leak] When the server rotates the registration_access_token (newRegToken != ""), persistClientCredentials calls GenerateUniqueSecretNameWithPrefix which mints a new OAUTH_REG_TOKEN_* key. The old key is never deleted, so each renewal accumulates a stale secret in the store.

Same issue applies to OAUTH_CLIENT_SECRET_* keys — every call to persistClientCredentials with a non-empty clientSecret generates a new key.

Fix: before writing the new secret, delete the old key if CachedRegTokenRef / CachedClientSecretRef is already set (or reuse the existing key name rather than generating a fresh one).

slog.Debug("Using cached DCR client credentials", "client_id", clientID)

// Proactively renew the client secret if it is expiring soon (RFC 7592)
if h.isSecretExpiredOrExpiringSoon() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Spec / Double renewal] tryRestoreFromCachedTokens (below, ~line 270) and resolveClientCredentials (here) both call renewClientSecret independently. When tryRestoreFromCachedTokens is on the call path it calls renewClientSecret first, then calls resolveClientCredentials, which evaluates isSecretExpiredOrExpiringSoon() again — but h.config.CachedSecretExpiry is never updated in memory by persistClientCredentials (only the persisted config is updated). So the second check still fires and a second PUT is sent to the RFC 7592 endpoint.

Fix: after a successful renewal, update h.config.CachedSecretExpiry in memory so the second isSecretExpiredOrExpiringSoon() call returns false.


// Proactively renew the client secret if it is expiring soon (RFC 7592)
if h.isSecretExpiredOrExpiringSoon() {
slog.Info("Cached client secret is expiring soon, attempting renewal",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Convention] slog.Info fires on every token resolution when the secret is expiring. Per .claude/rules/go-style.md "silent success — no output at INFO or above for successful operations". This should be slog.Debug (or slog.Warn only when the secret is already past expiry, not just approaching it).

Suggested change
slog.Info("Cached client secret is expiring soon, attempting renewal",
slog.Debug("Cached client secret is expiring soon, attempting renewal",

// Check if the cached client secret is expired before attempting token refresh.
// If it has fully expired and renewal also fails we must force a fresh OAuth flow.
if h.isSecretExpiredOrExpiringSoon() {
slog.Info("Cached client secret is expiring or expired; attempting renewal before token restore",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Convention] Same slog.Infoslog.Debug/slog.Warn issue as above. Diagnostic messages that fire on every tryRestoreFromCachedTokens call should not be at INFO level.

Suggested change
slog.Info("Cached client secret is expiring or expired; attempting renewal before token restore",
slog.Debug("Cached client secret is expiring or expired; attempting renewal before token restore",

// that were provided during the initial registration.
updateReq := clientUpdateRequest{
ClientID: h.config.CachedClientID,
ClientName: "ToolHive MCP Client",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Spec compliance] Two problems with this request body:

  1. ClientName is a hardcoded string literal instead of oauth.ToolHiveMCPClientName. If the constant changes, the value sent during registration and the value sent during renewal will diverge — violating RFC 7592's requirement for consistent metadata.

  2. h.config.CallbackPort is the configured default port, not the port that was actually registered during DCR (which may have been reassigned by networking.FindOrUsePort). If the port differs, strict RFC 7592 servers will reject the request because the redirect_uri won't match.

Fix: use oauth.ToolHiveMCPClientName for the name, and persist the registered callback port (similar to how CachedRegClientURI is now persisted) so it can be echoed back accurately.

return fmt.Errorf("failed to persist renewed client secret: %w", err)
}

slog.Info("Successfully renewed client secret via RFC 7592",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Convention] slog.Info for a successful operation. Per .claude/rules/go-style.md "silent success" rule, this should be slog.Debug.

Suggested change
slog.Info("Successfully renewed client secret via RFC 7592",
slog.Debug("Successfully renewed client secret via RFC 7592",


// validateRegistrationClientURI validates that the registration_client_uri is
// a valid HTTPS URL (or localhost for development).
func validateRegistrationClientURI(registrationClientURI string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Edge case] validateRegistrationClientURI validates scheme and host but permits any path, including /. A registration_client_uri of https://example.com/ passes validation but would send a PUT to the server root — almost certainly a misconfiguration.

RFC 7592 §2.2 implies the URI must be the unique management endpoint for a specific client registration. Add a check:

if parsedURL.Path == "" || parsedURL.Path == "/" {
    return fmt.Errorf("registration_client_uri must include a non-root path: %s", registrationClientURI)
}


// mockSecretProvider is a simple in-memory secret store for tests.
// It implements the full secrets.Provider interface.
type mockSecretProvider struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Convention] Per .claude/rules/testing.md: "Use go.uber.org/mock (gomock) framework — never hand-write mocks."

A generated mock for secrets.Provider already exists at pkg/secrets/mocks/mock_provider.go (run task gen to regenerate). Please replace this hand-written mockSecretProvider with the generated mock to stay consistent with project conventions and avoid silent interface drift if secrets.Provider gains new methods.


err := h.renewClientSecret(context.Background())
if err == nil {
t.Fatal("expected error, got nil")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Convention] This test uses t.Fatal / t.Errorf while the rest of the test file uses testify require / assert. Per .claude/rules/testing.md, prefer require.NoError / assert.Contains throughout for consistency.

Suggested change
t.Fatal("expected error, got nil")
require.Error(t, err)

Comment thread pkg/runner/runner.go
regAccessToken, regClientURI string,
tokenEndpointAuthMethod string,
) error {
r.Config.RemoteAuthConfig.CachedClientID = clientID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Convention / Correctness] r.Config.RemoteAuthConfig.CachedClientID is updated in memory before the subsequent StoreSecretInManagerWithProvider and SaveState calls. If either of those fails, the in-memory config reflects the new clientID but the durable state is inconsistent with it.

Per .claude/rules/go-style.md: write to durable storage before updating in-memory state. Move all r.Config.RemoteAuthConfig.* assignments to after SaveState returns successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Client Secret Expiry and Renewal for Dynamic Client Registration

2 participants