refactor(api): thread context through OAuth token-fetch and retry backoff (KLA-448)#44
Merged
Merged
Conversation
…koff (KLA-448) Companion to KLA-447. Bugbot findings #7 and #8 on PR #42 surfaced two related context-leak defects: 1. TokenCache.fetchToken used http.NewRequest (no context) and the httpClient's 30s Timeout. A caller passing a 100ms ctx still waited up to 30s for OAuth token exchange. 2. retryTransport.backoff used time.Sleep — uninterruptible — so a ctx that fired mid-backoff didn't short-circuit the retry loop. Both forced jc doctor to wrap its probe in a goroutine + select with leak-on-timeout. Bad pattern; would be unacceptable in jc mcp serve (long-running) or any future caller needing tight timeouts. Changes: - internal/api/oauth.go: Token(ctx) and fetchToken(ctx) accept context, use http.NewRequestWithContext. Nil-context defense for cobra-test patterns that construct commands without RunE. - internal/api/client.go: bearerAuthTransport.RoundTrip passes req.Context() to Token() — propagates the request deadline all the way through the OAuth hop. - internal/api/retry.go: retrySleepFn signature is now func(ctx context.Context, d time.Duration) error so backoff can return ctx.Err() early. The default impl is a select on timer + ctx.Done() — well-behaved by construction. SetRetrySleepFn API changes signature too; the 4 test callers (mcp/apps_test, api internal tests) updated. - internal/cmd/auth.go + setup.go: cmd.Context() / wiz.cmd.Context() threaded through to tc.Token() in the auth login / status / setup flows. - internal/cmd/doctor.go: runAPIProbe drops the goroutine + select + result-channel wrapper. Just calls client.ListAll(ctx, ...) directly. classifyProbeError already handles context errors → "timeout" classification. Net ~30 lines removed from the probe. Tests (4 new in context_propagation_test.go): - TokenCache.Token honors a 100ms context against a slow-responder (returns < 2s, pre-fix would wait 30s) - TokenCache.Token honors context.Cancel - retryTransport backoff respects ctx cancellation (returns < 1s when ctx fires during the first 1s backoff) - retry-loop errors are errors.Is-friendly for DeadlineExceeded so doctor's classifier handles them correctly All 200+ existing tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jrennichjc
approved these changes
Jun 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
KLA-448 — companion to KLA-447. Bugbot findings #7 and #8 on PR #42 surfaced two related context-leak defects in the api package's HTTP path:
TokenCache.fetchTokenusedhttp.NewRequest(no context) and the httpClient's 30s Timeout. A caller passing a 100ms ctx still waited up to 30s for OAuth token exchange.retryTransport.backoffusedtime.Sleep— uninterruptible — so a ctx that fired mid-backoff didn't short-circuit the retry loop.Both forced `jc doctor` to wrap its probe in a goroutine + select with leak-on-timeout. Bad pattern; unacceptable in `jc mcp serve` (long-running) or any future caller needing tight timeouts.
Changes
Diff: 12 files, +263 / -98. ~30 lines removed from doctor's probe; gained 4 focused regression tests proving the new context-cancel paths work.
Tests (4 new in `context_propagation_test.go`)
All 200+ existing tests still pass.
Tying the loop back to KLA-439
After KLA-447 + KLA-448, doctor is two thin adapters:
The 13 Bugbot findings on PR #42 traced to two structural problems: precedence duplication (fixed in PR #43) and context-leak workarounds (fixed here). Both are gone by construction.
Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Touches shared HTTP transport and OAuth for all service-account traffic; behavior change is intentional (faster cancel) but any missed
Token(ctx)call site could break compilation or runtime. Doctor probe simplification is lower risk given new regression tests.Overview
Fixes KLA-448 by threading caller context through the API client’s OAuth and retry paths so deadlines and cancellation are honored end-to-end (e.g.
jc doctor --probe-timeout).OAuth:
TokenCache.Token/fetchTokennow takecontext.Contextand usehttp.NewRequestWithContext;bearerAuthTransportpassesreq.Context()intoToken(). Retries:retrySleepFnandbackoffwait withselecton the context instead of blockingtime.Sleep, and the retry loop returnsctx.Err()when backoff is interrupted.Call sites pass
cmd.Context()for service-account flows in auth and setup.runAPIProbedrops the goroutine/channel timeout wrapper and callsclient.ListAll(ctx, …)directly, with a small tweak when the parent context expires before the probe timeout.Adds
context_propagation_test.goand updates existing tests/mocks for the newretrySleepFnsignature.Reviewed by Cursor Bugbot for commit ac67bc1. Bugbot is set up for automated code reviews on this repo. Configure here.