Skip to content

refactor(api): thread context through OAuth token-fetch and retry backoff (KLA-448)#44

Merged
jklaassenjc merged 1 commit into
mainfrom
juergen/kla-448-context-threading
Jun 6, 2026
Merged

refactor(api): thread context through OAuth token-fetch and retry backoff (KLA-448)#44
jklaassenjc merged 1 commit into
mainfrom
juergen/kla-448-context-threading

Conversation

@jklaassenjc
Copy link
Copy Markdown
Collaborator

@jklaassenjc jklaassenjc commented Jun 5, 2026

Summary

KLA-448 — companion to KLA-447. Bugbot findings #7 and #8 on PR #42 surfaced two related context-leak defects in the api package's HTTP path:

  1. TokenCache.fetchToken used http.NewRequest (no context) and the httpClient's 30s Timeout. A caller passing a 100ms ctx still waited up to 30s for OAuth token exchange.
  2. retryTransport.backoff used time.Sleep — uninterruptible — so a ctx that fired mid-backoff didn't short-circuit the retry loop.

Both forced `jc doctor` to wrap its probe in a goroutine + select with leak-on-timeout. Bad pattern; unacceptable in `jc mcp serve` (long-running) or any future caller needing tight timeouts.

Changes

File Change
internal/api/oauth.go `Token(ctx)` and `fetchToken(ctx)` accept context, use `http.NewRequestWithContext`. Nil-context defense for cobra-test patterns.
internal/api/client.go `bearerAuthTransport.RoundTrip` passes `req.Context()` to `Token()` — propagates the deadline through OAuth.
internal/api/retry.go `retrySleepFn` signature is now `func(ctx, d) error` so backoff returns `ctx.Err()` early. Default impl is a select on timer + ctx.Done() — well-behaved by construction.
internal/cmd/auth.go + setup.go `cmd.Context()` threaded through to `tc.Token()` in auth login / status / setup flows.
internal/cmd/doctor.go `runAPIProbe` drops the goroutine + select + result-channel wrapper. Just calls `client.ListAll(ctx, ...)` directly. Net ~30 lines removed.

Diff: 12 files, +263 / -98. ~30 lines removed from doctor's probe; gained 4 focused regression tests proving the new context-cancel paths work.

Tests (4 new in `context_propagation_test.go`)

  • `TestTokenCache_TokenRespectsContextTimeout` — 100ms ctx against a 2s slow-responder, asserts Token returns within ~2s (pre-fix: would wait 30s)
  • `TestTokenCache_TokenRespectsContextCancellation` — explicit cancel path, errors.Is(err, context.Canceled)
  • `TestRetryTransport_BackoffRespectsContext` — ctx fires during the first 1s backoff, retry loop short-circuits (returns < 1s)
  • `TestRetryTransport_PreservesContextErrorInChain` — errors.Is(err, DeadlineExceeded) — doctor's classifier depends on this

All 200+ existing tests still pass.

Tying the loop back to KLA-439

After KLA-447 + KLA-448, doctor is two thin adapters:

The 13 Bugbot findings on PR #42 traced to two structural problems: precedence duplication (fixed in PR #43) and context-leak workarounds (fixed here). Both are gone by construction.

Test plan

  • `go test ./...` full suite green (200+ tests)
  • `go vet ./...` clean
  • 4 new context-propagation tests pin the new behavior
  • Manual: `jc doctor --probe-timeout 100ms` returns within ~200ms against real JumpCloud (no more 2s retry-loop drag)
  • On merge: tag `1.18.2` via auto-release (patch label)

🤖 Generated with Claude Code


Note

Medium Risk
Touches shared HTTP transport and OAuth for all service-account traffic; behavior change is intentional (faster cancel) but any missed Token(ctx) call site could break compilation or runtime. Doctor probe simplification is lower risk given new regression tests.

Overview
Fixes KLA-448 by threading caller context through the API client’s OAuth and retry paths so deadlines and cancellation are honored end-to-end (e.g. jc doctor --probe-timeout).

OAuth: TokenCache.Token / fetchToken now take context.Context and use http.NewRequestWithContext; bearerAuthTransport passes req.Context() into Token(). Retries: retrySleepFn and backoff wait with select on the context instead of blocking time.Sleep, and the retry loop returns ctx.Err() when backoff is interrupted.

Call sites pass cmd.Context() for service-account flows in auth and setup. runAPIProbe drops the goroutine/channel timeout wrapper and calls client.ListAll(ctx, …) directly, with a small tweak when the parent context expires before the probe timeout.

Adds context_propagation_test.go and updates existing tests/mocks for the new retrySleepFn signature.

Reviewed by Cursor Bugbot for commit ac67bc1. Bugbot is set up for automated code reviews on this repo. Configure here.

…koff (KLA-448)

Companion to KLA-447. Bugbot findings #7 and #8 on PR #42 surfaced
two related context-leak defects:

  1. TokenCache.fetchToken used http.NewRequest (no context) and the
     httpClient's 30s Timeout. A caller passing a 100ms ctx still
     waited up to 30s for OAuth token exchange.
  2. retryTransport.backoff used time.Sleep — uninterruptible — so a
     ctx that fired mid-backoff didn't short-circuit the retry loop.

Both forced jc doctor to wrap its probe in a goroutine + select with
leak-on-timeout. Bad pattern; would be unacceptable in jc mcp serve
(long-running) or any future caller needing tight timeouts.

Changes:

- internal/api/oauth.go: Token(ctx) and fetchToken(ctx) accept
  context, use http.NewRequestWithContext. Nil-context defense for
  cobra-test patterns that construct commands without RunE.
- internal/api/client.go: bearerAuthTransport.RoundTrip passes
  req.Context() to Token() — propagates the request deadline all the
  way through the OAuth hop.
- internal/api/retry.go: retrySleepFn signature is now
  func(ctx context.Context, d time.Duration) error so backoff can
  return ctx.Err() early. The default impl is a select on timer +
  ctx.Done() — well-behaved by construction. SetRetrySleepFn API
  changes signature too; the 4 test callers (mcp/apps_test, api
  internal tests) updated.
- internal/cmd/auth.go + setup.go: cmd.Context() / wiz.cmd.Context()
  threaded through to tc.Token() in the auth login / status / setup
  flows.
- internal/cmd/doctor.go: runAPIProbe drops the goroutine + select
  + result-channel wrapper. Just calls client.ListAll(ctx, ...)
  directly. classifyProbeError already handles context errors →
  "timeout" classification. Net ~30 lines removed from the probe.

Tests (4 new in context_propagation_test.go):
- TokenCache.Token honors a 100ms context against a slow-responder
  (returns < 2s, pre-fix would wait 30s)
- TokenCache.Token honors context.Cancel
- retryTransport backoff respects ctx cancellation (returns < 1s
  when ctx fires during the first 1s backoff)
- retry-loop errors are errors.Is-friendly for DeadlineExceeded
  so doctor's classifier handles them correctly

All 200+ existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jklaassenjc jklaassenjc merged commit d17322b into main Jun 6, 2026
7 of 8 checks passed
@jklaassenjc jklaassenjc deleted the juergen/kla-448-context-threading branch June 6, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

3 participants