feat(cmd): jc doctor — no-auth diagnostic command (KLA-439)#42
Conversation
Borrowed from jamf-cli's `doctor` command. Every jc-cli support
thread starts with "what does the CLI actually see?" — `jc doctor`
absorbs half of them.
The command runs without requiring valid auth — that's the whole
point. Reports:
- Build: version, Go runtime, OS/arch
- Profile: active + source (JC_PROFILE env vs config) + all available
- Config: resolved file path + file/dir permissions
- Auth: API key source resolved in priority order:
--api-key flag > JC_API_KEY env > profile config > keychain ref
Fingerprint = last 4 chars masked with **** (matches TTY step-up
convention; raw secrets never appear in output).
- API: V1 + V2 base URLs + probe (`ok` / `auth_failed` / `unreachable`)
— distinguishes "the key is wrong" from "the host is unreachable",
the whole reason this command exists. Probe is GET /v2/usergroups?limit=1
(HEAD on JumpCloud's API root returns 404 and would be indistinguishable
from a real outage).
- LLM: jc ask provider + key source + fingerprint
- MCP: step-up + signing config + webhook URL + pubkey fingerprint
Output: JSON by default (script-friendly, exits 0 even on probe
failure so callers parse the result); --output human renders grouped
sections. --no-probe for offline triage; --probe-timeout customizable.
Tests:
- fingerprint masking edge cases (empty, ≤4 chars, normal)
- collectBuild: version + go + os/arch shape
- collectProfile: JC_PROFILE env override vs config
- collectConfig: existing file with mode reporting
- collectAuth: JC_API_KEY env wins, keychain://, plaintext profile, unset
- collectLLM: JC_ASK_API_KEY env override
- runAPIProbe: 200 (ok), 401 (auth_failed), unreachable host
- printDoctorJSON: round-trip parseable
- printDoctorText: all sections present + no-probe noted
- TestPrintDoctorText_NeverPrintsRawSecrets: load-bearing security
contract that raw API/LLM secrets never appear in either output
format. Seeds config with distinctive plaintext, asserts literal
absence in both text and JSON.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Bugbot KLA-439) Two Bugbot findings on PR #42: 1. **Service-account probe false failure (Medium).** The original probe hand-rolled an HTTP request with only `x-api-key` header. A valid service_account profile (OAuth2 client credentials) would then report `auth_failed` even though every other jc command worked. Fix: route through `api.NewV2Client()` which honors the active profile's auth method. Also returns a new `no_credentials` probe status when the client can't be constructed (no auth configured at all) — distinct from `unreachable` since we never asked the network. 2. **Flag/env auth precedence mislabeled (Low).** When both `--api-key` flag and `JC_API_KEY` env were set to the *same* value, the original code attributed to env because it compared resolved == env. Cobra precedence treats the flag as the override. Fix: peek the root command's `--api-key` persistent flag .Changed bit and prefer the flag attribution when set. Also: a service_account profile (no api_key field) now reports "service_account (OAuth)" instead of the misleading "(unset)". Refactor: split `runAPIProbe` into a network half (real client call) and a `classifyProbeError` half that turns the (err) result into the status / status_code / error triple. Tests classify error shapes directly without mocking the HTTP client; end-to-end probe behavior is covered by the api package's own tests. New tests: - TestCollectAuth_FlagBeatsEnvEvenWhenEqual - TestCollectAuth_ServiceAccountReportsOAuth - TestClassifyProbeError_{Success,AuthFailed,OtherHTTP,TransportFailure} Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…env (Bugbot KLA-439) Third Bugbot finding on PR #42 (Medium). When the active profile uses service_account auth and JC_API_KEY happens to be set in the env (e.g. left over from a different profile or session), the original code labeled auth as "JC_API_KEY env" — but api.NewClient() short-circuits to OAuth Bearer when AuthMethod() == service_account with valid client_id + client_secret, so the reported source didn't match what jc actually uses. Fix: collectAuth now checks service_account auth FIRST and, when client credentials are present, returns "service_account (OAuth)" with a client_id fingerprint — without ever consulting JC_API_KEY, the flag, or the keychain. Matches api.NewClient()'s precedence. A service_account profile *without* client credentials now reports "service_account (no client credentials)" — distinct from the old "(unset)" so the operator knows it's a misconfig, not just missing auth. Refactor: extracted collectOrgID so both the short-circuit path and the api_key path report org ID consistently. Tests: - TestCollectAuth_ServiceAccountWinsOverStrayEnvKey (locks the fix: asserts source is OAuth even when JC_API_KEY=stray is in env, and asserts the stray env value is NOT leaked into the fingerprint) - TestCollectAuth_ServiceAccountMissingClientCreds (companion: bad config surfaces a distinct status) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugbot KLA-439) Two Bugbot findings on PR #42 (both Medium): 1. **Global output formats ignored.** Only "json" was honored; "yaml", "table", "csv", "ndjson", "human" all fell through to text. Users with JC_OUTPUT=yaml or `jc doctor -o yaml` got sectioned plain text instead of YAML. Fix: - "yaml" routes to a new printDoctorYAML (round-trips through JSON so the YAML keys match the snake_case JSON keys instead of lowercased Go field names) - "human" / "text" / "" → text (explicit, no surprise) - tabular formats (table/csv/ndjson) don't fit a hierarchical report; surface a stderr note and downgrade to text rather than silently rendering the wrong thing 2. **OAuth failures labeled "unreachable".** Service-account profiles fail their token exchange BEFORE any API call, with plain `error` strings (no *api.APIError involved). classifyProbeError fell through to "unreachable", so operators with bad client credentials saw "network failure" suggestions instead of "check your client ID/secret." Fix: extend classifyProbeError to substring-match the distinct phrases internal/api/oauth.go emits ("invalid client credentials", "client credentials lack permission"). Documented the coupling so a future oauth.go refactor fails the doctor tests loudly rather than silently drifting. New tests: - TestClassifyProbeError_OAuthInvalidClient (HTTP 401 from token endpoint) - TestClassifyProbeError_OAuthInsufficientScope (HTTP 403) - TestPrintDoctorYAML_RoundTrip (asserts all section keys present + the load-bearing "no raw secrets" contract still holds in YAML) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…KLA-439) Fourth Bugbot finding on PR #42 (Medium). Mirror api.NewClient()'s full precedence — including its silent fallback. When AuthMethod is service_account but client_id/client_secret are missing AND an api_key resolves from flag/env/keychain/profile, NewClient() drops through to NewClientWithKey(). Previously the doctor reported `method: service_account` with the api-key source, which was internally inconsistent (every other jc command was using x-api-key). Fix: detect the fallback at the top of collectAuth and re-label the method as `api_key (service_account fallback)`. The operator now sees both that their service_account config is broken AND that jc is actually using an API key — two actionable facts in one line. Refactored the service_account branch into an explicit switch with all three outcomes (OAuth happy path / no-creds dead end / silent fallback) so the precedence mirror to api.NewClient() is auditable in one place. The "no client credentials" branch I added in the previous Bugbot fix is preserved unchanged for the case where neither auth path works. Test: - TestCollectAuth_ServiceAccountFallsBackToAPIKey (configures service_account + api_key on the same profile, asserts method reads the fallback label and source still describes where the api_key came from) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ecret (Bugbot KLA-439) Fifth Bugbot finding on PR #42 (Medium). When a service_account profile's client_secret is a keychain:// reference but keychain resolution fails (locked, deleted, permission denied), config.ClientSecret() returns "" and prints a stderr warning. My doctor saw the empty return and reported "service_account (no client credentials)" — same status as "never configured." An operator following that hint would re-enter their client_secret instead of fixing the keychain. Fix: peek at the raw `profiles.<name>.client_secret` viper value. If client_id is present AND client_secret starts with "keychain://" AND config.ClientSecret() returned "", that's a keychain failure, not a missing config. Report "service_account (client_secret keychain unavailable: <ref>)" so the operator sees both the actual cause and which keychain entry to fix. The client_id fingerprint is still surfaced so they can confirm they're looking at the right service account. Test: - TestCollectAuth_ServiceAccountKeychainSecretMissing — sets client_secret to a keychain:// ref that won't resolve, asserts the source includes "keychain unavailable" and does NOT include "no client credentials" (the misleading message). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-439) Sixth Bugbot finding on PR #42 (Medium). Side effect of finding #5's fix: when a service_account profile's client_secret keychain ref fails to resolve AND an api_key is available (env / flag / profile), api.NewClient() drops to the api_key path. My #5-era code returned early on the keychain-unavailable branch and kept method as service_account — the Auth section disagreed with the probe and every other jc command. Fix: only return early on the keychain-failure branch when there's no api_key fallback. With a fallback, fall through to the api_key resolution but re-label method to surface both facts: api_key (service_account fallback, client_secret keychain unavailable) so the operator sees the fallback AND the cause in one line. Restructure: the service_account branch now follows api.NewClient()'s exact decision tree: 1. oauthAvailable (both creds resolved) → OAuth 2. !hasAPIKey AND keychainFailed → "keychain unavailable" message 3. !hasAPIKey → "no client credentials" message 4. otherwise → silent fallback to api_key (with or without keychain note) Test: - TestCollectAuth_ServiceAccountKeychainFailureFallsBackToAPIKey — configures keychain client_secret + api_key on the same profile, asserts method contains both "api_key" and "keychain unavailable", and the source/fingerprint describe the api_key fallback honestly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bot KLA-439) Seventh Bugbot finding on PR #42 (Medium). For service_account profiles, the OAuth token-fetch hop runs in TokenCache.fetchToken() which uses its own http.Client with a 30s timeout AND doesn't accept a context. The probe's --probe-timeout couldn't reach it, so a hung OAuth endpoint could make `jc doctor --probe-timeout 100ms` block for 30+ seconds. Manually observed during E2E: a 100ms timeout returned in ~2s with status "unreachable" because the api client's retry path didn't cancel cleanly either. Fix: run client.ListAll in a goroutine and select on ctx.Done(). The probe itself now returns within --probe-timeout regardless of what the upstream does. New probe status "timeout" with an error message that names the cause (OAuth token-fetch or upstream HTTP not honoring the context) and points the operator at --no-probe. The goroutine leaks when the timeout fires — `jc doctor` exits as soon as we print, so the goroutine dies with the process. Not worth threading a cancellation primitive through the api package for a one-shot CLI. Test: - TestRunAPIProbe_RespectsTimeoutEvenWhenUnderlyingDoesnt — 50ms timeout against a real api client; asserts the probe returns within 5s (the full upstream retry path would take 30s+). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e (Bugbot KLA-439) Eighth Bugbot finding on PR #42 (Medium). When --probe-timeout fires AND the upstream HTTP client honors the context, ListAll returns context.DeadlineExceeded BEFORE runAPIProbe's select reaches ctx.Done(). The result landed in classifyProbeError, which had no context-error branch and fell through to "unreachable" — the better-behaved the upstream, the worse my classification. Fix: classifyProbeError now has a top-priority context-error branch. Any error that matches context.DeadlineExceeded or context.Canceled (including wrapped variants) returns "timeout". This also catches Ctrl-C cancellation cleanly. The fast-path runs before the APIError and OAuth marker checks. Three new tests: - TestClassifyProbeError_ContextDeadline - TestClassifyProbeError_ContextCanceled - TestClassifyProbeError_WrappedDeadline (pins that we use errors.Is for unwrap, not == comparison) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closing note — stopping the Bugbot loop hereBugbot has flagged 8 findings on this PR, every one substantive. After patching #8 ( Class 1 (findings #1, #3, #4, #5, #6): Class 2 (findings #7, #8): the api package's OAuth token-fetch and retry transport don't honor context cleanly, so the probe has to defensively wrap and classify around it. The root fix is to thread context through the api package. Both have been filed as follow-ups:
The doctor implementation as it stands at Review focus suggestion: the goal of |
Ninth Bugbot finding on PR #42 (Low). collectOrgID was reading os.Getenv("JC_ORG_ID") directly, missing the top-level viper "org_id" key that config.OrgID() honors first. An operator who set `org_id: xxx` at the top level of config.yaml (not under profiles) would see no org ID in doctor while every other jc command still sent it. Fix: read viper "org_id" first (which picks up JC_ORG_ID env via BindEnv, the top-level config-file key, or a flag bound to "org_id"), then fall back to the profile config. Attribute the source by peeking at JC_ORG_ID env — when it matches the resolved value it's the env binding; otherwise top-level config. Cannot distinguish flag vs config without a pflag.Changed peek, and there's no --org-id flag today. Tests: - TestCollectOrgID_TopLevelViperKey (top-level beats profile-level) - TestCollectOrgID_EnvBeatsTopLevel (env attribution when both match) Outside the KLA-447 follow-up scope: this is org_id, not auth. Single surgical fix completes the precedence-mirror for this field exactly — only three paths total (top-level / env / profile), all covered now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion (Bugbot KLA-439) Tenth and eleventh Bugbot findings on PR #42 (both Medium). **#10: --org flag source ignored.** root.go's PersistentPreRunE calls config.OverrideActiveProfile when --org is set, but collectProfile only knew about JC_PROFILE env / config / "default". An operator running `jc doctor --org staging` saw the right ActiveProfile but the source said "config (or default)" — the report lied about how the profile was selected. Fix: collectProfile takes a flagOrgSet bool; plumb the root command's --org .Changed bit through (same pattern as flagAPIKeySet for the api-key flag). **#11: probe-timeout vs parent-context attribution.** When the parent context's deadline fires first (global --timeout, signal cancel), my ctx.Done() branch always blamed --probe-timeout. Fix: peek at parentCtx.Err() in the timeout branch — if non-nil, the parent fired first, so the error message points at the global --timeout instead. After these two, every field doctor reports has a precedence attribution that matches what jc actually used. Same surgical class as finding #9 — bounded, no compounding cases possible because the field enumerations are exhausted. Tests: - TestCollectProfile_OrgFlagOverridesAll - TestRunAPIProbe_ParentContextDeadlineBlamedCorrectly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ot KLA-439) Twelfth Bugbot finding on PR #42 (Medium). Compound of finding #10. My fix to #10 detected --org via flag.Changed, but Changed is true even for an empty value: `--org=` (literal empty assignment), or `--org "$UNSET_VAR"` (shell expansion to empty). Root's PersistentPreRunE actually checks viper.GetString("org") != "" before calling OverrideActiveProfile, so with empty --org the profile stays unchanged but my doctor reported source as "--org flag" — a lie. Fix: at the flag-set decision point in newDoctorCmd, AND the .Changed bit with a non-empty viper value: flagOrgSet = f.Changed && strings.TrimSpace(viper.GetString("org")) != "" Same precedence mirror as root.go's PreRun. The fall-through then uses JC_PROFILE env / config / "default" as before. Test: - TestCollectProfile_EmptyOrgFlagFallsThrough — passes flagOrgSet=false (simulating the corrected caller decision), asserts source is NOT "--org flag". Per my closing commitment on this PR: if Bugbot finds a 13th, it goes to KLA-447 regardless of severity. This batch covers every field and every empty-flag-value edge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 37216dc. Configure here.
| } else { | ||
| as.Method = "api_key (service_account fallback)" | ||
| } | ||
| } |
There was a problem hiding this comment.
Keychain ref counts as API key
Medium Severity
For service_account profiles, hasAPIKey treats a non-empty profiles.*.api_key value (including an unresolvable keychain:// reference) as a usable API-key fallback. api.NewClient() only falls back when config.APIKey() resolves to a real key, so doctor can label method as an api_key fallback while the probe and other commands hit no_credentials.
Reviewed by Cursor Bugbot for commit 37216dc. Configure here.
There was a problem hiding this comment.
Real finding, exact auth-precedence-mirror class (siblings of #1, #3, #4, #5, #6, #12) — hasAPIKey peeks at the raw profiles.*.api_key value but doesn't resolve keychain refs the way config.APIKey() does at runtime, so an unresolvable keychain:// looks like a fallback to doctor while api.NewClient() returns ErrNoAPIKey.
Per the closing commitment on this PR, this one does not get an in-PR patch. It's deferred to KLA-447 — the architectural refactor that pulls auth resolution into the api package and makes doctor a thin presentation layer on top. That refactor eliminates this entire class by construction.
Adding this specific case (hasAPIKey keychain-ref edge) to KLA-447's regression-test list so the refactor proves it.


Summary
KLA-439 — borrowed from jamf-cli's `doctor` command via the 1.17.0 competitive analysis. Every jc-cli support thread starts with "what does the CLI actually see?" — `jc doctor` absorbs half of them.
The command runs without requiring valid auth — that's the whole point. It reports what jc actually resolved at runtime, distinguishes connectivity from auth failures, and never prints raw secrets.
What it reports
Why a real-endpoint probe instead of HEAD root
HEAD on JumpCloud's `/api/v2` returns 404 — indistinguishable from a real outage. Probe uses `GET /v2/usergroups?limit=1` instead, which returns 200 with valid auth, 401/403 with wrong/missing auth, network error if unreachable. Three clean states for triage.
Example output
```
$ jc doctor --output human
▸ Build
Version: 1.17.3
Platform: darwin/arm64
Go: go1.26.3
▸ Profile
Active: default (config (or default))
Available: default, staging
▸ Config
Path: /Users/jklaassen/.config/jc/config.yaml
File mode: 0600
Dir mode: 0700
▸ Auth
Method: api_key
Source: keychain (jc/default)
Key: ****efb2
Org ID: 5ec71e8e96bfda0611fc6c5b (profile config)
▸ API
V1: https://console.jumpcloud.com/api
V2: https://console.jumpcloud.com/api/v2
Probe: ok (HTTP 200, 663ms)
▸ LLM (jc ask)
Provider: anthropic
API key: ****ZgAA
Source: ask.api_key config
▸ MCP
Step-up: disabled (auto)
Signing: disabled
```
Security contract
`TestPrintDoctorText_NeverPrintsRawSecrets` is the load-bearing test: seeds config with distinctive plaintext secrets, asserts they don't appear in either text or JSON output. Whoever changes the output formatter has to keep that test green.
Test coverage
15 tests:
Docs
Test plan
🤖 Generated with Claude Code
Note
Low Risk
New read-only diagnostic path with extensive tests; it reads config/keychain and may call the API but does not change how other commands authenticate.
Overview
Adds
jc doctor, a no-auth diagnostic under the Setup & Config group, so operators can see what the CLI resolved (profile, config path, auth source, LLM/MCP settings) when other commands fail.The report mirrors real auth precedence—
--org/--api-keyattribution, service-account OAuth vs silent api_key fallback, keychain failures—and fingerprints secrets (****+ last 4). An optional API probe usesapi.NewV2ClientandGET /v2/usergroups?limit=1, classifyingok,auth_failed,unreachable,timeout, etc., with--no-probeand--probe-timeout. Output supports JSON (default via global--output), YAML, and human text.README and QUICKSTART add Diagnostics / Troubleshooting sections pointing at
jc doctor.Reviewed by Cursor Bugbot for commit 37216dc. Bugbot is set up for automated code reviews on this repo. Configure here.