fix(rebuild): reuse OpenShell gateway credential when host env is empty#3918
fix(rebuild): reuse OpenShell gateway credential when host env is empty#3918ssam18 wants to merge 8 commits into
Conversation
Rebuild preflight aborted when NVIDIA_API_KEY or any other remote provider credential was missing from the shell even though onboard had already stored it in the OpenShell gateway. This made the channel-add auto-rebuild flow fail for users who closed the shell between onboard and the channel change. Fall back to providerExistsInGateway when the env var is empty so the gateway-stored credential is reused, and add a defensive skip in setupInference so a missing host env cannot overwrite the stored value with empty. Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
📝 WalkthroughWalkthroughRebuild and onboarding now consult the OpenShell gateway for provider credentials. When an expected env credential is missing, code checks gateway registration (except for mutable-endpoint providers), reuses gateway-stored credentials when present, and avoids upserting empty credentials during setupInference(). ChangesGateway Provider Credential Reuse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/lib/actions/sandbox/rebuild.ts (1)
371-402: Please run the rebuild channel lifecycle E2E on this path.This change is in the rebuild destroy/recreate control flow, so I’d still run
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=channels-stop-start-e2eonce before merge to catch any resume-time regression that only shows up with cached credentials and channel state. As per coding guidelines,src/lib/actions/sandbox/rebuild.tschanges should exercisechannels-stop-start-e2ebecause this file controls disabled channel resolution used during onboard and rebuild.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/actions/sandbox/rebuild.ts` around lines 371 - 402, Run the rebuild channel lifecycle E2E (channels-stop-start-e2e) against this branch before merging: execute `gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=channels-stop-start-e2e` and verify the rebuild/destroy→recreate flow in src/lib/actions/sandbox/rebuild.ts (especially the logic around rebuildCredentialEnv and providerExistsInGateway/gateway provider path) to catch resume-time regressions with cached credentials and channel state; only merge after the job passes and you confirm the rebuild behavior is correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 371-397: The preflight currently treats any falsy return from
providerExistsInGateway(...) as "credential not found", which masks gateway/CLI
errors; change providerExistsInGateway (or add a new wrapper) to surface errors
(tri-state: registered / not-registered / lookup-failed or throw on
lookup-failure), then update this block (using rebuildProvider, runOpenshell,
gatewayHasProvider, rebuildCredentialEnv, bail) to: if lookup-failed -> log a
distinct connectivity/lookup-error message telling users to retry/check
OpenShell and bail with a different error code/message; if not-registered ->
keep the existing missing-credential flow; if registered -> keep current
skip-and-nullify behavior. Add/adjust unit/e2e tests (channels-stop-start-e2e)
to cover the lookup-failure path so the rebuild preflight distinguishes
transient gateway failures from true missing credentials.
In `@src/lib/onboard.ts`:
- Around line 7506-7515: The current skipUpsertReusingGatewayCredential guard
only checks credentialValue and providerExistsInGateway(provider) and therefore
incorrectly skips upsert for mutable provider types that carry a user-selected
resolvedEndpointUrl; update the guard so that you only skip the upsert when the
provider exists in the gateway AND the provider type is immutable (i.e., not
'compatible-endpoint' or 'compatible-anthropic-endpoint') and there is no
resolvedEndpointUrl change. Concretely, change the logic around
skipUpsertReusingGatewayCredential (the variable computed before calling
upsertProvider) to also inspect config.providerType and resolvedEndpointUrl (and
credentialValue) so that for mutable types (compatible-endpoint /
compatible-anthropic-endpoint) you do not skip calling upsertProvider(provider,
config.providerType, resolvedCredentialEnv, resolvedEndpointUrl, env).
---
Nitpick comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 371-402: Run the rebuild channel lifecycle E2E
(channels-stop-start-e2e) against this branch before merging: execute `gh
workflow run nightly-e2e.yaml --ref <branch> -f jobs=channels-stop-start-e2e`
and verify the rebuild/destroy→recreate flow in
src/lib/actions/sandbox/rebuild.ts (especially the logic around
rebuildCredentialEnv and providerExistsInGateway/gateway provider path) to catch
resume-time regressions with cached credentials and channel state; only merge
after the job passes and you confirm the rebuild behavior is correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 48f597b6-194e-4f17-a218-13c120e91243
📒 Files selected for processing (3)
src/lib/actions/sandbox/rebuild.tssrc/lib/onboard.tstest/rebuild-credential-preflight.test.ts
Keep src/lib/onboard.ts net-neutral by moving the inference provider upsert with its gateway reuse short circuit into src/lib/onboard/inference-provider-upsert.ts. Behavior is unchanged, this only addresses the onboard entrypoint line budget for issue 3895 work. Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
…points Address review feedback on PR 3918. providerExistsInGateway collapsed every non-zero openshell exit to false, so a gateway outage looked identical to a missing provider and the user got a misleading missing credential message. Added a tristate lookupProviderInGateway and a dedicated lookup_failed error path that points at openshell status. Also stopped reusing the gateway credential for compatible-endpoint and compatible-anthropic-endpoint, since those carry an operator-supplied base URL that a rebuild must re-upsert. Added regression tests for both paths. Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/actions/sandbox/rebuild.ts (1)
303-429: Run the targeted rebuild/channel lifecycle E2E for regression confidence.Given this file’s rebuild preflight changes, run
channels-stop-start-e2eto confirm stop/start state remains stable across destroy/recreate with cached credentials.As per coding guidelines: "
src/lib/actions/sandbox/rebuild.ts... E2E test recommendation:channels-stop-start-e2e."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/actions/sandbox/rebuild.ts` around lines 303 - 429, Run the E2E regression "channels-stop-start-e2e" to validate the rebuilt preflight logic around credential handling; specifically exercise rebuild paths that hit getRebuildCredentialEnvFromRegistry, the session-matching branch (sessionMatchesTarget/rebuildProvider), the hermesProviderAuth.HERMES_PROVIDER_NAME branch which calls preflightHermesProviderCredentials, the hydrateCredentialEnv code path, and the gateway lookup path via lookupProviderInGateway/runOpenshell to confirm stop/start stability across destroy/recreate with cached credentials.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 303-429: Run the E2E regression "channels-stop-start-e2e" to
validate the rebuilt preflight logic around credential handling; specifically
exercise rebuild paths that hit getRebuildCredentialEnvFromRegistry, the
session-matching branch (sessionMatchesTarget/rebuildProvider), the
hermesProviderAuth.HERMES_PROVIDER_NAME branch which calls
preflightHermesProviderCredentials, the hydrateCredentialEnv code path, and the
gateway lookup path via lookupProviderInGateway/runOpenshell to confirm
stop/start stability across destroy/recreate with cached credentials.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c8a5936a-cff9-4c53-bd78-b4e6356a1d06
📒 Files selected for processing (5)
src/lib/actions/sandbox/rebuild.tssrc/lib/onboard.tssrc/lib/onboard/inference-provider-upsert.tssrc/lib/onboard/providers.tstest/rebuild-credential-preflight.test.ts
Rebuild was demanding the provider credential from the shell even when onboard had already registered it with the OpenShell gateway, which broke the channel-add auto-rebuild flow whenever the operator no longer had the env var exported. The preflight now falls back to a gateway provider lookup before bailing, and setupInference no longer overwrites the stored credential when the gateway already holds one. Existing tests were tightened so the fail path still requires both the env var and the gateway provider to be missing, and new tests cover the gateway reuse path for nvidia, anthropic, openai, and gemini. Closes #3895
Summary by CodeRabbit
Bug Fixes
Tests