You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[aw-failures] copilot-sdk driver mode broken by PR #36679 — Session was not created with authentication info or custom provider kills 9 work
[Content truncated due to length] #36710
In the 2026-06-03 17:30–19:53 UTC window, 10 scheduled copilot-engine runs across 9 distinct workflows died in ~1s with exitCode=1, 0 turns / 0 tokens, empty agent_output.json. This is the same surface symptom tracked by parent #36656, but fresh log evidence supersedes that issue's root-cause hypothesis: the failure is not absent GitHub tokens. It is a regression in copilot-sdk driver mode introduced by PR #36679 ("Remove copilot SDK driver inlined mode", merged 2026-06-03 17:29:43 UTC). The first failure landed at 17:30:06 UTC — ~1 minute after the merge.
Every failing run uses the copilot-sdk subprocess driver (copilot_sdk_driver.cjs + headless sidecar on 127.0.0.1:3002) and aborts with Error: Session was not created with authentication info or custom provider. Every passing copilot run in the same window invokes /usr/local/bin/copilotdirectly and succeeds. The token health-check (COPILOT_GITHUB_TOKEN is placeholder value (correct), offline+BYOK mode) is identical in both — so the placeholder tokens are correct by design, and the harness's isAuthError=true classification is a misdiagnosis of a driver-side session-bootstrap failure.
Copilot failure rate in-window: 11/28 runs (39%); 10 are this SDK-driver regression, 1 is an unrelated Smoke Copilot test-verdict FAIL.
PR Remove copilot SDK driver inlined mode #36679 removed the in-process inline path from copilot_harness.cjs, so copilot-sdk mode now always spawns the copilot_sdk_driver.cjs subprocess against a headless Copilot CLI sidecar.
In driver mode the harness generates a per-run COPILOT_CONNECTION_TOKEN, starts the sidecar (--headless --port 3002), then copilot_sdk_driver.cjs connects and creates a session that is missing the custom-provider / auth context, producing Error: Session was not created with authentication info or custom provider the moment the prompt is sent.
The harness maps this to isAuthError=true and emits no authentication information found — not retrying (COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN are all absent or invalid) — a false signal: the env tokens are intentional offline+BYOK placeholders and are identical on passing runs. Because it's classified as a non-retryable auth error, the run is lost immediately.
Immediate: revert PR Remove copilot SDK driver inlined mode #36679 (or restore the inline copilot-sdk path) so copilot-sdk workflows stop bleeding scheduled runs, until the driver session bootstrap is fixed.
Driver fix: in copilot_sdk_driver.cjs, propagate the custom provider / COPILOT_CONNECTION_TOKEN into the SDK session beforesending prompt (the sidecar starts and the session is created, but without provider context).
Classifier fix: distinguish driver-side Session was not created with authentication info or custom provider from genuine env-token absence; do not suppress retry for it (allow one bounded re-bootstrap-and-retry of the sidecar).
Recompile/redeploy the affected copilot-sdk .lock.yml workflows after the fix.
Success criteria / verification
copilot-sdk workflows complete with > 0 turns; no Session was not created with authentication info or custom provider in copilot_sdk_driver logs.
The isAuthError=true / 0-turn 1s-death signature does not recur for copilot-sdk runs.
Re-run the 9 affected workflows on schedule for 24–48h with 0 auth-class 0-turn failures.
Evidence
Divergence: failed (SDK-driver) vs successful (direct) — same window, same token health-check
FAILED §26909254135 (Daily Safe Output Integrator):
[health-check] ✓ COPILOT_GITHUB_TOKEN is placeholder value (correct) # offline+BYOK, by design
[copilot-harness] copilot-sdk mode active: generated per-run COPILOT_CONNECTION_TOKEN
[copilot-harness] copilot-sdk driver mode: starting sidecar command=/usr/local/bin/copilot
[copilot-sdk-driver] [sdk-driver] session created: sessionId=401a2cce-...
[copilot-sdk-driver] [sdk-driver] sending prompt...
[Error: Execution failed: Error: Session was not created with authentication info or custom provider]
[copilot-harness] attempt 1 failed: ... isAuthError=true ... retriesRemaining=3
[copilot-harness] attempt 1: no authentication information found — not retrying (...absent or invalid)
[copilot-harness] done: exitCode=1 totalDuration=1s
SUCCESS §26908784431 (Auto-Triage Issues):
[health-check] ✓ COPILOT_GITHUB_TOKEN is placeholder value (correct) # identical
[copilot-harness] attempt 1: spawning: /usr/local/bin/copilot --add-dir ... # DIRECT, no sidecar
[copilot-harness] success on attempt 1: totalDuration=40s
SDK-driver vs direct mode confirmed across the cluster
PR #36679 merged 17:29:43 UTC → first cluster failure 17:30:06 UTC.
audit-diff — clean pre-turn abort
Failed SDK-driver runs show agent job failure with 0 turns, token_usage=None, errors=0, empty agent_output.json, missing_tools=None, mcp_failures=None — a pre-turn abort with no model interaction. Successful direct runs in the same cohort complete in 40s–1m with normal turn/token counts and read_only posture. The delta is execution mode, not workload.
Fix the copilot-sdk driver session bootstrap — done: PR #36769 (merged 2026-06-04 01:29:18Z) resolves this P0 at the root. The driver now resolves the BYOK custom provider before sending the prompt, eliminating Session was not created with authentication info or custom provider.
Verification status: RESOLVED — pending one clean scheduled cycle
Every cluster-A failure in the last 6h, including the most recent, checked out pre-fix code — confirming the fix, not contradicting it:
The last failure (#36784) started 29 seconds before the fix merged — it is the regression's tail, not a fix gap.
Correction to remediation step 4 (no .lock.yml recompile needed)
Lock files invoke the driver by path (${RUNNER_TEMP}/gh-aw/actions/copilot_sdk_driver.cjs), which actions/checkout + ./actions/setup populate fresh from actions/setup/js/ at runtime. Verified: 0 of 62copilot_sdk_driver lock files inline the driver. The merged source fix is therefore live for the next scheduled run of every affected workflow with no recompile/redeploy.
Note on remediation step 3 (classifier/retry)
PR #36769 deliberately fails fast (process.exit(1)) when the BYOK provider cannot be resolved, with a clear diagnostic, rather than adding a bounded retry. This is a reasonable design choice (retrying a misconfigured env is pointless), so no separate classifier sub-issue is warranted.
Consolidation
The four auto-filed per-failure duplicates of this regression — #36745, #36765, #36768, #36784 — are being closed as duplicates of this issue / fixed by #36769. This issue remains the single tracking point.
Recommended next step: close this issue once the next scheduled cycle of the affected copilot-sdk workflows completes with > 0 turns and no recurrence of the session-bootstrap error.
Out of scope (distinct clusters, already tracked separately)
[aw] Daily BYOK Ollama Test failed #36747 — Daily BYOK Ollama Test: 503 Credentials for GitHub Copilot (port 10002) are not configured (BYOK config, not this regression).
Executive summary
In the 2026-06-03 17:30–19:53 UTC window, 10 scheduled
copilot-engine runs across 9 distinct workflows died in ~1s withexitCode=1, 0 turns / 0 tokens, emptyagent_output.json. This is the same surface symptom tracked by parent #36656, but fresh log evidence supersedes that issue's root-cause hypothesis: the failure is not absent GitHub tokens. It is a regression in copilot-sdk driver mode introduced by PR #36679 ("Remove copilot SDK driver inlined mode", merged 2026-06-03 17:29:43 UTC). The first failure landed at 17:30:06 UTC — ~1 minute after the merge.Every failing run uses the copilot-sdk subprocess driver (
copilot_sdk_driver.cjs+ headless sidecar on127.0.0.1:3002) and aborts withError: Session was not created with authentication info or custom provider. Every passingcopilotrun in the same window invokes/usr/local/bin/copilotdirectly and succeeds. The token health-check (COPILOT_GITHUB_TOKEN is placeholder value (correct), offline+BYOK mode) is identical in both — so the placeholder tokens are correct by design, and the harness'sisAuthError=trueclassification is a misdiagnosis of a driver-side session-bootstrap failure.Copilot failure rate in-window: 11/28 runs (39%); 10 are this SDK-driver regression, 1 is an unrelated Smoke Copilot test-verdict FAIL.
Failure clusters
effective_tokens_limit_exceeded(25M cap)Affected workflows and run IDs (Cluster A — P0)
Probable root cause
copilot_harness.cjs, so copilot-sdk mode now always spawns thecopilot_sdk_driver.cjssubprocess against a headless Copilot CLI sidecar.COPILOT_CONNECTION_TOKEN, starts the sidecar (--headless --port 3002), thencopilot_sdk_driver.cjsconnects and creates a session that is missing the custom-provider / auth context, producingError: Session was not created with authentication info or custom providerthe moment the prompt is sent.isAuthError=trueand emitsno authentication information found — not retrying (COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN are all absent or invalid)— a false signal: the env tokens are intentional offline+BYOK placeholders and are identical on passing runs. Because it's classified as a non-retryable auth error, the run is lost immediately.Proposed remediation
copilot_sdk_driver.cjs, propagate the custom provider /COPILOT_CONNECTION_TOKENinto the SDK session beforesending prompt(the sidecar starts and the session is created, but without provider context).Session was not created with authentication info or custom providerfrom genuine env-token absence; do not suppress retry for it (allow one bounded re-bootstrap-and-retry of the sidecar)..lock.ymlworkflows after the fix.Success criteria / verification
Session was not created with authentication info or custom providerincopilot_sdk_driverlogs.isAuthError=true/ 0-turn 1s-death signature does not recur for copilot-sdk runs.Evidence
Divergence: failed (SDK-driver) vs successful (direct) — same window, same token health-check
SDK-driver vs direct mode confirmed across the cluster
PR #36679 merged 17:29:43 UTC → first cluster failure 17:30:06 UTC.
audit-diff — clean pre-turn abort
Failed SDK-driver runs show
agentjob failure with 0 turns, token_usage=None, errors=0, emptyagent_output.json,missing_tools=None,mcp_failures=None— a pre-turn abort with no model interaction. Successful direct runs in the same cohort complete in 40s–1m with normal turn/token counts andread_onlyposture. The delta is execution mode, not workload.Existing-issue correlation
effective_tokens_limit_exceeded 25322117/25000000, 72 turns, also 2 permission-denied bash retries) is a new instance of the already-tracked 25M effective-token cap pattern; folded into those issues, no new tracking filed.node:lts-alpinepull from Docker Hub times out — mirror base image to ghcr.io #36595 (Docker Hub image pull), [aw-failures] Squid firewall container (awf-squid) unhealthy → claude engine fails to start (0-turn run failures) #35780 (squid startup), [aw-failures] Contribution Checksafe_outputsjob fails — agent emitsadd_commentwithtarget: "*"and noissue_number#35984 (safe-output contract) — no recurrence in this window; distinct classes.References: §26909254135 · §26901717191 · §26908784431
Related to #36656
Investigation update — 2026-06-04 (6h failure sweep)
Fix the copilot-sdk driver session bootstrap — done: PR #36769 (merged 2026-06-04 01:29:18Z) resolves this P0 at the root. The driver now resolves the BYOK custom provider before sending the prompt, eliminating
Session was not created with authentication info or custom provider.Verification status: RESOLVED — pending one clean scheduled cycle
Every cluster-A failure in the last 6h, including the most recent, checked out pre-fix code — confirming the fix, not contradicting it:
318e58a2(direct parent of fix4757ec0)The last failure (#36784) started 29 seconds before the fix merged — it is the regression's tail, not a fix gap.
Correction to remediation step 4 (no
.lock.ymlrecompile needed)Lock files invoke the driver by path (
${RUNNER_TEMP}/gh-aw/actions/copilot_sdk_driver.cjs), whichactions/checkout+./actions/setuppopulate fresh fromactions/setup/js/at runtime. Verified: 0 of 62copilot_sdk_driverlock files inline the driver. The merged source fix is therefore live for the next scheduled run of every affected workflow with no recompile/redeploy.Note on remediation step 3 (classifier/retry)
PR #36769 deliberately fails fast (
process.exit(1)) when the BYOK provider cannot be resolved, with a clear diagnostic, rather than adding a bounded retry. This is a reasonable design choice (retrying a misconfigured env is pointless), so no separate classifier sub-issue is warranted.Consolidation
The four auto-filed per-failure duplicates of this regression — #36745, #36765, #36768, #36784 — are being closed as duplicates of this issue / fixed by #36769. This issue remains the single tracking point.
Recommended next step: close this issue once the next scheduled cycle of the affected copilot-sdk workflows completes with > 0 turns and no recurrence of the session-bootstrap error.
Out of scope (distinct clusters, already tracked separately)
503 Credentials for GitHub Copilot (port 10002) are not configured(BYOK config, not this regression).report_incomplete(agent/tooling, codex engine).References: §26924221485, §26921808538, §26917699243