feat: replace tmate with native SSH debug step for --ssh-after-step#456
feat: replace tmate with native SSH debug step for --ssh-after-step#456robstolarz wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: waitForLogMarker ignores terminal job/attempt status
- Added a switch statement to check for terminal attempt statuses (finished, failed, cancelled) and return an error immediately, matching the pattern used in waitForSandbox.
- ✅ Fixed: Duplicate "Connecting to sandbox" message printed
- Removed the duplicate fmt.Fprintf call from run.go since waitForSandbox already prints this message before returning.
- ✅ Fixed:
printSSHInfosuggests invaliddepot ci sshcommand- Added a runOrJobID parameter to printSSHInfo and only display the 'depot ci ssh' suggestion when a valid run/job ID is available, passing empty string when called from 'depot ci run'.
Or push these changes by commenting:
@cursor push 0ccc9c9234
Preview (0ccc9c9234)
diff --git a/pkg/cmd/ci/run.go b/pkg/cmd/ci/run.go
--- a/pkg/cmd/ci/run.go
+++ b/pkg/cmd/ci/run.go
@@ -237,13 +237,12 @@
}
}
- if sshAfterStep > 0 {
- fmt.Fprintf(os.Stderr, "Run 'touch /tmp/depot-continue' to resume the workflow. (Your session will not end.)\n")
- }
- fmt.Fprintf(os.Stderr, "Connecting to sandbox %s...\n", sandboxID)
- if !helpers.IsTerminal() {
- return printSSHInfo(sandboxID, sessionID, "")
- }
+ if sshAfterStep > 0 {
+ fmt.Fprintf(os.Stderr, "Run 'touch /tmp/depot-continue' to resume the workflow. (Your session will not end.)\n")
+ }
+ if !helpers.IsTerminal() {
+ return printSSHInfo(sandboxID, sessionID, "", "")
+ }
return pty.Run(ctx, pty.SessionOptions{
Token: tokenVal,
SandboxID: sandboxID,
@@ -549,6 +548,11 @@
continue
}
+ switch attempt.Status {
+ case "finished", "failed", "cancelled":
+ return fmt.Errorf("job has already completed (status: %s) before marker appeared", attempt.Status)
+ }
+
lines, err := api.CIGetJobAttemptLogs(ctx, token, orgID, attempt.AttemptId)
if err == nil {
for _, line := range lines {
diff --git a/pkg/cmd/ci/ssh.go b/pkg/cmd/ci/ssh.go
--- a/pkg/cmd/ci/ssh.go
+++ b/pkg/cmd/ci/ssh.go
@@ -72,9 +72,9 @@
return err
}
- if info || !helpers.IsTerminal() {
- return printSSHInfo(sandboxID, sessionID, output)
- }
+ if info || !helpers.IsTerminal() {
+ return printSSHInfo(sandboxID, sessionID, output, runID)
+ }
return pty.Run(ctx, pty.SessionOptions{
Token: tokenVal,
@@ -254,7 +254,7 @@
return latest
}
-func printSSHInfo(sandboxID, sessionID, output string) error {
+func printSSHInfo(sandboxID, sessionID, output, runOrJobID string) error {
if output == "json" {
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", " ")
@@ -270,10 +270,14 @@
fmt.Printf("User: %s\n", sandboxID)
fmt.Printf("Password: Your Depot API token ($DEPOT_TOKEN)\n")
fmt.Println()
- fmt.Printf("Connect interactively:\n")
- fmt.Printf(" depot ci ssh %s\n", sandboxID)
- fmt.Println()
- fmt.Printf("Or via SSH directly:\n")
+ if runOrJobID != "" {
+ fmt.Printf("Connect interactively:\n")
+ fmt.Printf(" depot ci ssh %s\n", runOrJobID)
+ fmt.Println()
+ fmt.Printf("Or via SSH directly:\n")
+ } else {
+ fmt.Printf("Connect via SSH:\n")
+ }
fmt.Printf(" ssh %s@api.depot.dev\n", sandboxID)
return nil
}Replace the `mxschmitt/action-tmate@v3` GitHub Action with a simple shell script that pauses the workflow by polling for a magic file. After starting the run, the CLI waits for the sandbox, polls logs for a `::depot-ssh-ready::` marker, then connects via the existing PTY infrastructure. - Inject a shell-based debug step instead of tmate action - Wait for sandbox + log marker before connecting - Fall back to printing SSH info when no TTY is available - Fix job key matching for inline workflows (e.g. `_inline_0.yaml:e2e`) - Improve `printSSHInfo` to show `depot ci ssh` command Closes DEP-3862 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Early-exit waitForLogMarker when attempt reaches terminal status instead of polling forever if a step before the debug step fails - Remove duplicate "Connecting to sandbox" message (already in waitForSandbox) - Fix printSSHInfo to use run ID for `depot ci ssh` suggestion - Remove arbitrary 10min timeout on log marker polling (rely on terminal status check and context cancellation instead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9408c41 to
3707ba5
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Exact job key match priority lost in findJob
- Split the combined matching loop into two separate passes: first checking only exact matches, then checking short-name matches, ensuring exact matches always take priority regardless of job order.
- ✅ Fixed: Redundant fallback suffix-match loop in findJob
- Removed the redundant suffix-match fallback loop as part of the refactoring that separated exact and short-name matching into distinct passes.
Or push these changes by commenting:
@cursor push 810e322841
Preview (810e322841)
diff --git a/pkg/cmd/ci/ssh.go b/pkg/cmd/ci/ssh.go
--- a/pkg/cmd/ci/ssh.go
+++ b/pkg/cmd/ci/ssh.go
@@ -208,24 +208,22 @@
return nil, &retryableJobError{msg: fmt.Sprintf("run %s has no jobs yet", resp.RunId)}
}
- // Match by job key (--job flag): exact match or short name (after colon).
+ // Match by job key (--job flag): exact match first, then short name (after colon).
// Job keys from inline workflows look like "_inline_0.yaml:e2e" — the
// user passes just "e2e", so match on the suffix after the colon too.
if jobKey != "" {
+ // First pass: exact match takes priority.
for _, j := range allJobs {
- short := j.JobKey
- if i := strings.IndexByte(short, ':'); i >= 0 {
- short = short[i+1:]
- }
- if j.JobKey == jobKey || short == jobKey {
+ if j.JobKey == jobKey {
return j, nil
}
}
- // Inline workflows get prefixed keys (e.g. "_inline_0.yaml:lint_typecheck"),
- // so fall back to a suffix match when the user passes just the job name.
+ // Second pass: match on short name (part after colon).
for _, j := range allJobs {
- if strings.HasSuffix(j.JobKey, ":"+jobKey) {
- return j, nil
+ if i := strings.IndexByte(j.JobKey, ':'); i >= 0 {
+ if j.JobKey[i+1:] == jobKey {
+ return j, nil
+ }
}
}
// Job might not exist yet if workflows are still being expanded.Split the single-pass loop into exact-first then short-name fallback so a bare "build" job key is preferred over "_inline_0.yaml:build". Also removes the redundant HasSuffix fallback loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Missing timeout causes indefinite hang in log polling
- Added a 5-minute timeout to waitForLogMarker matching the pattern used in waitForSandbox, so the function now returns an error instead of hanging indefinitely when the marker never appears.
Or push these changes by commenting:
@cursor push 0aad5a83f3
Preview (0aad5a83f3)
diff --git a/pkg/cmd/ci/run.go b/pkg/cmd/ci/run.go
--- a/pkg/cmd/ci/run.go
+++ b/pkg/cmd/ci/run.go
@@ -512,8 +512,14 @@
// appears. This is used to detect when the injected debug step is running.
func waitForLogMarker(ctx context.Context, token, orgID, runID, jobKey, marker string) error {
const pollInterval = 3 * time.Second
+ const timeout = 5 * time.Minute
+ deadline := time.Now().Add(timeout)
+
for {
+ if time.Now().After(deadline) {
+ return fmt.Errorf("timed out waiting for debug step marker (waited %s)", timeout)
+ }
// Resolve the latest attempt ID for the job.
resp, err := api.CIGetRunStatus(ctx, token, orgID, runID)| case <-time.After(pollInterval): | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing timeout causes indefinite hang in log polling
Medium Severity
waitForLogMarker has no timeout, unlike waitForSandbox which uses a 5-minute deadline. If the marker never appears and the job stays running (e.g., a preceding step hangs, the debug step isn't reached, or the API returns persistent errors that are silently retried), the CLI blocks indefinitely. The sandbox is already provisioned at this point, and the caller on line 234 treats errors as non-fatal warnings to "connect anyway" — but it never gets the chance because waitForLogMarker never returns. API errors are also swallowed and retried forever (unlike waitForSandbox which fails immediately on API errors), compounding the risk.



Summary
mxschmitt/action-tmate@v3with a shell script that pauses viatouch /tmp/depot-continue::depot-ssh-ready::marker, then connects via PTY_inline_0.yaml:e2e→e2e)printSSHInfoto showdepot ci sshcommandStacked on #454 and needs https://github.com/depot/api/pull/3233. Closes DEP-3862.
Test plan
--ssh-after-step 2ondocker-integration.ymltouch /tmp/depot-continueresumes workflowgo build ./...andgo test ./...pass🤖 Generated with Claude Code
Note
Medium Risk
Changes the CI run/SSH connection flow by injecting a blocking debug step and polling logs before attaching, which could affect workflow execution timing and CLI attach reliability.
Overview
depot ci run --ssh-after-stepnow injects a native "Depot SSH Debug" shell step that pauses the job until/tmp/depot-continueis touched, replacing the previousaction-tmateapproach.After starting a run, the CLI can wait for sandbox provisioning and (for
--ssh-after-step) poll job attempt logs for a::depot-ssh-ready::marker before connecting; in non-TTY contexts it prints improved SSH info (now includesrun_idand adepot ci ssh <run-id>hint).Job lookup for
depot ci sshis tightened to prefer exact job-key matches and only fall back to matching the short name portion of inline workflow keys.Written by Cursor Bugbot for commit 4fbb90f. This will update automatically on new commits. Configure here.