feat: replace tmate with native SSH debug step for --ssh-after-step by robstolarz · Pull Request #456 · depot/cli

robstolarz · 2026-03-19T03:31:28Z

Summary

Replace mxschmitt/action-tmate@v3 with a shell script that pauses via touch /tmp/depot-continue
After starting the run, CLI waits for sandbox provisioning, polls logs for ::depot-ssh-ready:: marker, then connects via PTY
Falls back to printing SSH connection info when no TTY is available
Fix job key matching for inline workflows (_inline_0.yaml:e2e → e2e)
Improve printSSHInfo to show depot ci ssh command

Stacked on #454 and needs https://github.com/depot/api/pull/3233. Closes DEP-3862.

Test plan

Full end-to-end test: --ssh-after-step 2 on docker-integration.yml
Debug step injected correctly, workflow pauses
Log marker detected, PTY session connects after step N completes
touch /tmp/depot-continue resumes workflow
No-TTY fallback prints SSH connection info
go build ./... and go test ./... pass

🤖 Generated with Claude Code

Note

Medium Risk
Changes the CI run/SSH connection flow by injecting a blocking debug step and polling logs before attaching, which could affect workflow execution timing and CLI attach reliability.

Overview
depot ci run --ssh-after-step now injects a native "Depot SSH Debug" shell step that pauses the job until /tmp/depot-continue is touched, replacing the previous action-tmate approach.

After starting a run, the CLI can wait for sandbox provisioning and (for --ssh-after-step) poll job attempt logs for a ::depot-ssh-ready:: marker before connecting; in non-TTY contexts it prints improved SSH info (now includes run_id and a depot ci ssh <run-id> hint).

Job lookup for depot ci ssh is tightened to prefer exact job-key matches and only fall back to matching the short name portion of inline workflow keys.

^{Written by Cursor Bugbot for commit 4fbb90f. This will update automatically on new commits. Configure here.}

linear · 2026-03-19T03:31:31Z

DEP-3862 replace tmate action in --ssh-after-step with our own solution

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

✅ Fixed: waitForLogMarker ignores terminal job/attempt status
- Added a switch statement to check for terminal attempt statuses (finished, failed, cancelled) and return an error immediately, matching the pattern used in waitForSandbox.
✅ Fixed: Duplicate "Connecting to sandbox" message printed
- Removed the duplicate fmt.Fprintf call from run.go since waitForSandbox already prints this message before returning.
✅ Fixed: printSSHInfo suggests invalid depot ci ssh command
- Added a runOrJobID parameter to printSSHInfo and only display the 'depot ci ssh' suggestion when a valid run/job ID is available, passing empty string when called from 'depot ci run'.

Or push these changes by commenting:

@cursor push 0ccc9c9234

Preview (0ccc9c9234)

diff --git a/pkg/cmd/ci/run.go b/pkg/cmd/ci/run.go
--- a/pkg/cmd/ci/run.go
+++ b/pkg/cmd/ci/run.go
@@ -237,13 +237,12 @@
 					}
 				}
 
-				if sshAfterStep > 0 {
-					fmt.Fprintf(os.Stderr, "Run 'touch /tmp/depot-continue' to resume the workflow. (Your session will not end.)\n")
-				}
-				fmt.Fprintf(os.Stderr, "Connecting to sandbox %s...\n", sandboxID)
-				if !helpers.IsTerminal() {
-					return printSSHInfo(sandboxID, sessionID, "")
-				}
+			if sshAfterStep > 0 {
+				fmt.Fprintf(os.Stderr, "Run 'touch /tmp/depot-continue' to resume the workflow. (Your session will not end.)\n")
+			}
+			if !helpers.IsTerminal() {
+				return printSSHInfo(sandboxID, sessionID, "", "")
+			}
 				return pty.Run(ctx, pty.SessionOptions{
 					Token:     tokenVal,
 					SandboxID: sandboxID,
@@ -549,6 +548,11 @@
 			continue
 		}
 
+		switch attempt.Status {
+		case "finished", "failed", "cancelled":
+			return fmt.Errorf("job has already completed (status: %s) before marker appeared", attempt.Status)
+		}
+
 		lines, err := api.CIGetJobAttemptLogs(ctx, token, orgID, attempt.AttemptId)
 		if err == nil {
 			for _, line := range lines {

diff --git a/pkg/cmd/ci/ssh.go b/pkg/cmd/ci/ssh.go
--- a/pkg/cmd/ci/ssh.go
+++ b/pkg/cmd/ci/ssh.go
@@ -72,9 +72,9 @@
 				return err
 			}
 
-			if info || !helpers.IsTerminal() {
-				return printSSHInfo(sandboxID, sessionID, output)
-			}
+		if info || !helpers.IsTerminal() {
+			return printSSHInfo(sandboxID, sessionID, output, runID)
+		}
 
 			return pty.Run(ctx, pty.SessionOptions{
 				Token:     tokenVal,
@@ -254,7 +254,7 @@
 	return latest
 }
 
-func printSSHInfo(sandboxID, sessionID, output string) error {
+func printSSHInfo(sandboxID, sessionID, output, runOrJobID string) error {
 	if output == "json" {
 		enc := json.NewEncoder(os.Stdout)
 		enc.SetIndent("", "  ")
@@ -270,10 +270,14 @@
 	fmt.Printf("User:     %s\n", sandboxID)
 	fmt.Printf("Password: Your Depot API token ($DEPOT_TOKEN)\n")
 	fmt.Println()
-	fmt.Printf("Connect interactively:\n")
-	fmt.Printf("  depot ci ssh %s\n", sandboxID)
-	fmt.Println()
-	fmt.Printf("Or via SSH directly:\n")
+	if runOrJobID != "" {
+		fmt.Printf("Connect interactively:\n")
+		fmt.Printf("  depot ci ssh %s\n", runOrJobID)
+		fmt.Println()
+		fmt.Printf("Or via SSH directly:\n")
+	} else {
+		fmt.Printf("Connect via SSH:\n")
+	}
 	fmt.Printf("  ssh %s@api.depot.dev\n", sandboxID)
 	return nil
 }

pkg/cmd/ci/run.go

pkg/cmd/ci/ssh.go

Replace the `mxschmitt/action-tmate@v3` GitHub Action with a simple shell script that pauses the workflow by polling for a magic file. After starting the run, the CLI waits for the sandbox, polls logs for a `::depot-ssh-ready::` marker, then connects via the existing PTY infrastructure. - Inject a shell-based debug step instead of tmate action - Wait for sandbox + log marker before connecting - Fall back to printing SSH info when no TTY is available - Fix job key matching for inline workflows (e.g. `_inline_0.yaml:e2e`) - Improve `printSSHInfo` to show `depot ci ssh` command Closes DEP-3862 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Early-exit waitForLogMarker when attempt reaches terminal status instead of polling forever if a step before the debug step fails - Remove duplicate "Connecting to sandbox" message (already in waitForSandbox) - Fix printSSHInfo to use run ID for `depot ci ssh` suggestion - Remove arbitrary 10min timeout on log marker polling (rely on terminal status check and context cancellation instead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Exact job key match priority lost in findJob
- Split the combined matching loop into two separate passes: first checking only exact matches, then checking short-name matches, ensuring exact matches always take priority regardless of job order.
✅ Fixed: Redundant fallback suffix-match loop in findJob
- Removed the redundant suffix-match fallback loop as part of the refactoring that separated exact and short-name matching into distinct passes.

Or push these changes by commenting:

@cursor push 810e322841

Preview (810e322841)

diff --git a/pkg/cmd/ci/ssh.go b/pkg/cmd/ci/ssh.go
--- a/pkg/cmd/ci/ssh.go
+++ b/pkg/cmd/ci/ssh.go
@@ -208,24 +208,22 @@
 		return nil, &retryableJobError{msg: fmt.Sprintf("run %s has no jobs yet", resp.RunId)}
 	}
 
-	// Match by job key (--job flag): exact match or short name (after colon).
+	// Match by job key (--job flag): exact match first, then short name (after colon).
 	// Job keys from inline workflows look like "_inline_0.yaml:e2e" — the
 	// user passes just "e2e", so match on the suffix after the colon too.
 	if jobKey != "" {
+		// First pass: exact match takes priority.
 		for _, j := range allJobs {
-			short := j.JobKey
-			if i := strings.IndexByte(short, ':'); i >= 0 {
-				short = short[i+1:]
-			}
-			if j.JobKey == jobKey || short == jobKey {
+			if j.JobKey == jobKey {
 				return j, nil
 			}
 		}
-		// Inline workflows get prefixed keys (e.g. "_inline_0.yaml:lint_typecheck"),
-		// so fall back to a suffix match when the user passes just the job name.
+		// Second pass: match on short name (part after colon).
 		for _, j := range allJobs {
-			if strings.HasSuffix(j.JobKey, ":"+jobKey) {
-				return j, nil
+			if i := strings.IndexByte(j.JobKey, ':'); i >= 0 {
+				if j.JobKey[i+1:] == jobKey {
+					return j, nil
+				}
 			}
 		}
 		// Job might not exist yet if workflows are still being expanded.

pkg/cmd/ci/ssh.go

Split the single-pass loop into exact-first then short-name fallback so a bare "build" job key is preferred over "_inline_0.yaml:build". Also removes the redundant HasSuffix fallback loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Missing timeout causes indefinite hang in log polling
- Added a 5-minute timeout to waitForLogMarker matching the pattern used in waitForSandbox, so the function now returns an error instead of hanging indefinitely when the marker never appears.

Or push these changes by commenting:

@cursor push 0aad5a83f3

Preview (0aad5a83f3)

diff --git a/pkg/cmd/ci/run.go b/pkg/cmd/ci/run.go
--- a/pkg/cmd/ci/run.go
+++ b/pkg/cmd/ci/run.go
@@ -512,8 +512,14 @@
 // appears. This is used to detect when the injected debug step is running.
 func waitForLogMarker(ctx context.Context, token, orgID, runID, jobKey, marker string) error {
 	const pollInterval = 3 * time.Second
+	const timeout = 5 * time.Minute
 
+	deadline := time.Now().Add(timeout)
+
 	for {
+		if time.Now().After(deadline) {
+			return fmt.Errorf("timed out waiting for debug step marker (waited %s)", timeout)
+		}
 
 		// Resolve the latest attempt ID for the job.
 		resp, err := api.CIGetRunStatus(ctx, token, orgID, runID)

cursor · 2026-03-19T19:30:00Z

pkg/cmd/ci/run.go

+		case <-time.After(pollInterval):
+		}
+	}
+}


Missing timeout causes indefinite hang in log polling

Medium Severity

waitForLogMarker has no timeout, unlike waitForSandbox which uses a 5-minute deadline. If the marker never appears and the job stays running (e.g., a preceding step hangs, the debug step isn't reached, or the API returns persistent errors that are silently retried), the CLI blocks indefinitely. The sandbox is already provisioned at this point, and the caller on line 234 treats errors as non-fatal warnings to "connect anyway" — but it never gets the chance because waitForLogMarker never returns. API errors are also swallowed and retried forever (unlike waitForSandbox which fails immediately on API errors), compounding the risk.

robstolarz marked this pull request as ready for review March 19, 2026 03:32

cursor bot reviewed Mar 19, 2026

View reviewed changes

pkg/cmd/ci/run.go Show resolved Hide resolved

pkg/cmd/ci/run.go Outdated Show resolved Hide resolved

pkg/cmd/ci/ssh.go Outdated Show resolved Hide resolved

Base automatically changed from rob/dep-3852-cli-depot-ci-ssh-and-interactive-terminal-for-ci-jobs to main March 19, 2026 11:41

robstolarz and others added 2 commits March 19, 2026 11:54

robstolarz force-pushed the rob/dep-3862-replace-tmate-action-in-ssh-after-step-with-our-own-solution branch from 9408c41 to 3707ba5 Compare March 19, 2026 19:03

cursor bot reviewed Mar 19, 2026

View reviewed changes

pkg/cmd/ci/ssh.go Outdated Show resolved Hide resolved

pkg/cmd/ci/ssh.go Outdated Show resolved Hide resolved

cursor bot reviewed Mar 19, 2026

View reviewed changes

121watts approved these changes Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace tmate with native SSH debug step for --ssh-after-step#456

feat: replace tmate with native SSH debug step for --ssh-after-step#456
robstolarz wants to merge 3 commits intomainfrom
rob/dep-3862-replace-tmate-action-in-ssh-after-step-with-our-own-solution

robstolarz commented Mar 19, 2026 •

edited by cursor bot

Loading

Uh oh!

linear bot commented Mar 19, 2026

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

cursor bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robstolarz commented Mar 19, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

linear bot commented Mar 19, 2026

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 19, 2026

Choose a reason for hiding this comment

Missing timeout causes indefinite hang in log polling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robstolarz commented Mar 19, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading

cursor bot left a comment •

edited

Loading

cursor bot left a comment •

edited

Loading