Skip to content

fix: fully detach manifest service starts so Daytona evals don't wedge (#676)#734

Open
ElegantLin wants to merge 1 commit into
mainfrom
fix/676-manifest-service-fd-detach
Open

fix: fully detach manifest service starts so Daytona evals don't wedge (#676)#734
ElegantLin wants to merge 1 commit into
mainfrom
fix/676-manifest-service-fd-detach

Conversation

@ElegantLin

Copy link
Copy Markdown
Contributor

Summary

ManifestEnvironment.provision() started declared services with {cmd} > {log} 2>&1 & — stdout/stderr redirected, but stdin left attached and no nohup.

On Daytona, exec runs as a session command that only reports an exit code once nothing holds the session's streams (the failure class root-caused in smolclaws#85). A backgrounded service that inherits the session's stdin/stdout/stderr keeps the call "running" until the hard timeout cap (#670) → a manifest-driven eval wedges (~1h) instead of starting.

Docker is unaffected — communicate() returns when the foreground shell exits — so the bug is invisible in local runs and CI, and only bites on Daytona.

Fix

Start services with full detachment:

nohup {cmd} </dev/null >/tmp/benchflow-env-<svc>.log 2>&1 &
  • all three fds redirected — the missing </dev/null (stdin) is the operative change; the old form only covered stdout/stderr
  • nohup to detach from the session's SIGHUP
  • no disown — it is a bash/zsh builtin and the Daytona DinD shell is sh/dash
  • per-service logs stay retrievable at /tmp/benchflow-env-<svc>.log

Mirrors the fix that resolved the identical wedge downstream.

Test plan

  • test_provision_fully_detaches_service_fds — asserts nohup + all three fd redirections + backgrounded + no disown; verified to fail on the old > {log} 2>&1 & form (fail-then-pass). The pre-existing test only checked the command ended with &, which is exactly why the missing stdin redirect slipped through.
  • tests/environment/ + manifest inbound suite: 75 passed
  • ruff check + format, ty clean

Closes #676

🤖 Generated with Claude Code

#676)

ManifestEnvironment.provision() started declared services with
`{cmd} > {log} 2>&1 &` — stdout/stderr redirected, but stdin left
attached and no nohup. On Daytona, `exec` is a *session* command that
only reports an exit code once nothing holds the session's streams, so a
backgrounded service inheriting the session fds keeps the call "running"
until the hard timeout cap (#670) — the manifest eval wedges (~1h) instead
of starting. Docker is unaffected (`communicate()` returns when the
foreground shell exits), so the bug is invisible locally and in CI. Same
failure class as the service-hook wedge root-caused in smolclaws#85.

Fix: start services with `nohup {cmd} </dev/null >{log} 2>&1 &` — all
three fds redirected (the missing `</dev/null` is the operative change)
plus nohup to detach from the session. No `disown`: it is a bash/zsh
builtin and the Daytona DinD shell is `sh`/dash.

The existing test only asserted the start command ends with `&`, which is
why the missing stdin redirect slipped through. Added
test_provision_fully_detaches_service_fds asserting nohup + all three fd
redirections + no disown; verified to fail on the old `> {log} 2>&1 &`
form (fail-then-pass).

Closes #676

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ManifestEnvironment service start lacks fd detachment — manifest evals wedge on Daytona (same class as downstream service-hook wedge)

1 participant