Skip to content

Fix macOS handoff flake: tolerate EINVAL when arming SO_RCVTIMEO on a closed peer#2

Merged
jaredLunde merged 1 commit into
mainfrom
fix/macos-einval-rcvtimeo
Jun 5, 2026
Merged

Fix macOS handoff flake: tolerate EINVAL when arming SO_RCVTIMEO on a closed peer#2
jaredLunde merged 1 commit into
mainfrom
fix/macos-einval-rcvtimeo

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

The bug

On macOS, ~1–2 of 22 crash_matrix fault-injection scenarios failed per run, a different subset each time, all with the same supervisor-log signature:

handoff aborted: Some("ready read failed: io: Invalid argument (os error 22)")

EINVAL on the supervisor's timeout-bounded read of a control frame. Linux is unaffected.

Root cause (proven at the syscall level)

The supervisor arms a per-recv liveness timeout with SO_RCVTIMEO (UnixStream::set_read_timeout) before each control-socket read. macOS/BSD sosetoptlock rejects any setsockopt with EINVAL once a socket is fully shut down (SS_CANTRCVMORE | SS_CANTSENDMORE) — exactly the state a peer that closed its end leaves behind. Linux has no such check.

So when the successor (or incumbent) closed its socket end right as the supervisor reached set_read_timeout, the call returned EINVAL instead of the read returning a clean EOF. Flaky because it raced the peer's fork/exec/exit window — which is why a different subset failed each run.

Instrumentation at the failing setsockopt confirmed: fd was a valid connected SOCK_STREAM, SO_ERROR=0, timeval {10,0} (valid), yet both set_read_timeout and a hand-rolled setsockopt(SO_RCVTIMEO) returned EINVAL. Two states observed:

state so_nread peek meaning
A 0 EOF (0) peer closed, empty buffer → clean EOF
B 28 data (1) peer closed with a complete Ready frame still buffered

State B is why the task's "no blind retry / no swallow" constraint matters: treating EINVAL as failure/EOF would drop a buffered Ready and abort a handoff the successor actually completed.

Fix

arm_recv_timeout() arms the timeout and, on EINVAL, confirms via a non-blocking MSG_PEEK that the read cannot block (peer gone → buffered frame is delivered, otherwise EOF) before reading without the timeout. A genuine EINVAL on a still-open empty socket is surfaced rather than risking an unbounded blocking read. Linux path unchanged; atomic SOCK_CLOEXEC socketpair creation preserved. Applied to all three arming sites (read_until/Ready, seal-wait, Hello).

Test fixes (pre-existing, macOS-timing-exposed)

  • Several crash scenarios asserted O's asynchronously-written resume-called (and N's startup successor-pid) marker with an instant marker_exists right after the crashed process exited — racing the survivor's recovery. Converted to the bounded wait_marker(…, 3s) already used elsewhere in the file. A real "never resumes" bug still fails after 3s, so this absorbs scheduling jitter without masking regressions.
  • count_open_fds read Linux-only /proc/self/fd; now uses /dev/fd on macOS/BSD so the FD-leak stress check runs (and passes) cross-platform — also confirming the fix leaks no fds.

Verification (macOS)

  • crash_matrix green over 50 consecutive runs; the three task-named EINVAL scenarios over 40 more.
  • Full cargo test --workspace --lib --tests green.
  • cargo clippy --workspace --all-targets and cargo fmt --check clean.
  • ARCHITECTURE.md updated (same commit) to document the macOS SO_RCVTIMEO-on-shutdown behavior.

🤖 Generated with Claude Code

… closed peer

The supervisor arms a per-recv liveness timeout with `SO_RCVTIMEO`
(`UnixStream::set_read_timeout`) before every control-socket read. On macOS
and the BSDs, `sosetoptlock` rejects *any* `setsockopt` with `EINVAL` once a
socket is fully shut down (`SS_CANTRCVMORE | SS_CANTSENDMORE`) — exactly the
state a peer that closed its end leaves behind. Linux has no such check:
`setsockopt` succeeds and the following read returns EOF.

So when a successor (or incumbent) closed its end of the control socket right
as the supervisor reached `set_read_timeout`, the call returned EINVAL. That
surfaced as `ready read failed: io: Invalid argument (os error 22)` and
aborted the handoff. It was flaky because it depended on the peer's close
propagating into both shutdown flags before the supervisor armed the timeout —
a race against the successor's fork/exec/exit window. Different scenarios lost
the race on different runs.

Proven at the syscall level by instrumenting the failing `setsockopt`:
`fd` was a valid connected `SOCK_STREAM`, `SO_ERROR=0`, the timeval was
`{10,0}` (unquestionably valid), yet both `set_read_timeout` and a hand-rolled
`setsockopt(SO_RCVTIMEO)` returned EINVAL. Two states observed: peer closed
with an empty buffer (clean EOF) and — critically — peer closed with a
complete 28-byte `Ready` frame still buffered. A blind "treat EINVAL as
failure" would have dropped that buffered `Ready` and aborted a handoff the
successor actually completed.

`arm_recv_timeout()` arms the timeout, and on EINVAL confirms via a
non-blocking `MSG_PEEK` that the read cannot block (peer gone: buffered frame
is delivered, otherwise EOF) before proceeding to read without the timeout.
A genuine EINVAL on a still-open empty socket is surfaced rather than risking
an unbounded blocking read. The Linux path is unchanged (setsockopt never
fails there). Applied to all three arming sites (Ready/`read_until`,
seal-wait, Hello). The Linux build keeps the atomic `SOCK_CLOEXEC`
socketpair creation.

Also fixes two pre-existing, macOS-timing-exposed test issues surfaced by
looping the suite:

- Several crash scenarios asserted O's asynchronously-written `resume-called`
  (and N's startup `successor-pid`) marker with an instant `marker_exists`
  immediately after the crashed process exited, racing the survivor's
  recovery. Converted to the bounded `wait_marker(..., 3s)` already used
  elsewhere — a real "never resumes" bug still fails after 3s, so this absorbs
  scheduling jitter without masking regressions.
- `count_open_fds` in the stress test read Linux-only `/proc/self/fd`; now uses
  `/dev/fd` on macOS/BSD so the FD-leak check runs (and passes) cross-platform.

Verified: `crash_matrix` green over 50 consecutive runs and the named EINVAL
scenarios over 40 more; full `cargo test --workspace --lib --tests` green;
clippy and fmt clean. ARCHITECTURE.md updated to document the macOS
`SO_RCVTIMEO`-on-shutdown behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde force-pushed the fix/macos-einval-rcvtimeo branch from 5e3e621 to aad5609 Compare June 5, 2026 00:49
@jaredLunde jaredLunde merged commit ee91e82 into main Jun 5, 2026
8 checks passed
@jaredLunde jaredLunde mentioned this pull request Jun 5, 2026
jaredLunde added a commit that referenced this pull request Jun 5, 2026
Patch release: macOS/BSD portability (#1) plus the macOS EINVAL flake fix
on the supervisor's SO_RCVTIMEO arm against a closed peer (#2).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant