Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,38 @@ jobs:
- name: Doc tests
run: cargo test --workspace --doc

test-macos:
name: Test (macOS)
runs-on: macos-latest
needs: check
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Unit + integration tests
# Exercises the macOS-specific paths that cross-compilation can't:
# the `fcntl(FD_CLOEXEC)` fallback for socketpair (no SOCK_CLOEXEC
# on macOS) and the `/dev/fd` FD-leak enumeration.
run: cargo test --workspace --lib --tests
- name: Stress test (FD stability)
run: cargo test -p handoff-tests --test stress

cross-check:
name: Cross-compile (FreeBSD)
runs-on: ubuntu-latest
needs: check
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
targets: x86_64-unknown-freebsd
- uses: Swatinem/rust-cache@v2
- name: Type-check BSD target
# GitHub has no free FreeBSD runner; `cargo check` proves the BSD
# path stays buildable without one (type-check only, no linking).
run: cargo check --workspace --all-targets --target x86_64-unknown-freebsd

crash-matrix:
name: Crash matrix (fault injection)
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ If `ChildGuard` were still armed when the `Commit` write failed (e.g. O crashed

### Why the state journal uses rename, not O_DSYNC write

`write tmp + rename` produces an atomic view: the on-disk file is always either the old complete state or the new complete state, never a partial write. `O_DSYNC` only ensures the write itself is durable — it doesn't prevent a torn record if the supervisor crashes mid-write. Rename on Linux ext4/XFS/btrfs is atomic with respect to crash consistency.
`write tmp + rename` produces an atomic view: the on-disk file is always either the old complete state or the new complete state, never a partial write. `O_DSYNC` only ensures the write itself is durable — it doesn't prevent a torn record if the supervisor crashes mid-write. `rename(2)` is atomic with respect to crash consistency on every supported filesystem — Linux ext4/XFS/btrfs, macOS APFS, and BSD UFS/ZFS — so the guarantee is not Linux-specific.

### Liveness: heartbeats during drain/seal + two-tier supervisor timeout

Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ Three roles: a **supervisor** that holds listener FDs and drives the swap, an **

See [ARCHITECTURE.md](./ARCHITECTURE.md) for the wire protocol, state machine, and correctness invariants.

## Platforms

Linux, macOS, and the BSDs. The mechanism is plain POSIX — `fork`/`exec` FD inheritance, `flock`, Unix-domain control sockets, and signals — with no Linux-only syscalls. Windows is unsupported: it has no `fork`/`exec` FD inheritance and no `flock`, so the handoff model doesn't map without a separate backend.

## Integrate your daemon

### 1. Implement `Drainable`
Expand Down
15 changes: 8 additions & 7 deletions crates/handoff-tests/tests/stress.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,14 @@ fn many_handoffs_no_resource_leaks() {
);
}

/// Count of open file descriptors in the test process. Reads
/// `/proc/self/fd` directly so it captures everything — not just FDs we
/// know about. The directory entry for `/proc/self/fd` itself opens an
/// FD during enumeration; we measure with the same method on both sides
/// so the bias cancels.
/// Count of open file descriptors in the test process. Reads `/dev/fd`
/// directly so it captures everything — not just FDs we know about.
/// `/dev/fd` is the portable spelling of the per-process FD directory: a
/// symlink to `/proc/self/fd` on Linux, a real fdescfs on macOS and
/// FreeBSD. The directory entry itself opens an FD during enumeration; we
/// measure with the same method on both sides so the bias cancels.
fn count_open_fds() -> usize {
std::fs::read_dir("/proc/self/fd")
.expect("Linux: /proc/self/fd should exist")
std::fs::read_dir("/dev/fd")
.expect("/dev/fd should exist on any supported Unix")
.count()
}
6 changes: 5 additions & 1 deletion crates/handoff/src/fd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,15 @@ mod tests {
use std::os::fd::IntoRawFd;

let mk = || {
// `SockFlag::empty()` (not SOCK_CLOEXEC): these are throwaway
// source FDs for the dup2-shuffle assertion, and the flag isn't
// defined on macOS — keeping it portable lets the test compile
// everywhere.
let (a, b) = socketpair(
AddressFamily::Unix,
SockType::Stream,
None,
SockFlag::SOCK_CLOEXEC,
SockFlag::empty(),
)
.unwrap();
(a.into_raw_fd(), b.into_raw_fd())
Expand Down
23 changes: 23 additions & 0 deletions crates/handoff/src/supervisor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -775,12 +775,35 @@ fn send_best_effort_abort(
}

fn make_socketpair() -> Result<(UnixStream, UnixStream)> {
// Linux and the BSDs create the pair close-on-exec atomically via the
// SOCK_CLOEXEC flag. macOS doesn't define that flag for socketpair (nix
// won't even compile the symbol there), so set FD_CLOEXEC with a
// follow-up fcntl on each end. The non-atomic window on macOS is
// theoretical: this runs on the rare swap path, not under a fork storm.
#[cfg(not(target_os = "macos"))]
let (a, b) = socketpair(
AddressFamily::Unix,
SockType::Stream,
None,
SockFlag::SOCK_CLOEXEC,
)?;
#[cfg(target_os = "macos")]
let (a, b) = {
use std::os::fd::AsFd;
let pair = socketpair(
AddressFamily::Unix,
SockType::Stream,
None,
SockFlag::empty(),
)?;
for fd in [pair.0.as_fd(), pair.1.as_fd()] {
nix::fcntl::fcntl(
fd,
nix::fcntl::FcntlArg::F_SETFD(nix::fcntl::FdFlag::FD_CLOEXEC),
)?;
}
pair
};
// SAFETY: both ends are freshly owned by us, valid, non-blocking unset.
let s_a = unsafe {
use std::os::fd::FromRawFd;
Expand Down
Loading