fleetnode: harden enrollment transport + stage install layout (stack 1/3)#322
Conversation
- Scheme-aware gateway HTTP client: TLS-validating http2.Transport for
https://, h2c for http://. NewGatewayClient and NewAuthenticatedGateway-
Client return error so the scheme check has a place to surface.
- Per-RPC handshake timeout (var so tests can shorten it). Register wrapped
so a blackholed server cannot hang fleetnode enroll while the state lock
is held.
- State lock: LOCK_EX | LOCK_NB plus owner-PID write under the lock so a
contending invocation gets an actionable error instead of blocking.
- Shared testutil.NewH2CServer / NewH2CClient helpers; fake_gateway_test
serves the agent client over h2c to match production behavior.
- just build-fleetnode recipe stages server/.fleetnode/{fleetnode, nmap,
plugins/}; .gitignore covers the artifact dir.
- README skeleton: install/enroll surface only; run + discovery sections
land in later stack PRs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab1b22c09b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
🔐 Codex Security Review
Review SummaryOverall Risk: MEDIUM Findings[MEDIUM] Heartbeat RPCs Can Hang Indefinitely After Client Timeout Removal
NotesNo high-impact auth, SQLi, command injection, protobuf wire-format, cryptostealing/pool-hijack, or plugin trust-boundary issues were evident in the reviewed diff. Tests were not run; this was a diff-only review against Generated by Codex Security Review | |
There was a problem hiding this comment.
Pull request overview
This PR (stack 1/3) hardens the fleetnode enrollment/handshake transport and improves operational ergonomics around testing, state locking, and local build staging so the agent can be built + enrolled safely ahead of later “run/discovery” work.
Changes:
- Make gateway clients scheme-aware (TLS-validating HTTP/2 for
https://, h2c forhttp://) and plumb scheme/URL errors viaNewGatewayClient(...)(..., error)/NewAuthenticatedGatewayClient(...)(..., error). - Add per-RPC handshake timeouts and improve state-lock contention diagnostics (fast-fail + PID reporting).
- Add shared h2c test utilities; update tests and
just build-fleetnodeto stage artifacts intoserver/.fleetnode/(and ignore that directory).
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
server/internal/testutil/h2c.go |
Adds shared h2c server/client helpers for HTTP/2 cleartext testing. |
server/internal/fleetnodebootstrap/refresh.go |
Propagates NewGatewayClient errors and uses the new client factory shape. |
server/internal/fleetnodebootstrap/locking_unix.go |
Makes state lock non-blocking and records/reads holder PID for actionable contention errors. |
server/internal/fleetnodebootstrap/handshake.go |
Introduces per-step handshake timeouts and a test override seam. |
server/internal/fleetnodebootstrap/handshake_test.go |
Switches tests to h2c server helper; adapts to new NewGatewayClient signature. |
server/internal/fleetnodebootstrap/enrollment.go |
Adds per-call timeout for Register; adapts to new gateway client constructors. |
server/internal/fleetnodebootstrap/enrollment_test.go |
Adds a blackhole timeout regression test for Register. |
server/internal/fleetnodebootstrap/client.go |
Implements scheme-aware HTTP/2 transport selection and returns errors for invalid schemes; removes global client timeout. |
server/internal/fleetnodebootstrap/client_test.go |
Adds TLS validation regression coverage; updates redirect test to h2c and new constructor signature. |
server/internal/fleetnodebootstrap/authclient.go |
Returns (client, error) and uses the new scheme-aware HTTP client builder. |
server/internal/fleetnodebootstrap/authclient_test.go |
Updates to h2c server helper and new authenticated client signature. |
server/cmd/fleetnode/run.go |
Updates client factory signature to return errors and handles construction failures. |
server/cmd/fleetnode/run_test.go |
Updates test stubs for the new clientFactory signature. |
server/cmd/fleetnode/README.md |
Adds initial CLI documentation, state/lock semantics, build and security model notes. |
server/cmd/fleetnode/fake_gateway_test.go |
Switches fake gateway tests to serve over h2c to match production wire behavior. |
justfile |
Stages fleetnode + plugins + nmap symlink into server/.fleetnode/; makes asicrs build output directory configurable. |
.gitignore |
Ignores server/.fleetnode/ staging output. |
- Replace bare http2.Transport with net/http Transport on the https branch so ALPN negotiates H2 when available and falls back to HTTP/1.1. Packaged nginx terminates TLS and proxies upstream over HTTP/1.1, which the H2-only transport refused (Codex P1). - Stop calling t.Cleanup from the blackhole accept-loop goroutine; collect conns under a mutex and close them via a single Cleanup registered on the test goroutine. testing.T.Cleanup is not goroutine-safe. - Replace the "fleetnode run lands in the run-heartbeat PR" placeholder with a real Subcommands row: run is already registered in main.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83ff5297dd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codex Security Review responseHeartbeat RPCs can hang forever after removing client timeout (MEDIUM) — Per-RPC timeout for the daemon RPCs lands in PR #323: |
Brings in: - #325 RBAC swap: List/GetActive/Update curtailment RPCs now gated via RequirePermission; new entries in ProcedurePermissions. - #329 UpdateCurtailmentEvent client wiring + edit-modal UI. - #321 Curtailment start/restore action UI. - #322 / #323 fleetnode enrollment + heartbeat stack. Conflict resolution: server/internal/handlers/middleware/rpc_permissions.go. Main added CurtailmentService reads + Update + DeviceCollection + DeviceSet + ErrorQuery + FleetManagement entries; the branch added IngestCurtailmentSignal. Took the union — folded the Ingest entry into the CurtailmentService block alongside Update/reads/AdminTerminate. handler.go auto-merged cleanly; build + targeted tests pass.
Summary
PR 1 of 3 in a stack that splits up #256. Carries the install + enrollment surface so the agent can be built and registered before any daemon/discovery code lands.
http2.Transportforhttps://, h2c forhttp://. The previous sharedAllowHTTP + DialTLSContextshim silently downgradedhttps://requests to plaintext, defeatingValidateServerURL's non-loopback requirement. `NewGatewayClient` and `NewAuthenticatedGatewayClient` now return error so a bad scheme has somewhere to surface.just build-fleetnodestages `server/.fleetnode/{fleetnode, nmap, plugins/}`; `.gitignore` covers the artifact dir. Operators can build and enroll today; the run + discovery layers land in stack PRs 2 and 3.Stack
Test plan
🤖 Generated with Claude Code