fix(fleetnode): drop http.Client.Timeout on authenticated gateway client by ankitgoswami · Pull Request #234 · block/proto-fleet

ankitgoswami · 2026-05-14T15:15:32Z

Summary

The authenticated gateway client introduced in #227 reuses `newGatewayHTTPClient()` whose `http.Client.Timeout: 30s` covers the entire request lifecycle (including reading the response body). For long-lived streaming RPCs that's the streamed message channel itself — so any `ControlStream` / `UploadTelemetry` / `UploadEvents` open via this client would be forcibly torn down at the 30-second mark.

Latent today (`fleetnode run` only does the unary `UploadHeartbeat`, which finishes well under 30s), but the next PR that wires up any streaming RPC would have discovered it immediately. Fixing the foundation here keeps that PR focused on its actual feature.

Changes

New `newStreamingGatewayHTTPClient()` in `authclient.go`: no top-level `Timeout`; keeps the no-redirect `CheckRedirect`. Used by `NewAuthenticatedGatewayClient`.
Unchanged: `newGatewayHTTPClient()` (used by bootstrap's unary `Register` / `BeginAuthHandshake` / `CompleteAuthHandshake`). Those calls are bounded and the 30s circuit-breaker is correct for the interactive enroll/refresh flows.
Daemon: `sendHeartbeat` now wraps each call in `context.WithTimeout(heartbeatRequestTimeout)`. Preserves the "single hung heartbeat doesn't pin the daemon" guarantee that the client-level `Timeout` used to provide, but applies it only to unary calls; future streaming methods will pass the loop ctx without an artificial deadline.
Regression test pins that `newStreamingGatewayHTTPClient().Timeout == 0` and that `CheckRedirect` still refuses redirects.

Test plan

`PATH=./bin:$PATH go build ./server/...`
`PATH=./bin:$PATH go vet ./server/...`
`cd server && PATH=../bin:$PATH golangci-lint run -c .golangci.yaml ./...` — 0 issues
`DB_PASSWORD=fleet PATH=./bin:$PATH go test ./server/... -count=1` — 68 packages OK
CI runs the full suite

🤖 Generated with Claude Code

The authenticated gateway client (used by the daemon and by future streaming RPCs) was reusing newGatewayHTTPClient() with its 30s http.Client.Timeout. http.Client.Timeout covers the entire request lifecycle including reading the response body, so it would forcibly tear down every long-lived streaming RPC at the 30s mark — making the client unsuitable for the very methods it was named to support (ControlStream, UploadTelemetry, UploadEvents). Latent in PR #227 (the only RPC the daemon makes today is the unary UploadHeartbeat, which completes well under 30s), but the next PR that wires up any streaming RPC would discover this the hard way. Fixing the foundation now keeps the streaming PR focused on its actual feature. Changes: - New newStreamingGatewayHTTPClient() in authclient.go: no top-level Timeout, keeps the no-redirect CheckRedirect. Used by NewAuthenticatedGatewayClient. - newGatewayHTTPClient() (bootstrap unary path: Register, BeginAuth, CompleteAuth) is unchanged. Those calls are bounded and the 30s circuit-breaker is correct for the interactive enroll/refresh flows. - The daemon's sendHeartbeat now wraps each call in a per-call context.WithTimeout(heartbeatRequestTimeout). This preserves the "single hung heartbeat doesn't pin the daemon" guarantee that the client-level Timeout used to provide, but applies it only to unary calls; future streaming methods will pass the loop ctx without an artificial deadline. - Regression test asserts newStreamingGatewayHTTPClient().Timeout is zero and that CheckRedirect still refuses redirects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-14T15:16:46Z

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

Reviewed pull request diff only (dbb1a4f9534c8cb88a74d1a11db5908f2bca3dd1...b0d5e1a58c5e6cd979e0fdee631d67d781e4fe92, exact PR three-dot diff)

Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.

Review Summary

Overall Risk: NONE

Findings

No security, correctness, or reliability findings in the reviewed diff.

Notes

The PR scope is limited to replacing the authenticated fleet-node gateway client’s top-level http.Client.Timeout with a streaming-safe client, while adding a per-heartbeat context.WithTimeout around UploadHeartbeat. I verified the authenticated client is only used by the fleetnode daemon heartbeat path in this diff, and enrollment/refresh still use the existing bounded NewGatewayClient.

I did not run tests due to the read-only sandbox, but the added test coverage matches the intended behavior: no top-level timeout on the streaming-capable authenticated client, redirect rejection preserved, and heartbeat calls bounded by context.

_{Generated by Codex Security Review |

Triggered by: @ankitgoswami |

Review workflow run}

Copilot

Pull request overview

This PR fixes an important foundation issue for the fleetnode authenticated gateway client: it removes the http.Client.Timeout that would otherwise hard-cut long-lived streaming RPCs after 30 seconds, while preserving redirect hardening and maintaining unary-call safety via per-call context timeouts.

Changes:

Switch NewAuthenticatedGatewayClient to use a new HTTP client constructor intended for streaming (no top-level http.Client.Timeout, still rejects redirects).
Add a regression test to pin Timeout == 0 and redirect rejection behavior for the streaming client.
Add a per-heartbeat context.WithTimeout so a hung unary heartbeat can’t block the daemon loop indefinitely.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`server/internal/fleetnodebootstrap/authclient.go`	Introduces `newStreamingGatewayHTTPClient()` and uses it for the authenticated gateway client to avoid breaking streaming RPCs.
`server/internal/fleetnodebootstrap/authclient_test.go`	Adds a regression test ensuring the streaming HTTP client has no top-level timeout and still rejects redirects.
`server/cmd/fleetnode/run.go`	Wraps each `UploadHeartbeat` call with a per-call context timeout to preserve “no hung heartbeat pins daemon” behavior.

ankitgoswami · 2026-05-14T17:40:34Z

Closing in favor of bundling the streaming-friendly HTTP client into the first PR that actually uses streaming. Commit b0d5e1a stays on the ankitg/fleetnode-streaming-timeout branch for cherry-pick when needed.

Adds the first remote-agent discovery flow under RFC 0001 phase 2: fleet_nodes scan their LAN, report findings, and an operator pairs devices to nodes through a new admin RPC. Bundled server + agent in one PR so the end-to-end smoke is the merge gate. Protocol - ReportDiscoveredDevices: unary, batched up to 1024 devices per request, proto-validated. - Three pairing admin RPCs: PairDeviceToFleetNode, UnpairDevice, ListFleetNodeDevices. Session-only auth. Server - Migration 000050 adds discovered_device.discovered_by_fleet_node_id and backfills from existing fleet_node_device pairings so already- paired legacy rows can't be retargeted by another node. - fleetnodepairing.Service wraps pair/unpair in a transactor; both fleet_node_device and discovered_device.discovered_by_fleet_node_id update atomically. PairDevice transfers attribution to the new owner; UnpairDevice clears it. - UpsertDiscoveredDeviceFromFleetNode's WHERE blocks claims from a different fleet node and from NULL-attributed rows already paired to a different node. 0 rows on conflict signals rejection without raising an error. - validateReport restricts ip_address to RFC1918/RFC4193 private ranges; rejects loopback, link-local, multicast, unspecified, and public addresses. URL scheme allowlist: http, https. - Gateway handler ReportDiscoveredDevices derives fleet_node_id + org_id from the auth Subject (never from the body); admin handler exposes the three pairing RPCs under SessionOnlyProcedures. Agent (server/cmd/fleetnode) - New --scan-cidr / --scan-port / --discovery-interval flags. If no --scan-cidr is supplied, discovery no-ops (heartbeat still runs). - Bounded-concurrency probes (32 in-flight, 3s per probe). CIDR expansion capped at 4096 hosts; IPv6 CIDRs rejected for v1. - Production SDK drivers (Antminer/ProtoRig/...) leave DeviceIdentifier empty; the agent synthesizes "mac:<addr>" or "serial:<sn>" so reports are stable across rescans without a server-side reconciliation pass. - Heartbeat and discovery run on separate goroutines. The discovery tick uses tickCtx for both probes and reports, hard-capping the tick at 60s. Heartbeat owns the session lifecycle; discovery delegates refresh to it. sync.Mutex guards SessionToken across the two loops; refreshAndSave does the handshake against a shallow copy so the network call doesn't block discovery token reads. Notes - discovered_device.ip_address is the LAN IP as seen by the reporting agent; the cloud never dials it. Per-fleet-node CIDR allowlist is a follow-up before any server-side code reads these IPs for outbound connections. - http.Client.Timeout fix from PR #234 deliberately not folded in; rolls in with the first streaming RPC (UploadTelemetry or ControlStream). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 14, 2026 15:15

ankitgoswami requested a review from a team as a code owner May 14, 2026 15:15

github-actions Bot assigned ankitgoswami May 14, 2026

github-actions Bot added the server label May 14, 2026

Copilot AI reviewed May 14, 2026

View reviewed changes

ankitgoswami closed this May 14, 2026

ankitgoswami deleted the ankitg/fleetnode-streaming-timeout branch May 14, 2026 17:40

ankitgoswami mentioned this pull request May 14, 2026

feat(fleetnode): server-initiated discovery via ControlStream #235

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fleetnode): drop http.Client.Timeout on authenticated gateway client#234

fix(fleetnode): drop http.Client.Timeout on authenticated gateway client#234
ankitgoswami wants to merge 1 commit into
mainfrom
ankitg/fleetnode-streaming-timeout

ankitgoswami commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

ankitgoswami commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ankitgoswami commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

github-actions Bot commented May 14, 2026

🔐 Codex Security Review

Review Summary

Findings

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ankitgoswami commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ankitgoswami commented May 14, 2026 •

edited

Loading