feat(fleetnode): pair discovered miners (orchestration + operator RPCs)#396
Conversation
🔐 Codex Security Review
Review SummaryOverall Risk: NONE FindingsNo security, correctness, or reliability findings in the scoped PR diff. NotesReviewed I did not run the test suite because the environment is read-only; this review is based on static inspection of the supplied diff and surrounding code. Generated by Codex Security Review | |
924106b to
c4e448f
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fc56691 to
56cd24c
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4b441a7 to
a8e23d9
Compare
56cd24c to
09b523d
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a8e23d9 to
9632479
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9632479 to
442d299
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
91f9f5b to
090667a
Compare
46f1bfd to
4f7f454
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
090667a to
f7caf78
Compare
4f7f454 to
cd54a3a
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f7caf78 to
1731a5b
Compare
cd54a3a to
cfb3fb2
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1731a5b to
f59394f
Compare
…ring guard) From the multi-agent review of #396: - Skip plugin default credentials that exceed the FleetNodePairResult caps before pairing, so an oversized default can't fail ReportPairedDevices validation for the whole batch (used_username/used_password were the gap left by the earlier identity-field clamp). - requestedPairResults consumes each target on first match, so a node replaying an in-target identifier (up to its quota) can't amplify into repeated heavy persists. - Guard the AUTHENTICATION_NEEDED downgrade with DeviceHasActiveCloudPairing so a node auth-needed report can't downgrade a device that became cloud-paired since target resolution; regression test added. - AckFailure maps ACK_CODE_REPORT_FAILED to a clearer 'some may have been applied, re-list' message instead of an opaque internal error. - Correct the ListFleetNodeDiscoveredDevices limit proto comment (0 = default 1024, not 'no limit') to match the handler clamp. - Add gateway ReportPairedDevices handler tests; add the missing AAA // Arrange. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f59394f to
3b8fa24
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45d63e8c8d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c2d3d6d to
c84a540
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c84a540026
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
57954a1 to
1c24fe5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c24fe5020
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
1c24fe5 to
9bb671e
Compare
There was a problem hiding this comment.
💡 Codex Review
proto-fleet/server/cmd/fleetnode/pair.go
Lines 258 to 261 in 9bb671e
When the gateway persists only part of this chunk (for example PersistFleetNodePairResult fails for one paired miner), ReportPairedDevices returns a successful RPC with RejectedCount > 0, but this caller discards the response and later sends an OK ack for the command. In that scenario the node considers the result uploaded even though the cloud did not store/bind the miner it just paired, so the failed result is not retried; treat any rejected pair result as a report failure before acknowledging the command. Fresh evidence in this revision is that the current node code still ignores the response while the gateway now reports persistence drops via RejectedCount.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
6d87253 to
688b5e0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 688b5e05b3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
688b5e0 to
8327449
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8327449d14
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
8327449 to
230a870
Compare
230a870 to
21ce9a8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21ce9a8fe4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
e98f25f to
6e99b8c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e99b8c8f7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Cloud orchestration for pairing fleet-node-discovered miners over the ControlStream, built on the #388 foundation (AgentCommand pair arm, node pairer, SQL listing + cloud-dial guards). Server domain: - PersistFleetNodePairResult records each device's outcome in its own transaction: PAIRED creates the device, binds it to the node before marking device_pairing PAIRED (so the cloud-paired guard never sees a PAIRED-but-unbound row), seeds an ACTIVE status, and stores encrypted credentials for password-auth drivers only; AUTH_NEEDED/AUTH_FAILED persist AUTHENTICATION_NEEDED for retry; ERROR persists nothing. It never downgrades an already-PAIRED device (race with the cloud or another node) and preserves a learned serial when a report omits it. - ResolvePairTargets builds the batch from the node's not-yet-paired devices; pair-all without credentials excludes AUTHENTICATION_NEEDED rows so a capped batch can't starve never-attempted devices on re-issue for large fleets. - pairDeviceLocked is factored out of PairDevice so both share the binding logic. Registry + dispatch (authoritative, decoupled from the operator stream): - The gateway ReportPairedDevices handler is the source of truth: it admits + scopes results to the dispatched targets (consuming each to bar replay), rejects empty batches, enforces a per-command quota = target count, persists each result before acking, and forwards only successfully-persisted results (with the persisted status) for live display. - PublishPairResults / report-kind admission keep discovery and pairing reports from cross-admitting. - The operator command is dispatched on a disconnect-immune context so pairing runs to completion server-side and the gateway keeps persisting even if the operator/browser disconnects; the admin stream is a best-effort observer. Operator admin RPCs: - ListFleetNodeDiscoveredDevices (paged) and streaming PairDiscoveredDevicesOnFleetNode, gated by miner:pair + fleetnode:manage, session-only, with credential bodies redacted/suppressed in logs. - DevicePairingResult.pairing_status uses the shared fleetmanagement PairingStatus enum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6e99b8c to
e219e36
Compare
Summary
Completes pairing miners discovered via fleet nodes, stacked on the foundation in #388 . With this, an operator can take a fleet-node-discovered miner from a
discovered_devicedead-end through to a fully paired, node-owned device.What is in here
PublishPairResults).ResolvePairTargetsbuilds the batch from the node's not-yet-paired devices, andPersistFleetNodePairResultrecords each result in one transaction:PAIRED: create the device, bind it to the node (fleet_node_device) before markingdevice_pairingPAIRED so the cloud-paired guard never sees a PAIRED-but-unbound row, store encrypted credentials for basic-auth drivers, seed an ACTIVE status.AUTH_NEEDED/AUTH_FAILED: persist an AUTHENTICATION_NEEDED device for credential retry.ERROR: persist nothing.pairDeviceLockedis factored out ofPairDeviceso both share the binding logic inside a caller-owned transaction.ReportPairedDeviceshandler binds the node's results to the in-flight command and routes them up. The server-streamingPairDiscoveredDevicesOnFleetNodedispatches a pairAgentCommanddown the ControlStream and streams per-device outcomes as it persists them (mirroringDiscoverOnFleetNode);ListFleetNodeDiscoveredDevicesenumerates candidates.fleetnode:manage./Gated byfleetnode:read(listing) andfleetnode:manage(pairing);PairDiscoveredDevicesOnFleetNodeadditionally requiresminer:pair, consistent with every other miner-onboarding path (e.g. ForemanImport), since it authenticates and onboards miners./fleetnode:manage.Testing
go build ./...clean; clienttscclean.-race.PersistFleetNodePairResult(asymmetric PAIRED, basic-auth PAIRED with creds stored, AUTH_NEEDED then retry to PAIRED, foreign-node rejection, ERROR persists nothing) andResolvePairTargets.device_pairing = PAIRED+fleet_node_devicebinding, plus no-stream / no-targets / permission cases; gateway tests coverReportPairedDevicesrouting.The fake-rig fixtures need no change: the node calls the identical plugin
PairDevicecontract the cloud uses (only the SecretBundle source differs). A full Docker E2E run needs a livejust devstack with an enrolled, connected fleet node; the demo script is that manual-verification path.🤖 Generated with Claude Code