Skip to content

feat(fleetnode): server-initiated discovery via ControlStream#235

Merged
ankitgoswami merged 1 commit into
mainfrom
ankitg/discovery-pairing
Jun 2, 2026
Merged

feat(fleetnode): server-initiated discovery via ControlStream#235
ankitgoswami merged 1 commit into
mainfrom
ankitg/discovery-pairing

Conversation

@ankitgoswami

@ankitgoswami ankitgoswami commented May 14, 2026

Copy link
Copy Markdown
Contributor

1. Overview

This PR adds the server-initiated discovery path, where
an operator in the web UI can kick off a scan on a chosen fleet node and watch
results stream in live.

The cast

Actor What it is Where it runs
Operator A person managing the fleet Web browser
ProtoFleet The fleet-management web app Browser
fleetd The cloud backend ("the server") Cloud
Fleet node agent The on-prem program that scans the LAN Inside the miner network
Miner plugins Per-vendor drivers that know how to talk to a miner Alongside the agent
Miners The actual machines being discovered On the LAN
flowchart LR
    Operator((Operator)) -->|web UI| ProtoFleet
    ProtoFleet -->|RPC over internet| fleetd[fleetd cloud server]
    fleetd -. "cannot reach directly" .-> Miners
    fleetd <-->|persistent control stream| Agent[Fleet node agent]
    subgraph LAN["Private miner network"]
        Agent -->|miner plugins| Miners[(Miners)]
    end
Loading

The dashed line is the whole point: the cloud cannot dial the miners. The
agent can.


2. Fleet node lifecycle

Before a node can scan anything, it has to be enrolled and trusted. This is a
one-time setup per node.

stateDiagram-v2
    [*] --> PENDING: agent starts enrollment
    PENDING --> AWAITING_CONFIRMATION: identity handshake completes
    AWAITING_CONFIRMATION --> CONFIRMED: operator approves in the UI
    CONFIRMED --> REVOKED: operator revokes
    REVOKED --> [*]
Loading
  1. Enroll. The operator generates a short-lived enrollment code in the
    UI. They paste it into the agent, which performs an identity handshake with
    the server (proving it holds a private key it generated locally).
  2. Confirm. The new node shows up in the UI as awaiting confirmation. The
    operator approves it. The server mints an API key the agent stores
    locally. Only confirmed nodes can be asked to scan.
  3. Run. The agent keeps two things going continuously (see below):
    heartbeats and the control stream. It refreshes its session credentials
    automatically before they expire.
  4. Revoke. The operator can revoke a node at any time. The node can no
    longer authenticate, and its device pairings are released.

Everything in this guide assumes the node is CONFIRMED and running.


3. The control stream: the backbone

A running agent opens one long-lived, two-way connection to the server called
the control stream. Think of it as an always-open phone line:

  • The server speaks down it to push commands ("go scan these addresses").
  • The agent speaks up it to acknowledge commands and report progress.

Key properties:

  • One command at a time per node. If a second command arrives while one is
    running, the agent answers "busy" and the operator is told to retry shortly.
  • Newest connection wins. If an agent reconnects (e.g. after a network
    blip), the new connection replaces the old one cleanly.
  • Self-healing. If the connection drops, the agent reconnects with an
    increasing backoff delay so a flapping network does not hammer the server.
  • In-memory only. The server tracks active connections in memory. This is a
    single-server design; it is not yet built for a horizontally-scaled cloud.

Separately, the agent sends a periodic heartbeat so the server knows it is
alive even when no scan is running.


4. The discovery workflow

This is the main feature. An operator starts a scan and watches results arrive
in real time until the scan finishes.

sequenceDiagram
    participant Op as Operator (ProtoFleet)
    participant S as fleetd (cloud)
    participant A as Fleet node agent
    participant M as Miners (LAN)

    Op->>S: DiscoverOnFleetNode(node, request)
    Note over S: Validate request,<br/>compute allowed "scope",<br/>assign a command id
    S->>A: command (over control stream)
    alt agent already busy
        A-->>S: ack BUSY
        S-->>Op: "node busy, retry shortly"
    else accepted
        loop for each address:port
            A->>M: probe / nmap
        end
        loop results in batches
            A->>S: ReportDiscoveredDevices(batch)
            Note over S: Drop anything out of scope,<br/>not a private IP, or over quota
            S-->>Op: stream accepted devices
        end
        A-->>S: final ack (OK or PARTIAL)
        S-->>Op: stream completes
    end
Loading

Step by step:

  1. Operator starts it. From the UI, the operator picks a fleet node and a
    scan request, then calls DiscoverOnFleetNode. This requires the
    fleetnode:manage permission. It is a streaming call: the connection
    stays open and devices appear as they are found.
  2. Server validates and dispatches. Before touching the agent, the server
    checks the request is sane (see Guardrails), generates a
    unique command id, computes the scope (exactly which addresses and
    ports the agent is allowed to report for this command), and sends the command
    down the control stream.
  3. Agent scans. The agent probes each address/port, either by asking the
    miner plugins to connect and identify the device, or by running an nmap
    sweep. Probes run in parallel with a per-probe time limit so one slow address
    cannot stall the whole scan.
  4. Agent reports in batches. As devices are found, the agent uploads them
    via ReportDiscoveredDevices, tagged with the command id. It does not
    wait until the end; results stream up incrementally.
  5. Server filters and forwards. For every reported device the server
    enforces the guardrails, stores the ones it accepts, and streams those back
    to the operator's open connection.
  6. Scan ends. The agent sends a final acknowledgement: OK (clean
    finish) or PARTIAL (some results uploaded, but it ran out of time or hit
    a limit). The operator's stream then completes.

Scan modes

The operator chooses how to describe what to scan:

Mode Meaning Notes
IP list An explicit list of addresses Each entry must be a valid IP or hostname
IP range A start/end address range Expanded into a list before scanning; network/gateway addresses are skipped
nmap target An nmap-style target (single IP, CIDR block, or A.B.C.D-N range) CIDR breadth is capped; IPv6 CIDR and multi-octet ranges are rejected
mDNS (multicast discovery) Not supported on fleet nodes; rejected up front

Ports can be specified explicitly, or left empty to use each plugin's default
discovery ports.


5. How devices are identified

Every discovered device needs a stable identifier so that re-scanning the
same network updates the existing record instead of creating duplicates. The
agent picks one in this order:

  1. MAC address (mac:...) if the device reports one.
  2. Serial number (serial:...) if it reports one.
  3. A synthesized fingerprint (auto:...) otherwise: a hash of the fleet
    node's identity plus the address, port, and device type.

The fleet node's identity is mixed into that fingerprint on purpose. Two
different networks often reuse the same private address ranges (for example,
both using 192.168.1.x). Including the node identity stops a miner on one
network from being mistaken for a different miner at the same address on another
network.


6. Discovered devices, attribution, and pairing

Attribution

Every discovered device record remembers which fleet node found it
("attribution"). This single fact drives an important safety rule.

The cloud-exclusion rule (important)

The cloud has its own, separate discovery path for miners it can reach
directly. To prevent the cloud from ever trying to dial a private LAN address it
cannot actually reach (and should not be probing), the rule is:

Devices found by a fleet node are excluded from the cloud's own
dial-the-miner discovery list.

In other words, once a device is attributed to a fleet node, only that node's
reports (and the pairing flow) touch it. The cloud leaves it alone.

Pairing

To actively manage a discovered device through a fleet node, the operator
pairs it (PairDeviceToFleetNode). Pairing rules:

  • The node must be confirmed.
  • The device must not already be cloud-paired. (If it is, the operator is
    told to unpair it from the cloud first. Allowing both would leave the node
    unable to refresh the device while the system reported it as fleet-node
    managed.)
  • Pairing transfers attribution to that node, so the node's future scans
    keep the record fresh.
flowchart TD
    Found[Device discovered by Node A] -->|attributed to A| Excluded[Hidden from cloud dial list]
    Found -->|operator pairs to A| Paired[Managed via Node A]
    Paired -->|operator unpairs| Found
Loading

7. Guardrails

Because a fleet node runs on someone's network and reports back over the
internet, the server treats every report as untrusted input. These checks
all live on the server side, so a buggy or compromised agent cannot bypass them.

Guardrail What it does Why it matters
Permissions DiscoverOnFleetNode, pairing, confirm, and revoke require fleetnode:manage; listing requires fleetnode:read Only authorized operators can drive a node
Scope enforcement The server records exactly which addresses/ports a command covers, then drops any reported device outside that set A node cannot report devices it was never asked to scan
Endpoint stamping The agent records the address/port it actually probed, ignoring whatever the plugin claims A misbehaving plugin cannot smuggle in a spoofed address
Private-IP only Reported addresses must be in private ranges (RFC1918 / RFC4193) Keeps discovery confined to local networks
URL-scheme validation A device's link scheme must match the standard URL grammar; the clickable web link is further limited to http/https Stops injection (e.g. javascript:) reaching the operator's browser
Per-command quota A command can report at most (addresses x ports) devices Bounds a runaway or hostile agent
Scan limits Caps on number of targets and ports; nmap CIDR breadth capped A single command cannot launch an enormous scan
Timeouts The agent has a scan budget; the server waits a bit longer, then gives up A silent or stuck node cannot tie up resources forever
One command per node A second concurrent command is refused as "busy" Predictable behavior and resource use

8. Outcomes and what the operator sees

Each scan ends with a coded acknowledgement. The server translates these into
clear outcomes for the operator:

Agent result Operator sees Meaning / next step
OK Success Scan completed, all results delivered
PARTIAL Partial success Some devices were delivered, but the scan hit its time or work budget before finishing
BUSY "Node busy, retry shortly" The node is already running a command; try again in a moment
BAD_REQUEST Invalid request The request was malformed; fix and resend
AGENT_INCAPABLE "Try another node" This node cannot perform this kind of scan; retrying it will not help
REPORT_FAILED Failure The scan ran, but results could not be uploaded
(silence past the timeout) Timed out The node never finished; the command is abandoned server-side

9. Tricky lifecycle cases worth knowing

  • Replacing a node. If a node that discovered devices is revoked and a
    replacement node is enrolled, the replacement can reclaim those device
    records the first time it re-scans and finds the same machines. They do not
    get stuck pointing at the dead node. Attribution simply moves to the new node,
    so the cloud-exclusion rule still holds and the cloud never starts dialing
    those addresses.
  • Cloud-paired devices. A device already managed directly by the cloud
    cannot be paired to a fleet node until it is unpaired from the cloud first
    (see Pairing).
  • Overlapping address ranges. Two networks using the same private IPs stay
    distinct because the fleet node identity is folded into synthesized device
    fingerprints (see How devices are identified).

10. Orientation map

If you want to trace the feature in the codebase, the moving parts are:

Piece Roughly where it lives
Operator-facing RPCs (discover, pair, confirm, revoke) server/internal/handlers/fleetnode/admin/
Agent-facing RPCs (control stream, heartbeat, report) server/internal/handlers/fleetnode/gateway/
In-memory registry of active control streams server/internal/domain/fleetnode/control/
Pairing and discovered-device rules server/internal/domain/fleetnode/pairing/
The fleet node agent itself server/cmd/fleetnode/
Scan-request validation (limits, nmap grammar) server/internal/domain/discoverylimits/, server/internal/domain/nmaptarget/
Database changes server/migrations/000069_*, server/migrations/000070_*

Test plan

  • go build ./... and go vet ./...
  • golangci-lint clean on changed packages
  • DB_PASSWORD=fleet go test -race across the registry, discovery, pairing, and handler packages (registry race tests, command-id binding, quota, scope binding, attribution transfer, and the DB-backed admin/gateway suites)
  • CI green

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 14, 2026 18:02
@ankitgoswami ankitgoswami requested a review from a team as a code owner May 14, 2026 18:02
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels May 14, 2026
@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (f57be3ca649978744796e767726f977a6882430f...492e2e6e419777a30240121e36cafb6ef8a444d4, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] Hostname Discovery Bypasses Private-Address Enforcement

  • Category: Network Discovery
  • Location: server/internal/handlers/fleetnode/admin/handler.go:328
  • Description: normalizeDiscoverRequest rejects public IP literals but allows any syntactically valid hostname because it is resolved agent-side. The agent then resolves IP-list hostnames and probes the result, and nmap mode substitutes resolved hostname IPs before scanning. nmapTargetIsPrivate also returns true for hostnames, and hostname report scopes accept any reported IP on matching ports.
  • Impact: A hostname resolving to 169.254.169.254, public IPs, or other non-private infrastructure bypasses the intended private-only discovery policy. Even if later reports are rejected, the fleet node has already performed probes/scans. A compromised fleet node can also use hostname-scoped commands to report arbitrary private endpoints on allowed ports.
  • Recommendation: Resolve and validate hostnames before probing/scanning. Either reject unresolved hostnames, pass validated literal IPs to the agent, or enforce private/non-link-local checks agent-side immediately after resolution. Scope reports to the resolved address set rather than falling back to port-only matching.

[MEDIUM] Fleet-Node Reports Can Claim Unowned Discovery Rows By Identifier Alone

  • Category: Plugin
  • Location: server/sqlc/queries/fleetnodepairing.sql:35
  • Description: UpsertDiscoveredDeviceFromFleetNode allows a fleet-node report to update a conflicting discovered_device row whenever discovered_by_fleet_node_id IS NULL, then rewrites its endpoint and attribution. The conflicting key is device_identifier, which is untrusted plugin/fleet-node input.
  • Impact: A buggy or malicious fleet node/plugin can spoof an existing unpaired device identifier and take over that discovery row, changing its IP/port/metadata and removing it from cloud-local pairing flows. This can misroute operator pairing/control to the wrong miner.
  • Recommendation: Do not reattribute null-attributed rows based only on reported identifier. Require endpoint match, verified MAC/serial match, or explicit operator linking; otherwise reject the conflict or create a separate fleet-node-scoped discovery record.

[MEDIUM] Fleet-Node Pairing Can Race Cloud Pairing

  • Category: Concurrency
  • Location: server/internal/domain/fleetnode/pairing/service.go:71
  • Description: The new cloud-pairing exclusion checks DeviceHasActiveCloudPairing before inserting into fleet_node_device, but it does not lock the device or enforce mutual exclusion at the database level. A concurrent cloud pairing can still insert/update device_pairing after the check.
  • Impact: The same miner can end up both cloud-paired and fleet-node-paired, leaving conflicting ownership/control paths and causing future fleet-node refreshes to be rejected by the upsert guard.
  • Recommendation: Enforce exclusivity atomically: lock the device row during both pairing flows, use an INSERT ... SELECT ... WHERE NOT EXISTS under the same lock, or add a DB-level invariant/constraint plus reciprocal checks in cloud pairing.

Notes

No cryptostealing or pool-hijack behavior was evident in the reviewed diff. Generated protobuf/sqlc changes were considered only as they reflected source changes.


Generated by Codex Security Review |
Triggered by: @ankitgoswami |
Review workflow run

chatgpt-codex-connector[bot]

This comment was marked as duplicate.

This comment was marked as outdated.

@ankitgoswami ankitgoswami marked this pull request as draft May 14, 2026 18:10
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch from 9839915 to 0f7b702 Compare May 14, 2026 22:37
@ankitgoswami ankitgoswami changed the title feat(fleetnode): discovery + pairing E2E (server + agent) feat(fleetnode): discovery + pairing E2E (server) May 18, 2026
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch from b4d4ad4 to be68841 Compare May 19, 2026 21:01
@github-actions github-actions Bot removed javascript Pull requests that update javascript code client shared labels May 19, 2026
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch from e5c338a to 717ac76 Compare May 27, 2026 17:34
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch from 65c5fa2 to e48be40 Compare May 28, 2026 17:26
ankitgoswami added a commit that referenced this pull request May 28, 2026
Three findings from a fresh re-review of PR #235 after the server/agent split (the original Codex inline comments on this PR target a file that now lives on a different branch).

- ProcedurePermissions catalog drift: Pair/Unpair/ListFleetNodeDevices/DiscoverOnFleetNode were listed as "UNIMPLEMENTED STUB" in ProceduresPendingMigration even though their handlers are fully implemented and gated via RequirePermission. The contract test reads this map as the source of truth, so a regression that dropped the gate would have gone unnoticed. Moved all four entries into ProcedurePermissions with the right key (manage / read).
- Unbounded IPList / ports counts in DiscoverOnFleetNode: an operator with fleetnode:manage could submit 1M IPs * 10k ports. Added (buf.validate.field).repeated.max_items on the proto (4096 on ip_addresses, 256 on ports in all three modes) plus defense-in-depth maxIPListEntries / maxPortsPerMode checks in normalizeDiscoverRequest so the limits hold even if the validator interceptor is misconfigured.
- Silent event drop in fleetnodecontrol.Registry: the non-blocking publish to a 16-slot buffer dropped batches silently when the operator stream fell behind. Bumped the buffer to 64 and added an atomic dropped-event counter exposed via Registry.DroppedEvents so callers and tests have a signal that batches were lost.

Two items deliberately deferred (see PR description): RFC1918 / private-range gating on discovery targets, and surfacing the dropped-batch count to the operator UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added javascript Pull requests that update javascript code client shared labels May 28, 2026
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch from 8f24a99 to 9254c2d Compare May 28, 2026 20:54
ankitgoswami added a commit that referenced this pull request May 28, 2026
Three findings from a fresh re-review of PR #235 after the server/agent split (the original Codex inline comments on this PR target a file that now lives on a different branch).

- ProcedurePermissions catalog drift: Pair/Unpair/ListFleetNodeDevices/DiscoverOnFleetNode were listed as "UNIMPLEMENTED STUB" in ProceduresPendingMigration even though their handlers are fully implemented and gated via RequirePermission. The contract test reads this map as the source of truth, so a regression that dropped the gate would have gone unnoticed. Moved all four entries into ProcedurePermissions with the right key (manage / read).
- Unbounded IPList / ports counts in DiscoverOnFleetNode: an operator with fleetnode:manage could submit 1M IPs * 10k ports. Added (buf.validate.field).repeated.max_items on the proto (4096 on ip_addresses, 256 on ports in all three modes) plus defense-in-depth maxIPListEntries / maxPortsPerMode checks in normalizeDiscoverRequest so the limits hold even if the validator interceptor is misconfigured.
- Silent event drop in fleetnodecontrol.Registry: the non-blocking publish to a 16-slot buffer dropped batches silently when the operator stream fell behind. Bumped the buffer to 64 and added an atomic dropped-event counter exposed via Registry.DroppedEvents so callers and tests have a signal that batches were lost.

Two items deliberately deferred (see PR description): RFC1918 / private-range gating on discovery targets, and surfacing the dropped-batch count to the operator UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch 2 times, most recently from 27fec06 to 764d1a2 Compare June 1, 2026 21:35
@ankitgoswami ankitgoswami marked this pull request as draft June 1, 2026 21:41
@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch 8 times, most recently from 25676b7 to 2c33c35 Compare June 1, 2026 23:24
@ankitgoswami ankitgoswami changed the title feat(fleetnode): server-initiated discovery via ControlStream [PR 2/2] feat(fleetnode): server-initiated discovery via ControlStream Jun 1, 2026
@ankitgoswami ankitgoswami marked this pull request as ready for review June 1, 2026 23:39
chatgpt-codex-connector[bot]

This comment was marked as outdated.

@ankitgoswami ankitgoswami force-pushed the ankitg/discovery-pairing branch 3 times, most recently from 0e240cd to a6605bd Compare June 2, 2026 05:50
chatgpt-codex-connector[bot]

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa3cf42025

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/handlers/fleetnode/admin/handler.go
Comment thread server/internal/handlers/fleetnode/admin/handler.go Outdated
PR 2 of a stack. Layers operator-initiated discovery on top of the
pairing + agent-reporting surface in PR 1 (#332). Builds on the existing
fleetnodepairing.UpsertDiscoveredDevices ingestion path; an in-memory
registry correlates server-issued ControlCommand requests with the
agent's eventual ReportDiscoveredDevices batches.

What's in this PR:

- fleetnodecontrol.Registry: single-instance in-memory map of
  fleet_node_id -> active ControlStream + per-command_id event channel
  (CommandEvent { Batch | Ack }). Newest-wins eviction signaled via a
  done channel (so outgoing channel is never closed under a publisher);
  Send selects on done to bail cleanly. Publishers hold the mutex
  through the bounded non-blocking send to avoid panicking on a
  closed channel when cleanup races. Dropped-event counter on a 64-slot
  buffer, exposed via DroppedEvents().

- FleetNodeGateway.ControlStream: bidi handler. Hello receive is wrapped
  in a 5s timeout (HelloTimeout var) so an authenticated-but-idle agent
  cannot hold a server goroutine + HTTP/2 stream indefinitely. After
  Hello, registers the stream and pumps outgoing ControlCommand
  requests + incoming ControlAck responses through a side goroutine
  (2-buffer to avoid linger on exit).

- ReportDiscoveredDevices: rejects reports without a command_id or
  whose command_id is not in flight for this fleet_node (binds to
  server-issued ControlCommand). UpsertDiscoveredDevices now returns
  acceptedIdx []int instead of an opaque count; only the rows the store
  actually accepted are forwarded to the operator's command stream so
  ownership-rejected rows can't leak.

- FleetNodeAdmin.DiscoverOnFleetNode: operator-facing streaming RPC.
  Validates target is CONFIRMED, normalizes IPRange to IPList (capped
  at 4096 expanded addresses), rejects MDNS, forwards IPList/Nmap.
  Wraps the operator ctx with DiscoverCommandTimeout (5m default, var
  for test override) so a buggy/silent agent cannot pin operator
  streams + registry entries forever. Returns CodeDeadlineExceeded on
  timeout. Uses id.GenerateID() for command_id and proto.Marshal for
  the payload.

- discovered_by_fleet_node_id is immutable origin tracking. Set on
  first agent report; never cleared by PairDevice / UnpairDevice /
  RevokeFleetNode. Cloud-side pairing.PairDevices refuses to dial any
  discovered_device with DiscoveredByFleetNodeID != nil so an
  agent-reported private IP cannot redirect cloud credentialing later.
  Migration 000064 adds the column + FK + partial index.

- UpsertDiscoveredDeviceFromFleetNode reconciles auto:* identifiers
  per (fleet_node, ip, port) endpoint so re-keyed scans collapse onto
  one row; mac:/serial: identifiers pass through unchanged.

- pairing.proto: buf.validate count caps on DiscoverRequest modes
  (4096 IPs, 256 ports per mode).

- middleware: DiscoverOnFleetNode gated on fleetnode:manage.

Review fixes folded in:

- Migration 000065 widens discovered_device.url_scheme from VARCHAR(10)
  to VARCHAR(32) to match the gateway proto's advertised max_len. Schemes
  of 11-32 chars (e.g. "stratum+tcp") passed validation but overflowed the
  column, failing the whole batch as an internal error.

- UpsertDiscoveredDevices tallies accepted/rejected into per-attempt
  locals reset on closure entry, so a RunInTx retry after a retryable
  Postgres/commit failure can no longer double-count a batch. Adds a unit
  test for the retry path and a DB-backed test for the 32-char scheme.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@mcharles-square mcharles-square left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A synthesized fingerprint (auto:...) otherwise: a hash of the fleet
node's identity plus the address, port, and device type.

I wonder if this is too forgiving/brittle? would it be better to bail if devices cant cant be identified but mac address or serial? How is this handled by other MMSs?

@ankitgoswami

Copy link
Copy Markdown
Contributor Author

A synthesized fingerprint (auto:...) otherwise: a hash of the fleet
node's identity plus the address, port, and device type.

I wonder if this is too forgiving/brittle? would it be better to bail if devices cant cant be identified but mac address or serial? How is this handled by other MMSs?

antminers don't advertise MAC or serial via an unauthenticated API. Foreman actually asks for credentials even before discovery so that's a very different UX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants