Skip to content

Atomic reparent RPCs + naming normalization for fleet hierarchy#430

Open
flesher wants to merge 6 commits into
mainfrom
issue-420
Open

Atomic reparent RPCs + naming normalization for fleet hierarchy#430
flesher wants to merge 6 commits into
mainfrom
issue-420

Conversation

@flesher

@flesher flesher commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the orphan window and rename race in the miner reparent flow (issue #420) and normalizes the parent-assignment surface across the fleet hierarchy.

Architectural picture. The fleet has three levels of parent-assignment (site, building, rack). Before this PR, each level had its own RPC verb, shape, and arity (some singular, one prefixed Reassign, one with an add+remove client-side orchestration). The matrix is now consistent: every parent-assignment is one Assign<Children>To<Parent> RPC, bulk-friendly (repeated child ids), and null-aware for unassignment (optional target_*_id unset → "Unassigned"). All five RPCs land their write inside a single transaction.

Orphan window + rename race. The miner→rack reparent flow previously orchestrated RemoveDevicesFromDeviceSet(source) followed by AddDevicesToDeviceSet(target) as two independent calls. A server error or network blip between them left miners removed from their source rack but never added to the target; concurrent rename of the source rack between the client's label→id resolution and the remove call caused the same partial-state outcome. The new AssignDevicesToRack RPC bundles both halves into one server-side transaction with rack identity resolved under lock, closing both holes; the client orchestration collapses from ~70 lines of remove-then-add looping to a single dispatch.

Partial-update rack→site reparent. SaveRack previously required a full round-trip of label / layout / members / slot assignments on every site change. The new AssignRacksToSite is a partial update that only touches site_id; building_id is auto-cleared on site transition (a building belongs to a single site) and the response reports cleared-building count so callers can prompt for re-assignment.

Non-goals / deferred

  • Miner→building reparent (AssignDevicesToBuilding + device.building_id column). Adding a direct FK for building requires five cascade invariants on existing RPCs (rack→building, rack→site, save rack, devices→rack, devices→site) to keep the new column in sync with the rack-derived view. The migration and cascade design exist on a working branch but are deferred to a follow-up PR to keep this one reviewable. Until then, miner→building goes through the existing rack-picker flow.
  • No production data migrations. This PR is server-code + proto only; no schema changes.
  • Naming normalization touches in-flight UI. PR feat(fleet): row-action ellipsis menus + ParentPickerModal re-parent #412's row-action modals had to be re-wired to the new RPC shapes; this PR carries those edits but the underlying UI design is unchanged.

🤖 Generated with Claude Code

flesher added 5 commits June 11, 2026 11:42
Standardizes the parent-assignment RPC verb across the fleet hierarchy:
all hierarchical assignment RPCs now use Assign* (no Reassign/Set/Move
variants). No semantic change — the existing optional target_site_id
already encodes both initial assignment and re-parenting via the same
null-equals-unassigned convention.

Sets up the naming convention used by the new RPCs landing in
subsequent commits (AssignDevicesToRack, AssignDevicesToBuilding,
AssignRacksToSite).

Refs #420.
Promotes the building→site assignment RPC to a bulk shape so operators
can move multiple buildings between sites in one atomic transaction.
A single-building caller passes a one-element `building_ids` array;
the response aggregates cascade counts across every building.

The store-level operation stays singular (one UPDATE per building);
the service layer sorts by id for deterministic lock ordering then
loops inside a single tx — if any building fails, the whole batch
rolls back.

Updates the in-flight FleetBuildingsPage row-action call site (#412)
to pass the new array shape.

Refs #420.
…ilding

Promotes the rack→building assignment RPC to a bulk shape so the
ManageBuildingModal save flow (which previously fired N concurrent
singular calls) can land an entire grid layout in two atomic batches
— vacate first, then place — instead of N independent transactions.
Single-rack callers (RacksPage row action) pass a one-element array.

Request shape: repeated RackPlacement { rack_id, optional
aisle_index, optional position_in_aisle } + optional
target_building_id. Each entry carries its own placement so a single
call can mix placed and cleared positions.

Service-layer behavior preserved per rack — building lock once,
canonical lock order (building → rack id ascending), per-rack zone
clear / cascade / position write. If any rack fails, the whole tx
rolls back.

Activity-log payload now carries rack_ids[] + aggregate cascade
counts; consumers reading the singular rack_id metadata key will need
to migrate (no existing consumers in tree).

Refs #420.
Closes the orphan window in the miner reparent flow. Previously the
client orchestrated removeDevicesFromDeviceSet(source) +
addDevicesToDeviceSet(target) as two independent RPCs; a server error
or network blip between them left miners removed from their source
rack but never added to the target. The cross-rack rename race
(source rack renamed mid-confirm) had the same failure shape via the
client-side label→id resolution.

The new RPC bundles both halves into one server-side transaction:
remove from any current rack (single SQL DELETE on
device_set_membership filtered by device_set_type='rack'), insert
into target rack, cascade device.site_id when the target rack's site
differs. target_rack_id unset clears rack membership without
re-assigning (site/building stay intact).

MinerReparentPicker's rack path collapses from ~50 lines of
remove-then-add orchestration + listRacks label resolution to a
single dispatch. Site reparent path is unchanged.

Server: new sqlc query RemoveDevicesFromAnyRack, new
CollectionStore.RemoveDevicesFromAnyRack, new
collection.Service.AssignDevicesToRack with 4 unit tests covering
happy path, unassign, target-must-be-rack, and empty-input
rejection.

Refs #420.
Adds the missing rack→site reparent RPC called out in #420. SaveRack
was a full-replace requiring the rack's label, layout, members, and
slot assignments to round-trip on every site change; the new RPC is a
partial update — only site_id changes.

Behavior:
  - target_site_id unset → racks move to "Unassigned"
  - building_id is auto-cleared on any site transition (a building
    belongs to one site, so the move invalidates building membership);
    the response carries the count of racks whose building was cleared
    so the UI can prompt the operator to pick a building in the new
    site
  - same transaction cascades device.site_id for every rack member
  - lock order: target site → each rack id ascending (matches
    AssignBuildingsToSite + AssignDevicesToSite)
  - bulk-friendly: pass repeated rack_ids; the batch rolls back on any
    per-rack failure

Plumbs collectionStore through sites.Service so the new method can
reuse LockRackPlacementForWrite + UpdateRackPlacement +
CascadeRackDeviceSites instead of duplicating the rack-placement
write path.

Server: 4 handler tests (cascade + clear building, target-unset
unassign, same-site no-op, bulk aggregation with sorted lock order),
2 service tests (empty rejection, nil-store guard). Client:
useSites().assignRacksToSite wrapper.

Refs #420.
Copilot AI review requested due to automatic review settings June 11, 2026 16:52
@flesher flesher requested a review from a team as a code owner June 11, 2026 16:52
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels Jun 11, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f38d54b85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// that previously lived in the client-side
// RemoveDevicesFromDeviceSet call; bundling it into the tx is
// what closes the orphan window described in #420.
removed, err := s.collectionStore.RemoveDevicesFromAnyRack(ctx, params.OrgID, params.DeviceIdentifiers)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid deleting target-rack memberships during reassignment

When target_rack_id is set, this removes the selected devices from every rack before re-adding them to the target. If the selection includes miners already in the target rack (for example selecting the same rack, or a bulk selection that includes existing target members), their existing membership is deleted and the rack_slot rows cascade away, then AddDevicesToCollection reinserts membership without restoring slot positions. The client even discounts current target members in its capacity check, so this path is expected to occur; exclude the target rack from the removal or only remove devices whose current rack differs.

Useful? React with 👍 / 👎.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (3421f32d5969829d5af0f0bd6cb7e07f59410ba1...5bd451c9da29151abae56cbf4e1dda143ea74b13, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] AssignDevicesToRack Clears Slot Assignments For Devices Already In The Target Rack

  • Category: Reliability
  • Location: server/internal/domain/collection/service.go:957
  • Description: The new atomic rack move deletes rack membership for every requested device before re-adding them to the target rack. If any requested device is already in the target rack, this delete removes its existing membership row. rack_slot rows are foreign-keyed to membership with ON DELETE CASCADE, so the device’s slot position is silently deleted even though the move is a no-op for that device. The frontend still sends all selected IDs to assignDevicesToRack, including current target members.
  • Impact: Bulk “move/add to rack” can permanently clear rack slot layout data for miners already in the selected target rack.
  • Recommendation: Preserve existing target-rack memberships. For example, make the delete skip device_set_id = target_rack_id, or pre-resolve current target members and only remove/reinsert devices whose current rack differs. Add a regression test covering an already-in-target miner with a rack_slot.

[MEDIUM] Bulk Assignment Counts Accumulate Across Transaction Retries

  • Category: Concurrency | Reliability
  • Location: server/internal/domain/sites/service.go:402
  • Description: The new bulk assignment paths accumulate response/audit counters in variables captured outside RunInTx. Since the SQL transactor retries retryable transaction failures, any retry after a partial in-closure accumulation can double-count rolled-back work. This pattern also appears in AssignRacksToSite and AssignRacksToBuilding.
  • Impact: Database state remains transactional, but API responses and activity logs can over-report reassigned racks/devices after deadlock or serialization retries.
  • Recommendation: Build per-attempt counters inside the transaction closure and assign them to outer variables only after the closure completes successfully, or return a result struct via RunInTxWithResult.

[MEDIUM] Existing RPC Methods Are Removed Instead Of Kept As Compatible Aliases

  • Category: Protobuf
  • Location: proto/sites/v1/sites.proto:38
  • Description: The PR renames/removes existing service methods (ReassignDevicesToSite, AssignBuildingToSite, AssignRackToBuilding) and replaces them with new method names. Existing generated clients will call the old Connect paths and receive Unimplemented.
  • Impact: Any deployed or third-party client generated from the previous proto will break on upgrade, even though the underlying operation still exists.
  • Recommendation: If client/server versions can be skewed, keep deprecated old RPCs as wrappers around the new bulk implementations, or introduce the bulk methods alongside the old methods for one release.

Notes

No changed hunks in scope touched pool URLs, wallet/payout addresses, shell execution, nmap/discovery, plugin execution, Docker/Nginx, localStorage/session token handling, or raw SQL string interpolation outside sqlc-generated queries.


Generated by Codex Security Review |
Triggered by: @flesher |
Review workflow run

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes fleet hierarchy “parent assignment” RPCs into consistent, bulk-friendly Assign<Children>To<Parent> shapes and closes the miner→rack orphan window by introducing an atomic server-side AssignDevicesToRack operation. It also adds a partial-update rack→site assignment flow and updates the ProtoFleet UI to use the new RPCs.

Changes:

  • Added atomic AssignDevicesToRack (server + client) to replace the prior client-side remove→add orchestration and eliminate partial-state orphaning.
  • Renamed/normalized site-level device + building assignment RPCs (AssignDevicesToSite, AssignBuildingsToSite) and added AssignRacksToSite.
  • Converted rack→building assignment to a bulk RPC (AssignRacksToBuilding) and rewired UI workflows to use the new request shapes.

Reviewed changes

Copilot reviewed 47 out of 59 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/sqlc/queries/site.sql Renames ReassignDevicesToSite query to AssignDevicesToSite and updates comments for renamed RPCs.
server/sqlc/queries/device_set.sql Adds RemoveDevicesFromAnyRack query to enable atomic rack reassignment.
server/internal/handlers/sites/translate.go Updates request→domain param translation for renamed/bulk site RPCs.
server/internal/handlers/sites/handler.go Renames device/site + building/site handlers and adds AssignRacksToSite handler.
server/internal/handlers/sites/handler_test.go Updates handler tests for renamed/bulk RPCs and new rack→site flow.
server/internal/handlers/sites/handler_stats_test.go Updates sites.NewService wiring after constructor signature change.
server/internal/handlers/middleware/rpc_permissions.go Updates procedure permission map for renamed/new RPC procedures.
server/internal/handlers/deviceset/handler.go Adds AssignDevicesToRack RPC handler.
server/internal/handlers/buildings/translate.go Updates translation for new bulk AssignRacksToBuilding request shape.
server/internal/handlers/buildings/handler.go Renames handler to AssignRacksToBuilding.
server/internal/handlers/buildings/handler_test.go Updates building handler tests for bulk assignment shape.
server/internal/domain/stores/sqlstores/site.go Renames store method to AssignDevicesToSite.
server/internal/domain/stores/sqlstores/site_invariant_integration_test.go Updates integration tests for renamed site assignment store call.
server/internal/domain/stores/sqlstores/collection.go Adds SQL-backed RemoveDevicesFromAnyRack store method.
server/internal/domain/stores/sqlstores/collection_site_placement_integration_test.go Updates integration tests for renamed site assignment store call(s).
server/internal/domain/stores/interfaces/site.go Renames interface method to AssignDevicesToSite and updates comments.
server/internal/domain/stores/interfaces/mocks/mock_site_store.go Regenerates mock to include AssignDevicesToSite and remove old method.
server/internal/domain/stores/interfaces/mocks/mock_collection_store.go Regenerates mock to include RemoveDevicesFromAnyRack.
server/internal/domain/stores/interfaces/collection.go Extends CollectionStore interface with RemoveDevicesFromAnyRack.
server/internal/domain/stores/interfaces/building.go Updates docs referencing renamed SiteService RPC.
server/internal/domain/sites/service.go Implements AssignBuildingsToSite (bulk) and adds AssignRacksToSite partial-update flow.
server/internal/domain/sites/service_test.go Updates unit tests for renamed/bulk site assignment flows + new rack→site behavior.
server/internal/domain/sites/service_stats_test.go Updates stats tests for sites.NewService signature change.
server/internal/domain/sites/models/models.go Renames/introduces models for bulk assignment flows + rack→site result/params.
server/internal/domain/fleetmanagement/collection_test.go Updates tests to use renamed AssignDevicesToSite store method.
server/internal/domain/collection/service.go Adds domain-level AssignDevicesToRack atomic reassignment behavior + activity logging.
server/internal/domain/collection/service_test.go Adds unit tests for atomic rack reassignment scenarios.
server/internal/domain/buildings/service.go Converts rack→building assignment to bulk AssignRacksToBuilding with validation + deterministic lock order.
server/internal/domain/buildings/service_test.go Updates tests for bulk rack→building assignment behavior.
server/internal/domain/buildings/service_stats_test.go Updates stats test comments referencing renamed SiteService RPC.
server/internal/domain/buildings/models/models.go Introduces bulk rack placement params and replaces singular assignment model.
server/generated/sqlc/site.sql.go Generated sqlc output for renamed AssignDevicesToSite query.
server/generated/sqlc/device_set.sql.go Generated sqlc output for RemoveDevicesFromAnyRack.
server/generated/sqlc/db.go Generated sqlc prepare/close wiring for new/renamed statements.
server/generated/grpc/sites/v1/sitesv1connect/sites.connect.go Generated Connect code for renamed/new site RPC procedures.
server/generated/grpc/device_set/v1/device_setv1connect/device_set.connect.go Generated Connect code for new AssignDevicesToRack RPC.
server/generated/grpc/buildings/v1/buildingsv1connect/buildings.connect.go Generated Connect code for bulk AssignRacksToBuilding RPC.
server/cmd/fleetd/main.go Updates production wiring for sites.NewService to pass collectionStore.
proto/sites/v1/sites.proto Renames ReassignDevicesToSiteAssignDevicesToSite, adds AssignBuildingsToSite + AssignRacksToSite.
proto/device_set/v1/device_set.proto Adds AssignDevicesToRack RPC and request/response messages.
proto/buildings/v1/buildings.proto Replaces singular rack→building RPC with bulk AssignRacksToBuilding + RackPlacement.
client/src/protoFleet/features/sites/hooks/useSiteModals.ts Updates inline TODO reference to renamed site assignment wrapper.
client/src/protoFleet/features/sites/hooks/useSiteModals.test.ts Updates mocks for renamed assignDevicesToSite API.
client/src/protoFleet/features/rackManagement/pages/RacksPage.tsx Rewires rack→building reparent UI to use bulk assignRacksToBuilding.
client/src/protoFleet/features/fleetManagement/pages/FleetBuildingsPage.tsx Rewires building→site reparent UI to use bulk assignBuildingsToSite.
client/src/protoFleet/features/fleetManagement/components/MinerActionsMenu/MinerReparentPicker.tsx Collapses miner→rack flow to single atomic assignDevicesToRack dispatch.
client/src/protoFleet/features/fleetManagement/components/FleetLayout/FleetLayout.test.tsx Updates sites API mock for renamed wrapper.
client/src/protoFleet/features/buildings/components/ManageBuildingModal/types.ts Updates comment to reference AssignRacksToBuilding.
client/src/protoFleet/features/buildings/components/ManageBuildingModal/ManageBuildingModal.tsx Batches rack placement/unplacement via bulk assignRacksToBuilding.
client/src/protoFleet/features/buildings/components/BuildingModals/BuildingModals.test.tsx Updates buildings API mock for renamed bulk wrapper.
client/src/protoFleet/api/useDeviceSets.ts Adds assignDevicesToRack wrapper for new atomic RPC.
client/src/protoFleet/api/sites.ts Renames site wrapper + adds bulk building/site + rack/site assignment wrappers.
client/src/protoFleet/api/generated/buildings/v1/buildings_pb.ts Generated TS protobuf types for bulk AssignRacksToBuilding.
client/src/protoFleet/api/buildings.ts Replaces singular rack→building wrapper with bulk assignRacksToBuilding + RackPlacementInput.

Comment on lines +1013 to +1018
assignedInt := int(out.assigned + out.removed)
event := activitymodels.Event{
Category: activitymodels.CategoryCollection,
Type: "assign_devices_to_rack",
Description: eventDescription,
ScopeCount: &assignedInt,
Comment on lines +473 to +480
// Lock target site first if assigning so a concurrent
// DeleteSite can't soft-delete it between the check and the
// cascade writes. target=nil/0 (Unassigned) needs no lock.
if params.TargetSiteID != nil && *params.TargetSiteID > 0 {
if err := s.store.LockSiteForWrite(txCtx, params.OrgID, *params.TargetSiteID); err != nil {
return err
}
}
CI's generated-code check failed because the local regen on the
naming-normalization commits used protoc-gen-es 2.11.0 (a sibling
worktree's cached node_modules) instead of the pinned 2.12.0. The
2.12.0 output annotates optional Timestamp fields as `Timestamp |
undefined` rather than `Timestamp?`. Also fixes a prettier ternary
formatting issue prettier --write applied to MinerReparentPicker.tsx
during CI's gen pass that didn't run locally.

No semantic change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants