Skip to content

refactor: Migrate remaining ExpectedMachine BeginTx sites to WithTx#557

Open
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:with-tx-expectedmachine
Open

refactor: Migrate remaining ExpectedMachine BeginTx sites to WithTx#557
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:with-tx-expectedmachine

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented May 21, 2026

Description

Picks up the remaining BeginTx sites left over in expectedmachine.go after the original migration (Create and Update flows).

In this PR:

  • Both call sites use WithTxResult naturally.
  • Moved Create validation reads out of the tx (for existingMachinesOnSite and existingSkus).
  • Moved Update validation reads out of the tx (for requestedExpectedMachine, expectedMachinesOnSite, and skus).
  • Errors switched to NewAPIError.

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • Flow - Flow service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@chet chet requested a review from a team as a code owner May 21, 2026 19:28
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Review Change Stack

Summary by CodeRabbit

  • Refactor
    • Improved batch machine create/update flows to separate validation from write operations, use more robust workflow identifiers, and add stronger checks on created/updated items—resulting in more reliable batch processing, safer error detection, and reduced risk of inconsistent outcomes.

Walkthrough

The PR refactors batch create and batch update handlers for ExpectedMachine to decouple validation reads (site resolution, MAC/serial unicity, SKU existence) from transactional writes. Validation queries run outside transactions; DB writes and Temporal workflow scheduling occur inside cdb.WithTxResult callbacks. Workflow IDs now use UUID suffixes and correlation uses ExpectedMachineID.

Changes

ExpectedMachine Batch Transaction Refactor

Layer / File(s) Summary
Batch Create Refactor: Validation and Write Transaction Pattern
api/pkg/api/handler/expectedmachine.go
Import cleanup removes database/sql. MAC/serial unicity and SKU existence validation queries moved outside the write transaction. Write phase consolidated into cdb.WithTxResult callback, incorporating CreateMultiple, credential correlation by ExpectedMachineID, Temporal workflow scheduling with UUID workflow ID, and transaction finalization.
Batch Update Refactor: Validation and Write Transaction Pattern
api/pkg/api/handler/expectedmachine.go
Requested machines loaded to resolve SiteID; site-scoped unicity and SKU existence validation performed outside the write transaction. Update phase consolidated into cdb.WithTxResult callback, incorporating UpdateMultiple, defensive credential correlation by ExpectedMachineID, Temporal workflow scheduling with UUID workflow ID, and transaction finalization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the primary change: migrating remaining BeginTx call sites in the ExpectedMachine handler to the WithTx pattern.
Description check ✅ Passed The description clearly outlines the refactoring scope, specific changes made, and affected flows, providing sufficient context for understanding the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-21 19:29:34 UTC | Commit: c8defbc

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/pkg/api/handler/expectedmachine.go (1)

1031-1038: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't cap the SKU lookup to the request count unless the query is filtered by those SKU IDs.

SkuDAO.GetAll here is only scoped by SiteIDs, so Limit: len(requestedSkuIDs) can return an arbitrary subset of the site's SKUs. On sites with more SKUs than the batch references, valid skuID values can be rejected as missing.

Proposed fix
- existingSkus, _, err := skuDAO.GetAll(ctx, nil, cdbm.SkuFilterInput{
- 	SiteIDs: []uuid.UUID{siteID},
- }, paginator.PageInput{
- 	Limit: cdb.GetIntPtr(len(requestedSkuIDs)),
- })
+ existingSkus, _, err := skuDAO.GetAll(ctx, nil, cdbm.SkuFilterInput{
+ 	SiteIDs: []uuid.UUID{siteID},
+ }, paginator.PageInput{
+ 	// We are not filtering by requested SKU IDs here, so we need the full site set.
+ 	Limit: cdb.GetIntPtr(paginator.TotalLimit),
+ })

Apply the same change in both batch-create and batch-update paths. If cdbm.SkuFilterInput supports filtering by specific SKU IDs, that would be even better than loading the full site set.

Also applies to: 1436-1443

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/pkg/api/handler/expectedmachine.go` around lines 1031 - 1038, The SKU
lookup is incorrectly limited by Limit: cdb.GetIntPtr(len(requestedSkuIDs))
while SkuDAO.GetAll is only filtered by SiteIDs, which can return a subset and
cause missing validations; update the calls around SkuDAO.GetAll (used in the
batch-create and batch-update paths) to remove or increase the Limit so you
fetch all SKUs for the site, or—preferably—add filtering by the specific SKU IDs
via cdbm.SkuFilterInput (e.g., include SKUIDs: requestedSkuIDs) so SkuDAO.GetAll
returns precisely the referenced SKUs before comparing against requestedSkuIDs
(apply the same change at both occurrences).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/pkg/api/handler/expectedmachine.go`:
- Around line 1179-1182: The code currently only warns when
workflowResult.GetResults() length differs from len(createdMachines) and still
commits; change this to fail the transaction and return an error when
AcceptPartialResults is false so the operation is all-or-nothing: after
receiving workflowResult in the create path (where createdMachines is used)
check if len(workflowResult.GetResults()) != len(createdMachines) and if
AcceptPartialResults == false, abort the transaction and return an error
(instead of logger.Warn()), and make the identical change in the batch-update
branch (the other branch handling the update at the referenced lines) to ensure
both handlers enforce the same all-or-nothing behavior.

---

Outside diff comments:
In `@api/pkg/api/handler/expectedmachine.go`:
- Around line 1031-1038: The SKU lookup is incorrectly limited by Limit:
cdb.GetIntPtr(len(requestedSkuIDs)) while SkuDAO.GetAll is only filtered by
SiteIDs, which can return a subset and cause missing validations; update the
calls around SkuDAO.GetAll (used in the batch-create and batch-update paths) to
remove or increase the Limit so you fetch all SKUs for the site,
or—preferably—add filtering by the specific SKU IDs via cdbm.SkuFilterInput
(e.g., include SKUIDs: requestedSkuIDs) so SkuDAO.GetAll returns precisely the
referenced SKUs before comparing against requestedSkuIDs (apply the same change
at both occurrences).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 789ac409-e7e6-4306-9324-d3a899290ef3

📥 Commits

Reviewing files that changed from the base of the PR and between f813f8d and c8defbc.

📒 Files selected for processing (1)
  • api/pkg/api/handler/expectedmachine.go

Comment thread api/pkg/api/handler/expectedmachine.go
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 66 4 34 18 2 8
nico-nsm 82 2 28 43 9 0
nico-psm 67 4 35 18 2 8
nico-rest-api 100 6 53 30 3 8
nico-rest-cert-manager 65 4 34 18 1 8
nico-rest-db 66 4 34 18 2 8
nico-rest-site-agent 65 4 34 18 1 8
nico-rest-site-manager 65 4 34 18 1 8
nico-rest-workflow 67 4 35 18 2 8
TOTAL 643 36 321 199 23 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Picks up the remaining `BeginTx` sites left over in `expectedmachine.go` after the original migration (Create and Update flows).

In this PR:
- Both call sites use `WithTxResult` naturally.
- Moved Create validation reads out of the tx (for `existingMachinesOnSite` and `existingSkus`).
- Moved Update validation reads out of the tx (for `requestedExpectedMachine`, `expectedMachinesOnSite`, and `skus`).
- Errors switched to `NewAPIError`.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet force-pushed the with-tx-expectedmachine branch from c8defbc to 0048c34 Compare May 21, 2026 20:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/pkg/api/handler/expectedmachine.go (1)

1076-1076: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Error message incorrectly references "update" in the create handler.

This error message states "Failed to validate Expected Machine update data" but resides within CreateExpectedMachinesHandler. The message should reference "create" or "creation" for consistency and to avoid confusion during debugging.

Proposed fix
-		return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Failed to validate Expected Machine update data", validationErrors)
+		return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Failed to validate Expected Machine create data", validationErrors)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/pkg/api/handler/expectedmachine.go` at line 1076, The error message in
CreateExpectedMachinesHandler incorrectly says "Failed to validate Expected
Machine update data"; update the message passed to cutil.NewAPIErrorResponse in
CreateExpectedMachinesHandler to reference "create" or "creation" (e.g., "Failed
to validate Expected Machine creation data") so it accurately reflects the
handler's purpose while leaving the validationErrors payload unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@api/pkg/api/handler/expectedmachine.go`:
- Line 1076: The error message in CreateExpectedMachinesHandler incorrectly says
"Failed to validate Expected Machine update data"; update the message passed to
cutil.NewAPIErrorResponse in CreateExpectedMachinesHandler to reference "create"
or "creation" (e.g., "Failed to validate Expected Machine creation data") so it
accurately reflects the handler's purpose while leaving the validationErrors
payload unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5eb483ca-8236-430a-9dca-637423e9386f

📥 Commits

Reviewing files that changed from the base of the PR and between c8defbc and 0048c34.

📒 Files selected for processing (1)
  • api/pkg/api/handler/expectedmachine.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant