Skip to content

ci: REST API CI migration + per-workflow aggregators (#1756)#1800

Open
lachen-nv wants to merge 23 commits into
NVIDIA:mainfrom
lachen-nv:chore/rest-ci-test
Open

ci: REST API CI migration + per-workflow aggregators (#1756)#1800
lachen-nv wants to merge 23 commits into
NVIDIA:mainfrom
lachen-nv:chore/rest-ci-test

Conversation

@lachen-nv
Copy link
Copy Markdown
Contributor

@lachen-nv lachen-nv commented May 19, 2026

Description

Migrates the REST API GitHub Actions from the standalone nico-rest repo into infra-controller-core, and refactors the CI infrastructure so Core CI (Carbide CI) and REST CI (NICo REST CI) can coexist as two independent workflows on the same repo with stable required checks.

Issue: #1756.

What changed

REST CI workflows copied over (new files)

  • .github/workflows/rest-ci.yml — top-level REST CI orchestrator
  • .github/workflows/rest-prepare-build-info.yml — version + build metadata
  • .github/workflows/rest-lint-and-test.yml — Go lint + unit tests
  • .github/workflows/rest-build-binaries.yml — Go cross-compile (linux-amd64, linux-arm64, darwin-arm64)
  • .github/workflows/rest-build-push-docker.yml — 9 nico-* container images
  • .github/workflows/rest-build-push-service.yml — service image plumbing
  • .github/workflows/rest-helm-workflows.yml — Helm chart validate + push

Dual-pipeline gating

  • Added changes job to both ci.yaml and rest-ci.yml using dorny/paths-filter@v3 with:
    • base: main + fetch-depth: 0 + PR-refs-only if: (dorny only runs on pull-request/N mirror refs from copy-pr-bot; main/release/tag short-circuit to run_*_ci=true)
    • predicate-quantifier: every on the core side with negative filters to catch any non-rest path
  • Gate output gates prepare; downstream jobs skip via needs.prepare.result == 'success' cascade
  • Commit-message escape hatch ci-run-complete-pipeline forces both pipelines

Aggregator pattern (stable required check per workflow)

  • New carbide-ci-pass (in ci.yaml) — needs: [changes, prepare, build-release-container-x86_64, build-release-container-aarch64, security-secret-scan, lint-police]
  • New rest-ci-pass (in rest-ci.yml) — needs: [changes, prepare, lint-and-test, build-binaries, security-secret-scan, build-and-push, helm]
  • Both use if: always() + jq logic: skipped counts as pass, failure/cancelled fail the aggregator
  • Designed so the branch protection ruleset can require ONE context per workflow

REST helm aligned to core's pattern

  • Dropped detect-changes and validate-versions jobs (no more manual Chart.yaml bump check)
  • validate-charts now runs unconditionally on every PR
  • push-charts gate: !cancelled() && event != schedule && event != workflow_dispatch && validate.success && !pull-request
  • Chart version source: prepare.outputs.helm_version (git-describe-derived, SemVer-strict 1.6.0-3.gabc1234) — replaces reading version: from Chart.yaml
  • helm-package-push action SHA pinned to match core (7de61972...)
  • Side benefit: fixes pre-existing bug where helm_version only read nico-rest/Chart.yaml, ignoring nico-rest-site-agent

Misc

  • Renamed REST TruffleHog job to REST Secret Scan with TruffleHog to avoid collision with core's same-named job in the PR checks UI
  • Added name: Detect Carbide CI Gate to core changes job (UI alignment with REST's Detect REST CI Gate)
  • Reverted root VERSION file; both core and REST now use git describe
  • Switched REST test execution to make rest-api/test-<module> (Kyle's Makefile)

Why this design

GitHub Actions has no cross-workflow needs: and workflow-level path filters leave the other side's required checks stuck at "Expected — Waiting for status" when only one pipeline fires. We considered four options (analysis):

  1. Aggregator job per workflow + both requiredchosen: each workflow always wakes up; job-level gates handle "do nothing"; one stable required check name per workflow.
  2. Reusable + central router workflow — cleanest in theory but ci.yaml is 1467 lines and refactor cost is too high before Computex; also incompatible with copy-pr-bot's pull-request/N push-event model.
  3. GitHub merge queue — needs Enterprise + on: merge_group.
  4. External Status API aggregator — more moving parts than needed.

base: main on the dorny filter is critical for copy-pr-bot: each /ok to test force-pushes the PR head to pull-request/N; without an explicit base, dorny/paths-filter would diff against the previous force-push tip (i.e., only the delta between PR head versions), which would lose earlier changes after a fixup commit.

REST helm aligned to core because the original detect-changes + manual Chart.yaml bump pattern would block main/release/tag publishes whenever someone forgot to bump the chart version. Git-describe-derived version guarantees uniqueness per commit and unblocks ngc-duplicate: fail.

Test plan

Tested

Scenario Result
Mixed PR run (this PR's own pipeline) Both pipelines completed end-to-end; both aggregators green
Aggregator mechanism (if: always() + jq) skipped correctly counts as pass; verified on 4 separate runs
dorny/paths-filter with base: main on copy-pr-bot mirror Gate log shows correct file list and Filter X = true/false decision
REST TruffleHog rename Both Secret Scan with TruffleHog (core) and REST Secret Scan with TruffleHog (rest) appear separately in PR checks UI; no collision
Helm validate-charts unconditional Validates both nico-rest and nico-rest-site-agent charts
Helm push-charts skipped on PR Confirmed !contains(github.ref, 'pull-request/') evaluates false → SKIPPED
REST end-to-end on monorepo All 9 nico-* container builds, all binary cross-compile, all lint/test pass

Not yet tested

Scenario Reason Mitigation
True path-filter SKIP behavior on isolated PRs Throwaway test PRs (#1857, #1858, #1859) branched from this PR, so dorny base: main saw the full PR diff (113 files) and both gates evaluated true; no PR yet has isolated paths against main Will validate organically after merge with the next real REST-only / core-only PR; gate logic verified by code review + dry-run git diff against chore/rest-ci-test base showed correct file-list isolation
release/v0.X.0 push Not exercised due to publish side-effects Workflow trigger pattern matches existing release/v0.1.0release/v0.9.0 branches; gate logic forces run_*_ci=true for non-PR refs
Tag push Not exercised (would trigger real NGC publish) Same as above — gate forces both pipelines true; publish gating already validated on PR side as inverted condition

Open gaps (from #1756 spec, not blocking this PR)

  • 3 REST modules untested in CI: flow, powershelf-manager, nvswitch-manager have no test-X target in rest-api/Makefile; matrix has a TODO comment. Blocked on Makefile updates.

Next steps

required-checks ruleset

Update ruleset 10088763 (main branch) required_status_checks from:

  • build-release-container-x86_64 / build
  • build-release-container-aarch64 / build
  • Secret Scan with TruffleHog
  • lint-police

To:

  • carbide-ci-pass
  • rest-ci-pass

Versioning

  • Manually tag v1.6.0 post-merge so git describe produces clean output (avoids REST going backward from 1.x to 0.x).

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test c45dd87

@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-19 08:22:01 UTC | Commit: c45dd87

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test a56dff8

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 66 4 34 18 2 8
nico-nsm 82 2 28 43 9 0
nico-psm 67 4 35 18 2 8
nico-rest-api 100 6 53 30 3 8
nico-rest-cert-manager 65 4 34 18 1 8
nico-rest-db 66 4 34 18 2 8
nico-rest-site-agent 65 4 34 18 1 8
nico-rest-site-manager 65 4 34 18 1 8
nico-rest-workflow 67 4 35 18 2 8
TOTAL 643 36 321 199 23 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test c90fc67

@lachen-nv lachen-nv force-pushed the chore/rest-ci-test branch from 2a6e535 to f1eb47d Compare May 20, 2026 03:12
@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test f1eb47d

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test ede0dcf

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 2c1e90b

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 590d19e

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 4fafb10

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 3768f2c

@lachen-nv lachen-nv changed the title Chore/rest ci test ci: REST API CI migration + per-workflow aggregators (#1756) May 21, 2026
@lachen-nv lachen-nv marked this pull request as ready for review May 21, 2026 12:27
@lachen-nv lachen-nv requested a review from a team as a code owner May 21, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant