Skip to content

refactor(cli): emit onboard session machine events#3849

Merged
cv merged 14 commits into
mainfrom
refactor/3802-2/events
May 21, 2026
Merged

refactor(cli): emit onboard session machine events#3849
cv merged 14 commits into
mainfrom
refactor/3802-2/events

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 20, 2026

Summary

Emit structured, redacted onboarding machine events from the existing session step mutation helpers. This stacks on #3848 and keeps event delivery process-local/non-persistent as the PR 2 foundation for the #3802 FSM workstream.

Related Issue

Refs #3802
Stacked on #3848

Changes

  • Added src/lib/onboard/machine/events.ts with structured event creation, redacted context building, session-step-to-machine-state mapping, and process-local listeners.
  • Augmented markStepStarted, markStepComplete, markStepSkipped, markStepFailed, and completeSession to emit state.*, context.updated, and onboarding terminal events after persisted session mutations.
  • Added session tests that verify event ordering, redaction, observer failure isolation, unknown-step behavior, and that full event logs are not persisted by default.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • New Features
    • Onboarding now emits structured events for step/state transitions and context updates, with sanitized metadata and origin parsing.
  • Bug Fixes
    • Listener errors are isolated so they cannot disrupt onboarding progression.
    • Emissions guarded to avoid duplicate or spurious events for unknown or no-op transitions; events emit only on real state changes.
  • Tests
    • Added tests validating event ordering, redaction of sensitive data, persistence excludes events, and observer-failure isolation.
  • Chores
    • Updated test harness command invocation argument order for sandboxed checks.

Review Change Stack

@cv cv self-assigned this May 20, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 53a2c96c-b197-49c5-93c7-7f643d0437d1

📥 Commits

Reviewing files that changed from the base of the PR and between b5072f8 and 27630a0.

📒 Files selected for processing (1)
  • test/e2e/validation_suites/lib/baseline_onboarding.sh

📝 Walkthrough

Walkthrough

Adds a typed onboarding machine event system, safe payload builders and sanitizers, listener lifecycle helpers, emits events from session step/state mutations only when updates apply, and adds tests for emission order, redaction, observer isolation, unknown-step behavior, and idempotency.

Changes

Onboarding Machine Events

Layer / File(s) Summary
Event model and listener API
src/lib/onboard/machine/events.ts
OnboardSessionStepName, OnboardMachineEvent interface, listener registry and helpers (addOnboardMachineEventListener, clearOnboardMachineEventListeners), step validation and step→machine-state mapping.
Safe event payload construction
src/lib/onboard/machine/events.ts
Sanitization utilities that redact URLs and sensitive text, coerce values to JSON-safe forms, clamp arrays, parse endpoint origins, build OnboardMachineContext, createOnboardMachineEvent normalization, and emitOnboardMachineEvent with observer-failure isolation.
Session state mutation integration
src/lib/state/onboard-session.ts
Integrates event creation/emission into markStepStarted, markStepComplete, markStepSkipped, markStepFailed, and completeSession, emitting state.*, context.updated, and onboard.* events only when underlying updates actually apply.
Test setup and event coverage
src/lib/state/onboard-session.test.ts
Test harness loads/isolates events module, clears listeners between tests, and adds tests verifying emitted event sequences, redaction of secret tokens, observer-error isolation, no events for unknown steps, and no duplicate events for repeated no-op transitions.
E2E script argument ordering
test/e2e/validation_suites/lib/baseline_onboarding.sh
Reorders nemoclaw invocation to pass sandbox name before subcommands for status and logs checks.

Sequence Diagram(s)

sequenceDiagram
  participant OnboardSession
  participant Events_createOnboardMachineEvent as createOnboardMachineEvent
  participant Events_emitOnboardMachineEvent as emitOnboardMachineEvent
  participant Listener

  OnboardSession->>Events_createOnboardMachineEvent: build event (step,state,context,metadata)
  Events_createOnboardMachineEvent->>Events_emitOnboardMachineEvent: pass sanitized event
  Events_emitOnboardMachineEvent->>Listener: notify(listener) (swallow observer errors)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested reviewers

  • cjagwani
  • jyaunches
  • ericksoa

Poem

🐰 I hopped through steps with tidy paws,
I clipped the secrets, hid their flaws,
Events went out, polite and neat,
Listeners listened, then took a seat.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: emitting onboard session machine events from the session mutation helpers. It is concise, specific, and directly related to the main objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/3802-2/events

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, onboard-resume-e2e, onboard-negative-paths-e2e
Optional E2E: credential-sanitization-e2e, double-onboard-e2e

Dispatch hint: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e

Auto-dispatched E2E: cloud-onboard-e2e, onboard-resume-e2e, onboard-negative-paths-e2e via nightly-e2e.yaml at 96815cea393e34b20da3c94004430d126d05a40dnightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-onboard-e2e (medium): Exercises the primary non-interactive cloud onboarding path end-to-end, including sandbox creation, policy selection, security checks, credential handling, and final completion state touched by the session mutation changes.
  • onboard-resume-e2e (medium): Directly validates persisted onboard-session state after an interrupted onboarding run and confirms onboard --resume can continue through cached steps to completion. This is high-signal for changes in onboard-session step state, last-step tracking, failure clearing, and completion behavior.
  • onboard-negative-paths-e2e (medium): Covers onboarding failure and edge cases where markStepFailed/failure recording must remain user-safe and must not leave unwanted side effects. The PR changes failure event emission and redaction paths around session failure state.

Optional E2E

  • credential-sanitization-e2e (medium): Useful additional confidence for credential leak prevention because the new event context/metadata redaction code handles endpoint URLs, credential envs, and error text. Not required because the changed event stream is diagnostic and unit tests directly cover non-persistence/redaction.
  • double-onboard-e2e (high): Optional regression coverage for repeated onboarding and lifecycle recovery. The changed no-op and completion-event behavior should not affect persisted session state, but this is adjacent rather than merge-blocking.

New E2E recommendations

  • onboard-machine-events (medium): Existing E2E coverage validates onboarding state and credential safety, but there does not appear to be an E2E that observes the new onboard machine event stream during a real CLI onboarding run and asserts event ordering, redaction, observer failure isolation, and non-persistence of event payloads.
    • Suggested test: Add an E2E or scenario assertion that registers the onboard machine-event observer during a source CLI onboard/resume/failed-onboard flow and verifies emitted event types, redacted metadata, and that secrets are absent from logs/artifacts/session files.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e

Comment thread src/lib/onboard/machine/events.ts Fixed
@wscurran wscurran added NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). refactor This is a refactor of the code and/or architecture. labels May 20, 2026
@wscurran
Copy link
Copy Markdown
Contributor

@cv cv changed the base branch from refactor/3802-1/types to main May 20, 2026 20:11
@cv cv changed the base branch from main to refactor/3802-1/types May 20, 2026 20:12
@cv cv changed the base branch from refactor/3802-1/types to main May 20, 2026 20:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Review Advisor

Recommendation: blocked
Confidence: high
Analyzed HEAD: 96815cea393e34b20da3c94004430d126d05a40d
Findings: 5 blocker(s), 3 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review used trusted provided PR metadata, diff, and read-only analysis; no tests, package-manager commands, PR scripts, workflows, or E2E jobs were executed by this advisor.; Linked issue #3802 clauses were not available in trusted linkedIssues data, so acceptance is limited to PR body/comment clauses present in the provided context.; CI/E2E status is based on the trusted context snapshot and may change; all hard gates must be rechecked for the exact head SHA before merge consideration.; PR-provided titles, bodies, comments, branch names, and bot comments were treated as untrusted evidence and were not followed as instructions.; Active overlapping PR #3860 may change the same onboarding session behavior after this review.; The full local repository was not re-read beyond the provided deterministic context and diff.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: 96815cea393e34b20da3c94004430d126d05a40d
Recommendation: blocked
Confidence: high

Blocked: the patch applies to current onboarding code, but CI is pending, GitHub mergeability is blocked, required E2E pass evidence is for an older SHA, and two existing large onboarding files grew over the monolith threshold.

Gate status

  • CI: pending — Trusted status rollup for headSha 96815ce shows pending/in-progress/queued contexts including cli-parity, E2E recommendation, PR review advisor, CodeQL, unit-vitest-linux, checks, ShellCheck SARIF, and sandbox image builds; deterministic context reports 10 status context(s) appear pending.
  • Mergeability: fail — GitHub reports mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED for PR refactor(cli): emit onboard session machine events #3849 at headRefOid 96815ce.
  • Review threads: pass — 3 review thread(s), all resolved, including CodeQL sanitizer and CodeRabbit duplicate terminal-event discussions.
  • Risky code tested: pass — No risky code areas detected by path heuristics; runtime onboarding/session mutation behavior is tracked separately under E2E requirements.

🔴 Blockers

  • Required checks are still pending for the latest head SHA: The current PR head does not have a completed green CI rollup. Multiple required or relevant contexts are still pending, in progress, or queued for 96815ce.
    • Recommendation: Wait for all required checks to complete successfully for the exact current head SHA and address any failures before merge consideration.
    • Evidence: gateStatus.ci.status=pending; trusted GraphQL/status rollup lists pending/in-progress/queued contexts including cli-parity, E2E recommendation, PR review advisor, CodeQL, unit-vitest-linux, checks, ShellCheck SARIF, build-sandbox-images, and build-sandbox-images-arm64.
  • PR mergeability is blocked: GitHub reports the PR merge state as blocked for the current head SHA.
    • Recommendation: Resolve branch protection, required review, required-check, or CI conditions causing mergeStateStatus=BLOCKED, then re-check mergeability for 96815ce.
    • Evidence: Trusted GraphQL data: mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED, headRefOid=96815cea393e34b20da3c94004430d126d05a40d.
  • Required onboarding E2E has not passed for the current head SHA: This PR changes onboarding runtime/session mutation behavior and event emission. The E2E Advisor requires cloud-onboard-e2e, onboard-resume-e2e, and onboard-negative-paths-e2e, but available pass evidence is for 8f7476e, not the current head 96815ce.
    • Recommendation: Confirm passing required E2E jobs for the exact current head SHA before merge consideration.
    • Evidence: E2E Advisor auto-dispatched required jobs at 8f7476e and a later comment reports all three passed for that SHA. The requested headSha is 96815ce, and no pass evidence for that exact SHA is present.
  • Existing large test monolith grew over threshold (src/lib/state/onboard-session.test.ts:1): The large onboarding session test file grew by 105 lines, exceeding the 20-line monolith growth threshold.
    • Recommendation: Extract the new event-focused tests into a dedicated smaller test file, or offset the growth by moving existing helpers/tests out of the monolith before merge.
    • Evidence: Trusted monolithDeltas: src/lib/state/onboard-session.test.ts baseLines=948, headLines=1053, delta=105, severity=blocker.
  • Existing large onboarding session module grew over threshold (src/lib/state/onboard-session.ts:1): The large onboarding session implementation grew by 84 lines, exceeding the 20-line monolith growth threshold.
    • Recommendation: Move event emission orchestration or focused event helpers into a smaller module, or otherwise offset the growth before merge.
    • Evidence: Trusted monolithDeltas: src/lib/state/onboard-session.ts baseLines=992, headLines=1076, delta=84, severity=blocker.

🟡 Warnings

  • Metadata sanitizer is not key-aware for sensitive fields (src/lib/onboard/machine/events.ts:86): sanitizeJsonValue recursively redacts strings by URL/text patterns, but object keys are ignored. Because createOnboardMachineEvent and sanitizeOnboardMachineEventMetadata are exported, a future caller could pass metadata such as { token: "short-secret" } or { apiKey: "plain" } and have it emitted to process-local listeners if the value does not match existing content redaction patterns.
    • Recommendation: Redact values when their metadata key matches sensitive names such as apiKey, token, secret, password, credential, authorization, or bearer; add regression tests for low-entropy sensitive-key metadata values.
    • Evidence: events.ts assigns result[key] = sanitizeJsonValue(entry), and sanitizeJsonValue does not receive the key name. Current tests assert redaction of a token-like value in error/URL context, but not key-based metadata redaction.
  • Active overlapping PR touches the same onboarding files (src/lib/state/onboard-session.ts:1): The drift check found an active overlapping PR against the same onboarding session implementation and test files, increasing rebase and behavior-drift risk in this stacked FSM workstream.
  • Unit tests do not prove real CLI onboarding/resume integration (src/lib/state/onboard-session.test.ts:104): The unit tests cover event ordering, redaction, observer failure isolation, unknown-step behavior, non-persistence, and duplicate no-op event suppression, but they do not exercise real onboarding, resume, sandbox, credential, or failure-path flows.
    • Recommendation: Use the E2E Advisor required jobs for real flow coverage on the exact current head SHA and consider adding a future onboarding-diagnostics-events integration/E2E scenario once a supported observer/hook interface is exposed.
    • Evidence: E2E Advisor requires cloud-onboard-e2e, onboard-resume-e2e, and onboard-negative-paths-e2e. Unit-only coverage is added in src/lib/state/onboard-session.test.ts.

🔵 Suggestions

  • None.

Acceptance coverage

  • partial — Emit structured, redacted onboarding machine events from the existing session step mutation helpers.: Diff adds src/lib/onboard/machine/events.ts and emits events from markStepStarted, markStepComplete, markStepSkipped, markStepFailed, and completeSession. Tests assert event ordering and token-like redaction. Partial because metadata redaction is not key-aware and required E2E has not passed for head 96815ce.
  • partial — This stacks on refactor(cli): add onboard FSM transition types #3848: Commit history includes merges from refactor/3802-1/types and the PR body states it stacks on refactor(cli): add onboard FSM transition types #3848. Partial because mergeability remains blocked and stack coordination is still needed.
  • partial — keeps event delivery process-local/non-persistent as the PR 2 foundation for the Umbrella: refactor onboarding into a serializable FSM #3802 FSM workstream.: events.ts uses a module-local Set of listeners and tests assert persisted.events is undefined. Partial because linked issue Umbrella: refactor onboarding into a serializable FSM #3802 clauses were not available in trusted linkedIssues data and real CLI E2E non-persistence evidence is not available for the current head.
  • unknown — Refs Umbrella: refactor onboarding into a serializable FSM #3802: Trusted linkedIssues data is empty, so the literal Umbrella: refactor onboarding into a serializable FSM #3802 issue acceptance clauses were not available for clause-by-clause verification.
  • partial — Stacked on refactor(cli): add onboard FSM transition types #3848: Branch history includes refactor/3802-1/types merges and related commit messages. Partial because PR refactor(cli): add onboard FSM transition types #3848 stack order/merge status must still be coordinated with blocked mergeability.
  • partial — Added src/lib/onboard/machine/events.ts with structured event creation, redacted context building, session-step-to-machine-state mapping, and process-local listeners.: New file defines OnboardMachineEvent, ONBOARD_SESSION_STEP_TO_MACHINE_STATE, createOnboardMachineEvent, buildOnboardMachineContext, add/clear listener APIs, and listener dispatch. Partial due to the key-aware metadata redaction warning.
  • met — Augmented markStepStarted, markStepComplete, markStepSkipped, markStepFailed, and completeSession to emit state.*, context.updated, and onboarding terminal events after persisted session mutations.: onboard-session.ts emits state.entered after markStepStarted, context.updated and state.completed after markStepComplete, state.skipped after markStepSkipped, state.failed and onboard.failed after markStepFailed, and context.updated/onboard.completed in completeSession after updateSession returns the persisted updatedSession.
  • met — Added session tests that verify event ordering, redaction, observer failure isolation, unknown-step behavior, and that full event logs are not persisted by default.: onboard-session.test.ts adds tests asserting event type order, absence of super-secret-token in emitted JSON, observer throw isolation, no events for unknown steps, and persisted.events being undefined.
  • unknownnpx prek run --all-files passes: This is a PR body verification checkbox. Trusted CI for head 96815ce is pending and this advisor did not execute commands.
  • unknownnpm test passes: This is a PR body verification checkbox. Trusted CI for head 96815ce is pending and this advisor did not execute commands.
  • met — Tests added or updated for new or changed behavior: Diff adds event-focused tests in src/lib/state/onboard-session.test.ts.
  • partial — No secrets, API keys, or credentials committed: No hardcoded production secrets are evident in the diff; test strings are synthetic. Partial because exported event metadata sanitization is not key-aware for future low-entropy sensitive values.
  • partial — Updated this PR against current main and fixed the CI/type error from the FSM context hardening by emitting endpointOrigin instead of the removed endpointUrl context field.: buildOnboardMachineContext emits endpointOrigin derived from session.endpointUrl. Partial because trusted CI for 96815ce is still pending rather than green.
  • met — Also removed the CodeQL inconvertible-null comparison in the metadata sanitizer.: Current diff checks arrays after boolean/null and then uses if (typeof value !== "object") return String(value);; trusted reviewThreads shows the prior CodeQL thread resolved.
  • metmarkStepSkipped now treats an already-skipped step as a no-op, avoiding duplicate state.skipped events.: markStepSkipped returns early when step.status is complete, failed, or skipped; regression test expects one state.skipped event across duplicate calls.
  • metcompleteSession no longer emits duplicate onboard.completed events if the session is already complete.: completeSession captures wasComplete = session.status === "complete" and only emits onboard.completed when !wasComplete; regression test expects one onboard.completed event across duplicate calls.
  • met — Added regression coverage for duplicate skipped/completed calls.: onboard-session.test.ts adds does not emit duplicate events for no-op skipped and completed transitions, asserting emitted types are exactly state.skipped and onboard.completed.
  • unknown — cloud-onboard-e2e: E2E Advisor lists this as required, and it passed for 8f7476e, but no passing result for head SHA 96815ce is present in trusted context.
  • unknown — onboard-resume-e2e: E2E Advisor lists this as required, and it passed for 8f7476e, but no passing result for head SHA 96815ce is present in trusted context.
  • unknown — onboard-negative-paths-e2e: E2E Advisor lists this as required, and it passed for 8f7476e, but no passing result for head SHA 96815ce is present in trusted context.

Security review

  • warning — Secrets and Credentials: No hardcoded production secrets were introduced, tests use synthetic token-like values, endpoint context emits endpointOrigin rather than the full URL, and tests assert event logs are not persisted. Warning: exported metadata sanitization is not key-aware, so future callers could emit arbitrary low-entropy secrets in fields named token/apiKey/secret if values do not match redaction patterns.
  • warning — Input Validation and Data Sanitization: Step names are allowlisted, unknown steps emit no events, endpointUrl is parsed down to origin, metadata is recursively coerced to JSON-safe values, and observer errors are isolated. Warning: metadata redaction is content-pattern based rather than key-aware.
  • pass — Authentication and Authorization: Not applicable — the change does not add or modify endpoints, authentication checks, authorization decisions, token validation, or privilege boundaries.
  • pass — Dependencies and Third-Party Libraries: No new dependencies, package-manager files, registries, or dependency versions are changed.
  • pass — Error Handling and Logging: Observer exceptions are caught so diagnostics listeners cannot change onboarding behavior, failure messages are passed through redaction before event emission, and the change does not add secret-bearing logging.
  • pass — Cryptography and Data Protection: Not applicable — no cryptographic operations are added or modified. Existing session ID/random and credential hash behavior is not materially changed by this PR.
  • pass — Configuration and Security Headers: Not applicable — no HTTP headers, CORS policy, Dockerfile, container permissions, workflow permissions, or runtime security configuration are changed.
  • warning — Security Testing: Unit tests cover token-like event redaction, observer failure isolation, unknown steps, non-persistence, and duplicate no-op event suppression. Warning: tests do not cover key-based metadata secret redaction, and required real onboarding/resume/negative-path E2E has not passed for the current head SHA.
  • warning — Holistic Security Posture: The design is observe-only, process-local, non-persistent by default, and avoids full endpoint URL exposure. Overall posture remains warning because CI/merge gates are blocked, required E2E evidence is missing for the current head, active overlapping PRs affect the same behavior, and exported diagnostic metadata needs stronger redaction before broader use.

Test / E2E status

  • Test depth: e2e_required — Runtime onboarding/session mutation paths changed in src/lib/onboard/machine/events.ts and src/lib/state/onboard-session.ts. Unit tests are useful but cannot prove real CLI onboard, resume, sandbox, credential, and failure-path behavior.
  • E2E Advisor: missing
  • Required E2E jobs: cloud-onboard-e2e, onboard-resume-e2e, onboard-negative-paths-e2e
  • Missing for analyzed SHA: cloud-onboard-e2e, onboard-resume-e2e, onboard-negative-paths-e2e

✅ What looks good

  • The changed files exist in the current checkout; the PR is not patching removed code.
  • The new events.ts module is a focused addition for structured onboarding event construction and listener dispatch.
  • The event model uses an explicit session-step-to-machine-state mapping instead of accepting arbitrary machine states from session step names.
  • Event emission is placed after updateSession/saveSession returns the updated session, reducing observer visibility into unpersisted intermediate state.
  • Endpoint context uses endpointOrigin rather than the full endpoint URL, reducing query/userinfo credential leakage risk.
  • Observer failures are isolated from onboarding behavior and covered by a regression test.
  • Tests assert event logs are not persisted to the session file by default.
  • The latest commits address duplicate state.skipped and onboard.completed event issues and add regression coverage.
  • All review threads in trusted metadata are resolved.

Review completeness

  • Review used trusted provided PR metadata, diff, and read-only analysis; no tests, package-manager commands, PR scripts, workflows, or E2E jobs were executed by this advisor.
  • Linked issue Umbrella: refactor onboarding into a serializable FSM #3802 clauses were not available in trusted linkedIssues data, so acceptance is limited to PR body/comment clauses present in the provided context.
  • CI/E2E status is based on the trusted context snapshot and may change; all hard gates must be rechecked for the exact head SHA before merge consideration.
  • PR-provided titles, bodies, comments, branch names, and bot comments were treated as untrusted evidence and were not followed as instructions.
  • Active overlapping PR refactor(cli): persist onboard machine snapshots #3860 may change the same onboarding session behavior after this review.
  • The full local repository was not re-read beyond the provided deterministic context and diff.
  • Human maintainer review required: yes

@cv cv added the v0.0.47 Release target label May 20, 2026
@cv cv marked this pull request as ready for review May 20, 2026 21:09
@cv cv requested review from cjagwani, ericksoa and jyaunches May 20, 2026 21:09
@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 20, 2026

Updated this PR against current main and fixed the CI/type error from the FSM context hardening by emitting endpointOrigin instead of the removed endpointUrl context field. Also removed the CodeQL inconvertible-null comparison in the metadata sanitizer.\n\nVerification run locally:\n- npm run build:cli\n- npx vitest run src/lib/state/onboard-session.test.ts src/lib/onboard/machine/transitions.test.ts\n- npx prek run --files src/lib/onboard/machine/events.ts src/lib/state/onboard-session.test.ts\n\nReview requested from @cjagwani @jyaunches @ericksoa.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/lib/state/onboard-session.test.ts (1)

129-206: ⚡ Quick win

Add a no-op transition regression test for duplicate emissions.

Please add a case that calls markStepSkipped twice (and completeSession twice) and asserts no extra machine events are emitted on the second call.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/state/onboard-session.test.ts` around lines 129 - 206, Add a
regression test in onboard-session.test.ts that verifies duplicate no-op
transitions don't emit extra events: create an emitted array and listener via
machineEvents.addOnboardMachineEventListener, save a session with
session.createSession(), call session.markStepSkipped("someStep") twice and call
session.completeSession(...) twice (use same args for both calls), then assert
the emitted events length (and sequence/types) after the first calls and that no
additional events were pushed after the second calls (i.e., emitted length
unchanged); reference session.markStepSkipped, session.completeSession, and
machineEvents.addOnboardMachineEventListener to locate where to add the
assertions.
src/lib/state/onboard-session.ts (1)

21-24: Run the channels stop/start E2E for this change set.

This file is part of disabled channel lifecycle behavior. Please run channels-stop-start-e2e on this branch before merge.

As per coding guidelines: “src/lib/state/onboard-session.ts: This file controls disabled channel resolution used during onboard and rebuild… E2E test recommendation: channels-stop-start-e2e.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/state/onboard-session.ts` around lines 21 - 24, This change touches
disabled channel lifecycle logic in src/lib/state/onboard-session.ts (imports
createOnboardMachineEvent and emitOnboardMachineEvent) and requires running the
channels-stop-start-e2e test; before merging, check out this branch and run the
channels-stop-start-e2e E2E suite, verify onboard and rebuild flows properly
resolve disabled channels and that
createOnboardMachineEvent/emitOnboardMachineEvent behavior is unchanged, and fix
any failing assertions or lifecycle timing issues uncovered by the test run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/state/onboard-session.ts`:
- Around line 1005-1030: completeSession currently emits the "onboard.completed"
event on every call; fix it by reading the session's prior status before
mutating and only emitting the onboard.completed event when the previous status
was not "complete". Specifically, in completeSession (which already uses
filterSafeUpdates and updateSession), capture the existing session status (via
the session getter or a pre-update read), call updateSession to apply
safeUpdates and set status/resumable/failure, and then call
emitOnboardMachineEvent(createOnboardMachineEvent(..., type:
"onboard.completed")) only if the captured previous status !== "complete"; keep
the existing context.updated emission logic unchanged.
- Around line 945-963: markStepSkipped currently sets step.status = "skipped"
and flips shouldEmit even when the step is already skipped, causing duplicate
state.skipped events; modify the updateSession callback in markStepSkipped to
return early if step.status === "skipped" (in addition to the existing checks
for "complete" and "failed") so shouldEmit is only set when a real transition
occurs, ensuring emitOnboardMachineEvent(createOnboardMachineEvent(...)) is only
called when the status changed.

---

Nitpick comments:
In `@src/lib/state/onboard-session.test.ts`:
- Around line 129-206: Add a regression test in onboard-session.test.ts that
verifies duplicate no-op transitions don't emit extra events: create an emitted
array and listener via machineEvents.addOnboardMachineEventListener, save a
session with session.createSession(), call session.markStepSkipped("someStep")
twice and call session.completeSession(...) twice (use same args for both
calls), then assert the emitted events length (and sequence/types) after the
first calls and that no additional events were pushed after the second calls
(i.e., emitted length unchanged); reference session.markStepSkipped,
session.completeSession, and machineEvents.addOnboardMachineEventListener to
locate where to add the assertions.

In `@src/lib/state/onboard-session.ts`:
- Around line 21-24: This change touches disabled channel lifecycle logic in
src/lib/state/onboard-session.ts (imports createOnboardMachineEvent and
emitOnboardMachineEvent) and requires running the channels-stop-start-e2e test;
before merging, check out this branch and run the channels-stop-start-e2e E2E
suite, verify onboard and rebuild flows properly resolve disabled channels and
that createOnboardMachineEvent/emitOnboardMachineEvent behavior is unchanged,
and fix any failing assertions or lifecycle timing issues uncovered by the test
run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12f6cf83-4c8c-4f35-a8db-b45d3280453c

📥 Commits

Reviewing files that changed from the base of the PR and between 95cc0de and 22a7cd8.

📒 Files selected for processing (3)
  • src/lib/onboard/machine/events.ts
  • src/lib/state/onboard-session.test.ts
  • src/lib/state/onboard-session.ts

Comment thread src/lib/state/onboard-session.ts
Comment thread src/lib/state/onboard-session.ts Outdated
@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 20, 2026

Addressed CodeRabbit feedback in b5072f8:\n- markStepSkipped now treats an already-skipped step as a no-op, avoiding duplicate state.skipped events.\n- completeSession no longer emits duplicate onboard.completed events if the session is already complete.\n- Added regression coverage for duplicate skipped/completed calls.\n\nVerification run locally:\n- npm run build:cli\n- npx vitest run src/lib/state/onboard-session.test.ts src/lib/onboard/machine/transitions.test.ts\n- npx prek run --files src/lib/onboard/machine/events.ts src/lib/state/onboard-session.ts src/lib/state/onboard-session.test.ts\n\nAlso dispatched channels-stop-start-e2e per the CodeRabbit E2E recommendation.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26190201509
Target ref: 22a7cd85ea6301d9f81d9332c35c7dfc7eb67cd9
Workflow ref: main
Requested jobs: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ⚠️ cancelled
onboard-negative-paths-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/state/onboard-session.ts (1)

1005-1034: Run channels-stop-start-e2e for this session-path change.

This patch is small, but it still changes src/lib/state/onboard-session.ts, which is the persistence path that carries disabled-channel state across rebuilds.

As per coding guidelines, src/lib/state/onboard-session.ts: "This file controls disabled channel resolution used during onboard and rebuild... E2E test recommendation: channels-stop-start-e2e."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/state/onboard-session.ts` around lines 1005 - 1034, This change
modifies the onboard session persistence flow (see completeSession,
updateSession and emitOnboardMachineEvent) which impacts disabled-channel
resolution across rebuilds — run the end-to-end test suite named
channels-stop-start-e2e against this change and confirm it passes; if it fails,
investigate the disabled-channel state handling in completeSession (how
session.status, session.resumable and session.failure are set and how
context.updated / onboard.completed events are emitted) and adjust the session
updates or event emission to preserve/resolve disabled channels correctly before
re-running the e2e.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/state/onboard-session.ts`:
- Around line 1005-1034: This change modifies the onboard session persistence
flow (see completeSession, updateSession and emitOnboardMachineEvent) which
impacts disabled-channel resolution across rebuilds — run the end-to-end test
suite named channels-stop-start-e2e against this change and confirm it passes;
if it fails, investigate the disabled-channel state handling in completeSession
(how session.status, session.resumable and session.failure are set and how
context.updated / onboard.completed events are emitted) and adjust the session
updates or event emission to preserve/resolve disabled channels correctly before
re-running the e2e.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e121f94-6d4f-411e-b86f-fa991877838c

📥 Commits

Reviewing files that changed from the base of the PR and between 22a7cd8 and b5072f8.

📒 Files selected for processing (2)
  • src/lib/state/onboard-session.test.ts
  • src/lib/state/onboard-session.ts

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26190542083
Target ref: b5072f8112e34ba5e6de5669f92e0df5139f45f6
Workflow ref: main
Requested jobs: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e
Summary: 2 passed, 1 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
onboard-negative-paths-e2e ❌ failure
onboard-resume-e2e ✅ success

Failed jobs: onboard-negative-paths-e2e. Check run artifacts for logs.

@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 20, 2026

The required onboard-negative-paths E2E exposed a pre-existing validation-suite command-order issue after the branch was updated to current main: the baseline helper was still using nemoclaw status <sandbox> / nemoclaw logs <sandbox>, which the current oclif command surface rejects. Updated it to the sandbox-first forms nemoclaw <sandbox> status and nemoclaw <sandbox> logs in 27630a0.\n\nVerification:\n- npx prek run --files test/e2e/validation_suites/lib/baseline_onboarding.sh\n\nRe-dispatched channels-stop-start-e2e; E2E advisor should also re-dispatch required onboarding E2Es for the new head.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26190497649
Target ref: refactor/3802-2/events
Requested jobs: channels-stop-start-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
channels-stop-start-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26191331133
Target ref: 27630a04b3880a04933d978e46f309f7cd28fbe6
Workflow ref: main
Requested jobs: cloud-onboard-e2e,onboard-resume-e2e,credential-sanitization-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
credential-sanitization-e2e ✅ success
onboard-resume-e2e ✅ success

@cv cv enabled auto-merge (squash) May 20, 2026 21:56
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26192006386
Target ref: 076303e8f623b2843c3b1442175f7af8392886bb
Workflow ref: main
Requested jobs: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
onboard-negative-paths-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26192488077
Target ref: 8f7476e649a84a347830bf4d897ab19ef828613a
Workflow ref: main
Requested jobs: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
onboard-negative-paths-e2e ✅ success
onboard-resume-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26191266431
Target ref: refactor/3802-2/events
Requested jobs: channels-stop-start-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
channels-stop-start-e2e ❌ failure

Failed jobs: channels-stop-start-e2e. Check run artifacts for logs.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26194834702
Target ref: 96815cea393e34b20da3c94004430d126d05a40d
Workflow ref: main
Requested jobs: cloud-onboard-e2e,onboard-resume-e2e,onboard-negative-paths-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
onboard-negative-paths-e2e ✅ success
onboard-resume-e2e ✅ success

Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed live head 96815ce against current main. The change is observe-only event emission around existing onboard session mutations; listener failures are isolated, event data is redacted/non-persistent, CodeRabbit duplicate-emission feedback is covered by tests, and the PR checks plus required onboarding E2Es are green. I also verified the merge result locally with build:cli and targeted Vitest coverage.

@cv cv merged commit f19061e into main May 21, 2026
25 checks passed
cv added a commit that referenced this pull request May 21, 2026
## Summary
Persist a compact onboarding machine snapshot on the existing session
file and normalize legacy sessions that do not have machine fields. This
stacks on #3849 and keeps the snapshot limited to state, entered
timestamp, and revision without adding durable runtime topology or event
logs.

## Related Issue
Refs #3802
Stacked on #3849

## Changes
- Added `OnboardMachineSnapshot` to `src/lib/state/onboard-session.ts`
with `version`, `state`, `stateEnteredAt`, and `revision`.
- Updated session creation, normalization, step mutation helpers, and
completion/failure paths to keep the machine snapshot current.
- Added compatibility inference for old sessions without a `machine`
field using session status, in-progress step, or last completed step.
- Extended session tests to cover persisted snapshots, legacy
normalization, invalid snapshot fallback, and debug summaries.

## Type of Change
- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
Signed-off-by: Carlos Villela <cvillela@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Sessions now persist a versioned machine snapshot that tracks state,
when each state was entered, and a revision counter; step transitions
update this snapshot.

* **Bug Fixes / Data Migration**
* Legacy or malformed sessions are normalized to a valid machine
snapshot during load.

* **Tests**
* Expanded tests covering snapshot persistence, compact snapshots across
steps, and legacy/invalid-session normalization.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3860?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Aaron Erickson 🦞 <aerickson@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). refactor This is a refactor of the code and/or architecture. v0.0.47 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants