feat(hris): durable onboarding saga (Temporal) — Phase 5a#31
Conversation
Evolve the HRIS example with a durable new-hire onboarding saga, mirroring the car-sharing trip saga. A new OnboardingService gateway runs a synchronous pre-check (the employee id must be free → AlreadyExists otherwise) and then starts an OnboardingWorkflow that provisions the hire across services with automatic LIFO compensation: createEmployee → setupPayroll → grantTimeOff → provisionAccess → activate - Three orthogonal mechanisms now coexist in one example: cross-service ctx.call (RequestLeave → directory), EventBus broadcast (LeaveApproved → payroll), and a Temporal saga (onboarding). The framework stays thin. - New AccessService (leaf, in-memory) is the saga's IT-provisioning step. - Directory gains CreateEmployee/ActivateEmployee/OffboardEmployee over the existing Drizzle db; CreateEmployee uses an atomic `insert ... on conflict do nothing returning` → AlreadyExists (no raw DB error leak). The activity rethrows it as a non-retryable ApplicationFailure so the workflow fails fast with nothing to compensate. - Payroll/TimeOff gain idempotent Setup/Teardown and Grant/Revoke RPCs; payroll-setup is the decrementable leave BALANCE, timeoff-grant is the distinct annual PTO policy allotment. - The worker (src/worker.ts) is the ONLY importer of @temporalio/worker; the RPC roles import only the pure-JS @temporalio/client, keeping their no-build native-TS run model. @swc/core build is approved for the workflow bundle. - Runs on the published 1.0.0 packages — the saga uses no 1.1.0-only API, so Phase 5a is not blocked on the 1.1.0 release. Tests (dockerless, 33/33): workflow orchestration + LIFO compensation (time-skipping, mocked activities), real activity ↔ RPC wiring + compensation idempotency (in-process monolith + PGlite), and the onboarding edge pre-check (AlreadyExists before Temporal; Unavailable with no client). The existing LeaveApproved e2e is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
…file - README: a "three orthogonal mechanisms" framing (ctx.call / EventBus / Temporal saga), a dedicated onboarding-saga section (inverted pre-check, LIFO compensation diagram, the separate worker process, run instructions), and the updated layout / testing / key-points. - docker-compose: a `saga` profile adding Temporal (server + Web UI), the OnboardingService gateway (:5005), the access role (:5004), and the worker (`node src/worker.ts`). Pairs with `split`. - Dockerfile: copy pnpm-workspace.yaml so the @swc/core build approval applies in the image (the worker bundles workflows with swc); drop the lockfile requirement (mirrors car-sharing); expose the access/onboarding ports. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
Doc-currency follow-up: update the server/topology docstrings ("three" → "five"
services) and the README (resolver env vars, the generated-catalog example with
the new saga RPCs, and the split-profile run commands incl. the access role).
Comments/docs only — no logic change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
Reflect Phase 5a in the examples index: the hris row now reads "EventBus + durable onboarding saga with Temporal" (was an imprecise "EventBus saga"), mirroring the car-sharing row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
|
Warning Review limit reached
More reviews will be available in 49 minutes and 15 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughAdds a durable Temporal onboarding saga to the HRIS example. New ChangesHRIS Temporal Onboarding Saga
Sequence Diagram(s)sequenceDiagram
participant Client
participant OnboardingService
participant DirectoryService
participant Temporal
participant OnboardingWorkflow
participant AccessService
rect rgba(100, 149, 237, 0.5)
note over Client, OnboardingService: OnboardEmployee (synchronous pre-check)
Client->>OnboardingService: OnboardEmployee(req)
OnboardingService->>DirectoryService: GetEmployee(employeeId)
alt employee found
DirectoryService-->>OnboardingService: Employee response
OnboardingService-->>Client: Code.AlreadyExists
else Code.NotFound
DirectoryService-->>OnboardingService: Code.NotFound
OnboardingService->>Temporal: startWorkflow(OnboardingWorkflow, workflowId=employeeId)
Temporal-->>OnboardingService: workflowId
OnboardingService-->>Client: OnboardEmployeeResponse(status=STARTED, workflowId)
end
end
rect rgba(60, 179, 113, 0.5)
note over Temporal, AccessService: OnboardingWorkflow (durable saga, separate worker process)
Temporal->>OnboardingWorkflow: execute(input)
OnboardingWorkflow->>DirectoryService: createEmployee
OnboardingWorkflow->>DirectoryService: activateEmployee
OnboardingWorkflow->>AccessService: provisionAccess
OnboardingWorkflow-->>Temporal: COMPLETED
end
rect rgba(220, 80, 80, 0.5)
note over Temporal, AccessService: On failure — LIFO compensation unwind
OnboardingWorkflow->>AccessService: revokeAccess
OnboardingWorkflow->>DirectoryService: offboardEmployee
OnboardingWorkflow-->>Temporal: FAILED (ApplicationFailure)
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hris/Dockerfile (1)
13-24:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun the runtime container as a non-root user.
The runtime stage has no
USERdirective, so the process runs as root. This is a container hardening gap.Suggested fix
FROM node:25-slim AS runtime RUN apt-get update && apt-get install -y --no-install-recommends wget && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY package.json ./ COPY src/ ./src/ COPY gen/ ./gen/ +RUN chown -R node:node /app +USER node ENV NODE_ENV=production EXPOSE 5000 5001 5002 5003 5004 5005🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hris/Dockerfile` around lines 13 - 24, The runtime stage in the Dockerfile is missing a USER directive, causing the container to run as root, which is a security vulnerability. Add a RUN instruction to create a non-root user (such as 'node'), and then add a USER directive before the CMD instruction to ensure the application in the runtime stage runs as that non-root user instead of root.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hris/Dockerfile`:
- Around line 10-11: The COPY command at line 10 only copies package.json and
pnpm-workspace.yaml but omits the pnpm-lock.yaml file, and the RUN pnpm install
command at line 11 uses --prod flag without --frozen-lockfile, causing
non-deterministic dependency resolution across builds. Add pnpm-lock.yaml to the
COPY instruction and add the --frozen-lockfile flag to the pnpm install command
to ensure reproducible builds with locked dependency versions.
In `@hris/src/services/onboardingService.ts`:
- Around line 136-141: The catch block in the onboarding status check treats all
query failures identically by assuming the workflow is closed and mapping it to
a terminal status via terminalStatusFor. Instead of collapsing all errors into a
FAILED status, examine the actual error thrown when handle.query fails to
determine if it indicates the workflow is actually closed or if it is a
transient failure. Only call terminalStatusFor and fall back to the describe
approach when you are certain the error indicates workflow closure, and re-throw
or handle other error types appropriately to avoid masking transient failures.
In `@hris/src/temporal/activities.ts`:
- Around line 98-103: The current implementation in the ConnectError handling
block immediately throws an ApplicationFailure when Code.AlreadyExists is
encountered, but this mishandles retried attempts where the employee was already
successfully created. Before throwing the error for the AlreadyExists condition,
first perform a read-back verification to check if the employee record actually
exists with equivalent data to what was being created. If the read-back confirms
the employee exists with matching data, treat it as a successful operation and
return normally instead of throwing ApplicationFailure. Only throw
ApplicationFailure if the conflict cannot be reconciled through read-back
equivalence verification.
In `@hris/src/temporal/workflows.ts`:
- Around line 100-109: Reorder the compensation registration to occur before the
corresponding side effect calls in steps 2-4. For setupPayroll, grantTimeOff,
and provisionAccess, move each compensations.unshift call (registering
teardownPayroll, revokeTimeOff, and revokeAccess respectively) to execute
immediately before the corresponding await call rather than after. This ensures
that if a side effect commits but the workflow fails before the compensation is
registered, the undo action is already available in the compensations array for
proper unwinding.
---
Outside diff comments:
In `@hris/Dockerfile`:
- Around line 13-24: The runtime stage in the Dockerfile is missing a USER
directive, causing the container to run as root, which is a security
vulnerability. Add a RUN instruction to create a non-root user (such as 'node'),
and then add a USER directive before the CMD instruction to ensure the
application in the runtime stage runs as that non-root user instead of root.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 401929ce-f422-421a-a488-a62c301f11aa
📒 Files selected for processing (29)
README.mdhris/Dockerfilehris/README.mdhris/docker-compose.ymlhris/package.jsonhris/pnpm-workspace.yamlhris/proto/access/v1/access.protohris/proto/directory/v1/directory.protohris/proto/onboarding/v1/onboarding.protohris/proto/payroll/v1/payroll.protohris/proto/timeoff/v1/timeoff.protohris/src/empty.tshris/src/server.tshris/src/services/accessService.tshris/src/services/directoryService.tshris/src/services/onboardingService.tshris/src/services/payrollService.tshris/src/services/timeOffService.tshris/src/temporal/activities.tshris/src/temporal/clients.tshris/src/temporal/config.tshris/src/temporal/onboardingStatus.tshris/src/temporal/workflowClient.tshris/src/temporal/workflows.tshris/src/topology.tshris/src/worker.tshris/tests/activity/activities.test.tshris/tests/e2e/onboarding.test.tshris/tests/workflow/onboardingWorkflow.test.ts
| COPY package.json pnpm-workspace.yaml ./ | ||
| RUN pnpm install --prod |
There was a problem hiding this comment.
Restore lockfile-based installs for reproducible images.
At Line 10 and Line 11, dropping pnpm-lock.yaml + --frozen-lockfile makes image dependency resolution non-deterministic across builds.
Suggested fix
-COPY package.json pnpm-workspace.yaml ./
-RUN pnpm install --prod
+COPY package.json pnpm-lock.yaml pnpm-workspace.yaml ./
+RUN pnpm install --prod --frozen-lockfile🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hris/Dockerfile` around lines 10 - 11, The COPY command at line 10 only
copies package.json and pnpm-workspace.yaml but omits the pnpm-lock.yaml file,
and the RUN pnpm install command at line 11 uses --prod flag without
--frozen-lockfile, causing non-deterministic dependency resolution across
builds. Add pnpm-lock.yaml to the COPY instruction and add the --frozen-lockfile
flag to the pnpm install command to ensure reproducible builds with locked
dependency versions.
Match the example-code currency cleanup: an ADR number in example code is internal jargon. The reason (no enum under erasable TypeScript) is already stated in the comment just above. Comment-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
Address the four CodeRabbit "Major" findings on this PR, mirroring the same
hardenings applied to the gold-standard car-sharing saga so the two examples
stay in lockstep.
- Idempotent create (F3): createEmployee is a create-by-id, so on Code.AlreadyExists
it now reads the row back and treats a match as success (a Temporal retry
observing its own prior commit) — only a genuinely different employee under the
same id stays a terminal EmployeeExists. Compares business fields (not id), so
the existing e-001 conflict test still fails fast. No proto change. (car-sharing
uses a holder key instead, since its step 1 is a lock acquisition.)
- GetOnboarding error mapping (F2): the blanket `catch {}` mapped ANY query
failure to a terminal status. Narrow it: only WorkflowNotFoundError /
QueryNotRegistered / QueryRejected (run closed/gone) fall back to describe(); a
transient failure is surfaced as Code.Unavailable instead of a bogus
COMPLETED/FAILED onboarding.
- Compensation ordering (F4): register teardownPayroll / revokeTimeOff /
revokeAccess (steps 2-4) BEFORE their forward calls, so an ambiguous failure
that committed the side effect is still unwound (each comp no-ops on
uncommitted state). Step 1 (createEmployee) stays register-after.
- Dockerfile (F1): document the no-committed-lockfile policy honestly and point
production users at `--frozen-lockfile`; run as the non-root `node` user
(parity with car-sharing).
New/updated tests: read-back retry succeeds; describe() fallback on a closed run;
GetOnboarding surfaces Unavailable on a transient query failure; the steps-2-4
failure unwinds now include the step's own pre-registered compensation.
typecheck + tests: 36/36 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
What
Evolves the hris example with a durable new-hire onboarding saga on
Temporal, mirroring the car-sharing trip saga. The
example now demonstrates three orthogonal cross-service mechanisms side by
side, each fit to its job:
ctx.call(sync)RequestLeavevalidates against the directoryLeaveApproved→ payroll decrements the balanceA new
OnboardingServicegateway runs a synchronous pre-check (the employeeid must be free →
AlreadyExistsotherwise) and then starts anOnboardingWorkflow:AccessService(leaf, in-memory) is the saga's IT-provisioning step.CreateEmployee/ActivateEmployee/OffboardEmployeeover theexisting Drizzle db;
CreateEmployeeuses an atomicinsert … on conflict do nothing returning→AlreadyExists(no raw DB errorleak). The activity rethrows it as a non-retryable
ApplicationFailure, sothe workflow fails fast with nothing to compensate.
payroll-setup is the decrementable leave balance, timeoff-grant is the
distinct annual PTO policy allotment.
src/worker.ts) is the only importer of@temporalio/worker;the RPC roles import only the pure-JS
@temporalio/client, keeping theirno-build native-TS run model.
@swc/corebuild is approved for the workflowbundle.
Scope of verification (honest)
pnpm test,typecheck clean):
tests/workflow/onboardingWorkflow.test.ts— the realOnboardingWorkflow(swc-bundled under a real Worker) with mocked activities under Temporal's
time-skipping env: forward order + LIFO compensation on each failing
step + the non-retryable first step.
tests/activity/activities.test.ts— the real activity bodies against anin-process Connectum monolith (PGlite): each step mutates real service state,
the duplicate-id failure is non-retryable, compensations are idempotent.
tests/e2e/onboarding.test.ts— the gateway pre-check (AlreadyExistsbeforeTemporal;
Unavailablewith no client).only in
docker-compose.yml(the newsagaprofile). That compose isconfig-validated (
docker compose config), not run in this PR — thesame config-only posture as car-sharing.
Not blocked on 1.1.0
Unlike the EventBus-broadcast Phase 3 PR (#29, draft, awaiting unpublished
1.1.0), this saga uses no Connectum API beyond 1.0.0 (only
createEventBus/ctx.call/defineService; Temporal is an external dep). Itruns on the published 1.0.0 packages — so this is a normal (non-draft) PR.
@connectum/*deps stay^1.0.0.Run it
🤖 Generated with Claude Code
https://claude.ai/code/session_01MdeH7fExPmiRHRirGuvGk3
Summary by CodeRabbit
Release Notes
New Features
Documentation