Skip to content

feat: systemd support#295

Merged
jason-lynch merged 3 commits intomainfrom
feat/PLAT-417/systemd
Apr 3, 2026
Merged

feat: systemd support#295
jason-lynch merged 3 commits intomainfrom
feat/PLAT-417/systemd

Conversation

@jason-lynch
Copy link
Copy Markdown
Member

Summary

Adds preliminary support for orchestrating pgEdge Enterprise Postgres databases using SystemD instead of Docker Swarm.

Testing

This PR includes a lima-based development environment. See the changes in running-locally.md for complete instructions. If you're already set up to run the lima-based E2E fixtures, you can do:

# install circus
pipx install --include-deps circus

# deploy the dev-lima environment. this is more similar to the compose environment in
# that all ports are exposed on your host machine and we mount the project directory
# into all of the VMs.
make dev-lima-deploy

# that will take a while to set everything up
# once it's done, you can run the environment
make dev-lima-run

# then, in a new terminal, you can use the new dev-lima environment
use-dev-lima
cp1-req get-version
cp1-ssh
# etc..

# you can also run E2Es against this environment
make test-e2e E2E_PARALLEL=4 E2E_FIXTURE=dev-lima

Notes for Reviewers

As of this PR, the control-plane server does not respond to upgrades or other package changes that happen after the server starts. If you install a minor version update, you will be blocked from updating existing databases until you:

  • Performed the same upgrade on all hosts that are running the database
  • Update the postgres_version in your database spec

This will change in a subsequent PR.

PLAT-417

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a SystemD orchestrator implementation and Lima-based developer environment; extends instance metadata (ports, pgEdge version) and validation; updates orchestrator interfaces, DI wiring, resources, workflows, tests, docs, Make/Ansible/Zsh tooling, and license/module entries.

Changes

Cohort / File(s) Summary
Build & Dev-env
\.gitignore, Makefile, lima/Makefile, lima/*, hack/dev-env.zsh, hack/pgedge-cp-env.plugin.zsh
Ignore lima/data; add CGO_ENABLED=0 GOOS=linux to build targets; add top-level dev-lima-* Make targets; add lima Makefile, scripts, circus, and zsh helpers (including use-dev-lima).
Lima playbooks & roles
lima/run.sh, lima/circus.ini, lima/*.yaml, lima/roles/*, lima/vars.yaml, lima/lima-template.yaml
New Lima orchestration artifacts: run script, Circus watchers, Ansible playbooks/inventory, template, vars, and roles to start/teardown VMs, install prerequisites, generate test configs, stop DBs, and teardown/reset flows.
Licenses & Modules
go.mod, NOTICE.txt
Move github.com/coreos/go-systemd/v22 to direct require; add github.com/elastic/gosigar and github.com/godbus/dbus/v5; add corresponding NOTICE sections and license texts.
Docs & Changelog
docs/development/running-locally.md, changes/unreleased/*
Revise running-locally doc (TOC, prerequisites, Restish flow), add SystemD/Lima developer instructions and new unreleased changelog entry for preliminary systemd support.
API schema
api/apiv1/design/database.go
Clarify patroni_port doc: 0 means per-instance random port; field remains unsupported for Docker Swarm.
Config & validation
server/internal/config/config.go, server/internal/api/apiv1/validate*.go, server/internal/api/apiv1/convert.go
Add SystemD config type and OrchestratorSystemD constant; pass orchestrator into API->spec conversion and validation; validate required ports for systemd and compute effective node/db ports for checks.
Orchestrator DI & wiring
server/cmd/root.go, server/internal/app/app.go, server/internal/orchestrator/provide.go
Register systemd resource types at startup; resolve orchestrator from DI and call Start(ctx); add orchestrator provider wiring for systemd.
SystemD orchestrator package
server/internal/orchestrator/systemd/...
New systemd package: DBus-backed Client, DNF package manager, package/version types, Patroni/unit/pgBackRest resources, orchestrator implementation (PopulateHost, GenerateInstanceResources/Restore, Start/Stop, connection info, validation, ExecuteInstanceCommand), provider and resource registration.
Swarm orchestrator updates
server/internal/orchestrator/swarm/orchestrator.go
Add Start(ctx) (no-op); expand GetInstanceConnectionInfo and CreatePgBackRestBackup signatures to accept optional postgres/patroni port and pgEdge version or *InstanceSpec.
Database model & service
server/internal/database/instance*.go, server/internal/database/instance_store.go, server/internal/database/service.go, server/internal/database/spec.go, server/internal/database/orchestrator.go
Persist and expose Port, PatroniPort, PgEdgeVersion; add InstanceStateUpdateOptions + UpdateInstanceState; expand InstanceUpdateOptions; compute/store AllHostIDs; service methods use stored instance/spec for connection info and backups; orchestrator interface updated for connection/backup calls.
Workflows & monitor
server/internal/workflows/*, server/internal/monitor/instance_monitor.go
Update activities to use state-only instance updates (UpdateInstanceState), route backups via service methods, persist delete plans before apply; monitor uses UpdateInstanceState for state correction.
E2E tests & fixtures
e2e/*, e2e/fixtures/*
Fixture now records orchestrator from hosts; add skip helpers for unsupported features; many tests add PatroniPort: pointerTo(0) (keeping Port: pointerTo(0)), branch POSIX/S3 setup by orchestrator, and adjust logging/skip handling.
PgBackRest restore resource
server/internal/orchestrator/systemd/pgbackrest_restore.go
Add PgBackRestRestore resource implementing restore lifecycle: stop/delete Patroni, run pgBackRest restore, rename data, restart instance, and update task state.
Unit & Patroni resources
server/internal/orchestrator/systemd/unit.go, server/internal/orchestrator/systemd/patroni_config.go, server/internal/orchestrator/systemd/patroni_unit.go
Add UnitResource to manage systemd unit files (create/reload/enable/stop/delete), PatroniConfig resource aggregating peer addresses, and helper to generate Patroni unit options.
SystemD client & package manager
server/internal/orchestrator/systemd/client.go, server/internal/orchestrator/systemd/dnf.go, server/internal/orchestrator/systemd/packages.go
Add DBus-backed client (Start/Shutdown/Reload/StartUnit/StopUnit/RestartUnit/Enable/Disable/UnitExists/GetMainPID), DNF-based PackageManager to discover installed Postgres/Spock packages, and package comparison/types.
Utilities & golden tests
server/internal/ds/versions.go, server/internal/postgres/hba/entry.go, server/internal/orchestrator/common/golden_test/*
Add Version helpers (MajorString, MajorVersion) and tighten semver regex; normalize bare IPs to CIDR /32 or /128 in hba printing; update golden tests to use CIDR.
Linters & unit tests
.golangci.yml, server/internal/orchestrator/systemd/dnf_test.go
Enable staticcheck for test files; add unit tests for toVersion parsing.
Misc infra
.gitignore, e2e/fixtures/ansible.cfg, lima/ansible.cfg, Makefile
Ignore lima/data; set Ansible interpreter_python = auto_silent in fixtures and lima; update Makefile build flags and add dev-lima targets.

Poem

🐰 I hop through code with tiny, clever paws,
SystemD wakes and Lima hums because—
Patroni ports set zero, random doors appear,
Units spin up, playbooks sing, the bunnies cheer,
I thump my foot: orchestration now draws near!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title "feat: systemd support" is concise, follows Conventional Commits, and directly describes the main change—adding systemd orchestration support alongside existing Docker Swarm.
Description check ✅ Passed The description includes a clear Summary, comprehensive Testing instructions with multiple command examples, detailed Notes for Reviewers explaining constraints, and references the issue (PLAT-417). However, it lacks a structured Changes section and a formal Checklist with verification items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/PLAT-417/systemd

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 5b65dd1 to 7512d78 Compare March 9, 2026 22:37
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 4 times, most recently from 0292169 to fecca31 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 3b2cb2d to 1811d33 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 2 times, most recently from 8cb89f9 to c752911 Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 32a39ff to 1ef89d0 Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from c752911 to 9c4262c Compare March 10, 2026 21:01
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 32ef891 to 5c8c403 Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 9c4262c to e179f4c Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 5c8c403 to 79fbfc1 Compare March 11, 2026 13:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from e179f4c to 40f2797 Compare March 11, 2026 13:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 79fbfc1 to b1b5cae Compare March 11, 2026 13:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 40f2797 to 0e319c3 Compare March 11, 2026 13:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from b1b5cae to 3b749f2 Compare March 11, 2026 13:49
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 0e319c3 to 900598e Compare March 11, 2026 13:49
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 3b749f2 to ccf2e10 Compare March 12, 2026 19:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 900598e to e181cc5 Compare March 12, 2026 19:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from ccf2e10 to 7607b74 Compare March 12, 2026 21:17
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from e181cc5 to 42824fe Compare March 12, 2026 21:17
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 7607b74 to 09c6ec7 Compare March 13, 2026 13:13
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 42824fe to 89dca9e Compare March 13, 2026 13:13
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 40fd983 to 307f1e9 Compare March 24, 2026 18:15
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 2 times, most recently from b56d8cd to bf6b508 Compare March 25, 2026 18:16
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 307f1e9 to 6a1d107 Compare March 25, 2026 19:38
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from bf6b508 to d41571c Compare March 25, 2026 19:38
@jason-lynch jason-lynch requested a review from tsivaprasad March 30, 2026 15:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 6a1d107 to 8f7d2ca Compare March 31, 2026 16:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from d41571c to 26c6108 Compare March 31, 2026 16:43
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Mar 31, 2026

Up to standards ✅

🟢 Issues 13 medium

Results:
13 new issues

Category Results
Complexity 13 medium

View in Codacy

🟢 Metrics 153 complexity · 37 duplication

Metric Results
Complexity 153
Duplication 37

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 8f7d2ca to 8a3fa31 Compare March 31, 2026 16:51
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 26c6108 to 88440d0 Compare March 31, 2026 16:51
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 8a3fa31 to ab652f8 Compare March 31, 2026 17:35
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 88440d0 to 09d6b52 Compare March 31, 2026 17:35
Base automatically changed from feat/PLAT-417/rename-binary to main March 31, 2026 18:19
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 09d6b52 to 2d1d84c Compare March 31, 2026 20:10
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 12: Update the dependency declaration for github.com/docker/docker from
v27.1.1 to v29.3.1 (or later) in go.mod to remediate the listed HIGH severity
CVEs; after editing the module version for github.com/docker/docker, run module
resolution (e.g., go mod tidy / go get) and rebuild/tests to ensure
compatibility and update go.sum accordingly, and verify any code using docker
client types/APIs in your repo still compiles against the new version.

In `@server/internal/config/config.go`:
- Around line 368-382: The code appends the same "unsupported orchestrator"
error twice because the early guard checking c.Orchestrator against
OrchestratorSwarm and OrchestratorSystemD and the switch default both report the
same issue; remove the redundant error by eliminating the switch default (or
make the switch only handle OrchestratorSwarm and OrchestratorSystemD cases) so
that the guard is the single source of truth for unsupported orchestrator
reporting—update the switch over c.Orchestrator (and references to
c.DockerSwarm.validate() and c.SystemD.validate()) to not append the default
error.

In `@server/internal/orchestrator/systemd/client.go`:
- Around line 180-194: GetUnitFilePath currently calls prop.Value.String() which
yields a debug text representation; change it to use prop.Value() and
type-assert the result to string (e.g., v, ok := prop.Value().(string)) and
return a clear error if the assertion fails so the path passed to RemoveUnitFile
is the actual filesystem path; update logging to use the asserted string value
and ensure GetUnitFilePath still returns an error when the Variant isn't a
string.
- Around line 84-105: The code is using the integer returned from
StopUnitContext as if it were the unit's process ID; treat that value as the
systemd job ID only, and when wait==true fetch the unit's MainPID (or
ActiveState) via c.conn.GetUnitProperties(name) (or equivalent) and use that
MainPID to call waitForPid, or poll the unit state until it is inactive; update
references in StopUnitContext/awaitJob handling to keep the returned int as
jobID and replace usages of pid for waitForPid with the fetched MainPID (check
for MainPID==0 and fallback to polling ActiveState).

In `@server/internal/orchestrator/systemd/dnf.go`:
- Around line 126-135: The semverRegexp and toVersion must handle optional RPM
epoch prefixes (e.g., "1:1.0.2-1"); update semverRegexp to allow an optional
leading "<digits>:" and capture the real semver (e.g., use a pattern like
^(?:\d+:)?(\d+(?:\.\d+){0,2})), then change toVersion to extract the captured
group (use regexp.FindStringSubmatch on semverRegexp and use the first capture
instead of FindString) so it returns ds.ParseVersion on the semver after the
epoch while preserving the existing invalid-format error path when no match is
found.

In `@server/internal/orchestrator/systemd/orchestrator.go`:
- Around line 218-226: The code in GenerateInstanceResources incorrectly rejects
port value 0 (the valid "assign random port" sentinel); change the checks that
currently treat FromPointer(spec.PatroniPort) == 0 and FromPointer(spec.Port) ==
0 as errors to instead detect a nil/missing pointer and only error in that case.
In other words, use the pointer presence (spec.PatroniPort and spec.Port) to
decide missing vs present, keep patroniPort/postgresPort == 0 as allowed, and
apply the same change to the duplicate checks around the other block referenced
(lines ~497-504); update the error conditions around patroniPort/postgresPort in
GenerateInstanceResources so 0 is not treated as missing.

In `@server/internal/orchestrator/systemd/pgbackrest_restore.go`:
- Around line 122-125: The restore commit (StartInstance) is already
irreversible, so failures from the subsequent bookkeeping call
p.completeTask(ctx, taskSvc, t) must not be escalated via handleError; change
the flow so that after StartInstance returns nil you treat completeTask as
best-effort/non-fatal: catch its error, log it (including error details and task
ID) and do not return handleError or mark the task as Create-failed; likewise
apply the same non-fatal handling for the similar block around lines 159-166 —
ensure you still attempt the update (preferably using a
background/non-cancelable context or retry logic) but never convert its failure
into a retryable restore failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e9c62f6-031b-4e20-90ee-e3a1cf4f777a

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7c75f and 2d1d84c.

⛔ Files ignored due to path filters (8)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (74)
  • .gitignore
  • Makefile
  • NOTICE.txt
  • api/apiv1/design/database.go
  • changes/unreleased/Added-20260309-165318.yaml
  • docs/development/running-locally.md
  • e2e/backup_restore_test.go
  • e2e/cancel_task_test.go
  • e2e/custom_db_create_test.go
  • e2e/db_create_test.go
  • e2e/db_update_add_node_test.go
  • e2e/failover_test.go
  • e2e/fixture_test.go
  • e2e/fixtures/ansible.cfg
  • e2e/load_test.go
  • e2e/minor_version_upgrade_test.go
  • e2e/service_provisioning_test.go
  • e2e/switchover_test.go
  • e2e/whole_cluster_test.go
  • go.mod
  • hack/dev-env.zsh
  • hack/pgedge-cp-env.plugin.zsh
  • lima/Makefile
  • lima/ansible.cfg
  • lima/circus.ini
  • lima/config.json
  • lima/deploy.yaml
  • lima/inventory.yaml
  • lima/lima-template.yaml
  • lima/roles/generate_test_config/tasks/main.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • lima/roles/start_vms/tasks/main.yaml
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/run.sh
  • lima/stop-dbs.yaml
  • lima/teardown.yaml
  • lima/vars.yaml
  • server/cmd/root.go
  • server/internal/api/apiv1/convert.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/api/apiv1/validate.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/app/app.go
  • server/internal/config/config.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_store.go
  • server/internal/database/orchestrator.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/ds/versions.go
  • server/internal/monitor/instance_monitor.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • server/internal/orchestrator/provide.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/orchestrator/systemd/dnf.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/packages.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/provide.go
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/systemd/unit.go
  • server/internal/postgres/hba/entry.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (29)
  • changes/unreleased/Added-20260309-165318.yaml
  • e2e/fixtures/ansible.cfg
  • e2e/custom_db_create_test.go
  • e2e/whole_cluster_test.go
  • e2e/cancel_task_test.go
  • e2e/db_update_add_node_test.go
  • e2e/load_test.go
  • lima/roles/generate_test_config/tasks/main.yaml
  • e2e/minor_version_upgrade_test.go
  • lima/ansible.cfg
  • lima/stop-dbs.yaml
  • e2e/failover_test.go
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/config.json
  • lima/lima-template.yaml
  • lima/deploy.yaml
  • api/apiv1/design/database.go
  • server/internal/workflows/activities/update_db_state.go
  • lima/teardown.yaml
  • server/internal/database/spec.go
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • NOTICE.txt
  • lima/inventory.yaml
  • lima/vars.yaml
  • lima/circus.ini
  • server/internal/orchestrator/systemd/packages.go
🚧 Files skipped from review as they are similar to previous changes (25)
  • e2e/service_provisioning_test.go
  • server/internal/workflows/delete_database.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • server/cmd/root.go
  • hack/pgedge-cp-env.plugin.zsh
  • server/internal/app/app.go
  • lima/roles/start_vms/tasks/main.yaml
  • e2e/backup_restore_test.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/postgres/hba/entry.go
  • server/internal/orchestrator/systemd/resources.go
  • Makefile
  • e2e/switchover_test.go
  • lima/run.sh
  • .gitignore
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/ds/versions.go
  • server/internal/database/instance_store.go
  • lima/roles/install_prerequisites/tasks/main.yaml
  • server/internal/database/orchestrator.go
  • e2e/db_create_test.go
  • server/internal/database/instance.go
  • server/internal/database/service.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 2d1d84c to cd7c876 Compare April 1, 2026 14:47
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/internal/database/instance_resource.go (1)

224-235: ⚠️ Potential issue | 🟠 Major

Don't persist future connection metadata on failure.

Line 250 now uses this helper from recordError(), and this helper always copies r.Spec.Port, r.Spec.PatroniPort, and r.Spec.PgEdgeVersion into the stored instance record. If an update fails before the instance actually switches over, etcd will advertise the new ports/version while the old process is still serving, which will break monitor/retry paths.

💡 Possible fix
 	opts.InstanceID = r.Spec.InstanceID
 	opts.DatabaseID = r.Spec.DatabaseID
 	opts.HostID = r.Spec.HostID
 	opts.NodeName = r.Spec.NodeName
-	opts.Port = r.Spec.Port
-	opts.PatroniPort = r.Spec.PatroniPort
-	opts.PgEdgeVersion = r.Spec.PgEdgeVersion
+	if opts.State == InstanceStateAvailable {
+		opts.Port = r.Spec.Port
+		opts.PatroniPort = r.Spec.PatroniPort
+		opts.PgEdgeVersion = r.Spec.PgEdgeVersion
+	}

Also applies to: 250-253

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/instance_resource.go` around lines 224 - 235, The
helper behavior in recordError/updateInstanceRecord currently always copies
r.Spec.Port, r.Spec.PatroniPort and r.Spec.PgEdgeVersion into the stored
instance record even on failure, which can advertise future connection metadata
prematurely; change the logic so updateInstanceRecord (and the recordError
helper it calls) do not persist these connection-related fields from r.Spec on
error paths — only set Port, PatroniPort and PgEdgeVersion on the instance
record after a successful update/hand-off (or read the live values from the
existing record when logging errors), and remove the unconditional copying of
r.Spec.Port/PatroniPort/PgEdgeVersion in recordError to ensure failed updates do
not overwrite the active advertised metadata.
🧹 Nitpick comments (4)
.golangci.yml (1)

17-20: Consider whether disabling staticcheck entirely for test files is intentional.

Adding staticcheck to the exclusions is broader than errcheck—it suppresses many useful checks (unused code, deprecated APIs, sync primitive misuse, etc.) across all test files. If specific checks are causing noise, you could selectively disable them using staticcheck's check codes (e.g., SA4006, ST1000) rather than the entire linter.

If this broad exclusion is intentional to reduce friction during development, that's fine—just wanted to flag it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.golangci.yml around lines 17 - 20, The .golangci.yml currently excludes the
entire staticcheck linter for test files (the "staticcheck" entry under the
"linters" block paired with "path: (.+)_test.go"); instead, either remove
"staticcheck" from that exclusion so tests keep full static analysis, or replace
the broad suppression by selectively disabling only the noisy staticcheck checks
using staticcheck check codes (e.g., SA4006, ST1000) via the linter's config
(staticcheck.disable or issues.exclude patterns) targeted at the test file path;
update the "linters" block and associated exclusion rules so "errcheck" remains
but staticcheck is only suppressed for specific codes rather than entirely.
lima/roles/stop_dbs/tasks/main.yaml (1)

2-4: Scope reset-failed to Patroni units.

systemctl reset-failed clears failure state for every unit on the VM. In a dev-lima box that can mask unrelated host issues when you're trying to debug a failed deploy/reset. Limit it to patroni-* so this role only touches the units it owns.

Suggested change
-- name: Reset failed services
-  ansible.builtin.command: systemctl reset-failed
+- name: Reset failed Patroni services
+  ansible.builtin.command:
+    argv:
+      - systemctl
+      - reset-failed
+      - "patroni-*"
   changed_when: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/roles/stop_dbs/tasks/main.yaml` around lines 2 - 4, The task currently
runs a global systemctl reset-failed which clears failure state for every unit;
scope it to only Patroni units by changing the command from "systemctl
reset-failed" to "systemctl reset-failed patroni-*" in the ansible task (the
task named "Reset failed services" in lima/roles/stop_dbs/tasks/main.yaml),
keeping the existing ansible.builtin.command usage and changed_when: false so
only patroni-* units are affected.
server/internal/orchestrator/systemd/dnf_test.go (1)

82-89: Consider adding explicit NoError assertion for clarity.

When expectedErr is false, the test only checks the output value. Adding an explicit assert.NoError(t, err) would make test failures more informative when parsing unexpectedly fails.

♻️ Suggested improvement
 		t.Run(tc.in, func(t *testing.T) {
 			out, err := toVersion(tc.in)
 			if tc.expectedErr {
 				assert.Error(t, err)
 			} else {
+				assert.NoError(t, err)
 				assert.Equal(t, tc.expected, out)
 			}
 		})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/systemd/dnf_test.go` around lines 82 - 89, The
test loop for toVersion currently only asserts the returned output when
tc.expectedErr is false; update the t.Run block to also assert NoError on err in
the success case so failures show parsing errors explicitly (use
assert.NoError(t, err) when tc.expectedErr is false before asserting equality on
out). Locate the anonymous test function where toVersion is called and the
assertions are made and add the NoError assertion conditioned on tc.expectedErr.
server/internal/database/instance_store.go (1)

40-45: Consider extracting duplicated now() logic.

Both InstanceUpdateOptions.now() and InstanceStateUpdateOptions.now() have identical implementations. This could be extracted into a shared helper to reduce duplication.

♻️ Optional: Extract shared helper
func resolveTimestamp(t time.Time) time.Time {
	if !t.IsZero() {
		return t
	}
	return time.Now()
}

Then use resolveTimestamp(o.Now) in both option types.

Also applies to: 83-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/instance_store.go` around lines 40 - 45, Both
InstanceUpdateOptions.now() and InstanceStateUpdateOptions.now() have identical
logic; extract that into a shared helper (e.g., resolveTimestamp) that takes a
time.Time and returns the provided time if not zero or time.Now() otherwise,
then replace both InstanceUpdateOptions.now() and
InstanceStateUpdateOptions.now() implementations to call resolveTimestamp(o.Now)
/ resolveTimestamp(s.Now) respectively to remove duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hack/dev-env.zsh`:
- Around line 320-326: Help text for cp-psql is out of sync with the default
selection logic: when CP_ENV==dev-lima the code sets method=local but the helper
function _cp-psql-help() still states the default is "docker exec". Update the
_cp-psql-help() function to reflect the conditional default (show "local" as the
default when CP_ENV=dev-lima and "docker exec" otherwise), using the same CP_ENV
and method logic or interpolating a computed default string so the help text
matches the branch that sets method.

In `@server/internal/api/apiv1/validate.go`:
- Around line 267-276: The current SystemD branch only checks
db.Port/db.PatroniPort vs node.Port/node.PatroniPort separately and misses
cross-scope conflicts for each node; after the existing checks in the
config.OrchestratorSystemD case, iterate each node in db.Nodes and compute the
resolved values (use node.Port if non-nil else db.Port, and node.PatroniPort if
non-nil else db.PatroniPort) and run the same validation that ensures the
effective pair is valid (reuse/call validatePorts or the same validation logic)
and append errors with newValidationError using appendPath(path, "nodes",
nodeName, "port"/"patroni_port") so per-node resolved-port issues are reported.

In `@server/internal/orchestrator/systemd/client.go`:
- Around line 185-200: The UnitExists implementation currently uses
c.conn.ListUnitsContext and thus only detects loaded units; change
Client.UnitExists to call c.conn.ListUnitFilesContext(ctx) and iterate the
returned unit files for a matching name (or name with ".service"/appropriate
suffix) and return nil if found, otherwise return ErrUnitNotFound; update the
logic that currently uses ListUnitsContext (the similar loop later in the file)
the same way, or if you intended to check runtime state instead, rename and
document the method (e.g., UnitIsLoaded) and keep ListUnitsContext there while
providing a new UnitFileExists that uses ListUnitFilesContext; ensure references
to ErrUnitNotFound and callers like GetMainPID keep consistent semantics.

---

Outside diff comments:
In `@server/internal/database/instance_resource.go`:
- Around line 224-235: The helper behavior in recordError/updateInstanceRecord
currently always copies r.Spec.Port, r.Spec.PatroniPort and r.Spec.PgEdgeVersion
into the stored instance record even on failure, which can advertise future
connection metadata prematurely; change the logic so updateInstanceRecord (and
the recordError helper it calls) do not persist these connection-related fields
from r.Spec on error paths — only set Port, PatroniPort and PgEdgeVersion on the
instance record after a successful update/hand-off (or read the live values from
the existing record when logging errors), and remove the unconditional copying
of r.Spec.Port/PatroniPort/PgEdgeVersion in recordError to ensure failed updates
do not overwrite the active advertised metadata.

---

Nitpick comments:
In @.golangci.yml:
- Around line 17-20: The .golangci.yml currently excludes the entire staticcheck
linter for test files (the "staticcheck" entry under the "linters" block paired
with "path: (.+)_test.go"); instead, either remove "staticcheck" from that
exclusion so tests keep full static analysis, or replace the broad suppression
by selectively disabling only the noisy staticcheck checks using staticcheck
check codes (e.g., SA4006, ST1000) via the linter's config (staticcheck.disable
or issues.exclude patterns) targeted at the test file path; update the "linters"
block and associated exclusion rules so "errcheck" remains but staticcheck is
only suppressed for specific codes rather than entirely.

In `@lima/roles/stop_dbs/tasks/main.yaml`:
- Around line 2-4: The task currently runs a global systemctl reset-failed which
clears failure state for every unit; scope it to only Patroni units by changing
the command from "systemctl reset-failed" to "systemctl reset-failed patroni-*"
in the ansible task (the task named "Reset failed services" in
lima/roles/stop_dbs/tasks/main.yaml), keeping the existing
ansible.builtin.command usage and changed_when: false so only patroni-* units
are affected.

In `@server/internal/database/instance_store.go`:
- Around line 40-45: Both InstanceUpdateOptions.now() and
InstanceStateUpdateOptions.now() have identical logic; extract that into a
shared helper (e.g., resolveTimestamp) that takes a time.Time and returns the
provided time if not zero or time.Now() otherwise, then replace both
InstanceUpdateOptions.now() and InstanceStateUpdateOptions.now() implementations
to call resolveTimestamp(o.Now) / resolveTimestamp(s.Now) respectively to remove
duplication.

In `@server/internal/orchestrator/systemd/dnf_test.go`:
- Around line 82-89: The test loop for toVersion currently only asserts the
returned output when tc.expectedErr is false; update the t.Run block to also
assert NoError on err in the success case so failures show parsing errors
explicitly (use assert.NoError(t, err) when tc.expectedErr is false before
asserting equality on out). Locate the anonymous test function where toVersion
is called and the assertions are made and add the NoError assertion conditioned
on tc.expectedErr.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8fd3c04d-da1a-4783-b9a2-96d4916523b5

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1d84c and cd7c876.

⛔ Files ignored due to path filters (8)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (76)
  • .gitignore
  • .golangci.yml
  • Makefile
  • NOTICE.txt
  • api/apiv1/design/database.go
  • changes/unreleased/Added-20260309-165318.yaml
  • docs/development/running-locally.md
  • e2e/backup_restore_test.go
  • e2e/cancel_task_test.go
  • e2e/custom_db_create_test.go
  • e2e/db_create_test.go
  • e2e/db_update_add_node_test.go
  • e2e/failover_test.go
  • e2e/fixture_test.go
  • e2e/fixtures/ansible.cfg
  • e2e/load_test.go
  • e2e/minor_version_upgrade_test.go
  • e2e/service_provisioning_test.go
  • e2e/switchover_test.go
  • e2e/whole_cluster_test.go
  • go.mod
  • hack/dev-env.zsh
  • hack/pgedge-cp-env.plugin.zsh
  • lima/Makefile
  • lima/ansible.cfg
  • lima/circus.ini
  • lima/config.json
  • lima/deploy.yaml
  • lima/inventory.yaml
  • lima/lima-template.yaml
  • lima/roles/generate_test_config/tasks/main.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • lima/roles/start_vms/tasks/main.yaml
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/run.sh
  • lima/stop-dbs.yaml
  • lima/teardown.yaml
  • lima/vars.yaml
  • server/cmd/root.go
  • server/internal/api/apiv1/convert.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/api/apiv1/validate.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/app/app.go
  • server/internal/config/config.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_store.go
  • server/internal/database/orchestrator.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/ds/versions.go
  • server/internal/monitor/instance_monitor.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • server/internal/orchestrator/provide.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/orchestrator/systemd/dnf.go
  • server/internal/orchestrator/systemd/dnf_test.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/packages.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/provide.go
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/systemd/unit.go
  • server/internal/postgres/hba/entry.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (28)
  • e2e/whole_cluster_test.go
  • .gitignore
  • lima/roles/generate_test_config/tasks/main.yaml
  • lima/stop-dbs.yaml
  • e2e/custom_db_create_test.go
  • lima/lima-template.yaml
  • server/cmd/root.go
  • e2e/switchover_test.go
  • e2e/load_test.go
  • lima/ansible.cfg
  • changes/unreleased/Added-20260309-165318.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/config.json
  • server/internal/api/apiv1/post_init_handlers.go
  • e2e/minor_version_upgrade_test.go
  • NOTICE.txt
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/vars.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/database/orchestrator.go
  • lima/inventory.yaml
  • Makefile
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • lima/circus.ini
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
🚧 Files skipped from review as they are similar to previous changes (26)
  • lima/teardown.yaml
  • e2e/fixtures/ansible.cfg
  • e2e/db_create_test.go
  • server/internal/orchestrator/provide.go
  • e2e/service_provisioning_test.go
  • server/internal/workflows/activities/update_db_state.go
  • e2e/failover_test.go
  • e2e/cancel_task_test.go
  • server/internal/workflows/delete_database.go
  • lima/roles/start_vms/tasks/main.yaml
  • api/apiv1/design/database.go
  • lima/deploy.yaml
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/database/spec.go
  • server/internal/postgres/hba/entry.go
  • server/internal/api/apiv1/convert.go
  • server/internal/orchestrator/systemd/provide.go
  • server/internal/database/instance.go
  • server/internal/ds/versions.go
  • lima/roles/install_prerequisites/tasks/main.yaml
  • e2e/fixture_test.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/orchestrator/systemd/unit.go
  • server/internal/database/service.go
  • server/internal/orchestrator/systemd/packages.go
  • e2e/backup_restore_test.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from cd7c876 to a574a99 Compare April 1, 2026 16:13
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/internal/orchestrator/swarm/orchestrator.go (1)

587-636: ⚠️ Potential issue | 🟠 Major

Honor the supplied port overrides here.

server/internal/database/service.go and server/internal/database/instance_resource.go now pass the desired Postgres/Patroni ports into this method, but this implementation still derives ClientPort from container inspection and hardcodes PatroniPort to 8888. After a port change, the Swarm path can keep returning stale or zero connection info until the service is recreated.

Possible fix
 	var clientPort int
-	// Some configurations won't expose the database port directly, such as
-	// those that use a pooler or load balancer in front of Postgres.
-	binding, ok := inspect.NetworkSettings.Ports[dbPort]
-	if ok && len(binding) > 0 {
+	if postgresPort != nil && *postgresPort > 0 {
+		clientPort = *postgresPort
+	} else if binding, ok := inspect.NetworkSettings.Ports[dbPort]; ok && len(binding) > 0 {
 		p, err := strconv.Atoi(binding[0].HostPort)
 		if err != nil {
 			return nil, fmt.Errorf("failed to parse client port %q: %w", binding[0].HostPort, err)
 		}
 		clientPort = p
 	}
+
+	resolvedPatroniPort := 8888
+	if patroniPort != nil && *patroniPort > 0 {
+		resolvedPatroniPort = *patroniPort
+	}
 
 	return &database.ConnectionInfo{
@@
-		PatroniPort:      8888,
+		PatroniPort:      resolvedPatroniPort,

Based on learnings, server/internal/database/instance_resource.go intentionally rebuilds connection info from desired spec values before Reload() / initializeInstance() completes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/swarm/orchestrator.go` around lines 587 - 636,
GetInstanceConnectionInfo currently ignores the supplied postgresPort and
patroniPort parameters and derives ClientPort from container inspection and
hardcodes PatroniPort to 8888; update GetInstanceConnectionInfo to honor the
pointers passed in: if postgresPort != nil set AdminPort, PeerPort and
ClientPort to *postgresPort (instead of hardcoded 5432 / inspected port), and if
patroniPort != nil set PatroniPort = *patroniPort (otherwise keep the current
defaults/inspection behavior); adjust the creation of the returned
database.ConnectionInfo (fields AdminPort, PeerPort, ClientPort, PatroniPort)
accordingly so callers’ overrides are respected.
🧹 Nitpick comments (5)
server/internal/postgres/hba/entry.go (1)

88-94: Minor: else branch is unreachable dead code.

If net.ParseIP() succeeds (line 81-84), then ip.To16() will always return non-nil because Go internally represents all valid IPs as 16-byte slices. The else branch returning "" can never execute.

This is harmless defensive code, so feel free to keep it or simplify:

♻️ Optional simplification
 	var cidr *net.IPNet
 	if ipv4 := ip.To4(); ipv4 != nil {
 		cidr = &net.IPNet{IP: ipv4, Mask: net.CIDRMask(32, 32)}
-	} else if ipv6 := ip.To16(); ipv6 != nil {
-		cidr = &net.IPNet{IP: ipv6, Mask: net.CIDRMask(128, 128)}
 	} else {
-		return ""
+		cidr = &net.IPNet{IP: ip.To16(), Mask: net.CIDRMask(128, 128)}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/hba/entry.go` around lines 88 - 94, The trailing
else branch that returns "" after checking ip.To4() and ip.To16() is unreachable
because net.ParseIP guarantees ip.To16() is non-nil for any parsed IP; remove
the unreachable else { return "" } and simplify the logic to set cidr from
ip.To4() when non-nil otherwise use ip.To16() (i.e., fall through to the IPv6
case), ensuring cidr is always assigned for valid parsed IPs and no dead-return
remains; update references around ip, cidr, ip.To4(), and ip.To16() accordingly.
lima/Makefile (1)

18-23: Consider adding CGO_ENABLED=0 for consistency.

The top-level Makefile sets CGO_ENABLED=0 for both dev-build and ci-compose-build, but this target doesn't. For consistent static binary builds across all environments, consider adding it.

Suggested fix
 .PHONY: build
 build:
-	GOOS=linux go build \
+	CGO_ENABLED=0 GOOS=linux go build \
 		-gcflags "all=-N -l" \
 		-o ./pgedge-control-plane \
 		$(shell pwd)/../server
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/Makefile` around lines 18 - 23, The build target `build` should set
CGO_ENABLED=0 to match the top-level Makefile's static build behavior; update
the `build` recipe that produces `./pgedge-control-plane` from `$(shell
pwd)/../server` to prefix the go build invocation with `CGO_ENABLED=0` so the
produced binary is built consistently without cgo.
lima/run.sh (2)

39-42: Error message should go to stderr.

The error message on line 40 is written to stdout. For consistency with other error handling in the script (line 51) and shell conventions, redirect to stderr.

Suggested fix
         *)
-            echo "unrecognized hostname ${HOSTNAME}"
+            echo "unrecognized hostname ${HOSTNAME}" >&2
             exit 1
             ;;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/run.sh` around lines 39 - 42, The error echo in the default case of the
hostname switch writes to stdout; change it to write to stderr. Locate the case
pattern handling the unknown HOSTNAME (the *) block in run.sh and update the
echo that prints "unrecognized hostname ${HOSTNAME}" to redirect its output to
stderr (consistent with the later error handling approach used on line 51), then
keep the exit 1 as-is.

57-65: Quote variable expansions to prevent word splitting.

Several variable expansions are unquoted. While paths are likely controlled in this dev environment, quoting prevents issues if paths ever contain spaces.

Suggested fix
 # Copy the binary to another location so that we can safely rebuild on the host
 # without disrupting the running server.
-cp ${LIMA_DIR}/pgedge-control-plane /usr/sbin
+cp "${LIMA_DIR}/pgedge-control-plane" /usr/sbin

 # Set environment variable configuration.
 set_env

 # The sed prefixes each output line with the host ID.
 /usr/sbin/pgedge-control-plane run \
-    --config-path ${LIMA_DIR}/config.json \
+    --config-path "${LIMA_DIR}/config.json" \
     2>&1 | sed "s/^/[${PGEDGE_HOST_ID}] /"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/run.sh` around lines 57 - 65, The script uses unquoted variable
expansions which can cause word-splitting; update cp and the
pgedge-control-plane invocation to quote paths and the host ID: change cp
${LIMA_DIR}/pgedge-control-plane to cp "${LIMA_DIR}/pgedge-control-plane",
change --config-path ${LIMA_DIR}/config.json to --config-path
"${LIMA_DIR}/config.json", and ensure the PGEDGE_HOST_ID expansion in the sed
expression is safely handled (e.g., use sed "s/^/[${PGEDGE_HOST_ID}] /" but with
PGEDGE_HOST_ID set/quoted where used) so all references to LIMA_DIR and
PGEDGE_HOST_ID are double-quoted to prevent word-splitting; keep the set_env and
pgedge-control-plane run invocation otherwise unchanged.
server/internal/orchestrator/systemd/dnf.go (1)

29-42: Consider logging dnf failures for debugging.

When dnf list --installed fails with output other than "no matching packages", the error includes the output but there's no structured logging. For production debugging, consider logging the failure at warn/error level before returning.

♻️ Optional: Add structured logging for dnf failures
 func (d *Dnf) InstalledPostgresVersions(ctx context.Context) ([]*InstalledPostgres, error) {
 	ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
 	defer cancel()
 
 	args := append([]string{"list", "--installed"}, supportedDnfPackages()...)
 	cmd := exec.CommandContext(ctx, "dnf", args...)
 
 	out, err := cmd.CombinedOutput()
 	if err != nil {
 		if strings.Contains(strings.ToLower(string(out)), "no matching packages to list") {
 			return nil, nil
 		}
+		// Note: Consider injecting a logger if debugging becomes necessary
 		return nil, fmt.Errorf("failed to execute command: %w, output: %s", err, string(out))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/systemd/dnf.go` around lines 29 - 42, The
Dnf.InstalledPostgresVersions function currently returns an error from
exec.CommandContext(...) without logging details; before returning the wrapped
error in the error branch (after cmd.CombinedOutput and the strings.Contains
check) add a structured log call at warn/error level that records the command,
args (supportedDnfPackages()), the error value and the command output, then
return the same fmt.Errorf(...) as before; if the Dnf struct exposes a logger
field (e.g. d.logger or d.log) use that to log, otherwise use the package logger
you have in this package, ensuring the log message includes "dnf list
--installed" context, err and string(out).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/internal/database/service.go`:
- Around line 525-539: The current code mixes service placement hosts into the
host set used for version selection and compatibility checks; change the logic
to maintain two distinct sets: one (e.g., nodeHostIDs) populated only from
spec.Nodes (used for version computation via host.GreatestCommonDefaultVersion
and for h.Supports(specVersion) checks) and another (e.g., allHostIDs or
hostIDs) populated from both spec.Nodes and spec.Services (used only for
existence validation and passed to s.hostSvc.GetHosts and validateHostIDs).
Update references so host.GreatestCommonDefaultVersion(hosts...) and any
per-host compatibility loops use hosts resolved from nodeHostIDs, while
s.hostSvc.GetHosts(allHostIDs.ToSlice()) and validateHostIDs(allHostIDs, hosts)
continue to verify existence for service placement hosts.
- Around line 298-324: In UpdateInstanceState, do not create and persist a
partial StoredInstance when GetByKey returns storage.ErrNotFound (avoid calling
NewStoredInstance with only InstanceStateUpdateOptions); instead either fetch
the full spec (e.g., load InstanceSpec for opts.DatabaseID/InstanceID and
hydrate Port, PatroniPort, PgEdgeVersion, HostID, NodeName) and then apply opts
before Put, or return a clear ErrInstanceNotFound so callers don’t upsert
incomplete rows; update the flow in Service.UpdateInstanceState (and related
NewStoredInstance/InstanceStateUpdateOptions usage) to prevent persisting
partial records that break GetInstanceConnectionInfo/SystemD.

In `@server/internal/orchestrator/systemd/unit.go`:
- Around line 95-100: The error messages for EnableUnit and RestartUnit
incorrectly reference the file path variable `path` instead of the unit
identifier `r.Name`; update the fmt.Errorf calls inside the block that calls
`client.EnableUnit(ctx, r.Name)` and `client.RestartUnit(ctx, r.Name)` to use
`r.Name` in the formatted message (e.g., "failed to enable unit '%s': %w" and
"failed to restart unit '%s': %w") so the logs report the actual systemd unit
name rather than the file path.

---

Outside diff comments:
In `@server/internal/orchestrator/swarm/orchestrator.go`:
- Around line 587-636: GetInstanceConnectionInfo currently ignores the supplied
postgresPort and patroniPort parameters and derives ClientPort from container
inspection and hardcodes PatroniPort to 8888; update GetInstanceConnectionInfo
to honor the pointers passed in: if postgresPort != nil set AdminPort, PeerPort
and ClientPort to *postgresPort (instead of hardcoded 5432 / inspected port),
and if patroniPort != nil set PatroniPort = *patroniPort (otherwise keep the
current defaults/inspection behavior); adjust the creation of the returned
database.ConnectionInfo (fields AdminPort, PeerPort, ClientPort, PatroniPort)
accordingly so callers’ overrides are respected.

---

Nitpick comments:
In `@lima/Makefile`:
- Around line 18-23: The build target `build` should set CGO_ENABLED=0 to match
the top-level Makefile's static build behavior; update the `build` recipe that
produces `./pgedge-control-plane` from `$(shell pwd)/../server` to prefix the go
build invocation with `CGO_ENABLED=0` so the produced binary is built
consistently without cgo.

In `@lima/run.sh`:
- Around line 39-42: The error echo in the default case of the hostname switch
writes to stdout; change it to write to stderr. Locate the case pattern handling
the unknown HOSTNAME (the *) block in run.sh and update the echo that prints
"unrecognized hostname ${HOSTNAME}" to redirect its output to stderr (consistent
with the later error handling approach used on line 51), then keep the exit 1
as-is.
- Around line 57-65: The script uses unquoted variable expansions which can
cause word-splitting; update cp and the pgedge-control-plane invocation to quote
paths and the host ID: change cp ${LIMA_DIR}/pgedge-control-plane to cp
"${LIMA_DIR}/pgedge-control-plane", change --config-path ${LIMA_DIR}/config.json
to --config-path "${LIMA_DIR}/config.json", and ensure the PGEDGE_HOST_ID
expansion in the sed expression is safely handled (e.g., use sed
"s/^/[${PGEDGE_HOST_ID}] /" but with PGEDGE_HOST_ID set/quoted where used) so
all references to LIMA_DIR and PGEDGE_HOST_ID are double-quoted to prevent
word-splitting; keep the set_env and pgedge-control-plane run invocation
otherwise unchanged.

In `@server/internal/orchestrator/systemd/dnf.go`:
- Around line 29-42: The Dnf.InstalledPostgresVersions function currently
returns an error from exec.CommandContext(...) without logging details; before
returning the wrapped error in the error branch (after cmd.CombinedOutput and
the strings.Contains check) add a structured log call at warn/error level that
records the command, args (supportedDnfPackages()), the error value and the
command output, then return the same fmt.Errorf(...) as before; if the Dnf
struct exposes a logger field (e.g. d.logger or d.log) use that to log,
otherwise use the package logger you have in this package, ensuring the log
message includes "dnf list --installed" context, err and string(out).

In `@server/internal/postgres/hba/entry.go`:
- Around line 88-94: The trailing else branch that returns "" after checking
ip.To4() and ip.To16() is unreachable because net.ParseIP guarantees ip.To16()
is non-nil for any parsed IP; remove the unreachable else { return "" } and
simplify the logic to set cidr from ip.To4() when non-nil otherwise use
ip.To16() (i.e., fall through to the IPv6 case), ensuring cidr is always
assigned for valid parsed IPs and no dead-return remains; update references
around ip, cidr, ip.To4(), and ip.To16() accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8c861282-bd6b-4a57-b473-0f31a1d912f7

📥 Commits

Reviewing files that changed from the base of the PR and between cd7c876 and a574a99.

⛔ Files ignored due to path filters (8)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (76)
  • .gitignore
  • .golangci.yml
  • Makefile
  • NOTICE.txt
  • api/apiv1/design/database.go
  • changes/unreleased/Added-20260309-165318.yaml
  • docs/development/running-locally.md
  • e2e/backup_restore_test.go
  • e2e/cancel_task_test.go
  • e2e/custom_db_create_test.go
  • e2e/db_create_test.go
  • e2e/db_update_add_node_test.go
  • e2e/failover_test.go
  • e2e/fixture_test.go
  • e2e/fixtures/ansible.cfg
  • e2e/load_test.go
  • e2e/minor_version_upgrade_test.go
  • e2e/service_provisioning_test.go
  • e2e/switchover_test.go
  • e2e/whole_cluster_test.go
  • go.mod
  • hack/dev-env.zsh
  • hack/pgedge-cp-env.plugin.zsh
  • lima/Makefile
  • lima/ansible.cfg
  • lima/circus.ini
  • lima/config.json
  • lima/deploy.yaml
  • lima/inventory.yaml
  • lima/lima-template.yaml
  • lima/roles/generate_test_config/tasks/main.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • lima/roles/start_vms/tasks/main.yaml
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/run.sh
  • lima/stop-dbs.yaml
  • lima/teardown.yaml
  • lima/vars.yaml
  • server/cmd/root.go
  • server/internal/api/apiv1/convert.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/api/apiv1/validate.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/app/app.go
  • server/internal/config/config.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_store.go
  • server/internal/database/orchestrator.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/ds/versions.go
  • server/internal/monitor/instance_monitor.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • server/internal/orchestrator/provide.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/orchestrator/systemd/dnf.go
  • server/internal/orchestrator/systemd/dnf_test.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/packages.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/provide.go
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/systemd/unit.go
  • server/internal/postgres/hba/entry.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (28)
  • .golangci.yml
  • .gitignore
  • server/cmd/root.go
  • lima/stop-dbs.yaml
  • changes/unreleased/Added-20260309-165318.yaml
  • e2e/custom_db_create_test.go
  • lima/ansible.cfg
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • lima/teardown.yaml
  • server/internal/app/app.go
  • lima/lima-template.yaml
  • server/internal/orchestrator/systemd/resources.go
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • e2e/minor_version_upgrade_test.go
  • e2e/db_create_test.go
  • lima/config.json
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • server/internal/orchestrator/systemd/dnf_test.go
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/roles/start_vms/tasks/main.yaml
  • lima/vars.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • server/internal/orchestrator/systemd/patroni_unit.go
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/inventory.yaml
  • lima/circus.ini
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
🚧 Files skipped from review as they are similar to previous changes (26)
  • e2e/fixtures/ansible.cfg
  • e2e/whole_cluster_test.go
  • e2e/cancel_task_test.go
  • lima/roles/generate_test_config/tasks/main.yaml
  • server/internal/workflows/activities/update_db_state.go
  • e2e/load_test.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • e2e/failover_test.go
  • server/internal/api/apiv1/post_init_handlers.go
  • e2e/service_provisioning_test.go
  • lima/deploy.yaml
  • server/internal/monitor/instance_monitor.go
  • server/internal/database/spec.go
  • api/apiv1/design/database.go
  • server/internal/api/apiv1/convert.go
  • hack/pgedge-cp-env.plugin.zsh
  • server/internal/ds/versions.go
  • server/internal/workflows/delete_database.go
  • e2e/backup_restore_test.go
  • server/internal/config/config.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/orchestrator/systemd/packages.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/api/apiv1/validate.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/orchestrator/systemd/orchestrator.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from a574a99 to 940cc99 Compare April 2, 2026 13:29
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/internal/database/spec.go (1)

560-580: ⚠️ Potential issue | 🟠 Major

Clone() drops InPlaceRestore state

InstanceSpec.Clone() now copies AllHostIDs (Line 579), but it still does not copy InPlaceRestore (defined at Line 526). This can silently flip cloned specs from true to false.

Proposed fix
 	return &InstanceSpec{
 		InstanceID:       s.InstanceID,
 		TenantID:         utils.ClonePointer(s.TenantID),
 		DatabaseID:       s.DatabaseID,
 		HostID:           s.HostID,
 		DatabaseName:     s.DatabaseName,
 		NodeName:         s.NodeName,
 		NodeOrdinal:      s.NodeOrdinal,
 		PgEdgeVersion:    s.PgEdgeVersion.Clone(),
 		Port:             utils.ClonePointer(s.Port),
 		PatroniPort:      utils.ClonePointer(s.PatroniPort),
 		CPUs:             s.CPUs,
 		MemoryBytes:      s.MemoryBytes,
 		DatabaseUsers:    users,
 		BackupConfig:     s.BackupConfig.Clone(),
 		RestoreConfig:    s.RestoreConfig.Clone(),
 		PostgreSQLConf:   maps.Clone(s.PostgreSQLConf),
 		ClusterSize:      s.ClusterSize,
 		OrchestratorOpts: s.OrchestratorOpts.Clone(),
+		InPlaceRestore:   s.InPlaceRestore,
 		AllHostIDs:       slices.Clone(s.AllHostIDs),
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/spec.go` around lines 560 - 580,
InstanceSpec.Clone() currently omits the InPlaceRestore field so cloned specs
lose that state; update the Clone method to copy InPlaceRestore (e.g. set
InPlaceRestore: utils.ClonePointer(s.InPlaceRestore) if it's a *bool or
InPlaceRestore: s.InPlaceRestore if it's a value) alongside the other fields
(InstanceSpec, TenantID, AllHostIDs, etc.) so clones preserve the original
InPlaceRestore value.
♻️ Duplicate comments (1)
go.mod (1)

12-12: ⚠️ Potential issue | 🟠 Major

github.com/docker/docker is still pinned to the OSV-flagged release.

Line 12 remains on v27.1.1+incompatible, so the HIGH-severity Docker Engine advisories already called out on this PR are still present. Please move to a patched version and refresh module metadata before merge.

Verify the pinned version and current OSV results with:

#!/bin/bash
set -euo pipefail

rg -n 'github.com/docker/docker' go.mod

version=$(awk '/github.com\/docker\/docker/{print $2}' go.mod)

response=$(
  jq -n --arg version "$version" '{
    version: $version,
    package: {
      ecosystem: "Go",
      name: "github.com/docker/docker"
    }
  }' |
  curl -s https://api.osv.dev/v1/query \
    -H 'Content-Type: application/json' \
    --data `@-`
)

echo "$response" | jq --arg version "$version" '{version: $version, vulns: [.vulns[]?.id]}'

This should return a non-empty vulns array for the currently pinned version.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 12, The go.mod entry for the Docker module
("github.com/docker/docker" currently pinned as v27.1.1+incompatible) is on an
OSV-flagged release; update that module to a patched release (select the latest
non-vulnerable tag) and refresh module metadata so the vulnerability is
resolved. Change the version token for "github.com/docker/docker" in go.mod to
the patched version, run the module update/cleanup (e.g., fetch the new version
and run module tidy) and re-run the provided OSV verification snippet to confirm
no vulns remain; reference the go.mod line for "github.com/docker/docker" when
making the change. Ensure go.sum is updated and CI passes before merging.
🧹 Nitpick comments (2)
server/internal/database/spec.go (1)

652-674: Avoid sharing the same AllHostIDs slice across all instances

At Line 673, every InstanceSpec gets the same backing slice (allHostIDs). A later in-place mutation on one instance can leak into others. Safer to clone on assignment.

Proposed refactor
-				AllHostIDs:       allHostIDs,
+				AllHostIDs:       slices.Clone(allHostIDs),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/spec.go` around lines 652 - 674, The code assigns
the same backing slice allHostIDs into every InstanceSpec.AllHostIDs, risking
cross-instance mutation; in the block that constructs InstanceSpec (the
instances[...] = &InstanceSpec{...}) make a defensive copy of allHostIDs (e.g.,
create a new slice and copy the elements) and assign that copy to AllHostIDs so
each InstanceSpec has its own slice; update the construction for InstanceSpec to
use the cloned slice rather than the shared allHostIDs.
server/internal/config/config.go (1)

121-131: Minor: Inconsistent error message formatting.

The error messages here omit the field prefix colon used elsewhere in the codebase (e.g., "bind_addr cannot be empty" vs "bind_addr: cannot be empty"). This is a minor inconsistency but doesn't affect functionality.

♻️ Optional: Align error message format
 func (s SystemD) validate() []error {
 	var errs []error
 	if s.PgBackRestPath == "" {
-		errs = append(errs, errors.New("pgbackrest_path cannot be empty"))
+		errs = append(errs, errors.New("pgbackrest_path: cannot be empty"))
 	}
 	if s.PatroniPath == "" {
-		errs = append(errs, errors.New("patroni_path cannot be empty"))
+		errs = append(errs, errors.New("patroni_path: cannot be empty"))
 	}
 
 	return errs
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/config/config.go` around lines 121 - 131, Update
SystemD.validate() to make its error messages consistent with the rest of the
codebase by adding the colon after the field name; specifically, change the
errors produced for PgBackRestPath and PatroniPath in the validate method
(SystemD.validate) from "pgbackrest_path cannot be empty" and "patroni_path
cannot be empty" to "pgbackrest_path: cannot be empty" and "patroni_path: cannot
be empty" so they match the existing "bind_addr: cannot be empty" style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/development/running-locally.md`:
- Around line 32-36: The prerequisites list is contradictory about Restish;
update the "Restish" bullet in docs/development/running-locally.md to clearly
indicate whether it's optional (recommended) or required and make it consistent
with the later "Restish" section — e.g., change the list entry to "Restish
(optional, recommended) — see the Restish section below for usage and
configuration" or remove it from the required list and add a short parenthetical
that points to the Restish section for optional setup details so both places
convey the same requirement level.

In `@hack/dev-env.zsh`:
- Around line 219-222: The code extracts ip_addr and port from conn_info using
jq but doesn't handle missing values, so validate after parsing that local
variables ip_addr and port are non-empty (and not "null") before they are used
to build the psql connection; if either is missing, print a clear error and
exit. Locate the local ip_addr/port assignments in hack/dev-env.zsh (the
conn_info parsing block) and add a guard that checks the parsed values (e.g.,
test -z or comparison against "null") and fails with a friendly message
explaining connection_info is incomplete and which field is missing.

In `@lima/Makefile`:
- Around line 20-23: The linux cross-build invocation using "GOOS=linux go build
-gcflags \"all=-N -l\" -o ./pgedge-control-plane $(shell pwd)/../server" should
set CGO_ENABLED=0 to disable cgo during cross-compilation; update that build
line to export CGO_ENABLED=0 alongside GOOS (i.e., prepend or include
CGO_ENABLED=0) so the target that produces ./pgedge-control-plane matches other
Linux build targets and won't fail on macOS when cgo dependencies exist.

In `@lima/run.sh`:
- Around line 57-64: The script uses unquoted variable expansions (notably
LIMA_DIR) which will break on paths containing spaces; update all occurrences to
use quoted expansions — e.g., quote LIMA_DIR in the cp invocation (cp
"${LIMA_DIR}/pgedge-control-plane" /usr/sbin), quote assignments created by
set_env (ensure variables set in that function are quoted when expanded), and
quote the --config-path argument passed to pgedge-control-plane run
(--config-path "${LIMA_DIR}/config.json"); search for other uses of LIMA_DIR and
any other path variables in this file and wrap them in double quotes to prevent
word splitting.

In `@server/internal/orchestrator/systemd/orchestrator.go`:
- Around line 482-489: The validation currently only checks ports independently;
add a check in the same validation block (where validatePort is called for
prevPort/prevPatroniPort and ch.Current.Port/ch.Current.PatroniPort) to reject
configurations where ch.Current.Port == ch.Current.PatroniPort (and likewise
compare prevPort vs prevPatroniPort if relevant), set result.Valid = false and
append a clear error via result.Errors (e.g., "postgres and patroni ports must
not be identical"), so GenerateInstanceResources / PatroniConfigGeneratorOptions
won't receive identical ports.

In `@server/internal/orchestrator/systemd/pgbackrest_restore.go`:
- Around line 107-115: The restore currently calls runRestoreCmd before
renameDataDir which causes the restore command (built against
Paths.Instance.PgData()) to write into the live data directory or have the
restored data immediately moved out; swap the calls so renameDataDir() is
invoked before runRestoreCmd() to ensure Paths.Instance.PgData() points at the
intended pgdata-restore location (PgDataRestore()) when building/executing the
restore, i.e., call p.renameDataDir(fs) before p.runRestoreCmd(ctx, orch,
logger, taskSvc) and keep subsequent StartInstance usage unchanged.

---

Outside diff comments:
In `@server/internal/database/spec.go`:
- Around line 560-580: InstanceSpec.Clone() currently omits the InPlaceRestore
field so cloned specs lose that state; update the Clone method to copy
InPlaceRestore (e.g. set InPlaceRestore: utils.ClonePointer(s.InPlaceRestore) if
it's a *bool or InPlaceRestore: s.InPlaceRestore if it's a value) alongside the
other fields (InstanceSpec, TenantID, AllHostIDs, etc.) so clones preserve the
original InPlaceRestore value.

---

Duplicate comments:
In `@go.mod`:
- Line 12: The go.mod entry for the Docker module ("github.com/docker/docker"
currently pinned as v27.1.1+incompatible) is on an OSV-flagged release; update
that module to a patched release (select the latest non-vulnerable tag) and
refresh module metadata so the vulnerability is resolved. Change the version
token for "github.com/docker/docker" in go.mod to the patched version, run the
module update/cleanup (e.g., fetch the new version and run module tidy) and
re-run the provided OSV verification snippet to confirm no vulns remain;
reference the go.mod line for "github.com/docker/docker" when making the change.
Ensure go.sum is updated and CI passes before merging.

---

Nitpick comments:
In `@server/internal/config/config.go`:
- Around line 121-131: Update SystemD.validate() to make its error messages
consistent with the rest of the codebase by adding the colon after the field
name; specifically, change the errors produced for PgBackRestPath and
PatroniPath in the validate method (SystemD.validate) from "pgbackrest_path
cannot be empty" and "patroni_path cannot be empty" to "pgbackrest_path: cannot
be empty" and "patroni_path: cannot be empty" so they match the existing
"bind_addr: cannot be empty" style.

In `@server/internal/database/spec.go`:
- Around line 652-674: The code assigns the same backing slice allHostIDs into
every InstanceSpec.AllHostIDs, risking cross-instance mutation; in the block
that constructs InstanceSpec (the instances[...] = &InstanceSpec{...}) make a
defensive copy of allHostIDs (e.g., create a new slice and copy the elements)
and assign that copy to AllHostIDs so each InstanceSpec has its own slice;
update the construction for InstanceSpec to use the cloned slice rather than the
shared allHostIDs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d2118d17-56c9-4c4d-affe-8a0380f16edd

📥 Commits

Reviewing files that changed from the base of the PR and between a574a99 and 940cc99.

⛔ Files ignored due to path filters (8)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (76)
  • .gitignore
  • .golangci.yml
  • Makefile
  • NOTICE.txt
  • api/apiv1/design/database.go
  • changes/unreleased/Added-20260309-165318.yaml
  • docs/development/running-locally.md
  • e2e/backup_restore_test.go
  • e2e/cancel_task_test.go
  • e2e/custom_db_create_test.go
  • e2e/db_create_test.go
  • e2e/db_update_add_node_test.go
  • e2e/failover_test.go
  • e2e/fixture_test.go
  • e2e/fixtures/ansible.cfg
  • e2e/load_test.go
  • e2e/minor_version_upgrade_test.go
  • e2e/service_provisioning_test.go
  • e2e/switchover_test.go
  • e2e/whole_cluster_test.go
  • go.mod
  • hack/dev-env.zsh
  • hack/pgedge-cp-env.plugin.zsh
  • lima/Makefile
  • lima/ansible.cfg
  • lima/circus.ini
  • lima/config.json
  • lima/deploy.yaml
  • lima/inventory.yaml
  • lima/lima-template.yaml
  • lima/roles/generate_test_config/tasks/main.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • lima/roles/start_vms/tasks/main.yaml
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/run.sh
  • lima/stop-dbs.yaml
  • lima/teardown.yaml
  • lima/vars.yaml
  • server/cmd/root.go
  • server/internal/api/apiv1/convert.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/api/apiv1/validate.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/app/app.go
  • server/internal/config/config.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_store.go
  • server/internal/database/orchestrator.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/ds/versions.go
  • server/internal/monitor/instance_monitor.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • server/internal/orchestrator/provide.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/orchestrator/systemd/dnf.go
  • server/internal/orchestrator/systemd/dnf_test.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/packages.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/provide.go
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/systemd/unit.go
  • server/internal/postgres/hba/entry.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (32)
  • .golangci.yml
  • e2e/whole_cluster_test.go
  • changes/unreleased/Added-20260309-165318.yaml
  • e2e/custom_db_create_test.go
  • lima/ansible.cfg
  • e2e/fixtures/ansible.cfg
  • server/cmd/root.go
  • lima/stop-dbs.yaml
  • e2e/minor_version_upgrade_test.go
  • server/internal/api/apiv1/post_init_handlers.go
  • lima/lima-template.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/roles/start_vms/tasks/main.yaml
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • lima/config.json
  • lima/deploy.yaml
  • .gitignore
  • api/apiv1/design/database.go
  • server/internal/orchestrator/systemd/dnf_test.go
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/vars.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • lima/roles/generate_test_config/tasks/main.yaml
  • server/internal/orchestrator/systemd/unit.go
  • lima/inventory.yaml
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/circus.ini
  • lima/teardown.yaml
  • e2e/backup_restore_test.go
🚧 Files skipped from review as they are similar to previous changes (21)
  • e2e/db_create_test.go
  • e2e/cancel_task_test.go
  • e2e/switchover_test.go
  • NOTICE.txt
  • server/internal/workflows/delete_database.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/api/apiv1/convert.go
  • hack/pgedge-cp-env.plugin.zsh
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/postgres/hba/entry.go
  • server/internal/database/instance.go
  • Makefile
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/database/instance_store.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/database/service.go
  • server/internal/app/app.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/api/apiv1/validate.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 940cc99 to 490d89c Compare April 2, 2026 17:04
Copy link
Copy Markdown
Contributor

@tsivaprasad tsivaprasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification:
Created a Lima dev environment.


> cp1-req get-version
HTTP/1.1 200 OK
Content-Length: 171
Content-Type: application/json
Date: Thu, 02 Apr 2026 20:25:27 GMT

{
  arch: "arm64"
  revision: "490d89c3ddb60c621be308a25d3c0a4809a92313+dirty"
  revision_time: "2026-04-02T17:03:57Z"
  version: "v0.5.1-0.20260402170357-490d89c3ddb6+dirty"
}

Successfully completed E2E tests in the Lima dev environment.
DONE 41 tests, 12 skipped in 605.561s

return err
}

err = client.UnitExists(ctx, r.Name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check the unit file on disk instead? Looks UnitExists checks only with loaded units only.

path := filepath.Join(unitsDir, r.Name)
_, fileErr := os.Stat(path)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that makes sense here. StopUnit and DisableUnit are only valid on loaded units, so I think that suggestion would result in an error if the file existed but the unit was not loaded. In the current implementation, we would skip the StopUnit and DisableUnit calls and go right to deleting the file, which is correct.

Adds an orchestrator implementation for SystemD. As of this commit,
you'll need to install Postgres on the host before starting the Control
Plane:

```
dnf install -y epel-release dnf
dnf config-manager --set-enabled crb
dnf update -y --allowerasing
dnf install -y https://dnf.pgedge.com/reporpm/pgedge-release-latest.noarch.rpm
dnf install -y \
      pgedge-postgresql18 \
      pgedge-spock50_18 \
      pgedge-snowflake_18 \
      pgedge-lolor_18 \
      pgedge-postgresql18-contrib \
      pgedge-pgbackrest \
      pgedge-python3-psycopg2 \
      python3-pip
pip install 'patroni[etcd,jsonlogger]==4.1.0'
```

PLAT-417
Adds a development environment for the SystemD orchestrator, called
`dev-lima`. Just like the `compose` environment, this environment
exposes all services and databases on your host machine.
Adds support for running E2Es against the `dev-lima` environment.

PLAT-417
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 490d89c to 54ec399 Compare April 3, 2026 19:41
@jason-lynch jason-lynch merged commit 4457115 into main Apr 3, 2026
3 checks passed
@jason-lynch jason-lynch deleted the feat/PLAT-417/systemd branch April 3, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants