Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a SystemD orchestrator implementation and Lima-based developer environment; extends instance metadata (ports, pgEdge version) and validation; updates orchestrator interfaces, DI wiring, resources, workflows, tests, docs, Make/Ansible/Zsh tooling, and license/module entries. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
|
5b65dd1 to
7512d78
Compare
0292169 to
fecca31
Compare
3b2cb2d to
1811d33
Compare
8cb89f9 to
c752911
Compare
32a39ff to
1ef89d0
Compare
c752911 to
9c4262c
Compare
32ef891 to
5c8c403
Compare
9c4262c to
e179f4c
Compare
5c8c403 to
79fbfc1
Compare
e179f4c to
40f2797
Compare
79fbfc1 to
b1b5cae
Compare
40f2797 to
0e319c3
Compare
b1b5cae to
3b749f2
Compare
0e319c3 to
900598e
Compare
3b749f2 to
ccf2e10
Compare
900598e to
e181cc5
Compare
ccf2e10 to
7607b74
Compare
e181cc5 to
42824fe
Compare
7607b74 to
09c6ec7
Compare
42824fe to
89dca9e
Compare
40fd983 to
307f1e9
Compare
b56d8cd to
bf6b508
Compare
307f1e9 to
6a1d107
Compare
bf6b508 to
d41571c
Compare
6a1d107 to
8f7d2ca
Compare
d41571c to
26c6108
Compare
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 13 medium |
🟢 Metrics 153 complexity · 37 duplication
Metric Results Complexity 153 Duplication 37
TIP This summary will be updated as you push new changes. Give us feedback
8f7d2ca to
8a3fa31
Compare
26c6108 to
88440d0
Compare
8a3fa31 to
ab652f8
Compare
88440d0 to
09d6b52
Compare
09d6b52 to
2d1d84c
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Line 12: Update the dependency declaration for github.com/docker/docker from
v27.1.1 to v29.3.1 (or later) in go.mod to remediate the listed HIGH severity
CVEs; after editing the module version for github.com/docker/docker, run module
resolution (e.g., go mod tidy / go get) and rebuild/tests to ensure
compatibility and update go.sum accordingly, and verify any code using docker
client types/APIs in your repo still compiles against the new version.
In `@server/internal/config/config.go`:
- Around line 368-382: The code appends the same "unsupported orchestrator"
error twice because the early guard checking c.Orchestrator against
OrchestratorSwarm and OrchestratorSystemD and the switch default both report the
same issue; remove the redundant error by eliminating the switch default (or
make the switch only handle OrchestratorSwarm and OrchestratorSystemD cases) so
that the guard is the single source of truth for unsupported orchestrator
reporting—update the switch over c.Orchestrator (and references to
c.DockerSwarm.validate() and c.SystemD.validate()) to not append the default
error.
In `@server/internal/orchestrator/systemd/client.go`:
- Around line 180-194: GetUnitFilePath currently calls prop.Value.String() which
yields a debug text representation; change it to use prop.Value() and
type-assert the result to string (e.g., v, ok := prop.Value().(string)) and
return a clear error if the assertion fails so the path passed to RemoveUnitFile
is the actual filesystem path; update logging to use the asserted string value
and ensure GetUnitFilePath still returns an error when the Variant isn't a
string.
- Around line 84-105: The code is using the integer returned from
StopUnitContext as if it were the unit's process ID; treat that value as the
systemd job ID only, and when wait==true fetch the unit's MainPID (or
ActiveState) via c.conn.GetUnitProperties(name) (or equivalent) and use that
MainPID to call waitForPid, or poll the unit state until it is inactive; update
references in StopUnitContext/awaitJob handling to keep the returned int as
jobID and replace usages of pid for waitForPid with the fetched MainPID (check
for MainPID==0 and fallback to polling ActiveState).
In `@server/internal/orchestrator/systemd/dnf.go`:
- Around line 126-135: The semverRegexp and toVersion must handle optional RPM
epoch prefixes (e.g., "1:1.0.2-1"); update semverRegexp to allow an optional
leading "<digits>:" and capture the real semver (e.g., use a pattern like
^(?:\d+:)?(\d+(?:\.\d+){0,2})), then change toVersion to extract the captured
group (use regexp.FindStringSubmatch on semverRegexp and use the first capture
instead of FindString) so it returns ds.ParseVersion on the semver after the
epoch while preserving the existing invalid-format error path when no match is
found.
In `@server/internal/orchestrator/systemd/orchestrator.go`:
- Around line 218-226: The code in GenerateInstanceResources incorrectly rejects
port value 0 (the valid "assign random port" sentinel); change the checks that
currently treat FromPointer(spec.PatroniPort) == 0 and FromPointer(spec.Port) ==
0 as errors to instead detect a nil/missing pointer and only error in that case.
In other words, use the pointer presence (spec.PatroniPort and spec.Port) to
decide missing vs present, keep patroniPort/postgresPort == 0 as allowed, and
apply the same change to the duplicate checks around the other block referenced
(lines ~497-504); update the error conditions around patroniPort/postgresPort in
GenerateInstanceResources so 0 is not treated as missing.
In `@server/internal/orchestrator/systemd/pgbackrest_restore.go`:
- Around line 122-125: The restore commit (StartInstance) is already
irreversible, so failures from the subsequent bookkeeping call
p.completeTask(ctx, taskSvc, t) must not be escalated via handleError; change
the flow so that after StartInstance returns nil you treat completeTask as
best-effort/non-fatal: catch its error, log it (including error details and task
ID) and do not return handleError or mark the task as Create-failed; likewise
apply the same non-fatal handling for the similar block around lines 159-166 —
ensure you still attempt the update (preferably using a
background/non-cancelable context or retry logic) but never convert its failure
into a retryable restore failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0e9c62f6-031b-4e20-90ee-e3a1cf4f777a
⛔ Files ignored due to path filters (8)
api/apiv1/gen/control_plane/service.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/client/types.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/server/types.gois excluded by!**/gen/**api/apiv1/gen/http/openapi.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi.yamlis excluded by!**/gen/**api/apiv1/gen/http/openapi3.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi3.yamlis excluded by!**/gen/**go.sumis excluded by!**/*.sum
📒 Files selected for processing (74)
.gitignoreMakefileNOTICE.txtapi/apiv1/design/database.gochanges/unreleased/Added-20260309-165318.yamldocs/development/running-locally.mde2e/backup_restore_test.goe2e/cancel_task_test.goe2e/custom_db_create_test.goe2e/db_create_test.goe2e/db_update_add_node_test.goe2e/failover_test.goe2e/fixture_test.goe2e/fixtures/ansible.cfge2e/load_test.goe2e/minor_version_upgrade_test.goe2e/service_provisioning_test.goe2e/switchover_test.goe2e/whole_cluster_test.gogo.modhack/dev-env.zshhack/pgedge-cp-env.plugin.zshlima/Makefilelima/ansible.cfglima/circus.inilima/config.jsonlima/deploy.yamllima/inventory.yamllima/lima-template.yamllima/roles/generate_test_config/tasks/main.yamllima/roles/generate_test_config/templates/test_config.yaml.tmpllima/roles/install_prerequisites/tasks/main.yamllima/roles/install_prerequisites/vars/main.yamllima/roles/start_vms/tasks/main.yamllima/roles/stop_dbs/tasks/main.yamllima/roles/teardown_vms/tasks/main.yamllima/run.shlima/stop-dbs.yamllima/teardown.yamllima/vars.yamlserver/cmd/root.goserver/internal/api/apiv1/convert.goserver/internal/api/apiv1/post_init_handlers.goserver/internal/api/apiv1/validate.goserver/internal/api/apiv1/validate_test.goserver/internal/app/app.goserver/internal/config/config.goserver/internal/database/instance.goserver/internal/database/instance_resource.goserver/internal/database/instance_store.goserver/internal/database/orchestrator.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/ds/versions.goserver/internal/monitor/instance_monitor.goserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yamlserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yamlserver/internal/orchestrator/provide.goserver/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/systemd/client.goserver/internal/orchestrator/systemd/dnf.goserver/internal/orchestrator/systemd/orchestrator.goserver/internal/orchestrator/systemd/packages.goserver/internal/orchestrator/systemd/patroni_config.goserver/internal/orchestrator/systemd/patroni_unit.goserver/internal/orchestrator/systemd/pgbackrest_restore.goserver/internal/orchestrator/systemd/provide.goserver/internal/orchestrator/systemd/resources.goserver/internal/orchestrator/systemd/unit.goserver/internal/postgres/hba/entry.goserver/internal/workflows/activities/create_pgbackrest_backup.goserver/internal/workflows/activities/update_db_state.goserver/internal/workflows/activities/update_planned_instance_states.goserver/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (29)
- changes/unreleased/Added-20260309-165318.yaml
- e2e/fixtures/ansible.cfg
- e2e/custom_db_create_test.go
- e2e/whole_cluster_test.go
- e2e/cancel_task_test.go
- e2e/db_update_add_node_test.go
- e2e/load_test.go
- lima/roles/generate_test_config/tasks/main.yaml
- e2e/minor_version_upgrade_test.go
- lima/ansible.cfg
- lima/stop-dbs.yaml
- e2e/failover_test.go
- lima/roles/generate_test_config/templates/test_config.yaml.tmpl
- lima/config.json
- lima/lima-template.yaml
- lima/deploy.yaml
- api/apiv1/design/database.go
- server/internal/workflows/activities/update_db_state.go
- lima/teardown.yaml
- server/internal/database/spec.go
- lima/roles/stop_dbs/tasks/main.yaml
- lima/roles/teardown_vms/tasks/main.yaml
- lima/roles/install_prerequisites/vars/main.yaml
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
- NOTICE.txt
- lima/inventory.yaml
- lima/vars.yaml
- lima/circus.ini
- server/internal/orchestrator/systemd/packages.go
🚧 Files skipped from review as they are similar to previous changes (25)
- e2e/service_provisioning_test.go
- server/internal/workflows/delete_database.go
- server/internal/api/apiv1/post_init_handlers.go
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
- server/cmd/root.go
- hack/pgedge-cp-env.plugin.zsh
- server/internal/app/app.go
- lima/roles/start_vms/tasks/main.yaml
- e2e/backup_restore_test.go
- server/internal/workflows/activities/create_pgbackrest_backup.go
- server/internal/postgres/hba/entry.go
- server/internal/orchestrator/systemd/resources.go
- Makefile
- e2e/switchover_test.go
- lima/run.sh
- .gitignore
- server/internal/orchestrator/systemd/patroni_unit.go
- server/internal/api/apiv1/validate_test.go
- server/internal/ds/versions.go
- server/internal/database/instance_store.go
- lima/roles/install_prerequisites/tasks/main.yaml
- server/internal/database/orchestrator.go
- e2e/db_create_test.go
- server/internal/database/instance.go
- server/internal/database/service.go
2d1d84c to
cd7c876
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/internal/database/instance_resource.go (1)
224-235:⚠️ Potential issue | 🟠 MajorDon't persist future connection metadata on failure.
Line 250 now uses this helper from
recordError(), and this helper always copiesr.Spec.Port,r.Spec.PatroniPort, andr.Spec.PgEdgeVersioninto the stored instance record. If an update fails before the instance actually switches over, etcd will advertise the new ports/version while the old process is still serving, which will break monitor/retry paths.💡 Possible fix
opts.InstanceID = r.Spec.InstanceID opts.DatabaseID = r.Spec.DatabaseID opts.HostID = r.Spec.HostID opts.NodeName = r.Spec.NodeName - opts.Port = r.Spec.Port - opts.PatroniPort = r.Spec.PatroniPort - opts.PgEdgeVersion = r.Spec.PgEdgeVersion + if opts.State == InstanceStateAvailable { + opts.Port = r.Spec.Port + opts.PatroniPort = r.Spec.PatroniPort + opts.PgEdgeVersion = r.Spec.PgEdgeVersion + }Also applies to: 250-253
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/instance_resource.go` around lines 224 - 235, The helper behavior in recordError/updateInstanceRecord currently always copies r.Spec.Port, r.Spec.PatroniPort and r.Spec.PgEdgeVersion into the stored instance record even on failure, which can advertise future connection metadata prematurely; change the logic so updateInstanceRecord (and the recordError helper it calls) do not persist these connection-related fields from r.Spec on error paths — only set Port, PatroniPort and PgEdgeVersion on the instance record after a successful update/hand-off (or read the live values from the existing record when logging errors), and remove the unconditional copying of r.Spec.Port/PatroniPort/PgEdgeVersion in recordError to ensure failed updates do not overwrite the active advertised metadata.
🧹 Nitpick comments (4)
.golangci.yml (1)
17-20: Consider whether disablingstaticcheckentirely for test files is intentional.Adding
staticcheckto the exclusions is broader thanerrcheck—it suppresses many useful checks (unused code, deprecated APIs, sync primitive misuse, etc.) across all test files. If specific checks are causing noise, you could selectively disable them usingstaticcheck's check codes (e.g.,SA4006,ST1000) rather than the entire linter.If this broad exclusion is intentional to reduce friction during development, that's fine—just wanted to flag it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.golangci.yml around lines 17 - 20, The .golangci.yml currently excludes the entire staticcheck linter for test files (the "staticcheck" entry under the "linters" block paired with "path: (.+)_test.go"); instead, either remove "staticcheck" from that exclusion so tests keep full static analysis, or replace the broad suppression by selectively disabling only the noisy staticcheck checks using staticcheck check codes (e.g., SA4006, ST1000) via the linter's config (staticcheck.disable or issues.exclude patterns) targeted at the test file path; update the "linters" block and associated exclusion rules so "errcheck" remains but staticcheck is only suppressed for specific codes rather than entirely.lima/roles/stop_dbs/tasks/main.yaml (1)
2-4: Scopereset-failedto Patroni units.
systemctl reset-failedclears failure state for every unit on the VM. In a dev-lima box that can mask unrelated host issues when you're trying to debug a failed deploy/reset. Limit it topatroni-*so this role only touches the units it owns.Suggested change
-- name: Reset failed services - ansible.builtin.command: systemctl reset-failed +- name: Reset failed Patroni services + ansible.builtin.command: + argv: + - systemctl + - reset-failed + - "patroni-*" changed_when: false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lima/roles/stop_dbs/tasks/main.yaml` around lines 2 - 4, The task currently runs a global systemctl reset-failed which clears failure state for every unit; scope it to only Patroni units by changing the command from "systemctl reset-failed" to "systemctl reset-failed patroni-*" in the ansible task (the task named "Reset failed services" in lima/roles/stop_dbs/tasks/main.yaml), keeping the existing ansible.builtin.command usage and changed_when: false so only patroni-* units are affected.server/internal/orchestrator/systemd/dnf_test.go (1)
82-89: Consider adding explicitNoErrorassertion for clarity.When
expectedErris false, the test only checks the output value. Adding an explicitassert.NoError(t, err)would make test failures more informative when parsing unexpectedly fails.♻️ Suggested improvement
t.Run(tc.in, func(t *testing.T) { out, err := toVersion(tc.in) if tc.expectedErr { assert.Error(t, err) } else { + assert.NoError(t, err) assert.Equal(t, tc.expected, out) } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/systemd/dnf_test.go` around lines 82 - 89, The test loop for toVersion currently only asserts the returned output when tc.expectedErr is false; update the t.Run block to also assert NoError on err in the success case so failures show parsing errors explicitly (use assert.NoError(t, err) when tc.expectedErr is false before asserting equality on out). Locate the anonymous test function where toVersion is called and the assertions are made and add the NoError assertion conditioned on tc.expectedErr.server/internal/database/instance_store.go (1)
40-45: Consider extracting duplicatednow()logic.Both
InstanceUpdateOptions.now()andInstanceStateUpdateOptions.now()have identical implementations. This could be extracted into a shared helper to reduce duplication.♻️ Optional: Extract shared helper
func resolveTimestamp(t time.Time) time.Time { if !t.IsZero() { return t } return time.Now() }Then use
resolveTimestamp(o.Now)in both option types.Also applies to: 83-88
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/instance_store.go` around lines 40 - 45, Both InstanceUpdateOptions.now() and InstanceStateUpdateOptions.now() have identical logic; extract that into a shared helper (e.g., resolveTimestamp) that takes a time.Time and returns the provided time if not zero or time.Now() otherwise, then replace both InstanceUpdateOptions.now() and InstanceStateUpdateOptions.now() implementations to call resolveTimestamp(o.Now) / resolveTimestamp(s.Now) respectively to remove duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hack/dev-env.zsh`:
- Around line 320-326: Help text for cp-psql is out of sync with the default
selection logic: when CP_ENV==dev-lima the code sets method=local but the helper
function _cp-psql-help() still states the default is "docker exec". Update the
_cp-psql-help() function to reflect the conditional default (show "local" as the
default when CP_ENV=dev-lima and "docker exec" otherwise), using the same CP_ENV
and method logic or interpolating a computed default string so the help text
matches the branch that sets method.
In `@server/internal/api/apiv1/validate.go`:
- Around line 267-276: The current SystemD branch only checks
db.Port/db.PatroniPort vs node.Port/node.PatroniPort separately and misses
cross-scope conflicts for each node; after the existing checks in the
config.OrchestratorSystemD case, iterate each node in db.Nodes and compute the
resolved values (use node.Port if non-nil else db.Port, and node.PatroniPort if
non-nil else db.PatroniPort) and run the same validation that ensures the
effective pair is valid (reuse/call validatePorts or the same validation logic)
and append errors with newValidationError using appendPath(path, "nodes",
nodeName, "port"/"patroni_port") so per-node resolved-port issues are reported.
In `@server/internal/orchestrator/systemd/client.go`:
- Around line 185-200: The UnitExists implementation currently uses
c.conn.ListUnitsContext and thus only detects loaded units; change
Client.UnitExists to call c.conn.ListUnitFilesContext(ctx) and iterate the
returned unit files for a matching name (or name with ".service"/appropriate
suffix) and return nil if found, otherwise return ErrUnitNotFound; update the
logic that currently uses ListUnitsContext (the similar loop later in the file)
the same way, or if you intended to check runtime state instead, rename and
document the method (e.g., UnitIsLoaded) and keep ListUnitsContext there while
providing a new UnitFileExists that uses ListUnitFilesContext; ensure references
to ErrUnitNotFound and callers like GetMainPID keep consistent semantics.
---
Outside diff comments:
In `@server/internal/database/instance_resource.go`:
- Around line 224-235: The helper behavior in recordError/updateInstanceRecord
currently always copies r.Spec.Port, r.Spec.PatroniPort and r.Spec.PgEdgeVersion
into the stored instance record even on failure, which can advertise future
connection metadata prematurely; change the logic so updateInstanceRecord (and
the recordError helper it calls) do not persist these connection-related fields
from r.Spec on error paths — only set Port, PatroniPort and PgEdgeVersion on the
instance record after a successful update/hand-off (or read the live values from
the existing record when logging errors), and remove the unconditional copying
of r.Spec.Port/PatroniPort/PgEdgeVersion in recordError to ensure failed updates
do not overwrite the active advertised metadata.
---
Nitpick comments:
In @.golangci.yml:
- Around line 17-20: The .golangci.yml currently excludes the entire staticcheck
linter for test files (the "staticcheck" entry under the "linters" block paired
with "path: (.+)_test.go"); instead, either remove "staticcheck" from that
exclusion so tests keep full static analysis, or replace the broad suppression
by selectively disabling only the noisy staticcheck checks using staticcheck
check codes (e.g., SA4006, ST1000) via the linter's config (staticcheck.disable
or issues.exclude patterns) targeted at the test file path; update the "linters"
block and associated exclusion rules so "errcheck" remains but staticcheck is
only suppressed for specific codes rather than entirely.
In `@lima/roles/stop_dbs/tasks/main.yaml`:
- Around line 2-4: The task currently runs a global systemctl reset-failed which
clears failure state for every unit; scope it to only Patroni units by changing
the command from "systemctl reset-failed" to "systemctl reset-failed patroni-*"
in the ansible task (the task named "Reset failed services" in
lima/roles/stop_dbs/tasks/main.yaml), keeping the existing
ansible.builtin.command usage and changed_when: false so only patroni-* units
are affected.
In `@server/internal/database/instance_store.go`:
- Around line 40-45: Both InstanceUpdateOptions.now() and
InstanceStateUpdateOptions.now() have identical logic; extract that into a
shared helper (e.g., resolveTimestamp) that takes a time.Time and returns the
provided time if not zero or time.Now() otherwise, then replace both
InstanceUpdateOptions.now() and InstanceStateUpdateOptions.now() implementations
to call resolveTimestamp(o.Now) / resolveTimestamp(s.Now) respectively to remove
duplication.
In `@server/internal/orchestrator/systemd/dnf_test.go`:
- Around line 82-89: The test loop for toVersion currently only asserts the
returned output when tc.expectedErr is false; update the t.Run block to also
assert NoError on err in the success case so failures show parsing errors
explicitly (use assert.NoError(t, err) when tc.expectedErr is false before
asserting equality on out). Locate the anonymous test function where toVersion
is called and the assertions are made and add the NoError assertion conditioned
on tc.expectedErr.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8fd3c04d-da1a-4783-b9a2-96d4916523b5
⛔ Files ignored due to path filters (8)
api/apiv1/gen/control_plane/service.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/client/types.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/server/types.gois excluded by!**/gen/**api/apiv1/gen/http/openapi.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi.yamlis excluded by!**/gen/**api/apiv1/gen/http/openapi3.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi3.yamlis excluded by!**/gen/**go.sumis excluded by!**/*.sum
📒 Files selected for processing (76)
.gitignore.golangci.ymlMakefileNOTICE.txtapi/apiv1/design/database.gochanges/unreleased/Added-20260309-165318.yamldocs/development/running-locally.mde2e/backup_restore_test.goe2e/cancel_task_test.goe2e/custom_db_create_test.goe2e/db_create_test.goe2e/db_update_add_node_test.goe2e/failover_test.goe2e/fixture_test.goe2e/fixtures/ansible.cfge2e/load_test.goe2e/minor_version_upgrade_test.goe2e/service_provisioning_test.goe2e/switchover_test.goe2e/whole_cluster_test.gogo.modhack/dev-env.zshhack/pgedge-cp-env.plugin.zshlima/Makefilelima/ansible.cfglima/circus.inilima/config.jsonlima/deploy.yamllima/inventory.yamllima/lima-template.yamllima/roles/generate_test_config/tasks/main.yamllima/roles/generate_test_config/templates/test_config.yaml.tmpllima/roles/install_prerequisites/tasks/main.yamllima/roles/install_prerequisites/vars/main.yamllima/roles/start_vms/tasks/main.yamllima/roles/stop_dbs/tasks/main.yamllima/roles/teardown_vms/tasks/main.yamllima/run.shlima/stop-dbs.yamllima/teardown.yamllima/vars.yamlserver/cmd/root.goserver/internal/api/apiv1/convert.goserver/internal/api/apiv1/post_init_handlers.goserver/internal/api/apiv1/validate.goserver/internal/api/apiv1/validate_test.goserver/internal/app/app.goserver/internal/config/config.goserver/internal/database/instance.goserver/internal/database/instance_resource.goserver/internal/database/instance_store.goserver/internal/database/orchestrator.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/ds/versions.goserver/internal/monitor/instance_monitor.goserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yamlserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yamlserver/internal/orchestrator/provide.goserver/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/systemd/client.goserver/internal/orchestrator/systemd/dnf.goserver/internal/orchestrator/systemd/dnf_test.goserver/internal/orchestrator/systemd/orchestrator.goserver/internal/orchestrator/systemd/packages.goserver/internal/orchestrator/systemd/patroni_config.goserver/internal/orchestrator/systemd/patroni_unit.goserver/internal/orchestrator/systemd/pgbackrest_restore.goserver/internal/orchestrator/systemd/provide.goserver/internal/orchestrator/systemd/resources.goserver/internal/orchestrator/systemd/unit.goserver/internal/postgres/hba/entry.goserver/internal/workflows/activities/create_pgbackrest_backup.goserver/internal/workflows/activities/update_db_state.goserver/internal/workflows/activities/update_planned_instance_states.goserver/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (28)
- e2e/whole_cluster_test.go
- .gitignore
- lima/roles/generate_test_config/tasks/main.yaml
- lima/stop-dbs.yaml
- e2e/custom_db_create_test.go
- lima/lima-template.yaml
- server/cmd/root.go
- e2e/switchover_test.go
- e2e/load_test.go
- lima/ansible.cfg
- changes/unreleased/Added-20260309-165318.yaml
- lima/roles/generate_test_config/templates/test_config.yaml.tmpl
- lima/config.json
- server/internal/api/apiv1/post_init_handlers.go
- e2e/minor_version_upgrade_test.go
- NOTICE.txt
- server/internal/orchestrator/systemd/resources.go
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
- lima/roles/teardown_vms/tasks/main.yaml
- lima/vars.yaml
- lima/roles/install_prerequisites/vars/main.yaml
- server/internal/orchestrator/systemd/patroni_unit.go
- server/internal/database/orchestrator.go
- lima/inventory.yaml
- Makefile
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
- lima/circus.ini
- server/internal/orchestrator/systemd/pgbackrest_restore.go
🚧 Files skipped from review as they are similar to previous changes (26)
- lima/teardown.yaml
- e2e/fixtures/ansible.cfg
- e2e/db_create_test.go
- server/internal/orchestrator/provide.go
- e2e/service_provisioning_test.go
- server/internal/workflows/activities/update_db_state.go
- e2e/failover_test.go
- e2e/cancel_task_test.go
- server/internal/workflows/delete_database.go
- lima/roles/start_vms/tasks/main.yaml
- api/apiv1/design/database.go
- lima/deploy.yaml
- server/internal/workflows/activities/update_planned_instance_states.go
- server/internal/database/spec.go
- server/internal/postgres/hba/entry.go
- server/internal/api/apiv1/convert.go
- server/internal/orchestrator/systemd/provide.go
- server/internal/database/instance.go
- server/internal/ds/versions.go
- lima/roles/install_prerequisites/tasks/main.yaml
- e2e/fixture_test.go
- server/internal/workflows/activities/create_pgbackrest_backup.go
- server/internal/orchestrator/systemd/unit.go
- server/internal/database/service.go
- server/internal/orchestrator/systemd/packages.go
- e2e/backup_restore_test.go
cd7c876 to
a574a99
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/internal/orchestrator/swarm/orchestrator.go (1)
587-636:⚠️ Potential issue | 🟠 MajorHonor the supplied port overrides here.
server/internal/database/service.goandserver/internal/database/instance_resource.gonow pass the desired Postgres/Patroni ports into this method, but this implementation still derivesClientPortfrom container inspection and hardcodesPatroniPortto8888. After a port change, the Swarm path can keep returning stale or zero connection info until the service is recreated.Possible fix
var clientPort int - // Some configurations won't expose the database port directly, such as - // those that use a pooler or load balancer in front of Postgres. - binding, ok := inspect.NetworkSettings.Ports[dbPort] - if ok && len(binding) > 0 { + if postgresPort != nil && *postgresPort > 0 { + clientPort = *postgresPort + } else if binding, ok := inspect.NetworkSettings.Ports[dbPort]; ok && len(binding) > 0 { p, err := strconv.Atoi(binding[0].HostPort) if err != nil { return nil, fmt.Errorf("failed to parse client port %q: %w", binding[0].HostPort, err) } clientPort = p } + + resolvedPatroniPort := 8888 + if patroniPort != nil && *patroniPort > 0 { + resolvedPatroniPort = *patroniPort + } return &database.ConnectionInfo{ @@ - PatroniPort: 8888, + PatroniPort: resolvedPatroniPort,Based on learnings,
server/internal/database/instance_resource.gointentionally rebuilds connection info from desired spec values beforeReload()/initializeInstance()completes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/orchestrator.go` around lines 587 - 636, GetInstanceConnectionInfo currently ignores the supplied postgresPort and patroniPort parameters and derives ClientPort from container inspection and hardcodes PatroniPort to 8888; update GetInstanceConnectionInfo to honor the pointers passed in: if postgresPort != nil set AdminPort, PeerPort and ClientPort to *postgresPort (instead of hardcoded 5432 / inspected port), and if patroniPort != nil set PatroniPort = *patroniPort (otherwise keep the current defaults/inspection behavior); adjust the creation of the returned database.ConnectionInfo (fields AdminPort, PeerPort, ClientPort, PatroniPort) accordingly so callers’ overrides are respected.
🧹 Nitpick comments (5)
server/internal/postgres/hba/entry.go (1)
88-94: Minor:elsebranch is unreachable dead code.If
net.ParseIP()succeeds (line 81-84), thenip.To16()will always return non-nil because Go internally represents all valid IPs as 16-byte slices. Theelsebranch returning""can never execute.This is harmless defensive code, so feel free to keep it or simplify:
♻️ Optional simplification
var cidr *net.IPNet if ipv4 := ip.To4(); ipv4 != nil { cidr = &net.IPNet{IP: ipv4, Mask: net.CIDRMask(32, 32)} - } else if ipv6 := ip.To16(); ipv6 != nil { - cidr = &net.IPNet{IP: ipv6, Mask: net.CIDRMask(128, 128)} } else { - return "" + cidr = &net.IPNet{IP: ip.To16(), Mask: net.CIDRMask(128, 128)} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/postgres/hba/entry.go` around lines 88 - 94, The trailing else branch that returns "" after checking ip.To4() and ip.To16() is unreachable because net.ParseIP guarantees ip.To16() is non-nil for any parsed IP; remove the unreachable else { return "" } and simplify the logic to set cidr from ip.To4() when non-nil otherwise use ip.To16() (i.e., fall through to the IPv6 case), ensuring cidr is always assigned for valid parsed IPs and no dead-return remains; update references around ip, cidr, ip.To4(), and ip.To16() accordingly.lima/Makefile (1)
18-23: Consider addingCGO_ENABLED=0for consistency.The top-level
MakefilesetsCGO_ENABLED=0for bothdev-buildandci-compose-build, but this target doesn't. For consistent static binary builds across all environments, consider adding it.Suggested fix
.PHONY: build build: - GOOS=linux go build \ + CGO_ENABLED=0 GOOS=linux go build \ -gcflags "all=-N -l" \ -o ./pgedge-control-plane \ $(shell pwd)/../server🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lima/Makefile` around lines 18 - 23, The build target `build` should set CGO_ENABLED=0 to match the top-level Makefile's static build behavior; update the `build` recipe that produces `./pgedge-control-plane` from `$(shell pwd)/../server` to prefix the go build invocation with `CGO_ENABLED=0` so the produced binary is built consistently without cgo.lima/run.sh (2)
39-42: Error message should go to stderr.The error message on line 40 is written to stdout. For consistency with other error handling in the script (line 51) and shell conventions, redirect to stderr.
Suggested fix
*) - echo "unrecognized hostname ${HOSTNAME}" + echo "unrecognized hostname ${HOSTNAME}" >&2 exit 1 ;;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lima/run.sh` around lines 39 - 42, The error echo in the default case of the hostname switch writes to stdout; change it to write to stderr. Locate the case pattern handling the unknown HOSTNAME (the *) block in run.sh and update the echo that prints "unrecognized hostname ${HOSTNAME}" to redirect its output to stderr (consistent with the later error handling approach used on line 51), then keep the exit 1 as-is.
57-65: Quote variable expansions to prevent word splitting.Several variable expansions are unquoted. While paths are likely controlled in this dev environment, quoting prevents issues if paths ever contain spaces.
Suggested fix
# Copy the binary to another location so that we can safely rebuild on the host # without disrupting the running server. -cp ${LIMA_DIR}/pgedge-control-plane /usr/sbin +cp "${LIMA_DIR}/pgedge-control-plane" /usr/sbin # Set environment variable configuration. set_env # The sed prefixes each output line with the host ID. /usr/sbin/pgedge-control-plane run \ - --config-path ${LIMA_DIR}/config.json \ + --config-path "${LIMA_DIR}/config.json" \ 2>&1 | sed "s/^/[${PGEDGE_HOST_ID}] /"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lima/run.sh` around lines 57 - 65, The script uses unquoted variable expansions which can cause word-splitting; update cp and the pgedge-control-plane invocation to quote paths and the host ID: change cp ${LIMA_DIR}/pgedge-control-plane to cp "${LIMA_DIR}/pgedge-control-plane", change --config-path ${LIMA_DIR}/config.json to --config-path "${LIMA_DIR}/config.json", and ensure the PGEDGE_HOST_ID expansion in the sed expression is safely handled (e.g., use sed "s/^/[${PGEDGE_HOST_ID}] /" but with PGEDGE_HOST_ID set/quoted where used) so all references to LIMA_DIR and PGEDGE_HOST_ID are double-quoted to prevent word-splitting; keep the set_env and pgedge-control-plane run invocation otherwise unchanged.server/internal/orchestrator/systemd/dnf.go (1)
29-42: Consider logging dnf failures for debugging.When
dnf list --installedfails with output other than "no matching packages", the error includes the output but there's no structured logging. For production debugging, consider logging the failure at warn/error level before returning.♻️ Optional: Add structured logging for dnf failures
func (d *Dnf) InstalledPostgresVersions(ctx context.Context) ([]*InstalledPostgres, error) { ctx, cancel := context.WithTimeout(ctx, 10*time.Second) defer cancel() args := append([]string{"list", "--installed"}, supportedDnfPackages()...) cmd := exec.CommandContext(ctx, "dnf", args...) out, err := cmd.CombinedOutput() if err != nil { if strings.Contains(strings.ToLower(string(out)), "no matching packages to list") { return nil, nil } + // Note: Consider injecting a logger if debugging becomes necessary return nil, fmt.Errorf("failed to execute command: %w, output: %s", err, string(out)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/systemd/dnf.go` around lines 29 - 42, The Dnf.InstalledPostgresVersions function currently returns an error from exec.CommandContext(...) without logging details; before returning the wrapped error in the error branch (after cmd.CombinedOutput and the strings.Contains check) add a structured log call at warn/error level that records the command, args (supportedDnfPackages()), the error value and the command output, then return the same fmt.Errorf(...) as before; if the Dnf struct exposes a logger field (e.g. d.logger or d.log) use that to log, otherwise use the package logger you have in this package, ensuring the log message includes "dnf list --installed" context, err and string(out).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/database/service.go`:
- Around line 525-539: The current code mixes service placement hosts into the
host set used for version selection and compatibility checks; change the logic
to maintain two distinct sets: one (e.g., nodeHostIDs) populated only from
spec.Nodes (used for version computation via host.GreatestCommonDefaultVersion
and for h.Supports(specVersion) checks) and another (e.g., allHostIDs or
hostIDs) populated from both spec.Nodes and spec.Services (used only for
existence validation and passed to s.hostSvc.GetHosts and validateHostIDs).
Update references so host.GreatestCommonDefaultVersion(hosts...) and any
per-host compatibility loops use hosts resolved from nodeHostIDs, while
s.hostSvc.GetHosts(allHostIDs.ToSlice()) and validateHostIDs(allHostIDs, hosts)
continue to verify existence for service placement hosts.
- Around line 298-324: In UpdateInstanceState, do not create and persist a
partial StoredInstance when GetByKey returns storage.ErrNotFound (avoid calling
NewStoredInstance with only InstanceStateUpdateOptions); instead either fetch
the full spec (e.g., load InstanceSpec for opts.DatabaseID/InstanceID and
hydrate Port, PatroniPort, PgEdgeVersion, HostID, NodeName) and then apply opts
before Put, or return a clear ErrInstanceNotFound so callers don’t upsert
incomplete rows; update the flow in Service.UpdateInstanceState (and related
NewStoredInstance/InstanceStateUpdateOptions usage) to prevent persisting
partial records that break GetInstanceConnectionInfo/SystemD.
In `@server/internal/orchestrator/systemd/unit.go`:
- Around line 95-100: The error messages for EnableUnit and RestartUnit
incorrectly reference the file path variable `path` instead of the unit
identifier `r.Name`; update the fmt.Errorf calls inside the block that calls
`client.EnableUnit(ctx, r.Name)` and `client.RestartUnit(ctx, r.Name)` to use
`r.Name` in the formatted message (e.g., "failed to enable unit '%s': %w" and
"failed to restart unit '%s': %w") so the logs report the actual systemd unit
name rather than the file path.
---
Outside diff comments:
In `@server/internal/orchestrator/swarm/orchestrator.go`:
- Around line 587-636: GetInstanceConnectionInfo currently ignores the supplied
postgresPort and patroniPort parameters and derives ClientPort from container
inspection and hardcodes PatroniPort to 8888; update GetInstanceConnectionInfo
to honor the pointers passed in: if postgresPort != nil set AdminPort, PeerPort
and ClientPort to *postgresPort (instead of hardcoded 5432 / inspected port),
and if patroniPort != nil set PatroniPort = *patroniPort (otherwise keep the
current defaults/inspection behavior); adjust the creation of the returned
database.ConnectionInfo (fields AdminPort, PeerPort, ClientPort, PatroniPort)
accordingly so callers’ overrides are respected.
---
Nitpick comments:
In `@lima/Makefile`:
- Around line 18-23: The build target `build` should set CGO_ENABLED=0 to match
the top-level Makefile's static build behavior; update the `build` recipe that
produces `./pgedge-control-plane` from `$(shell pwd)/../server` to prefix the go
build invocation with `CGO_ENABLED=0` so the produced binary is built
consistently without cgo.
In `@lima/run.sh`:
- Around line 39-42: The error echo in the default case of the hostname switch
writes to stdout; change it to write to stderr. Locate the case pattern handling
the unknown HOSTNAME (the *) block in run.sh and update the echo that prints
"unrecognized hostname ${HOSTNAME}" to redirect its output to stderr (consistent
with the later error handling approach used on line 51), then keep the exit 1
as-is.
- Around line 57-65: The script uses unquoted variable expansions which can
cause word-splitting; update cp and the pgedge-control-plane invocation to quote
paths and the host ID: change cp ${LIMA_DIR}/pgedge-control-plane to cp
"${LIMA_DIR}/pgedge-control-plane", change --config-path ${LIMA_DIR}/config.json
to --config-path "${LIMA_DIR}/config.json", and ensure the PGEDGE_HOST_ID
expansion in the sed expression is safely handled (e.g., use sed
"s/^/[${PGEDGE_HOST_ID}] /" but with PGEDGE_HOST_ID set/quoted where used) so
all references to LIMA_DIR and PGEDGE_HOST_ID are double-quoted to prevent
word-splitting; keep the set_env and pgedge-control-plane run invocation
otherwise unchanged.
In `@server/internal/orchestrator/systemd/dnf.go`:
- Around line 29-42: The Dnf.InstalledPostgresVersions function currently
returns an error from exec.CommandContext(...) without logging details; before
returning the wrapped error in the error branch (after cmd.CombinedOutput and
the strings.Contains check) add a structured log call at warn/error level that
records the command, args (supportedDnfPackages()), the error value and the
command output, then return the same fmt.Errorf(...) as before; if the Dnf
struct exposes a logger field (e.g. d.logger or d.log) use that to log,
otherwise use the package logger you have in this package, ensuring the log
message includes "dnf list --installed" context, err and string(out).
In `@server/internal/postgres/hba/entry.go`:
- Around line 88-94: The trailing else branch that returns "" after checking
ip.To4() and ip.To16() is unreachable because net.ParseIP guarantees ip.To16()
is non-nil for any parsed IP; remove the unreachable else { return "" } and
simplify the logic to set cidr from ip.To4() when non-nil otherwise use
ip.To16() (i.e., fall through to the IPv6 case), ensuring cidr is always
assigned for valid parsed IPs and no dead-return remains; update references
around ip, cidr, ip.To4(), and ip.To16() accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8c861282-bd6b-4a57-b473-0f31a1d912f7
⛔ Files ignored due to path filters (8)
api/apiv1/gen/control_plane/service.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/client/types.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/server/types.gois excluded by!**/gen/**api/apiv1/gen/http/openapi.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi.yamlis excluded by!**/gen/**api/apiv1/gen/http/openapi3.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi3.yamlis excluded by!**/gen/**go.sumis excluded by!**/*.sum
📒 Files selected for processing (76)
.gitignore.golangci.ymlMakefileNOTICE.txtapi/apiv1/design/database.gochanges/unreleased/Added-20260309-165318.yamldocs/development/running-locally.mde2e/backup_restore_test.goe2e/cancel_task_test.goe2e/custom_db_create_test.goe2e/db_create_test.goe2e/db_update_add_node_test.goe2e/failover_test.goe2e/fixture_test.goe2e/fixtures/ansible.cfge2e/load_test.goe2e/minor_version_upgrade_test.goe2e/service_provisioning_test.goe2e/switchover_test.goe2e/whole_cluster_test.gogo.modhack/dev-env.zshhack/pgedge-cp-env.plugin.zshlima/Makefilelima/ansible.cfglima/circus.inilima/config.jsonlima/deploy.yamllima/inventory.yamllima/lima-template.yamllima/roles/generate_test_config/tasks/main.yamllima/roles/generate_test_config/templates/test_config.yaml.tmpllima/roles/install_prerequisites/tasks/main.yamllima/roles/install_prerequisites/vars/main.yamllima/roles/start_vms/tasks/main.yamllima/roles/stop_dbs/tasks/main.yamllima/roles/teardown_vms/tasks/main.yamllima/run.shlima/stop-dbs.yamllima/teardown.yamllima/vars.yamlserver/cmd/root.goserver/internal/api/apiv1/convert.goserver/internal/api/apiv1/post_init_handlers.goserver/internal/api/apiv1/validate.goserver/internal/api/apiv1/validate_test.goserver/internal/app/app.goserver/internal/config/config.goserver/internal/database/instance.goserver/internal/database/instance_resource.goserver/internal/database/instance_store.goserver/internal/database/orchestrator.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/ds/versions.goserver/internal/monitor/instance_monitor.goserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yamlserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yamlserver/internal/orchestrator/provide.goserver/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/systemd/client.goserver/internal/orchestrator/systemd/dnf.goserver/internal/orchestrator/systemd/dnf_test.goserver/internal/orchestrator/systemd/orchestrator.goserver/internal/orchestrator/systemd/packages.goserver/internal/orchestrator/systemd/patroni_config.goserver/internal/orchestrator/systemd/patroni_unit.goserver/internal/orchestrator/systemd/pgbackrest_restore.goserver/internal/orchestrator/systemd/provide.goserver/internal/orchestrator/systemd/resources.goserver/internal/orchestrator/systemd/unit.goserver/internal/postgres/hba/entry.goserver/internal/workflows/activities/create_pgbackrest_backup.goserver/internal/workflows/activities/update_db_state.goserver/internal/workflows/activities/update_planned_instance_states.goserver/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (28)
- .golangci.yml
- .gitignore
- server/cmd/root.go
- lima/stop-dbs.yaml
- changes/unreleased/Added-20260309-165318.yaml
- e2e/custom_db_create_test.go
- lima/ansible.cfg
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
- lima/teardown.yaml
- server/internal/app/app.go
- lima/lima-template.yaml
- server/internal/orchestrator/systemd/resources.go
- lima/roles/generate_test_config/templates/test_config.yaml.tmpl
- e2e/minor_version_upgrade_test.go
- e2e/db_create_test.go
- lima/config.json
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
- server/internal/orchestrator/systemd/dnf_test.go
- lima/roles/teardown_vms/tasks/main.yaml
- lima/roles/start_vms/tasks/main.yaml
- lima/vars.yaml
- lima/roles/install_prerequisites/vars/main.yaml
- server/internal/orchestrator/systemd/patroni_unit.go
- lima/roles/install_prerequisites/tasks/main.yaml
- lima/roles/stop_dbs/tasks/main.yaml
- lima/inventory.yaml
- lima/circus.ini
- server/internal/orchestrator/systemd/pgbackrest_restore.go
🚧 Files skipped from review as they are similar to previous changes (26)
- e2e/fixtures/ansible.cfg
- e2e/whole_cluster_test.go
- e2e/cancel_task_test.go
- lima/roles/generate_test_config/tasks/main.yaml
- server/internal/workflows/activities/update_db_state.go
- e2e/load_test.go
- server/internal/workflows/activities/update_planned_instance_states.go
- e2e/failover_test.go
- server/internal/api/apiv1/post_init_handlers.go
- e2e/service_provisioning_test.go
- lima/deploy.yaml
- server/internal/monitor/instance_monitor.go
- server/internal/database/spec.go
- api/apiv1/design/database.go
- server/internal/api/apiv1/convert.go
- hack/pgedge-cp-env.plugin.zsh
- server/internal/ds/versions.go
- server/internal/workflows/delete_database.go
- e2e/backup_restore_test.go
- server/internal/config/config.go
- server/internal/api/apiv1/validate_test.go
- server/internal/orchestrator/systemd/packages.go
- server/internal/orchestrator/systemd/patroni_config.go
- server/internal/api/apiv1/validate.go
- server/internal/orchestrator/systemd/client.go
- server/internal/orchestrator/systemd/orchestrator.go
a574a99 to
940cc99
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/internal/database/spec.go (1)
560-580:⚠️ Potential issue | 🟠 Major
Clone()dropsInPlaceRestorestate
InstanceSpec.Clone()now copiesAllHostIDs(Line 579), but it still does not copyInPlaceRestore(defined at Line 526). This can silently flip cloned specs fromtruetofalse.Proposed fix
return &InstanceSpec{ InstanceID: s.InstanceID, TenantID: utils.ClonePointer(s.TenantID), DatabaseID: s.DatabaseID, HostID: s.HostID, DatabaseName: s.DatabaseName, NodeName: s.NodeName, NodeOrdinal: s.NodeOrdinal, PgEdgeVersion: s.PgEdgeVersion.Clone(), Port: utils.ClonePointer(s.Port), PatroniPort: utils.ClonePointer(s.PatroniPort), CPUs: s.CPUs, MemoryBytes: s.MemoryBytes, DatabaseUsers: users, BackupConfig: s.BackupConfig.Clone(), RestoreConfig: s.RestoreConfig.Clone(), PostgreSQLConf: maps.Clone(s.PostgreSQLConf), ClusterSize: s.ClusterSize, OrchestratorOpts: s.OrchestratorOpts.Clone(), + InPlaceRestore: s.InPlaceRestore, AllHostIDs: slices.Clone(s.AllHostIDs), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/spec.go` around lines 560 - 580, InstanceSpec.Clone() currently omits the InPlaceRestore field so cloned specs lose that state; update the Clone method to copy InPlaceRestore (e.g. set InPlaceRestore: utils.ClonePointer(s.InPlaceRestore) if it's a *bool or InPlaceRestore: s.InPlaceRestore if it's a value) alongside the other fields (InstanceSpec, TenantID, AllHostIDs, etc.) so clones preserve the original InPlaceRestore value.
♻️ Duplicate comments (1)
go.mod (1)
12-12:⚠️ Potential issue | 🟠 Major
github.com/docker/dockeris still pinned to the OSV-flagged release.Line 12 remains on
v27.1.1+incompatible, so the HIGH-severity Docker Engine advisories already called out on this PR are still present. Please move to a patched version and refresh module metadata before merge.Verify the pinned version and current OSV results with:
#!/bin/bash set -euo pipefail rg -n 'github.com/docker/docker' go.mod version=$(awk '/github.com\/docker\/docker/{print $2}' go.mod) response=$( jq -n --arg version "$version" '{ version: $version, package: { ecosystem: "Go", name: "github.com/docker/docker" } }' | curl -s https://api.osv.dev/v1/query \ -H 'Content-Type: application/json' \ --data `@-` ) echo "$response" | jq --arg version "$version" '{version: $version, vulns: [.vulns[]?.id]}'This should return a non-empty
vulnsarray for the currently pinned version.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go.mod` at line 12, The go.mod entry for the Docker module ("github.com/docker/docker" currently pinned as v27.1.1+incompatible) is on an OSV-flagged release; update that module to a patched release (select the latest non-vulnerable tag) and refresh module metadata so the vulnerability is resolved. Change the version token for "github.com/docker/docker" in go.mod to the patched version, run the module update/cleanup (e.g., fetch the new version and run module tidy) and re-run the provided OSV verification snippet to confirm no vulns remain; reference the go.mod line for "github.com/docker/docker" when making the change. Ensure go.sum is updated and CI passes before merging.
🧹 Nitpick comments (2)
server/internal/database/spec.go (1)
652-674: Avoid sharing the sameAllHostIDsslice across all instancesAt Line 673, every
InstanceSpecgets the same backing slice (allHostIDs). A later in-place mutation on one instance can leak into others. Safer to clone on assignment.Proposed refactor
- AllHostIDs: allHostIDs, + AllHostIDs: slices.Clone(allHostIDs),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/spec.go` around lines 652 - 674, The code assigns the same backing slice allHostIDs into every InstanceSpec.AllHostIDs, risking cross-instance mutation; in the block that constructs InstanceSpec (the instances[...] = &InstanceSpec{...}) make a defensive copy of allHostIDs (e.g., create a new slice and copy the elements) and assign that copy to AllHostIDs so each InstanceSpec has its own slice; update the construction for InstanceSpec to use the cloned slice rather than the shared allHostIDs.server/internal/config/config.go (1)
121-131: Minor: Inconsistent error message formatting.The error messages here omit the field prefix colon used elsewhere in the codebase (e.g.,
"bind_addr cannot be empty"vs"bind_addr: cannot be empty"). This is a minor inconsistency but doesn't affect functionality.♻️ Optional: Align error message format
func (s SystemD) validate() []error { var errs []error if s.PgBackRestPath == "" { - errs = append(errs, errors.New("pgbackrest_path cannot be empty")) + errs = append(errs, errors.New("pgbackrest_path: cannot be empty")) } if s.PatroniPath == "" { - errs = append(errs, errors.New("patroni_path cannot be empty")) + errs = append(errs, errors.New("patroni_path: cannot be empty")) } return errs }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/config/config.go` around lines 121 - 131, Update SystemD.validate() to make its error messages consistent with the rest of the codebase by adding the colon after the field name; specifically, change the errors produced for PgBackRestPath and PatroniPath in the validate method (SystemD.validate) from "pgbackrest_path cannot be empty" and "patroni_path cannot be empty" to "pgbackrest_path: cannot be empty" and "patroni_path: cannot be empty" so they match the existing "bind_addr: cannot be empty" style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/development/running-locally.md`:
- Around line 32-36: The prerequisites list is contradictory about Restish;
update the "Restish" bullet in docs/development/running-locally.md to clearly
indicate whether it's optional (recommended) or required and make it consistent
with the later "Restish" section — e.g., change the list entry to "Restish
(optional, recommended) — see the Restish section below for usage and
configuration" or remove it from the required list and add a short parenthetical
that points to the Restish section for optional setup details so both places
convey the same requirement level.
In `@hack/dev-env.zsh`:
- Around line 219-222: The code extracts ip_addr and port from conn_info using
jq but doesn't handle missing values, so validate after parsing that local
variables ip_addr and port are non-empty (and not "null") before they are used
to build the psql connection; if either is missing, print a clear error and
exit. Locate the local ip_addr/port assignments in hack/dev-env.zsh (the
conn_info parsing block) and add a guard that checks the parsed values (e.g.,
test -z or comparison against "null") and fails with a friendly message
explaining connection_info is incomplete and which field is missing.
In `@lima/Makefile`:
- Around line 20-23: The linux cross-build invocation using "GOOS=linux go build
-gcflags \"all=-N -l\" -o ./pgedge-control-plane $(shell pwd)/../server" should
set CGO_ENABLED=0 to disable cgo during cross-compilation; update that build
line to export CGO_ENABLED=0 alongside GOOS (i.e., prepend or include
CGO_ENABLED=0) so the target that produces ./pgedge-control-plane matches other
Linux build targets and won't fail on macOS when cgo dependencies exist.
In `@lima/run.sh`:
- Around line 57-64: The script uses unquoted variable expansions (notably
LIMA_DIR) which will break on paths containing spaces; update all occurrences to
use quoted expansions — e.g., quote LIMA_DIR in the cp invocation (cp
"${LIMA_DIR}/pgedge-control-plane" /usr/sbin), quote assignments created by
set_env (ensure variables set in that function are quoted when expanded), and
quote the --config-path argument passed to pgedge-control-plane run
(--config-path "${LIMA_DIR}/config.json"); search for other uses of LIMA_DIR and
any other path variables in this file and wrap them in double quotes to prevent
word splitting.
In `@server/internal/orchestrator/systemd/orchestrator.go`:
- Around line 482-489: The validation currently only checks ports independently;
add a check in the same validation block (where validatePort is called for
prevPort/prevPatroniPort and ch.Current.Port/ch.Current.PatroniPort) to reject
configurations where ch.Current.Port == ch.Current.PatroniPort (and likewise
compare prevPort vs prevPatroniPort if relevant), set result.Valid = false and
append a clear error via result.Errors (e.g., "postgres and patroni ports must
not be identical"), so GenerateInstanceResources / PatroniConfigGeneratorOptions
won't receive identical ports.
In `@server/internal/orchestrator/systemd/pgbackrest_restore.go`:
- Around line 107-115: The restore currently calls runRestoreCmd before
renameDataDir which causes the restore command (built against
Paths.Instance.PgData()) to write into the live data directory or have the
restored data immediately moved out; swap the calls so renameDataDir() is
invoked before runRestoreCmd() to ensure Paths.Instance.PgData() points at the
intended pgdata-restore location (PgDataRestore()) when building/executing the
restore, i.e., call p.renameDataDir(fs) before p.runRestoreCmd(ctx, orch,
logger, taskSvc) and keep subsequent StartInstance usage unchanged.
---
Outside diff comments:
In `@server/internal/database/spec.go`:
- Around line 560-580: InstanceSpec.Clone() currently omits the InPlaceRestore
field so cloned specs lose that state; update the Clone method to copy
InPlaceRestore (e.g. set InPlaceRestore: utils.ClonePointer(s.InPlaceRestore) if
it's a *bool or InPlaceRestore: s.InPlaceRestore if it's a value) alongside the
other fields (InstanceSpec, TenantID, AllHostIDs, etc.) so clones preserve the
original InPlaceRestore value.
---
Duplicate comments:
In `@go.mod`:
- Line 12: The go.mod entry for the Docker module ("github.com/docker/docker"
currently pinned as v27.1.1+incompatible) is on an OSV-flagged release; update
that module to a patched release (select the latest non-vulnerable tag) and
refresh module metadata so the vulnerability is resolved. Change the version
token for "github.com/docker/docker" in go.mod to the patched version, run the
module update/cleanup (e.g., fetch the new version and run module tidy) and
re-run the provided OSV verification snippet to confirm no vulns remain;
reference the go.mod line for "github.com/docker/docker" when making the change.
Ensure go.sum is updated and CI passes before merging.
---
Nitpick comments:
In `@server/internal/config/config.go`:
- Around line 121-131: Update SystemD.validate() to make its error messages
consistent with the rest of the codebase by adding the colon after the field
name; specifically, change the errors produced for PgBackRestPath and
PatroniPath in the validate method (SystemD.validate) from "pgbackrest_path
cannot be empty" and "patroni_path cannot be empty" to "pgbackrest_path: cannot
be empty" and "patroni_path: cannot be empty" so they match the existing
"bind_addr: cannot be empty" style.
In `@server/internal/database/spec.go`:
- Around line 652-674: The code assigns the same backing slice allHostIDs into
every InstanceSpec.AllHostIDs, risking cross-instance mutation; in the block
that constructs InstanceSpec (the instances[...] = &InstanceSpec{...}) make a
defensive copy of allHostIDs (e.g., create a new slice and copy the elements)
and assign that copy to AllHostIDs so each InstanceSpec has its own slice;
update the construction for InstanceSpec to use the cloned slice rather than the
shared allHostIDs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d2118d17-56c9-4c4d-affe-8a0380f16edd
⛔ Files ignored due to path filters (8)
api/apiv1/gen/control_plane/service.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/client/types.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/server/types.gois excluded by!**/gen/**api/apiv1/gen/http/openapi.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi.yamlis excluded by!**/gen/**api/apiv1/gen/http/openapi3.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi3.yamlis excluded by!**/gen/**go.sumis excluded by!**/*.sum
📒 Files selected for processing (76)
.gitignore.golangci.ymlMakefileNOTICE.txtapi/apiv1/design/database.gochanges/unreleased/Added-20260309-165318.yamldocs/development/running-locally.mde2e/backup_restore_test.goe2e/cancel_task_test.goe2e/custom_db_create_test.goe2e/db_create_test.goe2e/db_update_add_node_test.goe2e/failover_test.goe2e/fixture_test.goe2e/fixtures/ansible.cfge2e/load_test.goe2e/minor_version_upgrade_test.goe2e/service_provisioning_test.goe2e/switchover_test.goe2e/whole_cluster_test.gogo.modhack/dev-env.zshhack/pgedge-cp-env.plugin.zshlima/Makefilelima/ansible.cfglima/circus.inilima/config.jsonlima/deploy.yamllima/inventory.yamllima/lima-template.yamllima/roles/generate_test_config/tasks/main.yamllima/roles/generate_test_config/templates/test_config.yaml.tmpllima/roles/install_prerequisites/tasks/main.yamllima/roles/install_prerequisites/vars/main.yamllima/roles/start_vms/tasks/main.yamllima/roles/stop_dbs/tasks/main.yamllima/roles/teardown_vms/tasks/main.yamllima/run.shlima/stop-dbs.yamllima/teardown.yamllima/vars.yamlserver/cmd/root.goserver/internal/api/apiv1/convert.goserver/internal/api/apiv1/post_init_handlers.goserver/internal/api/apiv1/validate.goserver/internal/api/apiv1/validate_test.goserver/internal/app/app.goserver/internal/config/config.goserver/internal/database/instance.goserver/internal/database/instance_resource.goserver/internal/database/instance_store.goserver/internal/database/orchestrator.goserver/internal/database/service.goserver/internal/database/spec.goserver/internal/ds/versions.goserver/internal/monitor/instance_monitor.goserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yamlserver/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yamlserver/internal/orchestrator/provide.goserver/internal/orchestrator/swarm/orchestrator.goserver/internal/orchestrator/systemd/client.goserver/internal/orchestrator/systemd/dnf.goserver/internal/orchestrator/systemd/dnf_test.goserver/internal/orchestrator/systemd/orchestrator.goserver/internal/orchestrator/systemd/packages.goserver/internal/orchestrator/systemd/patroni_config.goserver/internal/orchestrator/systemd/patroni_unit.goserver/internal/orchestrator/systemd/pgbackrest_restore.goserver/internal/orchestrator/systemd/provide.goserver/internal/orchestrator/systemd/resources.goserver/internal/orchestrator/systemd/unit.goserver/internal/postgres/hba/entry.goserver/internal/workflows/activities/create_pgbackrest_backup.goserver/internal/workflows/activities/update_db_state.goserver/internal/workflows/activities/update_planned_instance_states.goserver/internal/workflows/delete_database.go
✅ Files skipped from review due to trivial changes (32)
- .golangci.yml
- e2e/whole_cluster_test.go
- changes/unreleased/Added-20260309-165318.yaml
- e2e/custom_db_create_test.go
- lima/ansible.cfg
- e2e/fixtures/ansible.cfg
- server/cmd/root.go
- lima/stop-dbs.yaml
- e2e/minor_version_upgrade_test.go
- server/internal/api/apiv1/post_init_handlers.go
- lima/lima-template.yaml
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
- lima/roles/generate_test_config/templates/test_config.yaml.tmpl
- lima/roles/start_vms/tasks/main.yaml
- server/internal/orchestrator/systemd/resources.go
- server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
- lima/config.json
- lima/deploy.yaml
- .gitignore
- api/apiv1/design/database.go
- server/internal/orchestrator/systemd/dnf_test.go
- lima/roles/stop_dbs/tasks/main.yaml
- lima/vars.yaml
- lima/roles/install_prerequisites/vars/main.yaml
- lima/roles/generate_test_config/tasks/main.yaml
- server/internal/orchestrator/systemd/unit.go
- lima/inventory.yaml
- lima/roles/install_prerequisites/tasks/main.yaml
- lima/roles/teardown_vms/tasks/main.yaml
- lima/circus.ini
- lima/teardown.yaml
- e2e/backup_restore_test.go
🚧 Files skipped from review as they are similar to previous changes (21)
- e2e/db_create_test.go
- e2e/cancel_task_test.go
- e2e/switchover_test.go
- NOTICE.txt
- server/internal/workflows/delete_database.go
- server/internal/workflows/activities/update_db_state.go
- server/internal/api/apiv1/convert.go
- hack/pgedge-cp-env.plugin.zsh
- server/internal/workflows/activities/create_pgbackrest_backup.go
- server/internal/postgres/hba/entry.go
- server/internal/database/instance.go
- Makefile
- server/internal/orchestrator/systemd/patroni_unit.go
- server/internal/database/instance_store.go
- server/internal/orchestrator/swarm/orchestrator.go
- server/internal/database/service.go
- server/internal/app/app.go
- server/internal/api/apiv1/validate_test.go
- server/internal/orchestrator/systemd/patroni_config.go
- server/internal/orchestrator/systemd/client.go
- server/internal/api/apiv1/validate.go
940cc99 to
490d89c
Compare
There was a problem hiding this comment.
Verification:
Created a Lima dev environment.
> cp1-req get-version
HTTP/1.1 200 OK
Content-Length: 171
Content-Type: application/json
Date: Thu, 02 Apr 2026 20:25:27 GMT
{
arch: "arm64"
revision: "490d89c3ddb60c621be308a25d3c0a4809a92313+dirty"
revision_time: "2026-04-02T17:03:57Z"
version: "v0.5.1-0.20260402170357-490d89c3ddb6+dirty"
}
Successfully completed E2E tests in the Lima dev environment.
DONE 41 tests, 12 skipped in 605.561s
| return err | ||
| } | ||
|
|
||
| err = client.UnitExists(ctx, r.Name) |
There was a problem hiding this comment.
Can we check the unit file on disk instead? Looks UnitExists checks only with loaded units only.
path := filepath.Join(unitsDir, r.Name)
_, fileErr := os.Stat(path)
There was a problem hiding this comment.
I don't think that makes sense here. StopUnit and DisableUnit are only valid on loaded units, so I think that suggestion would result in an error if the file existed but the unit was not loaded. In the current implementation, we would skip the StopUnit and DisableUnit calls and go right to deleting the file, which is correct.
Adds an orchestrator implementation for SystemD. As of this commit, you'll need to install Postgres on the host before starting the Control Plane: ``` dnf install -y epel-release dnf dnf config-manager --set-enabled crb dnf update -y --allowerasing dnf install -y https://dnf.pgedge.com/reporpm/pgedge-release-latest.noarch.rpm dnf install -y \ pgedge-postgresql18 \ pgedge-spock50_18 \ pgedge-snowflake_18 \ pgedge-lolor_18 \ pgedge-postgresql18-contrib \ pgedge-pgbackrest \ pgedge-python3-psycopg2 \ python3-pip pip install 'patroni[etcd,jsonlogger]==4.1.0' ``` PLAT-417
Adds a development environment for the SystemD orchestrator, called `dev-lima`. Just like the `compose` environment, this environment exposes all services and databases on your host machine.
Adds support for running E2Es against the `dev-lima` environment. PLAT-417
490d89c to
54ec399
Compare
Summary
Adds preliminary support for orchestrating pgEdge Enterprise Postgres databases using SystemD instead of Docker Swarm.
Testing
This PR includes a lima-based development environment. See the changes in
running-locally.mdfor complete instructions. If you're already set up to run the lima-based E2E fixtures, you can do:Notes for Reviewers
As of this PR, the control-plane server does not respond to upgrades or other package changes that happen after the server starts. If you install a minor version update, you will be blocked from updating existing databases until you:
postgres_versionin your database specThis will change in a subsequent PR.
PLAT-417