Conversation
📝 WalkthroughWalkthroughAdds logging-to-metrics instrumentation (slog + zap), exposes configurable controller logging via Helm and env, registers a new Prometheus metric, updates dependencies and Postgres images/versions, and tweaks scheduler/knowledge logging and error-handling behaviors. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as Application (main)
participant Zap as Zap Core
participant Slog as Slog Handler
participant Metrics as Prometheus Registry
App->>Zap: Initialize zap.New(..., RawZapOpts(WrapCoreWithLogMetrics))
App->>Slog: Set default slog handler = NewMetricsSlogHandler(JSONHandler)
Zap->>Metrics: on LogEntry(level>=Warn) -> increment cortex_log_messages_total
Slog->>Metrics: on Record(level>=Warn) -> increment cortex_log_messages_total
App->>Metrics: Registry.MustRegister(LogMessagesTotal)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Test Coverage ReportTest Coverage 📊: 68.8% |
|
|
||
| - alert: CortexNovaSyncObjectsDroppedToZero | ||
| expr: cortex_sync_objects{service="cortex-nova-metrics", datasource!="openstack_migrations"} == 0 | ||
| expr: cortex_sync_objects{service="cortex-nova-metrics", datasource!~"openstack_migrations|prometheus_kvm_libvirt_domain_steal_pct"} == 0 |
There was a problem hiding this comment.
Silence the alert instead?
There was a problem hiding this comment.
Maybe also an option. The issue in general is that not all datasource (e.g. prometheus_kvm_libvirt_domain_steal_pct) does not always have FEATURES > 0
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
helm/library/cortex-postgres/Chart.yaml (1)
1-9:⚠️ Potential issue | 🟡 MinorAdd the required local-source comment in this Chart.yaml.
This file is missing the mandated
# from: file://../../library/cortex-postgrescomment.📝 Suggested fix
# Copyright SAP SE # SPDX-License-Identifier: Apache-2.0 +# from: file://../../library/cortex-postgres apiVersion: v2 name: cortex-postgresAs per coding guidelines,
Helm Chart.yaml files must include the # from: file://../../library/cortex-postgres comment pointing to the local chart path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/library/cortex-postgres/Chart.yaml` around lines 1 - 9, Add the mandated local-source comment to the Chart.yaml for the cortex-postgres chart: open Chart.yaml (name: cortex-postgres) and insert the exact comment line "# from: file://../../library/cortex-postgres" near the top (above or immediately below the license/header comments) so the file includes the required local chart path reference.internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go (1)
265-296:⚠️ Potential issue | 🟡 MinorInconsistent log level between CPU and memory capacity checks.
The CPU capacity check at line 266 was downgraded to
Warn, but the analogous memory capacity check at line 292 still usesError. Both conditions have identical control flow (skip the host viacontinue), so they should likely use the same log level for consistency.🔧 Proposed fix for consistency
freeMemory, ok := free["memory"] if !ok || freeMemory.Value() < 0 { - traceLog.Error( + traceLog.Warn( "host with invalid memory capacity", "host", host, "freeMemory", freeMemory.String(), ) continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go` around lines 265 - 296, The memory-capacity check uses traceLog.Error while the analogous CPU check uses traceLog.Warn; change the traceLog.Error call in the memory validation block (the branch that checks freeMemory, referencing freeMemory and free["memory"]) to traceLog.Warn so both invalid-capacity branches use the same log level (traceLog.Warn) and remain consistent with the CPU check that uses traceLog.Warn for host skipping via continue.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helm/library/cortex/templates/manager/manager.yaml`:
- Around line 38-43: The template currently unconditionally emits
--zap-log-level/--zap-devel and LOG_LEVEL which can duplicate user-provided
entries in .Values.controllerManager.container.args and
.Values.controllerManager.container.env; add guards so we only render the
generated flags when the user has not already provided them. Implement small
helper(s) (e.g. a template function containsArg that iterates
.Values.controllerManager.container.args to check for a matching prefix like
"--zap-log-level" or "--zap-devel", and a helper hasEnvVar that scans
.Values.controllerManager.container.env for name == "LOG_LEVEL"), then wrap the
blocks that emit the flags/ENV (the places using
.Values.controllerManager.container.logLevel,
.Values.controllerManager.container.zapDevel and the LOG_LEVEL emission) with
conditions that only render when the corresponding containsArg/hasEnvVar returns
false; apply the same guard to both the blocks at the shown location and the
similar block at the other referenced section (lines 66-75).
In `@helm/library/cortex/values.yaml`:
- Around line 19-22: Update the zapDevel comment to avoid implying it toggles
log level on its own: clarify that zapDevel only changes format/stacktrace
behavior (human-readable console logs and development stack traces) but the
effective log level is controlled by controllerManager.container.logLevel and
the injected flag --zap-log-level in manager.yaml; mention the default logLevel:
"info" so users know enabling zapDevel does not automatically enable debug
logging.
In `@pkg/monitoring/log_metrics_test.go`:
- Around line 215-218: The test currently checks for single-letter bytes in
output which can match timestamps or level names; update the assertion in
pkg/monitoring/log_metrics_test.go to assert on a stable field instead: either
parse the handler output into individual log records and assert each record's
msg field equals "d"/"i"/"w"/"e", or change the substring checks to look for
"msg=d", "msg=i", "msg=w", "msg=e" in the output string (referencing the
existing output variable and the inner handler used in the test) so the loop
verifies delegation using a stable field rather than raw single characters.
- Around line 337-339: The test creates a zapcore.BufferedWriteSyncer (sink)
which spawns a background goroutine that is never stopped; replace the
BufferedWriteSyncer usage in the subtests that call WrapCoreWithLogMetrics with
a plain synchronous writer (e.g., use zapcore.AddSync(&bytes.Buffer{}) for the
WS) or, if you must keep BufferedWriteSyncer, ensure you call sink.Stop() at the
end of each subtest to clean up; locate the test block creating sink, inner
(zapcore.NewCore), and wrapped (WrapCoreWithLogMetrics) and change the sink
construction or add explicit Stop() to avoid leaking goroutines.
In `@postgres/Dockerfile`:
- Line 2: The image currently runs as root (FROM debian:trixie-slim...), so add
a non-root user and switch to it at the end of the Dockerfile: create a
user/group (e.g., appuser), set ownership (chown) of any runtime directories the
container needs, drop privileges with USER appuser, and ensure any commands that
require root (apt installs, etc.) run earlier as root; reference the
Dockerfile's FROM line to locate where to append the user creation, chown, and
USER directives.
---
Outside diff comments:
In `@helm/library/cortex-postgres/Chart.yaml`:
- Around line 1-9: Add the mandated local-source comment to the Chart.yaml for
the cortex-postgres chart: open Chart.yaml (name: cortex-postgres) and insert
the exact comment line "# from: file://../../library/cortex-postgres" near the
top (above or immediately below the license/header comments) so the file
includes the required local chart path reference.
In `@internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go`:
- Around line 265-296: The memory-capacity check uses traceLog.Error while the
analogous CPU check uses traceLog.Warn; change the traceLog.Error call in the
memory validation block (the branch that checks freeMemory, referencing
freeMemory and free["memory"]) to traceLog.Warn so both invalid-capacity
branches use the same log level (traceLog.Warn) and remain consistent with the
CPU check that uses traceLog.Warn for host skipping via continue.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f9653713-4bbb-423c-a641-d17fbfe96b23
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (22)
.github/renovate.jsoncmd/main.gocortex.secrets.example.yamlgo.modhelm/bundles/cortex-cinder/Chart.yamlhelm/bundles/cortex-manila/Chart.yamlhelm/bundles/cortex-nova/Chart.yamlhelm/bundles/cortex-nova/alerts/nova.alerts.yamlhelm/bundles/cortex-nova/templates/knowledges_kvm.yamlhelm/library/cortex-postgres/Chart.yamlhelm/library/cortex/templates/manager/manager.yamlhelm/library/cortex/values.yamlinternal/knowledge/extractor/controller.gointernal/scheduling/lib/filter_weigher_pipeline.gointernal/scheduling/nova/external_scheduler_api.gointernal/scheduling/nova/hypervisor_overcommit_controller.gointernal/scheduling/nova/plugins/filters/filter_capabilities.gointernal/scheduling/nova/plugins/filters/filter_external_customer.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gopkg/monitoring/log_metrics.gopkg/monitoring/log_metrics_test.gopostgres/Dockerfile
| # Last updated: 17 Mar 2026 | ||
| FROM debian:trixie-slim@sha256:26f98ccd92fd0a44d6928ce8ff8f4921b4d2f535bfa07555ee5d18f61429cf0c | ||
| # Last updated: 9 Apr 2026 | ||
| FROM debian:trixie-slim@sha256:4ffb3a1511099754cddc70eb1b12e50ffdb67619aa0ab6c13fcd800a78ef7c7a |
There was a problem hiding this comment.
Run the final image as a non-root user.
No USER is set after Line 2, so the image defaults to root at runtime. This weakens container isolation and matches the DS-0002 finding.
🔧 Proposed hardening change
ENTRYPOINT ["docker-entrypoint.sh"]
+USER postgres
@@
CMD ["postgres"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@postgres/Dockerfile` at line 2, The image currently runs as root (FROM
debian:trixie-slim...), so add a non-root user and switch to it at the end of
the Dockerfile: create a user/group (e.g., appuser), set ownership (chown) of
any runtime directories the container needs, drop privileges with USER appuser,
and ensure any commands that require root (apt installs, etc.) run earlier as
root; reference the Dockerfile's FROM line to locate where to append the user
creation, chown, and USER directives.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
helm/library/cortex/templates/manager/manager.yaml (1)
38-43:⚠️ Potential issue | 🟠 MajorSuppress both autogenerated log-level entries when either manual override is present.
This still leaves a mixed-config upgrade path: a chart that already sets only
--zap-log-levelwill still get an autogeneratedLOG_LEVEL, and a chart that sets onlyLOG_LEVELwill still get an autogenerated--zap-log-level. That makes zap and slog drift to different levels even thoughcontrollerManager.container.logLevelis meant to be one logical knob. Please treat the arg/env pair as a single override surface and skip generating both when either manual form is already present.Suggested change
- {{- if and .Values.controllerManager.container.logLevel (ne (include "chart.argsContainPrefix" (dict "prefix" "--zap-log-level" "args" .Values.controllerManager.container.args)) "true") }} + {{- if and + .Values.controllerManager.container.logLevel + (ne (include "chart.argsContainPrefix" (dict "prefix" "--zap-log-level" "args" .Values.controllerManager.container.args)) "true") + (not (and .Values.controllerManager.container.env (hasKey .Values.controllerManager.container.env "LOG_LEVEL"))) }} - "--zap-log-level={{ .Values.controllerManager.container.logLevel }}" {{- end }} @@ - {{- if and .Values.controllerManager.container.logLevel (not (and .Values.controllerManager.container.env (hasKey .Values.controllerManager.container.env "LOG_LEVEL"))) }} + {{- if and + .Values.controllerManager.container.logLevel + (not (and .Values.controllerManager.container.env (hasKey .Values.controllerManager.container.env "LOG_LEVEL"))) + (ne (include "chart.argsContainPrefix" (dict "prefix" "--zap-log-level" "args" .Values.controllerManager.container.args)) "true") }} - name: LOG_LEVEL value: {{ .Values.controllerManager.container.logLevel | quote }} {{- end }}Please render the chart in both cases to confirm behavior stays aligned:
- only
controllerManager.container.args: ["--zap-log-level=debug"]- only
controllerManager.container.env.LOG_LEVEL: debugAlso applies to: 66-69
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/library/cortex/templates/manager/manager.yaml` around lines 38 - 43, The template currently avoids emitting an autogenerated "--zap-*" flag only if the same flag is already present in controllerManager.container.args, but it must treat the arg/env pair as one override surface: when deciding to add "--zap-log-level" or "--zap-devel" also check for the corresponding env override (LOG_LEVEL for zap-log-level and the zap devel env var if used) and skip generating both forms if either manual form exists. Update the conditional around the "--zap-log-level" emission (the block using .Values.controllerManager.container.logLevel, include "chart.argsContainPrefix", and controllerManager.container.args) to also test that controllerManager.container.env does NOT contain a LOG_LEVEL entry, and likewise update the parallel env-generation block to check that args do not contain the "--zap-log-level" prefix; apply the same paired logic to the other block at the 66-69 region so that presence of either manual arg or env suppresses emitting both autogenerated entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@helm/library/cortex/templates/manager/manager.yaml`:
- Around line 38-43: The template currently avoids emitting an autogenerated
"--zap-*" flag only if the same flag is already present in
controllerManager.container.args, but it must treat the arg/env pair as one
override surface: when deciding to add "--zap-log-level" or "--zap-devel" also
check for the corresponding env override (LOG_LEVEL for zap-log-level and the
zap devel env var if used) and skip generating both forms if either manual form
exists. Update the conditional around the "--zap-log-level" emission (the block
using .Values.controllerManager.container.logLevel, include
"chart.argsContainPrefix", and controllerManager.container.args) to also test
that controllerManager.container.env does NOT contain a LOG_LEVEL entry, and
likewise update the parallel env-generation block to check that args do not
contain the "--zap-log-level" prefix; apply the same paired logic to the other
block at the 66-69 region so that presence of either manual arg or env
suppresses emitting both autogenerated entries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b8c18685-8bcf-4d6b-8b8d-cf049daa2682
📒 Files selected for processing (7)
helm/bundles/cortex-nova/alerts/nova.alerts.yamlhelm/library/cortex/templates/_helpers.tplhelm/library/cortex/templates/manager/manager.yamlhelm/library/cortex/values.yamlinternal/scheduling/lib/filter_weigher_pipeline.gopkg/monitoring/log_metrics.gopkg/monitoring/log_metrics_test.go
✅ Files skipped from review due to trivial changes (3)
- internal/scheduling/lib/filter_weigher_pipeline.go
- helm/library/cortex/values.yaml
- helm/bundles/cortex-nova/alerts/nova.alerts.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/monitoring/log_metrics_test.go
- pkg/monitoring/log_metrics.go
|
Will split PR => close |
Uh oh!
There was an error while loading. Please reload this page.