notifications (4/5): Grafana-backed service, handlers, and server wiring by illegalprime · Pull Request #455 · block/proto-fleet

illegalprime · 2026-06-15T15:53:49Z

Stack 4/5 — base: `eden/notifications-3-authz`

Wires up the server side end-to-end.

Notifications domain: Grafana sidecar client, service with org isolation and value-level secret redaction, models
Connect handlers for Channel / Rule / Silence / History
`main.go` wiring; Grafana + destination-policy config on the metrics provider
Marks channel mutations as redacted + session-only in the interceptor config (a leaked API key can't reconfigure alert delivery or mute alerts)

Stack

proto + generated code
notification_history keyset listing
notification:read / notification:manage permissions
this PR — Grafana-backed service, handlers, server wiring
notifications UI

🤖 Generated with Claude Code

github-actions · 2026-06-15T15:57:54Z

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

Reviewed pull request diff only (66e55db8f5c0d4bfe97e1fd3f88307bceeb66a05...75185f66396fe7581a62d34f4869e4be93b79392, exact PR three-dot diff)

Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.

Review Summary

Overall Risk: HIGH

Findings

[HIGH] Created notification channels are not used for real alert delivery

Category: Reliability
Location: server/internal/domain/notifications/service.go:96
Description: CreateChannel only creates a Grafana contact point. It never updates Grafana’s notification policy tree or any fleet-api fan-out path to route org alerts to that contact point. The existing provisioned policy still routes alerts to the internal protofleet-internal receiver only.
Impact: Operators can create and successfully test channels, but production alerts will not be delivered to those channels. This is a silent alerting failure.
Recommendation: On channel create/update/delete, also manage an org-scoped notification policy route, or make the internal webhook fan out to configured channels. Add an integration test that fires a real alert and verifies it reaches a user-created channel.

[HIGH] Destination validation is bypassable by DNS rebinding or redirects

Category: Network Discovery
Location: server/internal/domain/notifications/service.go:369
Description: Notification destinations are classified only at save/test-preview time. Grafana performs the actual outbound request later, and the code explicitly notes the check is not rebinding-proof. A user can save a public domain, then change DNS or return a redirect to loopback, link-local, metadata, or private addresses before TestChannel or alert delivery.
Impact: A user with notification:manage can turn Grafana into an SSRF primitive against services reachable from the Grafana container or host network.
Recommendation: Enforce egress controls at the Grafana/container network boundary, including loopback, RFC1918, link-local, and metadata IP blocks. If app-level enforcement remains, proxy delivery through code that re-resolves and validates every dial and redirect hop.

[MEDIUM] `TestChannel` does not implement the request contract

Category: gRPC
Location: server/internal/handlers/notifications/handler.go:128
Description: TestChannel always calls protoToChannel, which rejects CHANNEL_KIND_UNSPECIFIED, even when the request only needs to test an existing saved channel by id. It also contradicts the proto comment that an unsaved definition wins when both id and config are present; the service uses the saved channel whenever id is non-empty.
Impact: Saved-channel tests can fail unexpectedly, and edited unsaved destinations can get false-positive test results from the old saved destination.
Recommendation: Treat id-only requests as Channel{ID: id} without requiring kind; when both id and a definition are present, either test the supplied definition or reject the ambiguous request. Add handler-level tests.

[MEDIUM] Grafana admin password is exposed in process arguments

Category: Infrastructure
Location: deployment-files/run-fleet.sh:933
Description: The new token provisioning path passes admin:${admin_pass} to curl -u. Command-line arguments are commonly visible via ps or /proc to local users while the command runs.
Impact: A local user on the deployment host could capture the Grafana admin password and mint broader Grafana access.
Recommendation: Avoid secrets in argv. Use a temporary 0600 curl config/netrc file or stdin-fed config, remove it immediately after use, and keep the password out of process arguments.

[MEDIUM] Grafana JSON error bodies are returned after incomplete redaction

Category: gRPC
Location: server/internal/domain/notifications/grafana_client.go:213
Description: For JSON error responses, the client returns the redacted body as the error message. The redactor only handles known JSON keys plus URLs and bearer strings inside generic string fields, so JSON messages that echo SMTP passwords or other secrets in free-form text can still reach logs and RPC errors.
Impact: Notification credentials can leak to server logs or API clients during Grafana validation/provisioning failures.
Recommendation: Do not surface upstream Grafana bodies to clients. Return a generic status/category, and log only whitelisted safe fields or a fully generic redacted summary.

Notes

Reviewed only .git/codex-review.diff. I did not find changed code in this diff touching miner pool configuration, command execution, SQL queries, frontend rendering, plugin execution, Rust ASIC code, or protobuf wire-format definitions.

_{Generated by Codex Security Review |

Triggered by: @illegalprime |

Review workflow run}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c55e88d7a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T16:07:15Z

+      FLEET_METRICS_GRAFANA_TOKEN: "${FLEET_METRICS_GRAFANA_TOKEN:-}"
+      # Basic-auth fallback; defaults empty so a token deployment never holds the Grafana admin credential.
+      FLEET_METRICS_GRAFANA_USER: "${FLEET_METRICS_GRAFANA_USER:-admin}"
+      FLEET_METRICS_GRAFANA_PASSWORD: "${FLEET_METRICS_GRAFANA_PASSWORD:-}"


Pass the generated Grafana password to fleet-api

In the documented production path (run-fleet.sh --enable-beta-notifications), the script generates GRAFANA_ADMIN_PASSWORD but does not set FLEET_METRICS_GRAFANA_TOKEN or FLEET_METRICS_GRAFANA_PASSWORD; this overlay then passes an empty password while GrafanaConfig falls back to basic auth as admin. The stack therefore starts with fleet-api unable to authenticate to the Grafana sidecar, so the newly wired channel/rule/silence APIs return Grafana 401s unless an operator manually discovers and sets an extra env var.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T16:07:15Z

+		}
+	case ChannelKindSlack:
+		// Slack URLs are write-only secrets: an edit without a fresh URL keeps the stored destination, a fresh URL is both a destination change and a new secret.
+		keepStoredSlackURL = c.Slack == nil || c.Slack.WebhookURL == ""


Require a Slack URL when changing channel kind

When an existing non-Slack channel is updated to SLACK with no webhook_url, this treats the missing URL as “keep the stored Slack URL” without checking owned.Kind; validation is skipped and carrySecretSettings later copies the old settings url into the new Slack contact point. That can silently reuse a webhook destination as a Slack secret when changing kinds, so only preserve an empty Slack URL for existing Slack channels and require a fresh URL otherwise.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T16:07:16Z

+	case SilenceScopeGroup:
+		if sil.Scope.GroupID != "" {
+			matchers = append(matchers, GrafanaSilenceMatcher{
+				Name:    "group_id",


Match group silences to labels alerts actually emit

A group-scoped silence is accepted but written with a group_id matcher; none of the bundled Grafana rules emit a group_id label (the static rule label is rule_group, while the metrics contract uses device_group). In the current notification stack this silence will look active in the API but will never match firing alerts, so group mutes do not stop notifications.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T16:07:16Z

+	case SilenceScopeSite:
+		if sil.Scope.SiteID != "" {
+			matchers = append(matchers, GrafanaSilenceMatcher{
+				Name:    "site_id",


Include site labels before accepting site silences

Site-scoped silences are written with a site_id matcher, but the provisioned Grafana alert queries in proto-fleet-rules.yaml select only organization_id/device_id for device alerts and only organization_id for fleet alerts, so alert instances do not carry site_id. In that deployment a site silence is stored and returned as active but cannot match any alert, causing notifications for the site to continue firing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cdd7d2724

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T16:15:57Z

+		Type:     grafanaTypeFor(c.Kind),
+		Settings: settings,
+	}
+	created, err := s.grafana.CreateContactPoint(ctx, cp)


Route created channels into Grafana notification policies

When a user creates a Slack/SMTP/webhook channel, this only creates a contact point; the provisioned policy tree in server/monitoring/grafana/provisioning/alerting/notification-policies.yaml still has a single root receiver protofleet-internal and there is no policy API update in this service. In that deployed configuration Grafana will keep routing alerts only to the internal webhook, so newly created channels can be listed and tested but will never receive real alert notifications unless an operator manually edits Grafana routing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T16:15:57Z

+		return nil, err
+	}
+	// TestChannelRequest carries no name; the domain service derives one from the id when present.
+	dom, err := protoToChannel(req.Msg.GetId(), "", req.Msg.GetKind(), req.Msg.GetWebhook(), req.Msg.GetSmtp(), req.Msg.GetSlack())


Honor unsaved test definitions when an id is present

For TestChannelRequest payloads that include both an existing id and edited channel settings, passing the id through here makes Service.TestChannel take the saved-channel branch and ignore the supplied webhook/SMTP/Slack fields. That contradicts the proto contract that the unsaved definition wins when both are set, so an edit form that tests a changed destination before saving will send the probe to the old stored channel and can report success for settings that were never tested.

Useful? React with 👍 / 👎.

The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

illegalprime · 2026-06-15T17:39:07Z

Codex findings on this PR:

[MEDIUM] prod overlay defaults fleet-api to Grafana admin creds — fixed (commit 6655a15). Reverted the admin-password default back to empty and instead provision a least-privilege Grafana service-account token (Editor on org 1) post-boot in run-fleet.sh, persisted to .env, with fleet-api recreated to pick it up. Basic-auth password stays empty by default, so a token deployment never holds the admin credential. Idempotent across re-runs.
[MEDIUM] destination validation bypass / SSRF — not fixed inline; the durable fix is Grafana-side egress enforcement (the pre-flight DNS check in fleet-api isn't the real egress boundary). Tracked in Notifications: enforce Grafana egress restrictions (destination SSRF) #459.

⚠️ The run-fleet.sh token provisioning calls the live Grafana API and could not be end-to-end tested here — please validate on a real --enable-beta-notifications deploy.

Adds the notifications domain (Grafana sidecar client, service with org isolation and secret redaction, models), the Connect handlers for Channel/ Rule/Silence/History, and the main.go wiring. Marks channel mutations as redacted + session-only in the interceptor config and adds Grafana + destination-policy config to the metrics provider. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6655a150e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T17:43:25Z

+
+func (g *Grafana) UpdateContactPoint(ctx context.Context, uid string, cp GrafanaContactPoint) (*GrafanaContactPoint, error) {
+	var out GrafanaContactPoint
+	if err := g.do(ctx, http.MethodPut, "/api/v1/provisioning/contact-points/"+uid, cp, &out); err != nil {


Handle Grafana contact-point update ACK

When updating any saved channel against real Grafana, PUT /api/v1/provisioning/contact-points/:uid succeeds with a 202 Ack rather than an EmbeddedContactPoint. Decoding that ACK into GrafanaContactPoint leaves Settings empty (or returns EOF on an empty body), and UpdateChannel then passes it to contactPointToChannel, so a successful edit is reported as an RPC error and the UI cannot reliably update channels. Return/refetch the contact point or build the response from the request instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T17:43:25Z

+	}
+	return GrafanaSilence{
+		StartsAt:  sil.StartsAt,
+		EndsAt:    sil.EndsAt,


Require or materialize a silence end time

When CreateSilence/UpdateSilence omits ends_at, protoToSilence leaves sil.EndsAt at the Go zero time; this copies that zero into the Alertmanager payload even though the Alertmanager silence schema requires endsAt. The service's own silenceActive treats zero as indefinite, so open-ended mutes accepted by this API shape are rejected by Grafana instead of being created; either reject missing ends_at or convert it to a real future expiration before posting.

Useful? React with 👍 / 👎.

…Ls in parse errors Two follow-ups from review: - run-fleet.sh swallowed the post-token fleet-api restart (>/dev/null 2>&1 || true), so a failed recreate left fleet-api on the pre-token env (still 401ing) while the script reported success. Check the result and fail loudly with manual-restart instructions. - checkDestinationURL returned url.Parse's error verbatim, which embeds the raw input — a webhook/Slack URL can carry a capability token. Return a generic 'not parseable' message so the secret can't leak via errors or logs. Addresses Codex security review findings (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

illegalprime · 2026-06-15T17:50:30Z

Re the re-review (the first two are now fixed in commit 75185f6):

[MEDIUM] malformed webhook URLs leak tokens through errors — fixed. checkDestinationURL no longer returns url.Parse's error verbatim (it embeds the raw input, which can carry a capability token); returns a generic "not parseable" message instead.
[MEDIUM] token provisioning can strand fleet-api on hidden restart failure — fixed. run-fleet.sh no longer swallows the post-token fleet-api recreate (>/dev/null 2>&1 || true); it checks the result and fails loudly with manual-restart instructions.
[MEDIUM] TestChannel validates stored channel instead of edited definition — by design (documented at the call site): for a saved channel the echoed-back payload is redacted, so testing the request body would probe a secret-stripped target; we test the stored settings instead. New/unsaved channels are tested from the entered payload directly (AddChannelModal → testChannelApi). Not changing this.
[HIGH] DNS-rebinding SSRF — unchanged: tracked in Notifications: enforce Grafana egress restrictions (destination SSRF) #459 (needs Grafana-side egress enforcement, not a fleet-api pre-flight fix).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75185f6639

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T17:54:31Z

+			"smtpHost":     c.SMTP.Host,
+			"smtpPort":     c.SMTP.Port,
+			"smtpUsername": c.SMTP.Username,
+			"fromAddress":  c.SMTP.From,


Don't store SMTP server settings on email contact points

For SMTP channels, these per-channel smtpHost/smtpPort/credential fields are written into the Grafana email contact point, but Grafana OSS reads SMTP server configuration from [smtp]/GF_SMTP_*, not from the contact point; the Grafana email docs require SMTP to be configured in grafana.ini and list the email contact-point settings as addresses/single-email/template fields (https://grafana.com/docs/grafana/latest/alerting/configure-notifications/manage-contact-points/integrations/configure-email/). Since the bundled server/monitoring/grafana/grafana.ini has no [smtp] section, creating/testing an SMTP channel with a host/password here can appear saved while real email delivery still uses an unconfigured global SMTP transport and fails.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-15T17:54:31Z

+		return nil, ErrNotFound
+	}
+	sil.OrganizationID = orgID
+	gs := domainSilenceToGrafana(orgID, sil)


Preserve the silence creator on update

When updating an existing silence, UpdateSilenceRequest has no created_by, so sil.CreatedBy is still empty when this payload is built. For any edit of a silence created by a user, the POST replaces the Grafana silence with an empty createdBy, so subsequent reads lose the audit/display owner (and may be rejected if Grafana enforces the Alertmanager field); carry e.CreatedBy from the matched existing silence or stamp the updater before calling domainSilenceToGrafana.

Useful? React with 👍 / 👎.

illegalprime requested a review from a team as a code owner June 15, 2026 15:53

github-actions Bot assigned illegalprime Jun 15, 2026

github-actions Bot added the server label Jun 15, 2026

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

illegalprime mentioned this pull request Jun 15, 2026

notifications: preliminary UI work #306

Closed

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

illegalprime force-pushed the eden/notifications-3-authz branch from 507a4e5 to 3bd492f Compare June 15, 2026 17:37

illegalprime force-pushed the eden/notifications-4-server branch from 6cdd7d2 to 6655a15 Compare June 15, 2026 17:37

This was referenced Jun 15, 2026

Notifications: enforce Grafana egress restrictions (destination SSRF) #459

Open

notifications (1/5): proto definitions and generated code #452

Open

notifications (2/5): notification_history keyset listing #453

Open

illegalprime mentioned this pull request Jun 15, 2026

notifications (3/5): notification:read and notification:manage permissions #454

Open

illegalprime and others added 2 commits June 15, 2026 13:43

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

illegalprime force-pushed the eden/notifications-3-authz branch from 3bd492f to 66e55db Compare June 15, 2026 17:44

illegalprime force-pushed the eden/notifications-4-server branch from 6655a15 to 753a5b8 Compare June 15, 2026 17:44

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

Conversation

illegalprime commented Jun 15, 2026

Stack

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔐 Codex Security Review

Review Summary

Findings

[HIGH] Created notification channels are not used for real alert delivery

[HIGH] Destination validation is bypassable by DNS rebinding or redirects

[MEDIUM] TestChannel does not implement the request contract

[MEDIUM] Grafana admin password is exposed in process arguments

[MEDIUM] Grafana JSON error bodies are returned after incomplete redaction

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

illegalprime commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

illegalprime commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 15, 2026 •

edited

Loading

[MEDIUM] `TestChannel` does not implement the request contract