Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,10 @@ Vue 3 + TypeScript, Vite, Tailwind, Pinia. Alerting UI present (`src/views/Alert
Single-node Grafana Mimir with S3-compatible backend and multi-tenant Alertmanager. Containerfile, Makefile, docker-compose.yml + docker-compose.local.yml. `scripts/` contains Python helpers (`alert.py`, `alerting_config.py`) for manual testing.

**Alerting integration**:
- Backend (`backend/services/alerting/`) renders Alertmanager YAML from `AlertingConfig` models and pushes via `POST /api/v1/alerts` per tenant. Email templates are Go `html/template`-embedded, en/it locales, firing + resolved variants.
- Backend (`backend/services/alerting/`) holds one `AlertingConfigLayer` per organization in `alert_config_layers` (flat recipient-based shape: `enabled`, `email_recipients[]`, `webhook_recipients[]`, `telegram_recipients[]`, each recipient carries its own `severities[]`; email also `language` + `format`). The effective per-tenant Mimir YAML is the server-side merge of every layer from Owner down to the tenant (union dedup, additive-only). `/alerts/config` only ever returns the caller's own layer — the merged view is internal and never leaves the backend. Templates are Go `html/template`-embedded, en/it locales, firing + resolved variants; both languages ship with every tenant push and the renderer picks per email recipient via per-language dispatchers (`alert_<lang>.html|txt|subject`).
- Collect proxies systems to Alertmanager `alerts`/`silences` with `X-Scope-OrgID` from the authenticated system's org.
- Alertmanager webhooks resolved alerts back to collect `/api/alert_history`, which persists them scoped by `organization_id` (column on `alert_history`, populated from the DB via `system_key` lookup — never trusted from the payload).
- RBAC: `/alerts/config*` is gated on a dedicated `alerts` resource (`read:alerts` for GET, `manage:alerts` for POST/DELETE) — admin/super only. The list/silence endpoints (`/alerts`, `/alerts/history`, `/alerts/silences*`, `/alerts/activity/:fingerprint`, `/systems/:id/alerts*`) stay on `read:systems`/`manage:systems`. The cross-system `/alerts/silences*` set mirrors `/systems/:id/alerts/silences*` 1:1 — same backend `buildSystemAlertSilenceRequest` builds the Mimir payload, so a silence created via either route is interoperable with the other.

### 3.6 Proxy (`proxy/`)

Expand Down Expand Up @@ -370,8 +371,10 @@ Authoritative: `backend/openapi.yaml` (also `make docs` / redocly). High-level r
/api/users/* CRUD + avatar + import/export + password reset + suspend/reactivate
/api/systems/* CRUD + inventory + alerts + regenerate-secret + reachability + export
/api/applications/* CRUD + assign/unassign org + totals/summary/trend
/api/alerts, /api/alerts/{totals,trend,config} active alerts + config + aggregates
/api/filters/{systems,applications,users} UI filter aggregation
/api/alerts, /api/alerts/{totals,trend,stats,history,config} active alerts + config + aggregates + history
/api/alerts/silences/* cross-system silences (mute/unmute) — parallel to /systems/:id/alerts/silences
/api/alerts/activity/:fingerprint per-alert audit timeline (silence created/updated/removed)
/api/filters/{systems,applications,users,alerts} UI filter aggregation (alerts: static catalog + data-driven systems/severities/orgs)
/api/rebranding/* per-org per-product asset management
/api/organizations, /api/roles, /api/organization-roles metadata
/api/validators/vat/:entity_type VAT validation
Expand Down
53 changes: 45 additions & 8 deletions backend/cmd/apitool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,24 +145,61 @@ working user whose token reflects the real hierarchy.

## Push test alerts

`apitool` doesn't push alerts itself — they go straight to Mimir Alertmanager,
which is per-tenant by `X-Scope-OrgID`. The org's `logto_id` (visible via
`apitool list`) is the tenant ID:
`apitool` doesn't push alerts itself. Alerts **must go through the `collect`
service**, not directly to Mimir Alertmanager. Hitting Mimir on `:9009` directly
bypasses authentication, skips the server-side label enrichment
(`system_key`, `system_id`, `organization_*`, …) and annotation templating, and
is not how real appliances behave — the GET `/api/alerts` aggregation will not
show such alerts as expected.

`collect` authenticates the pushing **system** via HTTP Basic Auth
(`system_key:system_secret`) and injects `X-Scope-OrgID` itself from that
system's organization. So you push as a system, never as an org, and you never
send `X-Scope-OrgID` yourself.

`create-system` now prints both the `system_key` and the full `system_secret`
token (`my_<public>.<secret>`). The secret is only ever returned at creation
time — save it. Use `apitool create-system ... ` from the hierarchy example
above to get a system.

**The system must be registered before `collect` will accept it.** `collect`
rejects appliances that have a valid secret on file but never completed
`POST /api/systems/register` (you'd get `401 invalid system credentials` on the
push otherwise). Register once with the `system_secret`:

```bash
SYSTEM_KEY='NETH-...'
SYSTEM_SECRET='my_....'

# Public, unauthenticated endpoint — the secret is the credential.
curl -s -X POST "http://localhost:8080/api/systems/register" \
-H "Content-Type: application/json" \
-d "{\"system_secret\":\"$SYSTEM_SECRET\"}"
```

Then push the alert through `collect`:

```bash
ORG=$(./apitool list | awk '/TestCust1/ {print $3; exit}')
NOW=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
END=$(date -u -v+1H +"%Y-%m-%dT%H:%M:%SZ")

curl -s -X POST "http://localhost:9009/alertmanager/api/v2/alerts" \
-H "X-Scope-OrgID: $ORG" -H "Content-Type: application/json" \
# collect: localhost:18081 in the docker-compose full stack
curl -s -X POST \
"http://localhost:18081/api/services/mimir/alertmanager/api/v2/alerts" \
-u "$SYSTEM_KEY:$SYSTEM_SECRET" \
-H "Content-Type: application/json" \
-d "[{\"startsAt\":\"$NOW\",\"endsAt\":\"$END\",
\"labels\":{\"alertname\":\"HighCPU\",\"severity\":\"critical\",
\"system_key\":\"NETH-...\",\"instance\":\"test\"},
\"instance\":\"test\"},
\"annotations\":{\"summary\":\"Test alert\"}}]"
```

Then verify aggregation through `/api/alerts/totals` with each role's token.
`collect` enriches the payload with the authoritative `system_key`,
`system_id`, `system_name`, `organization_*` labels itself — don't set them in
the request; any client-supplied values are overridden.

Then verify aggregation through `/api/alerts/totals` (and `/api/alerts`) with
each role's token.

## Known quirks

Expand Down
21 changes: 12 additions & 9 deletions backend/cmd/apitool/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -499,32 +499,35 @@ func (c *Client) ListUsersInOrg(orgID string) ([]struct{ LogtoID, Email string }
return out, nil
}

// CreateSystem creates a system under an org. Returns the system_key.
func (c *Client) CreateSystem(name, orgID string) (string, error) {
// CreateSystem creates a system under an org. Returns the system_key and the
// full system_secret token (my_<public>.<secret>), the latter only ever
// returned by the API at creation time.
func (c *Client) CreateSystem(name, orgID string) (key, secret string, err error) {
payload := map[string]interface{}{
"name": name,
"organization_id": orgID,
}
r, err := c.api("POST", "/systems", payload)
if err != nil {
return "", err
return "", "", err
}
if r.status >= 400 {
return "", fmt.Errorf("create system failed (%d): %s", r.status, r.body)
return "", "", fmt.Errorf("create system failed (%d): %s", r.status, r.body)
}
var resp struct {
Data struct {
SystemKey string `json:"system_key"`
ID string `json:"id"`
SystemKey string `json:"system_key"`
SystemSecret string `json:"system_secret"`
ID string `json:"id"`
} `json:"data"`
}
if err := json.Unmarshal(r.body, &resp); err != nil {
return "", err
return "", "", err
}
if resp.Data.SystemKey == "" {
return "", fmt.Errorf("no system_key in response: %s", r.body)
return "", "", fmt.Errorf("no system_key in response: %s", r.body)
}
return resp.Data.SystemKey, nil
return resp.Data.SystemKey, resp.Data.SystemSecret, nil
}

func (c *Client) ResetPassword(userID, password string) error {
Expand Down
10 changes: 7 additions & 3 deletions backend/cmd/apitool/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Usage:
still has child orgs/users; clean those out first.

apitool create-system --org=<name> <system-name>
Create a system under a customer org. Prints the system_key.
Create a system under a customer org. Prints system_key + system_secret.

apitool cleanup-orphans --org=<name>
Soft-delete every user listed in <org> whose email is NOT in registry.
Expand Down Expand Up @@ -482,11 +482,15 @@ func cmdCreateSystem(args []string) error {
if err != nil {
return err
}
systemKey, err := client.CreateSystem(systemName, org.LogtoID)
systemKey, systemSecret, err := client.CreateSystem(systemName, org.LogtoID)
if err != nil {
return err
}
fmt.Printf("Created system %q in org %q (system_key=%s)\n", systemName, orgKey, systemKey)
fmt.Printf("Created system %q in org %q\n", systemName, orgKey)
fmt.Printf(" system_key=%s\n", systemKey)
fmt.Printf(" system_secret=%s\n", systemSecret)
fmt.Printf("\nPush alerts as this system (Basic Auth) through collect:\n")
fmt.Printf(" curl -u '%s:%s' http://localhost:18081/api/services/mimir/alertmanager/api/v2/alerts ...\n", systemKey, systemSecret)
return nil
}

Expand Down
41 changes: 41 additions & 0 deletions backend/database/migrations/023_add_alert_activity.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
-- Migration 023: Add alert_activity table
-- Append-only timeline of operator actions performed on a single alert
-- (silence created/updated/deleted). The UI renders this in the alert-detail
-- drawer ("Activity" section). Per-alert scoped via (organization_id,
-- fingerprint). Operator "notes" are not a separate concept: they are stored
-- as the comment of the silence, so a note edit is recorded here as a
-- silence_updated event whose details payload includes the comment change.

CREATE TABLE IF NOT EXISTS alert_activity (
id BIGSERIAL PRIMARY KEY,

organization_id VARCHAR(255) NOT NULL,
fingerprint VARCHAR(255) NOT NULL,

-- Action identifier. Open-ended so new event types don't require a schema
-- change; current values: 'silenced', 'silence_updated', 'unsilenced'.
action VARCHAR(50) NOT NULL,

-- Actor identity (denormalized for cheap render). user_id may be NULL for
-- system-driven events (none today, kept for future).
actor_user_id VARCHAR(255),
actor_name VARCHAR(255),

-- Optional silence reference, set on silence-related actions so the
-- DELETE handler can resolve the originating fingerprint without a
-- separate mapping table.
silence_id VARCHAR(255),

-- Free-form structured payload (e.g. comment, end_at, note excerpt).
details JSONB NOT NULL DEFAULT '{}',

created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

COMMENT ON TABLE alert_activity IS 'Append-only audit timeline of operator actions on individual alerts';
COMMENT ON COLUMN alert_activity.fingerprint IS 'Alertmanager fingerprint (hex hash of labels) of the alert the action targets';
COMMENT ON COLUMN alert_activity.action IS 'Event kind: silenced | silence_updated | unsilenced. Note changes are silence_updated events.';
COMMENT ON COLUMN alert_activity.silence_id IS 'Silence ID associated with the event. Lets DELETE silence resolve the fingerprint.';

CREATE INDEX IF NOT EXISTS idx_alert_activity_org_fp_created_at ON alert_activity(organization_id, fingerprint, created_at DESC);
CREATE INDEX IF NOT EXISTS idx_alert_activity_silence_lookup ON alert_activity(organization_id, silence_id) WHERE silence_id IS NOT NULL;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DROP TABLE IF EXISTS alert_activity;
49 changes: 49 additions & 0 deletions backend/database/migrations/024_add_alert_config_layers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
-- Migration 024: alert_config_layers
--
-- One row per organization carrying that org's alerting configuration as a
-- flat recipient-based JSON blob. The effective per-tenant Mimir YAML is
-- the server-side merge of all rows walking up the org hierarchy from the
-- tenant to the Owner:
--
-- Owner.layer → Distributor.layer → Reseller.layer → Customer.layer
--
-- The merge is internal — /alerts/config exposes only the caller's own
-- row, never the merged effective view or anyone else's row.
--
-- Merge rules (additive-only for security-relevant fields):
-- - bool channel toggles (enabled.{email,webhook,telegram}): OR. A
-- descendant cannot disable a channel an ancestor enabled. Non-Owner
-- layers cannot store an explicit false (normalised to null on save).
-- - recipient lists (email/webhook/telegram): union with stable dedup.
-- Dedup keys: email→address, webhook→url, telegram→(bot_token,chat_id).
-- - per-recipient severities[]: union; if any contributor uses [] ("all
-- severities") the merged copy widens back to [].
--
-- Mimir sees a flat YAML per tenant; the layered model is server-internal
-- and invisible to Alertmanager.

CREATE TABLE IF NOT EXISTS alert_config_layers (
organization_id VARCHAR(255) PRIMARY KEY,

-- Serialized AlertingConfigLayer (Go struct):
-- {
-- "enabled": {"email": *bool, "webhook": *bool, "telegram": *bool},
-- "email_recipients": [{address, severities[], language, format}],
-- "webhook_recipients": [{name, url, severities[]}],
-- "telegram_recipients": [{bot_token, chat_id, severities[]}]
-- }
-- Channel toggles are tri-state (null = "no opinion at this layer,
-- inherit from above"). Per-recipient severities=[] means "all".
config_json JSONB NOT NULL,

-- Audit fields. updated_by_user_id stores the logto_id of the user who
-- last saved this layer. updated_by_name is denormalised for cheap UI
-- rendering of "who set this".
updated_by_user_id VARCHAR(255),
updated_by_name VARCHAR(255),
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

COMMENT ON TABLE alert_config_layers IS 'Per-organization alerting config layer. Effective Mimir YAML for a tenant is the merge of all layers from Owner down to that tenant; merge is server-side only and never exposed via API.';
COMMENT ON COLUMN alert_config_layers.config_json IS 'Serialized AlertingConfigLayer: { enabled:{email,webhook,telegram}, email_recipients[], webhook_recipients[], telegram_recipients[] }. Each recipient carries its own severities[]; email recipients additionally carry language+format. Channel toggles are nullable tri-state.';
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DROP TABLE IF EXISTS alert_config_layers;
Loading
Loading