Skip to content

feat(monitoring)!: offload Cribl Stream to the homelab — laptop runs Edge + OTEL only#272

Merged
JacobPEvans-personal merged 2 commits into
mainfrom
feat/offload-laptop-stream
Jun 12, 2026
Merged

feat(monitoring)!: offload Cribl Stream to the homelab — laptop runs Edge + OTEL only#272
JacobPEvans-personal merged 2 commits into
mainfrom
feat/offload-laptop-stream

Conversation

@JacobPEvans-personal

Copy link
Copy Markdown
Member

Why

The laptop's cribl-stream-standalone is the heavy processor that chronically wedges the orbstack node (Ready/NotReady flapping under load). The homelab side of the replacement path is fully merged: Cribl S2S over TCP 10300 through an HAProxy frontend into the homelab Cribl Stream workers, whose S2S input passes events straight through to Splunk HEC (dryvist/terraform-proxmox#424, dryvist/ansible-proxmox-apps#393). #269 already ships bifrost logs over that path. This PR completes the cutover and removes the laptop Stream.

What changed

Edge takes over the retired Stream's duties (k8s/monitoring/cribl-edge-standalone/)

  • New inputs: in_otel (OTLP gRPC :4317, fed by the in-cluster otel-collector) and in_hec (Splunk HEC :8088; host producers keep POSTing to NodePort 30088 via the new cribl-edge-standalone-hec service — same port the Stream exposed).
  • New force-splunk-meta output-conditioning pipeline on proxmox-stream: derives Splunk index/sourcetype from datatype and masks emails/IPv4s on every event before it leaves the laptop. Ported from the Stream with one change — the evals are conditional (index || …) so inputs that pre-stamp metadata (bifrost, index=llm) pass through untouched.
  • proxmox-stream (S2S :10300) is now the default and only output; stream-hec is gone; PQ bumped to 3GB (sized to the 5Gi PVC).

Repointed consumers

  • otel-collector exporter → cribl-edge-standalone:4317.
  • pipeline-heartbeat probes the Edge health API (:9420).
  • Network policies: Edge egress is port-only :10300 (destination is outside the cluster; the homelab address never appears in this repo), new allow-edge-standalone-data-ingress (4317 from OTEL, 8088 port-only for the NodePort, 9420 from the heartbeat), OTEL egress → Edge. The three allow-stream-* policies are deleted. This also fixes a latent gap: feat(cribl-edge): ship bifrost pod logs to the homelab Stream (index=llm) #269's S2S egress was never allowed by the old policy, so the bifrost output was blocked by default-deny.

Removed

  • k8s/monitoring/cribl-stream-standalone/ (StatefulSet, services incl. UI :30900, PDB, config), its kustomization entry, deploy.sh rollout + cribl-stream-admin secret, and the copilot-pack secret (the pack never actually installed on the live system — flagging here: if GitHub Copilot usage collection is wanted later, port the pack to the homelab Stream via its ansible role).
  • splunk-hec-config stays — heartbeat-splunk still probes Splunk HEC directly.

Tests/docs: stream-dependent tests retargeted to the Edge or deleted where they fundamentally required the local Stream (details in commit); architecture diagrams and docs updated. Full pre-commit suite + kustomize build + unit/manifest tests pass (60 passed, 30 pre-existing skips).

Deploy gate (important)

Merging is safe — nothing auto-deploys. Run make deploy only after the homelab path is live:

  1. terraform-proxmox full apply (firewall security groups incl. 10300) — pending, gated on a credentialed session
  2. ansible-proxmox-apps --tags haproxy,cribl_stream deploy

If deployed early, Edge telemetry buffers in its 3GB persistent queue rather than being lost, but don't.

Verification plan (post-deploy)

  • Sentinel claude/gemini event reaches Splunk via the homelab path; OTEL-sourced data advances.
  • kubectl -n monitoring get pods: no cribl-stream-standalone; Edge/OTEL Running; node stays Ready under load.
  • Host HEC POST to :30088 still lands in Splunk with correct index/sourcetype.
  • First-run watch item: test_cribl_edge_events_flowing assumes the Edge emits the same _raw stats lines the Stream did.

Refs: dryvist/terraform-proxmox#424, dryvist/ansible-proxmox-apps#393

🤖 Generated with Claude Code

…Edge + OTEL only

Remove the laptop cribl-stream-standalone StatefulSet (the heavy processor
that chronically wedged the orbstack node) and make the existing proxmox-stream
S2S output the Edge's sole egress:

- Edge gains the retired Stream's duties: in_otel (OTLP gRPC 4317, fed by the
  otel-collector) and in_hec (Splunk HEC 8088, host producers via NodePort
  30088) inputs, plus the force-splunk-meta pipeline (index/sourcetype from
  datatype + PII masking) attached to the proxmox-stream output so metadata is
  stamped at the source — the homelab S2S input is a passthrough by contract.
  The eval is now conditional so inputs that pre-stamp metadata (bifrost) pass
  through untouched.
- otel-collector exports to the Edge; pipeline-heartbeat probes the Edge
  health API; network policies updated (port-only 10300 egress — the homelab
  address never appears in-repo — and a data-ingress policy for 4317/8088).
- deploy.sh/Makefile drop the Stream rollout, its admin secret, and the dead
  copilot-pack secret (the pack never installed on the live system).
- Tests retargeted to the Edge or deleted where they fundamentally required
  the local Stream; docs/diagrams updated to the new flow.

Flow: host producers + packs + OTEL -> Edge (stamp + mask, PQ) -> S2S 10300 ->
homelab HAProxy -> Cribl Stream -> Splunk HEC.

DEPLOY GATE: merge is safe (nothing auto-deploys), but run `make deploy` only
after the homelab side is live (terraform firewall apply + ansible
haproxy/cribl_stream deploy), or Edge telemetry buffers in its 3GB PQ.

Refs: dryvist/terraform-proxmox#424, dryvist/ansible-proxmox-apps#393

Assisted-by: Claude:claude-fable-5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant