feat(monitoring)!: offload Cribl Stream to the homelab — laptop runs Edge + OTEL only#272
Merged
Merged
Conversation
…Edge + OTEL only Remove the laptop cribl-stream-standalone StatefulSet (the heavy processor that chronically wedged the orbstack node) and make the existing proxmox-stream S2S output the Edge's sole egress: - Edge gains the retired Stream's duties: in_otel (OTLP gRPC 4317, fed by the otel-collector) and in_hec (Splunk HEC 8088, host producers via NodePort 30088) inputs, plus the force-splunk-meta pipeline (index/sourcetype from datatype + PII masking) attached to the proxmox-stream output so metadata is stamped at the source — the homelab S2S input is a passthrough by contract. The eval is now conditional so inputs that pre-stamp metadata (bifrost) pass through untouched. - otel-collector exports to the Edge; pipeline-heartbeat probes the Edge health API; network policies updated (port-only 10300 egress — the homelab address never appears in-repo — and a data-ingress policy for 4317/8088). - deploy.sh/Makefile drop the Stream rollout, its admin secret, and the dead copilot-pack secret (the pack never installed on the live system). - Tests retargeted to the Edge or deleted where they fundamentally required the local Stream; docs/diagrams updated to the new flow. Flow: host producers + packs + OTEL -> Edge (stamp + mask, PQ) -> S2S 10300 -> homelab HAProxy -> Cribl Stream -> Splunk HEC. DEPLOY GATE: merge is safe (nothing auto-deploys), but run `make deploy` only after the homelab side is live (terraform firewall apply + ansible haproxy/cribl_stream deploy), or Edge telemetry buffers in its 3GB PQ. Refs: dryvist/terraform-proxmox#424, dryvist/ansible-proxmox-apps#393 Assisted-by: Claude:claude-fable-5
Assisted-by: Claude:claude-fable-5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The laptop's
cribl-stream-standaloneis the heavy processor that chronically wedges the orbstack node (Ready/NotReady flapping under load). The homelab side of the replacement path is fully merged: Cribl S2S over TCP 10300 through an HAProxy frontend into the homelab Cribl Stream workers, whose S2S input passes events straight through to Splunk HEC (dryvist/terraform-proxmox#424, dryvist/ansible-proxmox-apps#393). #269 already ships bifrost logs over that path. This PR completes the cutover and removes the laptop Stream.What changed
Edge takes over the retired Stream's duties (
k8s/monitoring/cribl-edge-standalone/)in_otel(OTLP gRPC :4317, fed by the in-cluster otel-collector) andin_hec(Splunk HEC :8088; host producers keep POSTing to NodePort 30088 via the newcribl-edge-standalone-hecservice — same port the Stream exposed).force-splunk-metaoutput-conditioning pipeline onproxmox-stream: derives Splunk index/sourcetype fromdatatypeand masks emails/IPv4s on every event before it leaves the laptop. Ported from the Stream with one change — the evals are conditional (index || …) so inputs that pre-stamp metadata (bifrost,index=llm) pass through untouched.proxmox-stream(S2S :10300) is now the default and only output;stream-hecis gone; PQ bumped to 3GB (sized to the 5Gi PVC).Repointed consumers
cribl-edge-standalone:4317.allow-edge-standalone-data-ingress(4317 from OTEL, 8088 port-only for the NodePort, 9420 from the heartbeat), OTEL egress → Edge. The threeallow-stream-*policies are deleted. This also fixes a latent gap: feat(cribl-edge): ship bifrost pod logs to the homelab Stream (index=llm) #269's S2S egress was never allowed by the old policy, so the bifrost output was blocked by default-deny.Removed
k8s/monitoring/cribl-stream-standalone/(StatefulSet, services incl. UI :30900, PDB, config), its kustomization entry, deploy.sh rollout +cribl-stream-adminsecret, and the copilot-pack secret (the pack never actually installed on the live system — flagging here: if GitHub Copilot usage collection is wanted later, port the pack to the homelab Stream via its ansible role).splunk-hec-configstays — heartbeat-splunk still probes Splunk HEC directly.Tests/docs: stream-dependent tests retargeted to the Edge or deleted where they fundamentally required the local Stream (details in commit); architecture diagrams and docs updated. Full pre-commit suite + kustomize build + unit/manifest tests pass (60 passed, 30 pre-existing skips).
Deploy gate (important)
Merging is safe — nothing auto-deploys. Run
make deployonly after the homelab path is live:--tags haproxy,cribl_streamdeployIf deployed early, Edge telemetry buffers in its 3GB persistent queue rather than being lost, but don't.
Verification plan (post-deploy)
kubectl -n monitoring get pods: no cribl-stream-standalone; Edge/OTEL Running; node stays Ready under load.test_cribl_edge_events_flowingassumes the Edge emits the same_raw statslines the Stream did.Refs: dryvist/terraform-proxmox#424, dryvist/ansible-proxmox-apps#393
🤖 Generated with Claude Code