feat(ingress): per-workload CF ingress + simplify apps to web-nvidia-smi demo by posix4e · Pull Request #134 · devopsdefender/dd

posix4e · 2026-04-18T21:20:18Z

Summary

Adds optional `expose: {hostname_label, port}` on individual boot workloads. dd-agent collects those into `DD_EXTRA_INGRESS`, forwards on `/register`, CP prepends them to the cloudflared ingress and provisions matching CNAMEs.
Simplifies `apps/` from podman+ollama+openclaw to podman+`web-nvidia-smi` — one focused demo that proves podman, GPU passthrough, and the new ingress path end-to-end. Ollama+openclaw move out of this repo (next PR: slopandmop).
`web-nvidia-smi` serves `nvidia-smi` output on `gpu..devopsdefender.com` via a tiny nc loop in an `nvidia/cuda:12.6.1-base-ubuntu22.04` container.
Preview agent VM keeps its registration-smoke-test role but drops the CPU-ollama workload.

Scope boundary

Boot-time exposure only. Runtime `/deploy` exposure for POSTed workloads (e.g. anything slopandmop ships at runtime) is a follow-up — those workloads still run, they're just not auto-routed to a public hostname yet.

Test plan

`cargo fmt && cargo check && cargo test` pass locally
Run `./apps/_infra/local-agents.sh "" https://app.devopsdefender.com\` on tdx2 with DD_PAT + DD_ITA_API_KEY
`virsh start dd-local-prod`; `virsh console` shows ITA mint, register succeeds, CP log shows `extra_ingress` entry
`curl https://.devopsdefender.com/` → dashboard (no regression)
`curl https://gpu..devopsdefender.com/` → `nvidia-smi` text table
Preview agent still registers against a PR CP and shows up in the fleet

🤖 Generated with Claude Code

…smi demo Agent VMs can now declare `expose: {hostname_label, port}` on individual boot workloads; dd-agent forwards those on /register, CP prepends them to the cloudflared ingress alongside the default dashboard rule, and CF provisions matching CNAMEs. Each entry becomes a public hostname `<label>.<agent-hostname>` → `localhost:<port>`. dd's apps/ example collapses from podman+ollama+openclaw down to podman+web-nvidia-smi — one focused demo that proves podman, GPU passthrough, and the new ingress path end-to-end. Ollama and openclaw move out of this repo; they'll land in slopandmop as a self-contained example where they belong. Preview agent VM keeps its role as the registration smoke test against per-PR CPs but drops its CPU-ollama workload. Prod agent VM serves `gpu.<agent-host>.devopsdefender.com` with the container's `nvidia-smi` output. Boot-time exposure only in this PR. Runtime /deploy exposure for POSTed workloads (e.g. anything slopandmop ships at runtime) is a follow-up — those workloads still run, they're just not auto-routed to a public hostname yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-18T21:24:13Z

DD preview ready

URL: https://pr-134.devopsdefender.com

Browser login: paste gh auth token output at https://pr-134.devopsdefender.com/auth/pat

CLI / curl: curl -H "Authorization: Bearer $(gh auth token)" https://pr-134.devopsdefender.com/

Register endpoint for a local agent: wss://pr-134.devopsdefender.com/register

The prior shape — a JSON array substituted into `"DD_EXTRA_INGRESS=${DD_EXTRA_INGRESS}"` — closed the outer env string at the first embedded `"`, producing invalid JSON that broke `jq -c .`: jq: parse error: Invalid numeric literal at line 21, column 40 Seen on the dd-local-prod relaunch pipeline immediately after #134 merged (the failing job was in main's Release cascade). Switches the wire format to comma-separated `label:port` pairs (`gpu:8081` or `gpu:8081,web:9000`) and adds unit tests covering the parser edge cases. HTTP request body from agent → CP /register still carries the structured JSON shape — only the env-var-to-env-var hop changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #134 wired `expose` at boot time — the ingress rules baked into the agent VM's config.iso got published when CP first created the tunnel. This extends that to the runtime path: when a workload POSTed to dd-agent's /deploy declares `expose: {hostname_label, port}`, the agent now calls the CP's new /ingress/replace endpoint with the merged (boot + runtime) extras list, and CP re-PUTs the tunnel config + upserts a CNAME for the new hostname. Wire-level summary: - cf.rs — extract `apply_ingress()` used by both `create()` (at register) and a new public `update_ingress()` (at runtime). The existing tunnel id + token stay stable. - cp.rs — new endpoint POST /ingress/replace. PAT-authenticated, looks up the agent in the store by agent_id, re-PUTs the tunnel config, updates the store's `extras` field for the agent. - collector::Agent — gains `tunnel_id` + `extras` fields, preserved across /health scrapes so the collector doesn't clobber them. - agent.rs — stores `agent_id` from the register bootstrap, holds a live `Arc<RwLock<Vec<(String, u16)>>>` for the merged extras, hooks into /deploy to push updates. Soft-fails — workload stays running even if the ingress update fails; only public reachability is affected. Opens the runtime path slopandmop needs: POST openclaw to an agent, the agent asks CP to route `openclaw.<agent-host>` → localhost:port, CF picks up the ingress config within seconds, browser hits the URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#136) The prior shape — a JSON array substituted into `"DD_EXTRA_INGRESS=${DD_EXTRA_INGRESS}"` — closed the outer env string at the first embedded `"`, producing invalid JSON that broke `jq -c .`: jq: parse error: Invalid numeric literal at line 21, column 40 Seen on the dd-local-prod relaunch pipeline immediately after #134 merged (the failing job was in main's Release cascade). Switches the wire format to comma-separated `label:port` pairs (`gpu:8081` or `gpu:8081,web:9000`) and adds unit tests covering the parser edge cases. HTTP request body from agent → CP /register still carries the structured JSON shape — only the env-var-to-env-var hop changes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #134 wired `expose` at boot time — the ingress rules baked into the agent VM's config.iso got published when CP first created the tunnel. This extends that to the runtime path: when a workload POSTed to dd-agent's /deploy declares `expose: {hostname_label, port}`, the agent now calls the CP's new /ingress/replace endpoint with the merged (boot + runtime) extras list, and CP re-PUTs the tunnel config + upserts a CNAME for the new hostname. Wire-level summary: - cf.rs — extract `apply_ingress()` used by both `create()` (at register) and a new public `update_ingress()` (at runtime). The existing tunnel id + token stay stable. - cp.rs — new endpoint POST /ingress/replace. PAT-authenticated, looks up the agent in the store by agent_id, re-PUTs the tunnel config, updates the store's `extras` field for the agent. - collector::Agent — gains `tunnel_id` + `extras` fields, preserved across /health scrapes so the collector doesn't clobber them. - agent.rs — stores `agent_id` from the register bootstrap, holds a live `Arc<RwLock<Vec<(String, u16)>>>` for the merged extras, hooks into /deploy to push updates. Soft-fails — workload stays running even if the ingress update fails; only public reachability is affected. Opens the runtime path slopandmop needs: POST openclaw to an agent, the agent asks CP to route `openclaw.<agent-host>` → localhost:port, CF picks up the ingress config within seconds, browser hits the URL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #134 wired `expose` at boot time — the ingress rules baked into the agent VM's config.iso got published when CP first created the tunnel. This extends that to the runtime path: when a workload POSTed to dd-agent's /deploy declares `expose: {hostname_label, port}`, the agent now calls the CP's new /ingress/replace endpoint with the merged (boot + runtime) extras list, and CP re-PUTs the tunnel config + upserts a CNAME for the new hostname. Wire-level summary: - cf.rs — extract `apply_ingress()` used by both `create()` (at register) and a new public `update_ingress()` (at runtime). The existing tunnel id + token stay stable. - cp.rs — new endpoint POST /ingress/replace. PAT-authenticated, looks up the agent in the store by agent_id, re-PUTs the tunnel config, updates the store's `extras` field for the agent. - collector::Agent — gains `tunnel_id` + `extras` fields, preserved across /health scrapes so the collector doesn't clobber them. - agent.rs — stores `agent_id` from the register bootstrap, holds a live `Arc<RwLock<Vec<(String, u16)>>>` for the merged extras, hooks into /deploy to push updates. Soft-fails — workload stays running even if the ingress update fails; only public reachability is affected. Opens the runtime path slopandmop needs: POST openclaw to an agent, the agent asks CP to route `openclaw.<agent-host>` → localhost:port, CF picks up the ingress config within seconds, browser hits the URL. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

posix4e temporarily deployed to staging April 18, 2026 21:21 — with GitHub Actions Inactive

posix4e merged commit c57d248 into main Apr 18, 2026
4 checks passed

posix4e mentioned this pull request Apr 18, 2026

fix(apps): DD_EXTRA_INGRESS text format avoids JSON-in-JSON quote bug #136

Merged

3 tasks

posix4e mentioned this pull request Apr 18, 2026

feat(ingress): runtime per-workload CF ingress on /deploy #137

Merged

5 tasks

posix4e deleted the feat/per-workload-ingress branch April 18, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingress): per-workload CF ingress + simplify apps to web-nvidia-smi demo#134

feat(ingress): per-workload CF ingress + simplify apps to web-nvidia-smi demo#134
posix4e merged 1 commit into
mainfrom
feat/per-workload-ingress

posix4e commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

posix4e commented Apr 18, 2026

Summary

Scope boundary

Test plan

Uh oh!

github-actions Bot commented Apr 18, 2026

DD preview ready

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant