From eaf40d30d86cdc5b7932d680b42a4dcdecbddc3a Mon Sep 17 00:00:00 2001
From: Alex Newman <posix4e@gmail.com>
Date: Sat, 18 Apr 2026 18:12:22 +0000
Subject: [PATCH 1/2] ci: unify workloads + collapse deploy paths into Release
 + cleanup audit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Consolidates PRs #127, #131, #132 into one commit. Net -381 lines,
one "Release" workflow drives the whole fleet lifecycle.

Workload spec:
  apps/<name>/workload.{json,json.tmpl} becomes the single source of
  truth for every EE workload (cloudflared, dd-agent, dd-management,
  ollama, openclaw, nv, mount-models, podman-static, podman-bootstrap).
  Boot-time (config.iso / ee-config metadata) and runtime (/deploy)
  both bake from the same file.

Workflow topology:
  release.yml is the one entry point.
    pull_request      → build → deploy-preview      → dd-local-preview relaunch
    push main         → build → deploy-production   → dd-local-prod    relaunch
    push v*           → build only (versioned artifact, no deploy)
    workflow_dispatch → build → deploy-production   (rollback: release_tag input)
  .github/workflows/deploy-cp.yml is the reusable workflow both paths
  call, so preview CI exercises the exact code prod uses. It provisions
  the GCP CP VM, verifies health + /cp/attest MRTD + dashboard + STONITH,
  comments on PR, and cascades agent relaunch. The cascade is blocking:
  a release goes green only when the matching dd-local-{kind} VM
  re-registers with the freshly-deployed CP (proves "everything works").
  .github/actions/relaunch-agent/ is the composite action that SSHes
  into tdx2, runs dd-relaunch.sh, and polls /api/agents for the
  freshly-registered entry (5-min budget).

Deleted:
  .github/workflows/production-deploy.yml (folded into release.yml
    as deploy-production job).
  .github/workflows/local-agents.yml (manual-dispatch path gone;
    push a commit to trigger a relaunch).
  .github/workflows/retire-staging.yml (one-shot, already run,
    no dd_env=staging VMs exist).
  scripts/ entirely. gcp-deploy.sh inlined into deploy-cp.yml;
  dd-relaunch.sh + local-agents.sh moved to apps/_infra/ (host-side);
  ollama-deploy.sh, redeploy-workload.sh, workloads.sh deleted as
  unused after the refactor.

Cleanup audit:
  gcloud survey revealed dd-pr-121-1776434711 RUNNING in staging a
  day after PR #121 merged — branch not deleted, so pr-teardown.yml
  never fired and cleanup.yml only reaps TERMINATED. Added a
  reap-merged-pr-previews job to cleanup.yml that resolves each
  RUNNING pr-N VM's PR state via gh and tears down VM + CF tunnel
  + DNS CNAME when MERGED/CLOSED. Dropped reap-staging (dead),
  trimmed workflow_run trigger (Production Deploy is gone).

Deferred (blocked on easyenclave):
  easyenclave image family split — preview → easyenclave-staging,
  prod → easyenclave-stable. easyenclave has no stable GCP family
  and no qcow2 on v0.1.14 today, so nothing to point prod at. Plus
  tdx2 auto-refresh of the base qcow2 from easyenclave releases.
  Follow-up PR once easyenclave publishes stable images.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .github/actions/relaunch-agent/action.yml | 103 ++++++
 .github/workflows/cleanup.yml             | 152 ++++++---
 .github/workflows/deploy-cp.yml           | 362 ++++++++++++++++++++++
 .github/workflows/local-agents.yml        | 111 -------
 .github/workflows/production-deploy.yml   | 146 ---------
 .github/workflows/release.yml             | 297 ++++--------------
 .github/workflows/retire-staging.yml      |  98 ------
 README.md                                 |  12 +-
 apps/_infra/dd-relaunch.sh                |  52 ++++
 {scripts => apps/_infra}/local-agents.sh  |  11 +-
 apps/cloudflared/workload.json            |   8 +
 apps/dd-agent/workload.json.tmpl          |  22 ++
 apps/dd-management/workload.json.tmpl     |  29 ++
 apps/mount-models/workload.json           |   7 +
 apps/nv/workload.json                     |   7 +
 apps/ollama/workload.preview.json         |   7 +
 apps/ollama/workload.prod.json            |   7 +
 apps/openclaw/workload.json.tmpl          |   7 +
 apps/podman-bootstrap/workload.json       |   7 +
 apps/podman-static/workload.json          |   7 +
 scripts/dd-relaunch.sh                    |  53 ----
 scripts/gcp-deploy.sh                     | 177 -----------
 scripts/ollama-deploy.sh                  | 327 -------------------
 23 files changed, 814 insertions(+), 1195 deletions(-)
 create mode 100644 .github/actions/relaunch-agent/action.yml
 create mode 100644 .github/workflows/deploy-cp.yml
 delete mode 100644 .github/workflows/local-agents.yml
 delete mode 100644 .github/workflows/production-deploy.yml
 delete mode 100644 .github/workflows/retire-staging.yml
 create mode 100755 apps/_infra/dd-relaunch.sh
 rename {scripts => apps/_infra}/local-agents.sh (95%)
 create mode 100644 apps/cloudflared/workload.json
 create mode 100644 apps/dd-agent/workload.json.tmpl
 create mode 100644 apps/dd-management/workload.json.tmpl
 create mode 100644 apps/mount-models/workload.json
 create mode 100644 apps/nv/workload.json
 create mode 100644 apps/ollama/workload.preview.json
 create mode 100644 apps/ollama/workload.prod.json
 create mode 100644 apps/openclaw/workload.json.tmpl
 create mode 100644 apps/podman-bootstrap/workload.json
 create mode 100644 apps/podman-static/workload.json
 delete mode 100755 scripts/dd-relaunch.sh
 delete mode 100755 scripts/gcp-deploy.sh
 delete mode 100755 scripts/ollama-deploy.sh
diff --git a/.github/actions/relaunch-agent/action.yml b/.github/actions/relaunch-agent/action.yml
new file mode 100644
index 0000000..b449289
--- /dev/null
+++ b/.github/actions/relaunch-agent/action.yml
@@ -0,0 +1,103 @@
+name: Relaunch local TDX agent
+description: >-
+  SSH into the tdx2 host, recreate the matching dd-local-{kind} libvirt
+  domain against the given CP url (pulling apps/ from the given git ref),
+  then block until the agent re-registers with the CP. A release is "done"
+  only when this action succeeds end-to-end.
+
+inputs:
+  kind:
+    description: 'prod | preview — which libvirt domain to relaunch'
+    required: true
+  url:
+    description: 'CP URL the agent should register against (e.g. https://app.devopsdefender.com)'
+    required: true
+  ref:
+    description: 'git ref whose scripts/apps tree dd-relaunch.sh should check out on the host'
+    required: true
+  ssh-key:
+    description: 'Private SSH key for tdx2@host'
+    required: true
+  host:
+    description: 'Public host address of the tdx2 node'
+    required: true
+  dd-pat:
+    description: 'GitHub PAT the agent uses to talk to the CP'
+    required: true
+  ita-api-key:
+    description: 'Intel Trust Authority API key for attestation'
+    required: true
+
+runs:
+  using: composite
+  steps:
+    # CP must be reachable before we SSH — on PR pushes we race with
+    # Release's deploy-preview standing up the pr-N CP. /health is public.
+    - name: Wait for CP to be healthy
+      shell: bash
+      env:
+        URL: ${{ inputs.url }}
+      run: |
+        for i in $(seq 1 60); do
+          if curl -fsS --max-time 5 "$URL/health" >/dev/null 2>&1; then
+            echo "CP $URL healthy after ${i} attempts"
+            exit 0
+          fi
+          echo "  waiting for $URL... (${i}/60)"
+          sleep 10
+        done
+        echo "::error::CP $URL never came up within 10 min"
+        exit 1
+
+    # SSH in and relaunch the VM (destroy + redefine + start). Finishes
+    # in ~10 s — the baked config.iso's EE_BOOT_WORKLOADS drives the rest.
+    - name: ssh + relaunch VM
+      shell: bash
+      env:
+        SSH_KEY:        ${{ inputs.ssh-key }}
+        HOST:           ${{ inputs.host }}
+        DD_PAT:         ${{ inputs.dd-pat }}
+        DD_ITA_API_KEY: ${{ inputs.ita-api-key }}
+        KIND:           ${{ inputs.kind }}
+        URL:            ${{ inputs.url }}
+        REF:            ${{ inputs.ref }}
+      run: |
+        mkdir -p ~/.ssh
+        printf '%s\n' "$SSH_KEY" > ~/.ssh/id_ed25519
+        chmod 600 ~/.ssh/id_ed25519
+        ssh-keyscan -H "$HOST" >> ~/.ssh/known_hosts 2>/dev/null
+        ssh -o BatchMode=yes -o StrictHostKeyChecking=yes \
+            -i ~/.ssh/id_ed25519 "tdx2@$HOST" \
+            "DD_PAT='$DD_PAT' DD_ITA_API_KEY='$DD_ITA_API_KEY' /home/tdx2/src/dd/apps/_infra/dd-relaunch.sh '$KIND' '$URL' '$REF'"
+
+    # Block until the freshly-booted agent VM registers with the CP.
+    # This is the "I can see the local agent deployment worked" signal
+    # that gates the whole release. 5-min budget covers a cold VM boot
+    # (~60s) + cloudflared tunnel (~30s) + agent startup + register —
+    # plenty of headroom. Doesn't probe openclaw/ollama readiness —
+    # that first-boot pays a 30-min npm-install tax and isn't part
+    # of the release gate.
+    - name: Verify agent registered with CP
+      shell: bash
+      env:
+        URL:    ${{ inputs.url }}
+        DD_PAT: ${{ inputs.dd-pat }}
+        KIND:   ${{ inputs.kind }}
+      run: |
+        vm="dd-local-$KIND"
+        started_at=$(date -u +%Y-%m-%dT%H:%M:%SZ)
+        AUTH=(-H "Authorization: Bearer $DD_PAT")
+        for i in $(seq 1 30); do
+          host=$(curl -fsS --max-time 10 "${AUTH[@]}" "$URL/api/agents" 2>/dev/null \
+            | jq -r --arg since "$started_at" --arg vm "$vm" '
+                [.[] | select(.vm_name==$vm and .status=="healthy" and .last_seen > $since)]
+                | sort_by(.last_seen) | reverse | .[0].hostname // empty' 2>/dev/null || true)
+          if [ -n "$host" ] && [ "$host" != "null" ]; then
+            echo "$vm registered at https://$host"
+            exit 0
+          fi
+          echo "  waiting for $vm to register with $URL... (${i}/30)"
+          sleep 10
+        done
+        echo "::error::$vm never registered with $URL within 5 min"
+        exit 1
diff --git a/.github/workflows/cleanup.yml b/.github/workflows/cleanup.yml
index 989b304..923ee87 100644
--- a/.github/workflows/cleanup.yml
+++ b/.github/workflows/cleanup.yml
@@ -1,29 +1,34 @@
 name: Cleanup
 
-# Reap TERMINATED dd-{env}-* VMs. STONITH self-poweroff leaves the VM
-# in TERMINATED state — it uses no compute but clutters the inventory
-# and a long enough chain of deploys turns into pages of dead VMs.
+# Background safety net that reaps GCE VMs the primary cleanup paths
+# missed. Primary paths today:
 #
-# Two jobs run in parallel, one per environment, so a regression in
-# either auth/zone/project doesn't block the other. The cleanup is
-# idempotent: skip if nothing to reap.
+#   - STONITH:  dd-register deletes the old VM's CF tunnel on startup →
+#               old cloudflared exits → old dd-register poweroffs →
+#               TERMINATED. Happens on every deploy of the same env.
+#   - Teardown: pr-teardown.yml fires on branch-delete and deletes the
+#               VM + tunnel + DNS. Happens when a dev deletes the branch.
+#
+# Gaps this workflow covers:
+#   - TERMINATED VMs accumulate between STONITH and branch-delete.
+#   - A PR that's merged/closed but whose branch survives → the preview
+#     VM stays RUNNING forever, burning compute. reap-merged-pr-previews
+#     finds these and treats them like a branch-delete (VM + tunnel + DNS).
 #
 # Triggers:
 #   - workflow_dispatch (operator-initiated cleanup)
-#   - workflow_run completion of Release / Production Deploy (catch
-#     post-deploy zombies opportunistically)
+#   - workflow_run completion of Release (catch post-deploy zombies
+#     opportunistically; Release covers both preview and prod now)
 #   - schedule, every 6 hours (background safety net)
 
 on:
   workflow_dispatch:
   workflow_run:
-    workflows: ["Release", "Production Deploy"]
+    workflows: ["Release"]
     types: [completed]
   schedule:
     - cron: '0 */6 * * *'
 
-# Don't pile up identical reaps when several deploys land in quick
-# succession — one in-flight reap is enough.
 concurrency:
   group: dd-cleanup
   cancel-in-progress: false
@@ -31,11 +36,14 @@ concurrency:
 permissions:
   contents: read
 
+env:
+  GCP_ZONE: us-central1-c
+
 jobs:
-  # PR preview envs (dd_env=pr-*) accumulate during active PRs — every
-  # push STONITHs the old VM into TERMINATED. PR close runs
-  # pr-teardown.yml which deletes them, but between pushes they stack
-  # up. This job reaps them in place.
+  # PR preview envs (dd_env=pr-*) accumulate TERMINATED VMs during
+  # active PRs — every push STONITHs the old VM. Branch-delete reaps
+  # the matching VMs; between pushes they stack up. This reaps them
+  # in place.
   reap-pr-previews:
     runs-on: ubuntu-latest
     environment: staging
@@ -51,11 +59,7 @@ jobs:
       - name: Reap TERMINATED dd-pr-* VMs
         env:
           GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          GCP_ZONE: us-central1-c
         run: |
-          # gcloud filter regex: `~` matches against the value. Anchor
-          # to start so we don't accidentally match an env like
-          # "foo-pr-bar" in the future.
           DEAD=$(gcloud compute instances list \
             --project="$GCP_PROJECT_ID" \
             --filter='labels.devopsdefender=managed AND labels.dd_env~"^pr-" AND status=TERMINATED' \
@@ -69,29 +73,28 @@ jobs:
           gcloud compute instances delete $DEAD \
             --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet
 
-  reap-staging:
+  reap-production:
     runs-on: ubuntu-latest
-    environment: staging
+    environment: production
     permissions:
       contents: read
       id-token: write
     steps:
       - uses: google-github-actions/auth@v2
         with:
-          workload_identity_provider: 'projects/654815109728/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
-          service_account: 'easyenclave-staging-ci@eestaging.iam.gserviceaccount.com'
+          workload_identity_provider: 'projects/779946350556/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
+          service_account: 'easyenclave-production-ci@easyenclave.iam.gserviceaccount.com'
       - uses: google-github-actions/setup-gcloud@v2
-      - name: Reap TERMINATED dd-staging VMs
+      - name: Reap TERMINATED dd-production VMs
         env:
           GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          GCP_ZONE: us-central1-c
         run: |
           DEAD=$(gcloud compute instances list \
             --project="$GCP_PROJECT_ID" \
-            --filter="labels.devopsdefender=managed AND labels.dd_env=staging AND status=TERMINATED" \
+            --filter="labels.devopsdefender=managed AND labels.dd_env=production AND status=TERMINATED" \
             --format="value(name)")
           if [ -z "$DEAD" ]; then
-            echo "No TERMINATED dd-staging VMs to reap."
+            echo "No TERMINATED dd-production VMs to reap."
             exit 0
           fi
           echo "Reaping: $(echo "$DEAD" | tr '\n' ' ')"
@@ -99,32 +102,97 @@ jobs:
           gcloud compute instances delete $DEAD \
             --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet
 
-  reap-production:
+  # RUNNING pr-N VMs whose PR is merged or closed are leaked compute —
+  # neither STONITH (waits for a new deploy) nor pr-teardown.yml (waits
+  # for branch-delete) reaches them. This finds them and tears them
+  # down like a branch-delete would have: VM + CF tunnel + DNS CNAME.
+  reap-merged-pr-previews:
     runs-on: ubuntu-latest
-    environment: production
+    environment: staging
     permissions:
       contents: read
       id-token: write
+      pull-requests: read
     steps:
       - uses: google-github-actions/auth@v2
         with:
-          workload_identity_provider: 'projects/779946350556/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
-          service_account: 'easyenclave-production-ci@easyenclave.iam.gserviceaccount.com'
+          workload_identity_provider: 'projects/654815109728/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
+          service_account: 'easyenclave-staging-ci@eestaging.iam.gserviceaccount.com'
       - uses: google-github-actions/setup-gcloud@v2
-      - name: Reap TERMINATED dd-production VMs
+      - name: Reap RUNNING pr-N VMs whose PR is closed or merged
         env:
           GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          GCP_ZONE: us-central1-c
+          CF_API_TOKEN:   ${{ secrets.DD_CP_CF_API_TOKEN }}
+          CF_ACCOUNT_ID:  ${{ secrets.DD_CP_CF_ACCOUNT_ID }}
+          CF_ZONE_ID:     ${{ secrets.DD_CP_CF_ZONE_ID }}
+          DD_DOMAIN:      ${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
+          GH_TOKEN:       ${{ github.token }}
         run: |
-          DEAD=$(gcloud compute instances list \
+          # Unique set of pr-N envs currently RUNNING.
+          envs=$(gcloud compute instances list \
             --project="$GCP_PROJECT_ID" \
-            --filter="labels.devopsdefender=managed AND labels.dd_env=production AND status=TERMINATED" \
-            --format="value(name)")
-          if [ -z "$DEAD" ]; then
-            echo "No TERMINATED dd-production VMs to reap."
+            --filter='labels.devopsdefender=managed AND labels.dd_env~"^pr-" AND status=RUNNING' \
+            --format='value(labels.dd_env)' | sort -u)
+          if [ -z "$envs" ]; then
+            echo "No RUNNING dd-pr-* VMs to consider."
             exit 0
           fi
-          echo "Reaping: $(echo "$DEAD" | tr '\n' ' ')"
-          # shellcheck disable=SC2086
-          gcloud compute instances delete $DEAD \
-            --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet
+
+          for env in $envs; do
+            pr="${env#pr-}"
+            state=$(gh pr view "$pr" --repo "${{ github.repository }}" \
+              --json state --jq .state 2>/dev/null || echo "UNKNOWN")
+            if [ "$state" = "OPEN" ]; then
+              echo "pr-$pr still OPEN — leaving RUNNING VMs alone."
+              continue
+            fi
+            if [ "$state" = "UNKNOWN" ]; then
+              echo "::warning::could not resolve state for pr-$pr (gh pr view failed); leaving alone"
+              continue
+            fi
+            echo "pr-$pr is $state — tearing down preview env $env"
+
+            # VMs
+            vms=$(gcloud compute instances list \
+              --project="$GCP_PROJECT_ID" \
+              --filter="labels.devopsdefender=managed AND labels.dd_env=$env" \
+              --format='value(name)')
+            if [ -n "$vms" ]; then
+              echo "  deleting VMs: $(echo "$vms" | tr '\n' ' ')"
+              # shellcheck disable=SC2086
+              gcloud compute instances delete $vms \
+                --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet
+            fi
+
+            # CF tunnels — named `dd-{env}-{uuid}`.
+            resp=$(curl -fsS \
+              -H "Authorization: Bearer $CF_API_TOKEN" \
+              "https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/cfd_tunnel?is_deleted=false&per_page=200")
+            ids=$(echo "$resp" | jq -r --arg prefix "dd-$env-" \
+              '.result[] | select(.name | startswith($prefix)) | .id')
+            for id in $ids; do
+              echo "  deleting tunnel $id"
+              curl -fsS -X DELETE \
+                -H "Authorization: Bearer $CF_API_TOKEN" \
+                "https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/cfd_tunnel/$id/connections" \
+                >/dev/null || true
+              curl -fsS -X DELETE \
+                -H "Authorization: Bearer $CF_API_TOKEN" \
+                "https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/cfd_tunnel/$id" \
+                >/dev/null || echo "::warning::tunnel $id delete failed (may already be gone)"
+            done
+
+            # DNS CNAME for pr-N.{domain}
+            hostname="$env.$DD_DOMAIN"
+            record_id=$(curl -fsS \
+              -H "Authorization: Bearer $CF_API_TOKEN" \
+              "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/dns_records?type=CNAME&name=$hostname" \
+              | jq -r '.result[0].id // empty')
+            if [ -n "$record_id" ]; then
+              echo "  deleting CNAME $hostname ($record_id)"
+              curl -fsS -X DELETE \
+                -H "Authorization: Bearer $CF_API_TOKEN" \
+                "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/dns_records/$record_id" \
+                >/dev/null
+            fi
+          done
diff --git a/.github/workflows/deploy-cp.yml b/.github/workflows/deploy-cp.yml
new file mode 100644
index 0000000..d21265c
--- /dev/null
+++ b/.github/workflows/deploy-cp.yml
@@ -0,0 +1,362 @@
+name: Deploy CP
+
+# Reusable workflow: provision the CP TDX VM on GCP, wait for it to be
+# healthy, verify attestation + dashboard + STONITH, then cascade a
+# relaunch of the matching dd-local agent VM and block until it
+# re-registers. Called from release.yml's deploy-preview (PR path) and
+# deploy-production (main / dispatch path) with env-specific inputs —
+# both paths share this exact set of verification steps so every PR
+# exercises the prod deploy code.
+#
+# GitHub Actions allows ≤4 levels of workflow_call nesting. Today's
+# chain is `release.yml → deploy-cp.yml` (2). The agent-relaunch
+# cascade uses a composite action (same-job, no nesting) to keep
+# headroom for future wrapping.
+
+on:
+  workflow_call:
+    inputs:
+      env:
+        description: 'DD_ENV (e.g. "production", "pr-42")'
+        required: true
+        type: string
+      hostname:
+        description: 'Public hostname (e.g. app.devopsdefender.com)'
+        required: true
+        type: string
+      gcp_environment:
+        description: 'GitHub environment name — "production" | "staging"'
+        required: true
+        type: string
+      workload_identity_provider:
+        description: 'GCP Workload Identity Federation provider resource name'
+        required: true
+        type: string
+      service_account:
+        description: 'GCP service account email'
+        required: true
+        type: string
+      release_tag:
+        description: 'devopsdefender release tag to deploy (e.g. "latest", "pr-abc123")'
+        required: true
+        type: string
+      oauth_enabled:
+        description: 'Enable GitHub OAuth (prod only; previews use PAT)'
+        required: false
+        type: boolean
+        default: false
+      comment_on_pr:
+        description: 'Leave a PR comment with the preview URL'
+        required: false
+        type: boolean
+        default: false
+      relaunch_agent:
+        description: 'After CP deploy, cascade a relaunch of dd-local-{env} via SSH'
+        required: false
+        type: boolean
+        default: true
+      ref:
+        description: 'Git ref the tdx2 host should pull before relaunching the agent VM'
+        required: false
+        type: string
+        default: main
+
+concurrency:
+  group: deploy-cp-${{ inputs.env }}
+  cancel-in-progress: false
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    environment: ${{ inputs.gcp_environment }}
+    permissions:
+      contents: read
+      id-token: write
+      pull-requests: write
+    env:
+      DD_ENV: ${{ inputs.env }}
+      DD_HOSTNAME: ${{ inputs.hostname }}
+      GCP_ZONE: us-central1-c
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: google-github-actions/auth@v2
+        with:
+          workload_identity_provider: ${{ inputs.workload_identity_provider }}
+          service_account: ${{ inputs.service_account }}
+      - uses: google-github-actions/setup-gcloud@v2
+
+      - name: Create TDX VM (boots from easyenclave, fetches dd from GitHub releases)
+        env:
+          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
+          DD_DOMAIN: ${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
+          CLOUDFLARE_API_TOKEN: ${{ secrets.DD_CP_CF_API_TOKEN }}
+          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.DD_CP_CF_ACCOUNT_ID }}
+          CLOUDFLARE_ZONE_ID: ${{ secrets.DD_CP_CF_ZONE_ID }}
+          # OAuth only in environments that have these set (production).
+          # Empty placeholder values get stripped below before baking the
+          # workload spec, so dd-web disables /auth/github/* and serves
+          # /auth/pat only in those envs.
+          DD_GITHUB_CLIENT_ID: ${{ inputs.oauth_enabled && (vars.DD_GITHUB_CLIENT_ID || secrets.DD_GITHUB_CLIENT_ID) || '' }}
+          DD_GITHUB_CALLBACK_URL: ${{ inputs.oauth_enabled && vars.DD_GITHUB_CALLBACK_URL || '' }}
+          DD_GITHUB_CLIENT_SECRET: ${{ inputs.oauth_enabled && secrets.DD_GITHUB_CLIENT_SECRET || '' }}
+          DD_ITA_API_KEY: ${{ secrets.DD_ITA_API_KEY }}
+          DD_RELEASE_TAG: ${{ inputs.release_tag }}
+          EE_IMAGE_FAMILY: easyenclave-staging
+          EE_IMAGE_PROJECT: easyenclave
+          VM_MACHINE_TYPE: c3-standard-4
+          VM_DISK_SIZE: 10GB
+          DD_ITA_BASE_URL: https://api.trustauthority.intel.com
+          DD_ITA_JWKS_URL: https://portal.trustauthority.intel.com/certs
+          DD_ITA_ISSUER: https://portal.trustauthority.intel.com
+        run: |
+          set -euo pipefail
+
+          VM_NAME="dd-${DD_ENV}-$(date +%s)"
+          : "${DD_ITA_API_KEY:?set DD_ITA_API_KEY via secrets.DD_ITA_API_KEY}"
+          export DD_GITHUB_CALLBACK_URL="${DD_GITHUB_CALLBACK_URL:-https://${DD_HOSTNAME}/auth/github/callback}"
+
+          # Bake a workload template: envsubst ${VAR} placeholders and
+          # strip any "KEY=" env entries that ended up with empty values
+          # (e.g. OAuth creds in non-prod envs).
+          bake() {
+            case "$1" in
+              *.json.tmpl)
+                envsubst < "$1" \
+                  | jq -c 'if .env then .env |= map(select(test("^[^=]+=.+"))) else . end'
+                ;;
+              *.json)
+                jq -c . "$1"
+                ;;
+              *)
+                echo "::error::unknown workload file type: $1" >&2
+                return 1
+                ;;
+            esac
+          }
+
+          # Boot workloads come from apps/<name>/workload.{json,json.tmpl}.
+          # cloudflared fetches the binary onto PATH; dd-management runs
+          # devopsdefender in DD_MODE=management (CP + dashboard).
+          EE_BOOT_WORKLOADS=$({
+            bake apps/cloudflared/workload.json
+            bake apps/dd-management/workload.json.tmpl
+          } | jq -cs '.')
+
+          jq -c -n \
+            --arg workloads "$EE_BOOT_WORKLOADS" \
+            '{ "EE_BOOT_WORKLOADS": $workloads, "EE_OWNER": "devopsdefender" }' \
+            > /tmp/ee-config.json
+
+          gcloud compute instances create "$VM_NAME" \
+            --project="$GCP_PROJECT_ID" \
+            --zone="$GCP_ZONE" \
+            --machine-type="$VM_MACHINE_TYPE" \
+            --confidential-compute-type=TDX \
+            --maintenance-policy=TERMINATE \
+            --boot-disk-size="$VM_DISK_SIZE" \
+            --image-family="$EE_IMAGE_FAMILY" \
+            --image-project="$EE_IMAGE_PROJECT" \
+            --metadata-from-file=ee-config=/tmp/ee-config.json \
+            --labels=devopsdefender=managed,dd_env="${DD_ENV}" \
+            --tags=dd-management
+
+          rm -f /tmp/ee-config.json
+          echo "VM: $VM_NAME ($DD_HOSTNAME, release $DD_RELEASE_TAG)"
+
+      - name: Wait for agent health (streams serial console)
+        env:
+          AGENT_URL: https://${{ inputs.hostname }}
+          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
+        run: |
+          VM_NAME=$(gcloud compute instances list \
+            --project="$GCP_PROJECT_ID" \
+            --filter="labels.devopsdefender=managed AND labels.dd_env=${DD_ENV}" \
+            --format="value(name)" --sort-by=~creationTimestamp | head -1)
+          if [ -z "$VM_NAME" ]; then
+            echo "::error::no dd-${DD_ENV} VM found — gcp-deploy.sh must have failed"
+            exit 1
+          fi
+          echo "Watching VM: $VM_NAME (zone: $GCP_ZONE)"
+
+          LAST_LINES=0
+          for i in $(seq 1 60); do
+            # Stream serial console so boot failures (DHCP hang, release
+            # fetch error, cloudflared exit, etc.) are visible without
+            # shelling into GCP.
+            gcloud compute instances get-serial-port-output "$VM_NAME" \
+              --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" 2>/dev/null \
+              > /tmp/serial.log || true
+            TOTAL_LINES=$(wc -l < /tmp/serial.log)
+            if [ "$TOTAL_LINES" -gt "$LAST_LINES" ]; then
+              tail -n +$((LAST_LINES + 1)) /tmp/serial.log \
+                | sed 's/^/[serial] /'
+              LAST_LINES=$TOTAL_LINES
+            fi
+
+            if grep -qE "FATAL|Kernel panic|Invalid ELF header|/bin/sh: can't access tty" /tmp/serial.log; then
+              echo "::error::boot failed — serial log shows fatal pattern"
+              exit 1
+            fi
+
+            if curl -fsS "${AGENT_URL}/health" >/dev/null 2>&1; then
+              echo "Agent healthy at ${AGENT_URL}"
+              exit 0
+            fi
+            echo "  waiting for tunnel... (${i}/60)"
+            sleep 5
+          done
+          echo "::error::Agent not healthy within 5 minutes"
+          echo "--- final serial tail ---"
+          tail -80 /tmp/serial.log | sed 's/^/[serial] /'
+          exit 1
+
+      - name: Verify NEW VM via TDX attestation
+        env:
+          AGENT_URL: https://${{ inputs.hostname }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          # /cp/attest proves the freshly-deployed VM is serving the tunnel
+          # (stale tunnels point at old VMs that 404 on this endpoint).
+          # MRTD = 48 bytes at offset 184 in TDX quote v4; if non-zero,
+          # attestation actually worked.
+          NONCE=$(openssl rand -base64 16)
+          for attempt in $(seq 1 60); do
+            BODY=$(curl -sG -w '\n%{http_code}' \
+              -H "Authorization: Bearer ${GITHUB_TOKEN}" \
+              --data-urlencode "nonce=${NONCE}" \
+              "${AGENT_URL}/cp/attest" || echo $'\n000')
+            CODE=$(echo "$BODY" | tail -n1)
+            JSON=$(echo "$BODY" | sed '$d')
+            if [ "$CODE" = "200" ]; then
+              QUOTE_B64=$(echo "$JSON" | jq -r '.quote_b64 // empty')
+              if [ -n "$QUOTE_B64" ] && [ "$QUOTE_B64" != "null" ]; then
+                MRTD=$(echo "$QUOTE_B64" | base64 -d \
+                  | dd bs=1 skip=184 count=48 status=none | xxd -p -c 48)
+                if [ -n "$MRTD" ] && [ "$MRTD" != "$(printf '00%.0s' {1..48})" ]; then
+                  echo "NEW VM verified — MRTD: $MRTD"
+                  exit 0
+                fi
+                echo "  /cp/attest 200 but MRTD empty/zero, retrying... (${attempt}/60)"
+              else
+                echo "  /cp/attest 200 but no quote_b64, retrying... (${attempt}/60)"
+              fi
+            else
+              echo "  /cp/attest returned HTTP ${CODE}, retrying... (${attempt}/60)"
+            fi
+            sleep 10
+          done
+          echo "::error::/cp/attest never returned a valid quote — stale tunnel or new VM never came up"
+          exit 1
+
+      - name: Verify dashboard renders
+        env:
+          AGENT_URL: https://${{ inputs.hostname }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          # Fast sanity check on top of /cp/attest — proves dd-web is up
+          # and accepts the CI PAT's Bearer auth.
+          for attempt in $(seq 1 12); do
+            code=$(curl -s -o /dev/null -w '%{http_code}' \
+              -H "Authorization: Bearer ${GITHUB_TOKEN}" \
+              "${AGENT_URL}/" || echo 000)
+            if [ "$code" = "200" ]; then
+              echo "Dashboard renders (HTTP 200, attempt ${attempt})"
+              exit 0
+            fi
+            echo "  dashboard returned HTTP ${code}, retrying... (${attempt}/12)"
+            sleep 5
+          done
+          echo "::error::dashboard / never returned 200 (last HTTP ${code})"
+          exit 1
+
+      - name: Verify STONITH halted prior VM(s) in this env
+        env:
+          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
+        run: |
+          # dd-register STONITHs the old VM on startup by deleting its
+          # CF tunnel → old cloudflared exits → old dd-register poweroffs.
+          # Scoped to this env — per-PR previews are hostname-isolated,
+          # so this only reaps prior deploys of the same env.
+          NEW_VM=$(gcloud compute instances list \
+            --project="$GCP_PROJECT_ID" \
+            --filter="labels.devopsdefender=managed AND labels.dd_env=${DD_ENV}" \
+            --format="value(name)" --sort-by=~creationTimestamp | head -1)
+          echo "new VM: $NEW_VM"
+          SURVIVORS=""
+          for i in $(seq 1 24); do
+            SURVIVORS=$(gcloud compute instances list \
+              --project="$GCP_PROJECT_ID" \
+              --filter="labels.devopsdefender=managed AND labels.dd_env=${DD_ENV} AND status=RUNNING" \
+              --format="value(name)" \
+              | grep -vx "$NEW_VM" || true)
+            if [ -z "$SURVIVORS" ]; then
+              echo "STONITH verified — only $NEW_VM running in ${DD_ENV}"
+              exit 0
+            fi
+            echo "  still running besides $NEW_VM: $(echo "$SURVIVORS" | tr '\n' ' ')"
+            echo "  waiting for STONITH poweroff... (${i}/24)"
+            sleep 5
+          done
+          echo "::warning::STONITH-by-tunnel-delete timed out; force-deleting zombies:"
+          echo "$SURVIVORS"
+          # shellcheck disable=SC2086
+          gcloud compute instances delete $SURVIVORS \
+            --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet || true
+          echo "zombies reaped; $NEW_VM is the only ${DD_ENV} VM"
+
+      - name: Comment preview URL on PR
+        if: inputs.comment_on_pr && github.event_name == 'pull_request'
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const url = `https://${{ inputs.hostname }}`;
+            const body = [
+              `### DD preview ready`,
+              ``,
+              `**URL:** ${url}`,
+              ``,
+              `Browser login: paste \`gh auth token\` output at ${url}/auth/pat`,
+              ``,
+              `CLI / curl: \`curl -H "Authorization: Bearer $(gh auth token)" ${url}/\``,
+              ``,
+              `Register endpoint for a local agent: \`wss://${{ inputs.hostname }}/register\``,
+            ].join('\n');
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: context.issue.number,
+            });
+            const marker = '### DD preview ready';
+            const existing = comments.find(c => c.user.type === 'Bot' && c.body && c.body.includes(marker));
+            if (existing) {
+              await github.rest.issues.updateComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                comment_id: existing.id,
+                body,
+              });
+            } else {
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: context.issue.number,
+                body,
+              });
+            }
+
+      # Cascade a relaunch of the matching dd-local-{env} libvirt domain
+      # on the tdx2 host, then block on it registering with the freshly-
+      # deployed CP. This is the gate: a release is "done" only when the
+      # local agent is back online talking to the new CP.
+      - name: Relaunch dd-local-${{ inputs.env == 'production' && 'prod' || 'preview' }}
+        if: inputs.relaunch_agent
+        uses: ./.github/actions/relaunch-agent
+        with:
+          kind: ${{ inputs.env == 'production' && 'prod' || 'preview' }}
+          url: https://${{ inputs.hostname }}
+          ref: ${{ inputs.ref }}
+          ssh-key: ${{ secrets.DD_LOCAL_SSH_KEY }}
+          host: ${{ secrets.DD_LOCAL_HOST }}
+          dd-pat: ${{ secrets.GITHUB_TOKEN }}
+          ita-api-key: ${{ secrets.DD_ITA_API_KEY }}
diff --git a/.github/workflows/local-agents.yml b/.github/workflows/local-agents.yml
deleted file mode 100644
index 345dbc3..0000000
--- a/.github/workflows/local-agents.yml
+++ /dev/null
@@ -1,111 +0,0 @@
-name: Local Agents
-
-# Relaunches the local TDX agent VM on this user's host whenever the
-# corresponding CP gets new code:
-#   - Production Deploy success → reboot dd-local-prod against app.devopsdefender.com
-#   - Release success on a PR    → reboot dd-local-preview against pr-N.devopsdefender.com
-#
-# SSHs in via key auth to a public-IP host, then invokes
-# scripts/dd-relaunch.sh which handles the destroy/recreate cycle.
-
-on:
-  workflow_run:
-    workflows: ["Release", "Production Deploy"]
-    types: [completed]
-  # Every non-README push to main also fires a prod relaunch directly,
-  # so fixes to the relaunch / deploy scripts get exercised even when
-  # they don't cascade through Release → Production Deploy.
-  push:
-    branches: [main]
-    paths-ignore:
-      - "README.md"
-  workflow_dispatch:
-    inputs:
-      kind:
-        description: 'prod | preview'
-        required: true
-        default: 'prod'
-      cp_url:
-        description: 'CP URL (e.g. https://app.devopsdefender.com)'
-        required: true
-        default: 'https://app.devopsdefender.com'
-
-permissions:
-  contents: read
-  pull-requests: read
-
-concurrency:
-  group: local-agents-${{ github.event.workflow_run.name || github.event.inputs.kind }}
-  cancel-in-progress: false
-
-jobs:
-  relaunch:
-    if: |
-      github.event_name == 'workflow_dispatch'
-      || github.event_name == 'push'
-      || github.event.workflow_run.conclusion == 'success'
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - id: pick
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          EVENT: ${{ github.event_name }}
-          WF: ${{ github.event.workflow_run.name }}
-          BRANCH: ${{ github.event.workflow_run.head_branch }}
-          DISPATCH_KIND: ${{ github.event.inputs.kind }}
-          DISPATCH_URL: ${{ github.event.inputs.cp_url }}
-        run: |
-          if [ "$EVENT" = "workflow_dispatch" ]; then
-            echo "kind=$DISPATCH_KIND" >> "$GITHUB_OUTPUT"
-            echo "url=$DISPATCH_URL"   >> "$GITHUB_OUTPUT"
-          elif [ "$EVENT" = "push" ] || [ "$WF" = "Production Deploy" ]; then
-            # push-to-main on local-agent scripts, or a prod CP redeploy
-            # → relaunch dd-local-prod against the live prod CP.
-            echo "kind=prod" >> "$GITHUB_OUTPUT"
-            echo "url=https://app.devopsdefender.com" >> "$GITHUB_OUTPUT"
-          else
-            # Release on a PR: derive pr-N. Released-on-main returns
-            # no open PR → skip (Production Deploy will fire shortly).
-            pr=$(gh pr list --head "$BRANCH" --state open \
-                   --repo "${{ github.repository }}" \
-                   --json number --jq '.[0].number' 2>/dev/null || true)
-            if [ -n "$pr" ]; then
-              echo "kind=preview" >> "$GITHUB_OUTPUT"
-              echo "url=https://pr-$pr.devopsdefender.com" >> "$GITHUB_OUTPUT"
-            else
-              echo "kind=skip" >> "$GITHUB_OUTPUT"
-            fi
-          fi
-
-      # Step 1: SSH in and relaunch the VM (destroy + redefine + start).
-      # Finishes in ~10 s — doesn't need keepalives. Only does the
-      # libvirt operations that require host-level access.
-      - name: ssh + relaunch VM
-        if: steps.pick.outputs.kind != 'skip'
-        env:
-          SSH_KEY:        ${{ secrets.DD_LOCAL_SSH_KEY }}
-          HOST:           ${{ secrets.DD_LOCAL_HOST }}
-          DD_PAT:         ${{ secrets.GITHUB_TOKEN }}
-          DD_ITA_API_KEY: ${{ secrets.DD_ITA_API_KEY }}
-          KIND:           ${{ steps.pick.outputs.kind }}
-          URL:            ${{ steps.pick.outputs.url }}
-        run: |
-          mkdir -p ~/.ssh
-          printf '%s\n' "$SSH_KEY" > ~/.ssh/id_ed25519
-          chmod 600 ~/.ssh/id_ed25519
-          ssh-keyscan -H "$HOST" >> ~/.ssh/known_hosts 2>/dev/null
-          ssh -o BatchMode=yes -o StrictHostKeyChecking=yes \
-              -i ~/.ssh/id_ed25519 "tdx2@$HOST" \
-              "DD_PAT='$DD_PAT' DD_ITA_API_KEY='$DD_ITA_API_KEY' /home/tdx2/src/dd/scripts/dd-relaunch.sh '$KIND' '$URL'"
-
-      # Step 2: Deploy ollama / pull model / sample query. Pure HTTPS
-      # against the CP + the newly-registered agent's tunnel. Can take
-      # minutes (model pull) — no SSH to keep alive.
-      - name: deploy ollama (HTTPS)
-        if: steps.pick.outputs.kind != 'skip'
-        env:
-          DD_PAT: ${{ secrets.GITHUB_TOKEN }}
-          KIND:   ${{ steps.pick.outputs.kind }}
-          URL:    ${{ steps.pick.outputs.url }}
-        run: ./scripts/ollama-deploy.sh "$KIND" "$URL"
diff --git a/.github/workflows/production-deploy.yml b/.github/workflows/production-deploy.yml
deleted file mode 100644
index 8253179..0000000
--- a/.github/workflows/production-deploy.yml
+++ /dev/null
@@ -1,146 +0,0 @@
-name: Production Deploy
-
-# Two triggers:
-#   - workflow_run: fires automatically after a successful Release run
-#     on main. Release publishes the `latest` tag, then this workflow
-#     deploys it to production. Sequential by design — if Release fails,
-#     we don't promote.
-#   - workflow_dispatch: manual re-deploy of any existing tag (e.g. a
-#     known-good v0.2.0 after a bad main push).
-
-on:
-  workflow_run:
-    workflows: ["Release"]
-    types: [completed]
-    branches: [main]
-  workflow_dispatch:
-    inputs:
-      release_tag:
-        description: 'Release tag to deploy (e.g. latest, v0.2.0)'
-        required: false
-        default: 'latest'
-
-concurrency:
-  group: dd-production
-  cancel-in-progress: false
-
-env:
-  GCP_ZONE: us-central1-c
-  DD_ENV: production
-  DD_DOMAIN: ${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
-
-permissions:
-  contents: read
-
-jobs:
-  # dd-register STONITHs the old VM on startup by deleting its CF
-  # tunnel, so no explicit teardown here.
-  deploy:
-    # workflow_run fires on every Release completion, including
-    # failures. Only promote on success.
-    if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
-    runs-on: ubuntu-latest
-    environment: production
-    permissions:
-      contents: read
-      id-token: write
-    steps:
-      - uses: actions/checkout@v4
-      - uses: google-github-actions/auth@v2
-        with:
-          workload_identity_provider: 'projects/779946350556/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
-          service_account: 'easyenclave-production-ci@easyenclave.iam.gserviceaccount.com'
-      - uses: google-github-actions/setup-gcloud@v2
-
-      - name: Create TDX VM (boots from easyenclave, fetches dd from GitHub releases)
-        env:
-          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          CLOUDFLARE_API_TOKEN: ${{ secrets.DD_CP_CF_API_TOKEN }}
-          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.DD_CP_CF_ACCOUNT_ID }}
-          CLOUDFLARE_ZONE_ID: ${{ secrets.DD_CP_CF_ZONE_ID }}
-          DD_GITHUB_CLIENT_ID: ${{ vars.DD_GITHUB_CLIENT_ID || secrets.DD_GITHUB_CLIENT_ID }}
-          DD_GITHUB_CALLBACK_URL: ${{ vars.DD_GITHUB_CALLBACK_URL }}
-          DD_GITHUB_CLIENT_SECRET: ${{ secrets.DD_GITHUB_CLIENT_SECRET }}
-          # Intel Trust Authority — optional. When the secret is set,
-          # the CP mints its own ITA token and verifies incoming agent
-          # registrations. DD_ITA_REQUIRED stays false (default).
-          DD_ITA_API_KEY: ${{ secrets.DD_ITA_API_KEY }}
-          # workflow_run has no `inputs`; fall back to `latest`, which
-          # release.yml just (re)published on push to main.
-          DD_RELEASE_TAG: ${{ inputs.release_tag || 'latest' }}
-        run: scripts/gcp-deploy.sh
-
-      - name: Wait for agent health
-        env:
-          AGENT_URL: https://app.${{ env.DD_DOMAIN }}
-        run: |
-          for i in $(seq 1 60); do
-            curl -fsS "${AGENT_URL}/health" >/dev/null 2>&1 && {
-              echo "Agent healthy at ${AGENT_URL}"
-              exit 0
-            }
-            echo "  waiting for tunnel... (${i}/60)"
-            sleep 5
-          done
-          echo "::error::Agent not healthy within 5 minutes"
-          exit 1
-
-      - name: Verify dashboard renders
-        env:
-          AGENT_URL: https://app.${{ env.DD_DOMAIN }}
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          # New auth model: dashboard expects a GitHub PAT/GITHUB_TOKEN with
-          # access to the dd repo; the CP verifies against DD_OWNER via the
-          # standard /user + /repos/{owner}/dd fallback. No OIDC audience wiring.
-          for attempt in $(seq 1 12); do
-            code=$(curl -s -o /dev/null -w '%{http_code}' \
-              -H "Authorization: Bearer ${GITHUB_TOKEN}" \
-              "${AGENT_URL}/" || echo 000)
-            if [ "$code" = "200" ]; then
-              echo "Dashboard renders (HTTP 200, attempt ${attempt})"
-              exit 0
-            fi
-            echo "  dashboard returned HTTP ${code}, retrying... (${attempt}/12)"
-            sleep 5
-          done
-          echo "::error::dashboard / never returned 200 (last HTTP ${code})"
-          exit 1
-
-      - name: Verify STONITH halted prior production VM(s)
-        env:
-          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          GCP_ZONE: ${{ env.GCP_ZONE }}
-        run: |
-          # Mirror of release.yml's verify-step for PR previews. Give
-          # STONITH-by-tunnel-delete 120s to work on well-behaved old
-          # prod VMs (their cloudflared exits → dd-register poweroffs
-          # → GCP TERMINATED → cleanup.yml reaps). After the timeout,
-          # force-delete any remaining RUNNING prod VMs so we don't
-          # leak compute indefinitely.
-          NEW_VM=$(gcloud compute instances list \
-            --project="$GCP_PROJECT_ID" \
-            --filter="labels.devopsdefender=managed AND labels.dd_env=production" \
-            --format="value(name)" --sort-by=~creationTimestamp | head -1)
-          echo "new VM: $NEW_VM"
-          SURVIVORS=""
-          for i in $(seq 1 24); do
-            SURVIVORS=$(gcloud compute instances list \
-              --project="$GCP_PROJECT_ID" \
-              --filter="labels.devopsdefender=managed AND labels.dd_env=production AND status=RUNNING" \
-              --format="value(name)" \
-              | grep -vx "$NEW_VM" || true)
-            if [ -z "$SURVIVORS" ]; then
-              echo "STONITH verified — only $NEW_VM running in prod"
-              exit 0
-            fi
-            echo "  still running besides $NEW_VM: $(echo "$SURVIVORS" | tr '\n' ' ')"
-            echo "  waiting for STONITH poweroff... (${i}/24)"
-            sleep 5
-          done
-          echo "::warning::STONITH-by-tunnel-delete timed out in prod; force-deleting:"
-          echo "$SURVIVORS"
-          # shellcheck disable=SC2086
-          gcloud compute instances delete $SURVIVORS \
-            --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet || true
-          echo "zombies reaped; $NEW_VM is the only production VM"
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
index b238c85..7d55bbc 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -1,13 +1,18 @@
 name: Release
 
-# Build the static musl binary, publish it as a GitHub release asset,
-# and (on PRs) deploy it to an ephemeral per-PR preview. Replaces the
-# Docker build+push pipeline — easyenclave fetches the asset directly
-# via its github_release workload source.
+# One workflow to rule them all: build the static musl binary, publish
+# it as a GitHub release asset, and deploy it to either the PR preview
+# (per-PR ephemeral CP at pr-N.domain) or production (app.domain). Both
+# paths cascade into a relaunch of the matching dd-local agent VM on
+# the tdx2 host, and the Release run only goes green when that agent
+# re-registers with the freshly-deployed CP.
 #
-# PR:             pre-release tagged pr-{sha12}, then full PR-preview deploy.
-# push to main:   rolling `latest` release (no deploy — that's production)
-# push v* tag:    versioned release (no deploy)
+# Paths:
+#   pull_request        → build → deploy-preview → dd-local-preview relaunch
+#   push main           → build → deploy-production → dd-local-prod relaunch
+#   push v*             → build only (versioned release, no deploy)
+#   workflow_dispatch   → build → deploy-production (rollback tool;
+#                          release_tag input picks which tag to deploy)
 
 on:
   push:
@@ -18,10 +23,18 @@ on:
   pull_request:
     paths-ignore:
       - "README.md"
+  workflow_dispatch:
+    inputs:
+      release_tag:
+        description: 'Release tag to deploy to production (rollback tool; default: latest)'
+        required: false
+        default: 'latest'
 
 concurrency:
   group: dd-release-${{ github.ref }}
-  cancel-in-progress: true
+  # PR pushes cancel old runs. Main / tag / manual dispatch queue —
+  # we never want to cancel an in-progress prod deploy.
+  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
 
 permissions:
   contents: write
@@ -31,10 +44,6 @@ permissions:
   id-token: write
   attestations: write
 
-env:
-  DD_DOMAIN: ${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
-  GCP_ZONE: us-central1-c
-
 jobs:
   build:
     runs-on: ubuntu-latest
@@ -79,10 +88,7 @@ jobs:
       #  `https://github.com/devopsdefender/dd/.github/workflows/release.yml@<ref>`).
       # The attestation is stored on the repo's /attestations endpoint
       # and retrievable via `gh attestation verify` or the REST API.
-      #
-      # For now we're tracking (not enforcing) — the CP will eventually
-      # use this to verify that a registering agent's artifact came
-      # from this workflow. Skipped on fork PRs (they lack id-token).
+      # Skipped on fork PRs (they lack id-token).
       - name: Attest devopsdefender binary
         if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
         uses: actions/attest-build-provenance@v2
@@ -117,227 +123,52 @@ jobs:
             | tail -n +12 \
             | xargs -rI{} gh release delete {} --yes --cleanup-tag
 
-  # Deploy the freshly-built binary to the PR's ephemeral preview.
-  # Each PR gets its own env at pr-{N}.{domain} with DD_ENV=pr-{N}
-  # (hostname-isolated, no OAuth — browser access via /auth/pat).
-  # main/v* produce releases that production-deploy picks up separately.
+  # Per-PR ephemeral preview at pr-{N}.{domain}. No OAuth (browser login
+  # via /auth/pat). Cascades into dd-local-preview relaunch.
   deploy-preview:
     if: github.event_name == 'pull_request'
     needs: build
-    runs-on: ubuntu-latest
-    environment: staging
     permissions:
       contents: read
       id-token: write
       pull-requests: write
-    env:
-      DD_ENV: pr-${{ github.event.number }}
-      DD_HOSTNAME: pr-${{ github.event.number }}.${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
-    steps:
-      - uses: actions/checkout@v4
-
-      - uses: google-github-actions/auth@v2
-        with:
-          workload_identity_provider: 'projects/654815109728/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
-          service_account: 'easyenclave-staging-ci@eestaging.iam.gserviceaccount.com'
-      - uses: google-github-actions/setup-gcloud@v2
-
-      - name: Create TDX VM (boots from easyenclave, fetches dd from GitHub releases)
-        env:
-          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          CLOUDFLARE_API_TOKEN: ${{ secrets.DD_CP_CF_API_TOKEN }}
-          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.DD_CP_CF_ACCOUNT_ID }}
-          CLOUDFLARE_ZONE_ID: ${{ secrets.DD_CP_CF_ZONE_ID }}
-          # OAuth env vars intentionally omitted — gcp-deploy.sh sees
-          # empty DD_GITHUB_CLIENT_ID and skips them in the workload
-          # spec. dd-web then disables /auth/github/* and serves
-          # /auth/pat for browser access.
-          #
-          # Intel Trust Authority — optional. When the secret is set,
-          # the CP mints its own ITA token at startup and verifies
-          # agent-supplied tokens on /register. DD_ITA_REQUIRED stays
-          # false (default) so unsigned agents still register.
-          DD_ITA_API_KEY: ${{ secrets.DD_ITA_API_KEY }}
-          DD_RELEASE_TAG: ${{ needs.build.outputs.tag }}
-        run: scripts/gcp-deploy.sh
-
-      - name: Wait for agent health (streams serial console)
-        env:
-          AGENT_URL: https://${{ env.DD_HOSTNAME }}
-          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          GCP_ZONE: ${{ env.GCP_ZONE }}
-        run: |
-          VM_NAME=$(gcloud compute instances list \
-            --project="$GCP_PROJECT_ID" \
-            --filter="labels.devopsdefender=managed AND labels.dd_env=${DD_ENV}" \
-            --format="value(name)" --sort-by=~creationTimestamp | head -1)
-          if [ -z "$VM_NAME" ]; then
-            echo "::error::no dd-${DD_ENV} VM found — gcp-deploy.sh must have failed"
-            exit 1
-          fi
-          echo "Watching VM: $VM_NAME (zone: $GCP_ZONE)"
-
-          LAST_LINES=0
-          for i in $(seq 1 60); do
-            # Stream serial console so boot failures (DHCP hang, GitHub
-            # release fetch error, cloudflared exit, etc.) are visible
-            # without shelling into GCP.
-            gcloud compute instances get-serial-port-output "$VM_NAME" \
-              --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" 2>/dev/null \
-              > /tmp/serial.log || true
-            TOTAL_LINES=$(wc -l < /tmp/serial.log)
-            if [ "$TOTAL_LINES" -gt "$LAST_LINES" ]; then
-              tail -n +$((LAST_LINES + 1)) /tmp/serial.log \
-                | sed 's/^/[serial] /'
-              LAST_LINES=$TOTAL_LINES
-            fi
-
-            if grep -qE "FATAL|Kernel panic|Invalid ELF header|/bin/sh: can't access tty" /tmp/serial.log; then
-              echo "::error::boot failed — serial log shows fatal pattern"
-              exit 1
-            fi
-
-            # /health via the Cloudflare tunnel tests the full chain:
-            # VM boot → easyenclave init → github_release fetch of dd +
-            # cloudflared → cloudflared tunnel up.
-            if curl -fsS "${AGENT_URL}/health" >/dev/null 2>&1; then
-              echo "Agent healthy at ${AGENT_URL}"
-              exit 0
-            fi
-            echo "  waiting for tunnel... (${i}/60)"
-            sleep 5
-          done
-          echo "::error::Agent not healthy within 5 minutes"
-          echo "--- final serial tail ---"
-          tail -80 /tmp/serial.log | sed 's/^/[serial] /'
-          exit 1
-
-      - name: Verify NEW VM via TDX attestation
-        env:
-          AGENT_URL: https://${{ env.DD_HOSTNAME }}
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          # /cp/attest proves the freshly-deployed VM is serving the tunnel
-          # (stale tunnels point at old VMs that 404 on this endpoint).
-          # Auth: GITHUB_TOKEN via Bearer — the CP's /repos/{owner}/dd probe
-          # accepts any token with repo access. No OIDC audience wiring.
-          NONCE=$(openssl rand -base64 16)
-
-          # 60 × 10s = 10 min. New VM has to boot, fetch cloudflared
-          # and dd from GitHub releases, start, and bring its tunnel up.
-          for attempt in $(seq 1 60); do
-            BODY=$(curl -sG -w '\n%{http_code}' \
-              -H "Authorization: Bearer ${GITHUB_TOKEN}" \
-              --data-urlencode "nonce=${NONCE}" \
-              "${AGENT_URL}/cp/attest" || echo $'\n000')
-            CODE=$(echo "$BODY" | tail -n1)
-            JSON=$(echo "$BODY" | sed '$d')
-            if [ "$CODE" = "200" ]; then
-              QUOTE_B64=$(echo "$JSON" | jq -r '.quote_b64 // empty')
-              if [ -n "$QUOTE_B64" ] && [ "$QUOTE_B64" != "null" ]; then
-                # MRTD = 48 bytes at offset 184 in TDX quote v4.
-                # If it's non-zero, attestation actually worked.
-                MRTD=$(echo "$QUOTE_B64" | base64 -d \
-                  | dd bs=1 skip=184 count=48 status=none | xxd -p -c 48)
-                if [ -n "$MRTD" ] && [ "$MRTD" != "$(printf '00%.0s' {1..48})" ]; then
-                  echo "NEW VM verified — MRTD: $MRTD"
-                  exit 0
-                fi
-                echo "  /cp/attest 200 but MRTD empty/zero, retrying... (${attempt}/60)"
-              else
-                echo "  /cp/attest 200 but no quote_b64, retrying... (${attempt}/60)"
-              fi
-            else
-              echo "  /cp/attest returned HTTP ${CODE}, retrying... (${attempt}/60)"
-            fi
-            sleep 10
-          done
-          echo "::error::/cp/attest never returned a valid quote — stale tunnel or new VM never came up"
-          exit 1
-
-      - name: Verify STONITH halted prior VM(s) in this env
-        env:
-          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-          GCP_ZONE: ${{ env.GCP_ZONE }}
-        run: |
-          # STONITH (dd-register deletes the old tunnel → old cloudflared
-          # exits → old dd-register poweroffs the VM) is the ONLY cleanup
-          # mechanism. Scoped to this PR's env — previews are
-          # hostname-isolated from each other, so this only reaps prior
-          # deploys of the same PR (re-pushes).
-          NEW_VM=$(gcloud compute instances list \
-            --project="$GCP_PROJECT_ID" \
-            --filter="labels.devopsdefender=managed AND labels.dd_env=${DD_ENV}" \
-            --format="value(name)" --sort-by=~creationTimestamp | head -1)
-          echo "new VM: $NEW_VM"
-
-          # Give STONITH-by-tunnel-delete 120s to work on well-behaved
-          # old VMs (their cloudflared exits → dd-register poweroffs).
-          # After that, force-delete any remaining survivors: they're
-          # zombies whose dd-register failed before creating a tunnel
-          # (e.g. CF auth error at boot — see src/cp.rs
-          # which now kernel_poweroff's on init failure, so this is a
-          # safety net for pre-fix zombies and any future init failure
-          # modes we haven't handled).
-          SURVIVORS=""
-          for i in $(seq 1 24); do
-            SURVIVORS=$(gcloud compute instances list \
-              --project="$GCP_PROJECT_ID" \
-              --filter="labels.devopsdefender=managed AND labels.dd_env=${DD_ENV} AND status=RUNNING" \
-              --format="value(name)" \
-              | grep -vx "$NEW_VM" || true)
-            if [ -z "$SURVIVORS" ]; then
-              echo "STONITH verified — only $NEW_VM running in dd_env=${DD_ENV}"
-              exit 0
-            fi
-            echo "  still running besides $NEW_VM: $(echo "$SURVIVORS" | tr '\n' ' ')"
-            echo "  waiting for STONITH poweroff... (${i}/24)"
-            sleep 5
-          done
-          echo "::warning::STONITH-by-tunnel-delete timed out; force-deleting zombies:"
-          echo "$SURVIVORS"
-          # shellcheck disable=SC2086
-          gcloud compute instances delete $SURVIVORS \
-            --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet || true
-          echo "zombies reaped; $NEW_VM is the only $DD_ENV VM"
-
-      - name: Comment preview URL on PR
-        uses: actions/github-script@v7
-        with:
-          script: |
-            const url = `https://${process.env.DD_HOSTNAME}`;
-            const body = [
-              `### DD preview ready`,
-              ``,
-              `**URL:** ${url}`,
-              ``,
-              `Browser login: paste \`gh auth token\` output at ${url}/auth/pat`,
-              ``,
-              `CLI / curl: \`curl -H "Authorization: Bearer $(gh auth token)" ${url}/\``,
-              ``,
-              `Register endpoint for a local agent: \`wss://${process.env.DD_HOSTNAME}/register\``,
-            ].join('\n');
-
-            // Update existing bot comment if present, else create.
-            const { data: comments } = await github.rest.issues.listComments({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              issue_number: context.issue.number,
-            });
-            const marker = '### DD preview ready';
-            const existing = comments.find(c => c.user.type === 'Bot' && c.body && c.body.includes(marker));
-            if (existing) {
-              await github.rest.issues.updateComment({
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                comment_id: existing.id,
-                body,
-              });
-            } else {
-              await github.rest.issues.createComment({
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                issue_number: context.issue.number,
-                body,
-              });
-            }
+    uses: ./.github/workflows/deploy-cp.yml
+    with:
+      env: pr-${{ github.event.number }}
+      hostname: pr-${{ github.event.number }}.${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
+      gcp_environment: staging
+      workload_identity_provider: 'projects/654815109728/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
+      service_account: 'easyenclave-staging-ci@eestaging.iam.gserviceaccount.com'
+      release_tag: ${{ needs.build.outputs.tag }}
+      oauth_enabled: false
+      comment_on_pr: true
+      ref: ${{ github.event.pull_request.head.ref }}
+    secrets: inherit
+
+  # Production deploy at app.{domain}. Fires on push-to-main OR on a
+  # manual workflow_dispatch (rollback to a specific release_tag).
+  # Tag pushes (v*) intentionally do not auto-deploy — they just
+  # publish the artifact. Cascades into dd-local-prod relaunch.
+  deploy-production:
+    if: >-
+      (github.event_name == 'push' && github.ref == 'refs/heads/main')
+      || github.event_name == 'workflow_dispatch'
+    needs: build
+    permissions:
+      contents: read
+      id-token: write
+      # Granted (though unused — comment_on_pr=false here) so the
+      # permissions intersection with deploy-cp.yml's job matches.
+      pull-requests: write
+    uses: ./.github/workflows/deploy-cp.yml
+    with:
+      env: production
+      hostname: app.${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
+      gcp_environment: production
+      workload_identity_provider: 'projects/779946350556/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
+      service_account: 'easyenclave-production-ci@easyenclave.iam.gserviceaccount.com'
+      release_tag: ${{ inputs.release_tag || 'latest' }}
+      oauth_enabled: true
+      comment_on_pr: false
+      ref: main
+    secrets: inherit
diff --git a/.github/workflows/retire-staging.yml b/.github/workflows/retire-staging.yml
deleted file mode 100644
index dbf4205..0000000
--- a/.github/workflows/retire-staging.yml
+++ /dev/null
@@ -1,98 +0,0 @@
-name: Retire Staging
-
-# One-shot cleanup of the shared `app-staging.{domain}` env.
-# Per-PR preview envs (deploy-preview in release.yml) fully replace
-# the old shared staging, so this workflow reaps whatever's left:
-#   - any `dd_env=staging` VMs (RUNNING or TERMINATED)
-#   - any CF tunnels named `dd-staging-*`
-#   - the CF CNAME for `app-staging.{domain}`
-#
-# Idempotent — skips silently if nothing to delete. Run manually once
-# via the Actions UI, then this workflow can be deleted.
-
-on:
-  workflow_dispatch:
-
-permissions:
-  contents: read
-
-env:
-  DD_DOMAIN: ${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
-  GCP_ZONE: us-central1-c
-
-jobs:
-  retire:
-    runs-on: ubuntu-latest
-    environment: staging
-    permissions:
-      contents: read
-      id-token: write
-    steps:
-      - uses: google-github-actions/auth@v2
-        with:
-          workload_identity_provider: 'projects/654815109728/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
-          service_account: 'easyenclave-staging-ci@eestaging.iam.gserviceaccount.com'
-      - uses: google-github-actions/setup-gcloud@v2
-
-      - name: Delete staging VMs
-        env:
-          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
-        run: |
-          VMS=$(gcloud compute instances list \
-            --project="$GCP_PROJECT_ID" \
-            --filter="labels.devopsdefender=managed AND labels.dd_env=staging" \
-            --format="value(name)")
-          if [ -z "$VMS" ]; then
-            echo "No dd-staging VMs to delete."
-          else
-            echo "Deleting: $(echo "$VMS" | tr '\n' ' ')"
-            # shellcheck disable=SC2086
-            gcloud compute instances delete $VMS \
-              --project="$GCP_PROJECT_ID" --zone="$GCP_ZONE" --quiet
-          fi
-
-      - name: Delete CF tunnels with dd-staging- prefix
-        env:
-          CF_API_TOKEN: ${{ secrets.DD_CP_CF_API_TOKEN }}
-          CF_ACCOUNT_ID: ${{ secrets.DD_CP_CF_ACCOUNT_ID }}
-        run: |
-          resp=$(curl -fsS \
-            -H "Authorization: Bearer ${CF_API_TOKEN}" \
-            "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/cfd_tunnel?is_deleted=false&per_page=200")
-          ids=$(echo "$resp" | jq -r \
-            '.result[] | select(.name | startswith("dd-staging-")) | .id')
-          if [ -z "$ids" ]; then
-            echo "No CF tunnels with prefix dd-staging-"
-          else
-            for id in $ids; do
-              echo "Deleting tunnel $id"
-              curl -fsS -X DELETE \
-                -H "Authorization: Bearer ${CF_API_TOKEN}" \
-                "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/cfd_tunnel/${id}/connections" \
-                >/dev/null || true
-              curl -fsS -X DELETE \
-                -H "Authorization: Bearer ${CF_API_TOKEN}" \
-                "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/cfd_tunnel/${id}" \
-                >/dev/null || echo "::warning::tunnel $id delete failed"
-            done
-          fi
-
-      - name: Delete CNAME for app-staging
-        env:
-          CF_API_TOKEN: ${{ secrets.DD_CP_CF_API_TOKEN }}
-          CF_ZONE_ID: ${{ secrets.DD_CP_CF_ZONE_ID }}
-        run: |
-          host="app-staging.${DD_DOMAIN}"
-          record_id=$(curl -fsS \
-            -H "Authorization: Bearer ${CF_API_TOKEN}" \
-            "https://api.cloudflare.com/client/v4/zones/${CF_ZONE_ID}/dns_records?type=CNAME&name=${host}" \
-            | jq -r '.result[0].id // empty')
-          if [ -z "$record_id" ]; then
-            echo "No CNAME for ${host}"
-          else
-            echo "Deleting CNAME record $record_id (${host})"
-            curl -fsS -X DELETE \
-              -H "Authorization: Bearer ${CF_API_TOKEN}" \
-              "https://api.cloudflare.com/client/v4/zones/${CF_ZONE_ID}/dns_records/${record_id}" \
-              >/dev/null
-          fi
diff --git a/README.md b/README.md
index 289d19c..f8a1df1 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ The `devopsdefender` binary ships as a **GitHub release asset** — not an OCI i
 
 `cloudflared` is also pulled directly from `cloudflare/cloudflared`'s GitHub releases as a fetch-only boot workload — no bundling in our image, no Dockerfile step.
 
-Per-VM configuration (CF credentials, GitHub OAuth, the workload spec itself) is passed to easyenclave at boot via **GCE instance metadata** (`ee-config` attribute), read by `easyenclave::init::fetch_gce_metadata_config()` and applied as env vars. `scripts/gcp-deploy.sh` builds the spec and invokes `gcloud compute instances create --image-family=easyenclave-staging --metadata-from-file=ee-config=...`.
+Per-VM configuration (CF credentials, GitHub OAuth, the workload spec itself) is passed to easyenclave at boot via **GCE instance metadata** (`ee-config` attribute), read by `easyenclave::init::fetch_gce_metadata_config()` and applied as env vars. The CP-deploy step in `.github/workflows/deploy-cp.yml` builds the spec and invokes `gcloud compute instances create --image-family=easyenclave-staging --metadata-from-file=ee-config=...`.
 
 ## CI/CD
 
@@ -48,16 +48,18 @@ PR              → pre-release tagged pr-{sha12}, then ephemeral preview at pr-
 branch deleted  → pr-teardown.yml deletes the preview's VM, CF tunnel, and DNS
 push to main    → rolling `latest` release, then auto-deploy to production
 push v* tag     → versioned release (no auto-deploy)
-manual          → production-deploy.yml promotes any existing tag
+manual dispatch → redeploy any existing tag to production (rollback tool)
 ```
 
-Each PR gets its own isolated env at `pr-{N}.{domain}` with `DD_ENV=pr-{N}` — no more shared staging tier. `.github/workflows/release.yml` builds the static musl binary, publishes it as a GitHub release asset, deploys the PR's preview VM, and posts the URL back to the PR. The preview VM is verified via:
+Every path lives in `.github/workflows/release.yml`: one `build` job, then either `deploy-preview` (PR) or `deploy-production` (main / dispatch), both calling the reusable `deploy-cp.yml` with env-specific inputs. Each cascades into a relaunch of the matching `dd-local-{env}` VM on the tdx2 host — the Release run only goes green when that agent re-registers with the freshly-deployed CP. Verifications along the way:
 
 1. `/health` via the Cloudflare tunnel
 2. `/cp/attest` returning a real TDX MRTD (cryptographic proof the freshly-deployed VM is running — old VMs don't have the endpoint and return 404)
-3. No other `dd-pr-{N}-*` VM is RUNNING after deploy (STONITH must have halted the previous instance of this PR)
+3. Dashboard `/` returning HTTP 200 under a Bearer PAT
+4. No other `dd-{env}-*` VM is RUNNING after deploy (STONITH must have halted the previous instance)
+5. `dd-local-{env}` re-registers with the new CP within 5 min
 
-Browser access to a PR preview goes through `/auth/pat` (paste a GitHub PAT, validated against `DD_OWNER`). OAuth is only wired for production, which `production-deploy.yml` still targets at `app.{domain}`.
+Browser access to a PR preview goes through `/auth/pat` (paste a GitHub PAT, validated against `DD_OWNER`). OAuth is only wired for production, at `app.{domain}`.
 
 ## STONITH
 
diff --git a/apps/_infra/dd-relaunch.sh b/apps/_infra/dd-relaunch.sh
new file mode 100755
index 0000000..55d380d
--- /dev/null
+++ b/apps/_infra/dd-relaunch.sh
@@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+# dd-relaunch.sh — destroy and recreate one local TDX agent VM.
+#
+# Invoked over SSH by .github/actions/relaunch-agent during a Release
+# cascade. Pulls the PR's (or main's) apps/_infra tree so this script
+# and local-agents.sh are always the ones the caller authored. Tears
+# down the existing VM + overlay, runs local-agents.sh to redefine,
+# and starts the VM.
+#
+#   dd-relaunch.sh prod    https://app.devopsdefender.com    main
+#   dd-relaunch.sh preview https://pr-N.devopsdefender.com   feat/some-pr
+#
+# DD_PAT and DD_ITA_API_KEY must be set in the environment.
+
+set -euo pipefail
+
+KIND="${1?usage: dd-relaunch.sh <prod|preview> <cp-url> [ref]}"
+CP="${2?cp url required}"
+REF="${3:-main}"
+: "${DD_PAT?DD_PAT must be set}"
+: "${DD_ITA_API_KEY?DD_ITA_API_KEY must be set}"
+
+case "$KIND" in
+  prod|preview) ;;
+  *) echo "unknown kind: $KIND (want prod|preview)" >&2; exit 2 ;;
+esac
+
+cd /home/tdx2/src/dd
+
+# Refresh the infra scripts + apps/ tree from the caller's ref. Limited
+# checkout so a dirty working tree elsewhere doesn't block the deploy.
+# This script is already in memory, so the refresh takes effect on the
+# *next* invocation.
+git fetch --quiet origin "$REF"
+git checkout --quiet "origin/$REF" -- apps/
+echo "dd-relaunch: refreshed apps/ from origin/$REF"
+
+vm="dd-local-$KIND"
+overlay="/var/lib/libvirt/images/$vm.qcow2"
+
+virsh destroy "$vm" 2>/dev/null || true
+virsh undefine "$vm" --managed-save --snapshots-metadata 2>/dev/null || true
+rm -f "$overlay"
+
+# Redefine via local-agents.sh; "" skips the other slot.
+case "$KIND" in
+  prod)    ./apps/_infra/local-agents.sh ""  "$CP" ;;
+  preview) ./apps/_infra/local-agents.sh "$CP" "" ;;
+esac
+
+virsh start "$vm"
+echo "relaunched $vm against $CP"
diff --git a/scripts/local-agents.sh b/apps/_infra/local-agents.sh
similarity index 95%
rename from scripts/local-agents.sh
rename to apps/_infra/local-agents.sh
index 17c61fa..20b772a 100755
--- a/scripts/local-agents.sh
+++ b/apps/_infra/local-agents.sh
@@ -12,11 +12,11 @@
 # Usage:
 #   export DD_PAT="$(gh auth token)"
 #   export DD_ITA_API_KEY="$(cat ~/.secrets/ita_api_key)"
-#   ./scripts/local-agents.sh https://pr-106.devopsdefender.com https://app.devopsdefender.com
+#   ./apps/_infra/local-agents.sh https://pr-106.devopsdefender.com https://app.devopsdefender.com
 #
 # Pass "" for either URL to skip defining that VM:
-#   ./scripts/local-agents.sh "" https://app.devopsdefender.com   # prod only
-#   ./scripts/local-agents.sh https://pr-N.devopsdefender.com ""  # preview only
+#   ./apps/_infra/local-agents.sh "" https://app.devopsdefender.com   # prod only
+#   ./apps/_infra/local-agents.sh https://pr-N.devopsdefender.com ""  # preview only
 #
 # After: virsh start dd-local-preview && virsh start dd-local-prod
 
@@ -255,3 +255,8 @@ echo
 echo "watch registration (Ctrl-] to exit):"
 [ -n "$PREVIEW_CP" ] && echo "  virsh console dd-local-preview"
 [ -n "$PROD_CP"    ] && echo "  virsh console dd-local-prod"
+
+# Explicit 0 — the tail `[ -n "$PROD_CP" ] && …` returns 1 when
+# PROD_CP="" (preview-only), bubbling up as the script exit status
+# and tripping set -e in dd-relaunch.sh. Force success.
+exit 0
diff --git a/apps/cloudflared/workload.json b/apps/cloudflared/workload.json
new file mode 100644
index 0000000..1b2270a
--- /dev/null
+++ b/apps/cloudflared/workload.json
@@ -0,0 +1,8 @@
+{
+  "app_name": "cloudflared",
+  "github_release": {
+    "repo": "cloudflare/cloudflared",
+    "asset": "cloudflared-linux-amd64",
+    "rename": "cloudflared"
+  }
+}
diff --git a/apps/dd-agent/workload.json.tmpl b/apps/dd-agent/workload.json.tmpl
new file mode 100644
index 0000000..a0e6d04
--- /dev/null
+++ b/apps/dd-agent/workload.json.tmpl
@@ -0,0 +1,22 @@
+{
+  "app_name": "dd-agent",
+  "github_release": {
+    "repo": "devopsdefender/dd",
+    "asset": "devopsdefender",
+    "tag": "latest"
+  },
+  "cmd": ["devopsdefender", "agent"],
+  "env": [
+    "DD_MODE=agent",
+    "DD_CP_URL=${DD_CP_URL}",
+    "DD_PAT=${DD_PAT}",
+    "DD_ITA_API_KEY=${DD_ITA_API_KEY}",
+    "DD_ITA_BASE_URL=https://api.trustauthority.intel.com",
+    "DD_ITA_JWKS_URL=https://portal.trustauthority.intel.com/certs",
+    "DD_ITA_ISSUER=https://portal.trustauthority.intel.com",
+    "DD_OWNER=devopsdefender",
+    "DD_ENV=${DD_ENV}",
+    "DD_VM_NAME=${DD_VM_NAME}",
+    "DD_PORT=8080"
+  ]
+}
diff --git a/apps/dd-management/workload.json.tmpl b/apps/dd-management/workload.json.tmpl
new file mode 100644
index 0000000..e8fbe17
--- /dev/null
+++ b/apps/dd-management/workload.json.tmpl
@@ -0,0 +1,29 @@
+{
+  "app_name": "dd-management",
+  "github_release": {
+    "repo": "devopsdefender/dd",
+    "asset": "devopsdefender",
+    "tag": "${DD_RELEASE_TAG}"
+  },
+  "cmd": ["devopsdefender"],
+  "env": [
+    "DD_MODE=management",
+    "DD_CF_API_TOKEN=${CLOUDFLARE_API_TOKEN}",
+    "DD_CF_ACCOUNT_ID=${CLOUDFLARE_ACCOUNT_ID}",
+    "DD_CF_ZONE_ID=${CLOUDFLARE_ZONE_ID}",
+    "DD_CF_DOMAIN=${DD_DOMAIN}",
+    "DD_HOSTNAME=${DD_HOSTNAME}",
+    "DD_ENV=${DD_ENV}",
+    "DD_OWNER=devopsdefender",
+    "DD_REGISTER_PORT=8081",
+    "DD_OIDC_AUDIENCE=dd-web",
+    "DD_PORT=8080",
+    "DD_GITHUB_CLIENT_ID=${DD_GITHUB_CLIENT_ID}",
+    "DD_GITHUB_CLIENT_SECRET=${DD_GITHUB_CLIENT_SECRET}",
+    "DD_GITHUB_CALLBACK_URL=${DD_GITHUB_CALLBACK_URL}",
+    "DD_ITA_API_KEY=${DD_ITA_API_KEY}",
+    "DD_ITA_BASE_URL=${DD_ITA_BASE_URL}",
+    "DD_ITA_JWKS_URL=${DD_ITA_JWKS_URL}",
+    "DD_ITA_ISSUER=${DD_ITA_ISSUER}"
+  ]
+}
diff --git a/apps/mount-models/workload.json b/apps/mount-models/workload.json
new file mode 100644
index 0000000..94111d8
--- /dev/null
+++ b/apps/mount-models/workload.json
@@ -0,0 +1,7 @@
+{
+  "app_name": "mount-models",
+  "cmd": [
+    "/bin/busybox", "sh", "-c",
+    "mkdir -p /var/lib/easyenclave/ollama && mount /dev/vdc /var/lib/easyenclave/ollama && echo mount-models: ok; sleep inf"
+  ]
+}
diff --git a/apps/nv/workload.json b/apps/nv/workload.json
new file mode 100644
index 0000000..047ed3d
--- /dev/null
+++ b/apps/nv/workload.json
@@ -0,0 +1,7 @@
+{
+  "app_name": "nv",
+  "cmd": [
+    "/bin/busybox", "sh", "-c",
+    "/sbin/insmod /lib/modules/7.0.0-14-generic/kernel/nvidia-580srv-open/nvidia.ko NVreg_OpenRmEnableUnsupportedGpus=1 2>&1 && echo nv: loaded || echo nv: failed; sleep inf"
+  ]
+}
diff --git a/apps/ollama/workload.preview.json b/apps/ollama/workload.preview.json
new file mode 100644
index 0000000..8455622
--- /dev/null
+++ b/apps/ollama/workload.preview.json
@@ -0,0 +1,7 @@
+{
+  "app_name": "ollama",
+  "cmd": [
+    "/bin/busybox", "sh", "-c",
+    "until [ -x /var/lib/easyenclave/bin/dd-podman ]; do sleep 2; done\nexec /var/lib/easyenclave/bin/dd-podman run --rm --name ollama --network=host -v /var/lib/easyenclave/ollama:/root/.ollama -e OLLAMA_HOST=127.0.0.1:11434 docker.io/ollama/ollama:latest serve"
+  ]
+}
diff --git a/apps/ollama/workload.prod.json b/apps/ollama/workload.prod.json
new file mode 100644
index 0000000..eae4a9a
--- /dev/null
+++ b/apps/ollama/workload.prod.json
@@ -0,0 +1,7 @@
+{
+  "app_name": "ollama",
+  "cmd": [
+    "/bin/busybox", "sh", "-c",
+    "until [ -x /var/lib/easyenclave/bin/dd-podman ]; do sleep 2; done\nexec /var/lib/easyenclave/bin/dd-podman run --rm --name ollama --network=host --device=/dev/nvidia0 --device=/dev/nvidiactl --device=/dev/nvidia-uvm -v /var/lib/easyenclave/ollama:/root/.ollama -e OLLAMA_HOST=127.0.0.1:11434 docker.io/ollama/ollama:latest serve"
+  ]
+}
diff --git a/apps/openclaw/workload.json.tmpl b/apps/openclaw/workload.json.tmpl
new file mode 100644
index 0000000..6f9087d
--- /dev/null
+++ b/apps/openclaw/workload.json.tmpl
@@ -0,0 +1,7 @@
+{
+  "app_name": "openclaw",
+  "cmd": [
+    "/bin/busybox", "sh", "-c",
+    "echo 'openclaw: waiting for ollama on 127.0.0.1:11434...'\ni=0\nuntil /bin/busybox wget -q -T 3 -O- http://127.0.0.1:11434/api/tags >/dev/null 2>&1; do\n  i=$((i+1))\n  if [ $((i % 6)) -eq 0 ]; then echo \"openclaw: still waiting for ollama ($i tries, ${i}x5s elapsed)\"; fi\n  sleep 5\ndone\necho 'openclaw: ollama responding, pulling model ${MODEL}'\n/var/lib/easyenclave/bin/dd-podman exec ollama ollama pull ${MODEL} 2>&1\necho 'openclaw: model pulled, launching gateway'\nexec /var/lib/easyenclave/bin/dd-podman exec ollama ollama launch openclaw --model ${MODEL} --yes"
+  ]
+}
diff --git a/apps/podman-bootstrap/workload.json b/apps/podman-bootstrap/workload.json
new file mode 100644
index 0000000..5a797e4
--- /dev/null
+++ b/apps/podman-bootstrap/workload.json
@@ -0,0 +1,7 @@
+{
+  "app_name": "podman-bootstrap",
+  "cmd": [
+    "/bin/busybox", "sh", "-c",
+    "set -e\nBIN=/var/lib/easyenclave/bin\nSRC=$BIN/podman-linux-amd64\nuntil [ -x $SRC/usr/local/bin/podman ]; do sleep 1; done\n# If there's a vdc scratch disk, wait for mount-models to actually\n# mount it before we write files under /var/lib/easyenclave/ollama —\n# otherwise our writes land on the rootfs tmpfs and get shadowed the\n# moment vdc is mounted. On VMs without vdc (GCP CP preview) there's\n# no mount-models workload and this check short-circuits.\nif [ -b /dev/vdc ]; then\n  until mountpoint -q /var/lib/easyenclave/ollama 2>/dev/null; do sleep 1; done\nfi\nmkdir -p /var/lib/easyenclave/ollama\ncp -f $SRC/usr/local/bin/* $BIN/\ncp -f $SRC/usr/local/lib/podman/conmon $BIN/\ncp -f $SRC/usr/local/lib/podman/netavark $BIN/ 2>/dev/null || true\ncp -f $SRC/usr/local/lib/podman/aardvark-dns $BIN/ 2>/dev/null || true\ncp -f $SRC/usr/local/lib/podman/rootlessport $BIN/ 2>/dev/null || true\nmkdir -p /var/lib/easyenclave/ollama/.podman/storage /var/lib/easyenclave/ollama/.podman/runroot\n# /dev/shm is where podman puts its per-container POSIX shm lock\n# file (libpod_lock). EE's guest rootfs may not mount tmpfs on\n# /dev/shm; without it, podman fails 'failed to create 2048 locks\n# in /libpod_lock: no such file or directory'. mkdir + mount idempotently.\nif ! mountpoint -q /dev/shm 2>/dev/null; then\n  mkdir -p /dev/shm\n  mount -t tmpfs -o size=64M tmpfs /dev/shm 2>/dev/null || true\nfi\n# Pick storage driver based on substrate. vdc-backed ext4 supports\n# native overlay (fast + space-efficient). Without vdc (GCP CP\n# preview, any guest running on tmpfs rootfs), overlay-on-tmpfs\n# errors out, so fall back to vfs (slower, full copy per layer, but\n# works on any filesystem).\nif mountpoint -q /var/lib/easyenclave/ollama; then\n  DRIVER=overlay\nelse\n  DRIVER=vfs\nfi\n# Write containers.conf on vdc (writable). /etc is RO on EE so we\n# can't put it where podman looks by default. helper_binaries_dir\n# tells podman where we staged conmon/netavark/aardvark-dns/… —\n# podman probes those at startup even with --network=host.\nPOL=/var/lib/easyenclave/ollama/.podman/policy.json\n# Minimum viable signature policy: trust anything. EE's attestation\n# story happens one layer up (image digest pinned by the spec we\n# baked); podman's own signature checking would duplicate that.\nprintf '%s' '{\"default\":[{\"type\":\"insecureAcceptAnything\"}]}' > $POL\n# Podman's containers-common looks for policy.json at hardcoded\n# paths (/etc/containers/, $HOME/.config/containers/). /etc and\n# /root are both RO on EE, so build a fake HOME under\n# /var/lib/easyenclave/.home (writable) and set HOME there in the\n# dd-podman wrapper.\nHOME_DIR=/var/lib/easyenclave/.home\nmkdir -p $HOME_DIR/.config/containers\ncp -f $POL $HOME_DIR/.config/containers/policy.json\nCONF=/var/lib/easyenclave/ollama/.podman/containers.conf\nprintf '%s\\n' '[engine]' 'helper_binaries_dir = [\"/var/lib/easyenclave/bin\"]' > $CONF\nmkdir -p $HOME_DIR/tmp\nprintf '%s\\n' '#!/bin/sh' \"export HOME=$HOME_DIR\" \"export TMPDIR=$HOME_DIR/tmp\" \"export CONTAINERS_CONF=$CONF\" \"exec /var/lib/easyenclave/bin/podman --conmon=/var/lib/easyenclave/bin/conmon --runtime=/var/lib/easyenclave/bin/crun --storage-driver=$DRIVER --root=/var/lib/easyenclave/ollama/.podman/storage --runroot=/var/lib/easyenclave/ollama/.podman/runroot --cgroup-manager=cgroupfs \\\"\\$@\\\"\" > $BIN/dd-podman\nchmod +x $BIN/dd-podman\nls -la $CONF $POL $BIN/dd-podman 2>&1 || true\ncat $CONF\necho podman-bootstrap: v2 ok driver=$DRIVER conf=$CONF policy=$POL"
+  ]
+}
diff --git a/apps/podman-static/workload.json b/apps/podman-static/workload.json
new file mode 100644
index 0000000..939125d
--- /dev/null
+++ b/apps/podman-static/workload.json
@@ -0,0 +1,7 @@
+{
+  "app_name": "podman-static",
+  "github_release": {
+    "repo": "mgoltzsche/podman-static",
+    "asset": "podman-linux-amd64.tar.gz"
+  }
+}
diff --git a/scripts/dd-relaunch.sh b/scripts/dd-relaunch.sh
deleted file mode 100755
index bdf1d8d..0000000
--- a/scripts/dd-relaunch.sh
+++ /dev/null
@@ -1,53 +0,0 @@
-#!/usr/bin/env bash
-# dd-relaunch.sh — destroy and recreate one local TDX agent VM.
-#
-# Invoked over SSH by .github/workflows/local-agents.yml after a
-# Release / Production Deploy succeeds. Pulls the current main of dd
-# (so this script and local-agents.sh are always the latest), tears
-# down the existing VM + overlay, runs scripts/local-agents.sh to
-# redefine, and starts the VM.
-#
-#   dd-relaunch.sh prod    https://app.devopsdefender.com
-#   dd-relaunch.sh preview https://pr-N.devopsdefender.com
-#
-# DD_PAT and DD_ITA_API_KEY must be set in the environment.
-
-set -euo pipefail
-
-KIND="${1?usage: dd-relaunch.sh <prod|preview> <cp-url>}"
-CP="${2?cp url required}"
-: "${DD_PAT?DD_PAT must be set}"
-: "${DD_ITA_API_KEY?DD_ITA_API_KEY must be set}"
-
-case "$KIND" in
-  prod|preview) ;;
-  *) echo "unknown kind: $KIND (want prod|preview)" >&2; exit 2 ;;
-esac
-
-cd /home/tdx2/src/dd
-
-# Pull the latest scripts. Limit the checkout to the two scripts so a
-# dirty working tree elsewhere doesn't block the deploy. The relaunch
-# script itself has already been read into memory by bash, so the
-# update takes effect on the *next* invocation.
-git fetch --quiet origin main
-git checkout --quiet origin/main -- scripts/local-agents.sh scripts/dd-relaunch.sh
-
-vm="dd-local-$KIND"
-overlay="/var/lib/libvirt/images/$vm.qcow2"
-
-virsh destroy "$vm" 2>/dev/null || true
-virsh undefine "$vm" --managed-save --snapshots-metadata 2>/dev/null || true
-rm -f "$overlay"
-
-# Redefine via local-agents.sh; "" skips the other slot.
-case "$KIND" in
-  prod)    ./scripts/local-agents.sh ""  "$CP" ;;
-  preview) ./scripts/local-agents.sh "$CP" "" ;;
-esac
-
-virsh start "$vm"
-echo "relaunched $vm against $CP"
-
-# ollama deploy + pull + query is driven from the workflow's HTTPS step
-# on ubuntu-latest, not here — see .github/workflows/local-agents.yml.
diff --git a/scripts/gcp-deploy.sh b/scripts/gcp-deploy.sh
deleted file mode 100755
index 96eed6b..0000000
--- a/scripts/gcp-deploy.sh
+++ /dev/null
@@ -1,177 +0,0 @@
-#!/bin/bash
-# gcp-deploy.sh — Create a TDX management VM on GCP that boots from a
-# sealed easyenclave image and runs dd management as a native process.
-#
-# Both the devopsdefender binary and cloudflared are fetched straight
-# from their GitHub releases by easyenclave's github_release workload
-# source — no OCI registry, no Dockerfile. Cloudflared is a fetch-only
-# boot workload: its binary lands in /var/lib/easyenclave/bin (now on
-# PATH) so dd-register can shell out to `cloudflared` by name.
-#
-# Agent-side mirror: a local TDX guest with a vfio-pci-passed GPU can
-# register against the CP this script deploys by using the same
-# easyenclave `github_release` workload source for the devopsdefender
-# binary, with `DD_REGISTER_URL=wss://{hostname}/register`. See the
-# local-GPU demo notes in the commit trail.
-#
-# Called by .github/workflows/{staging,production}-deploy.yml. Requires
-# gcloud CLI authenticated via Workload Identity Federation.
-#
-# Required env vars (set by the workflow):
-#   GCP_PROJECT_ID          — GCP project where the VM lives
-#   GCP_ZONE                — GCP zone (e.g. us-central1-c)
-#   DD_ENV                  — staging, production, or pr-{num} (ephemeral per-PR)
-#   DD_DOMAIN               — Public domain (e.g. devopsdefender.com)
-#   CLOUDFLARE_API_TOKEN    — CF API token (dd-register uses it)
-#   CLOUDFLARE_ACCOUNT_ID   — CF account ID
-#   CLOUDFLARE_ZONE_ID      — CF zone ID
-#
-# Optional env vars:
-#   DD_HOSTNAME             — public hostname override. If unset, derived
-#                             from DD_ENV (production → app.$DOMAIN,
-#                             anything else → app-staging.$DOMAIN). Set
-#                             explicitly for per-PR envs (pr-42.$DOMAIN).
-#   DD_GITHUB_CLIENT_ID     — GitHub OAuth client ID. If unset, dd-web
-#                             disables OAuth login and only PAT auth works.
-#                             Per-PR envs leave this unset.
-#   DD_GITHUB_CLIENT_SECRET — GitHub OAuth client secret (paired with above)
-#   DD_GITHUB_CALLBACK_URL  — OAuth callback, default https://{hostname}/auth/github/callback
-#   EE_IMAGE_FAMILY         — easyenclave GCP image family
-#   EE_IMAGE_PROJECT        — project hosting the image
-#   DD_RELEASE_TAG          — GitHub release tag on devopsdefender/dd
-#                             (defaults to 'latest'; PRs override with pr-{sha12})
-#   VM_MACHINE_TYPE         — default c3-standard-4
-#   VM_DISK_SIZE            — default 10GB
-
-set -euo pipefail
-
-# ── easyenclave image family ──────────────────────────────────────────────
-#   easyenclave-staging → rolling main, rotates on every push (5 kept)
-#   easyenclave-stable  → v* tags, kept forever
-EE_IMAGE_FAMILY="${EE_IMAGE_FAMILY:-easyenclave-staging}"
-EE_IMAGE_PROJECT="${EE_IMAGE_PROJECT:-easyenclave}"
-DD_RELEASE_TAG="${DD_RELEASE_TAG:-latest}"
-
-VM_NAME="dd-${DD_ENV}-$(date +%s)"
-VM_MACHINE_TYPE="${VM_MACHINE_TYPE:-c3-standard-4}"
-VM_DISK_SIZE="${VM_DISK_SIZE:-10GB}"
-
-if [ -z "${DD_HOSTNAME:-}" ]; then
-  if [ "${DD_ENV}" = "production" ]; then
-    DD_HOSTNAME="app.${DD_DOMAIN}"
-  else
-    DD_HOSTNAME="app-staging.${DD_DOMAIN}"
-  fi
-fi
-DD_GITHUB_CLIENT_ID="${DD_GITHUB_CLIENT_ID:-}"
-DD_GITHUB_CLIENT_SECRET="${DD_GITHUB_CLIENT_SECRET:-}"
-DD_GITHUB_CALLBACK_URL="${DD_GITHUB_CALLBACK_URL:-https://${DD_HOSTNAME}/auth/github/callback}"
-
-# Intel Trust Authority — mandatory. DD_ITA_API_KEY must be set in the
-# workflow (from secrets.DD_ITA_API_KEY). The CP will refuse to start
-# without one. Everything else has a default.
-if [ -z "${DD_ITA_API_KEY:-}" ]; then
-  echo "DD_ITA_API_KEY is required (configure secrets.DD_ITA_API_KEY)" >&2
-  exit 1
-fi
-DD_ITA_BASE_URL="${DD_ITA_BASE_URL:-https://api.trustauthority.intel.com}"
-DD_ITA_JWKS_URL="${DD_ITA_JWKS_URL:-https://portal.trustauthority.intel.com/certs}"
-DD_ITA_ISSUER="${DD_ITA_ISSUER:-https://portal.trustauthority.intel.com}"
-
-# ── Build the workload spec ──────────────────────────────────────────────
-# Two boot workloads:
-#   1. cloudflared — fetch-only. easyenclave downloads cloudflare's
-#      static binary from their GitHub release, symlinks it as
-#      `cloudflared`, and exits the deploy as "completed". The binary
-#      sits on PATH for dd-register to spawn.
-#   2. dd-management — fetches the devopsdefender binary from our own
-#      release and runs it. dd-register + dd-web both live in this
-#      single process (DD_MODE=management).
-EE_BOOT_WORKLOADS=$(jq -c -n \
-  --arg dd_tag         "$DD_RELEASE_TAG" \
-  --arg cf_token       "$CLOUDFLARE_API_TOKEN" \
-  --arg cf_account     "$CLOUDFLARE_ACCOUNT_ID" \
-  --arg cf_zone        "$CLOUDFLARE_ZONE_ID" \
-  --arg domain         "$DD_DOMAIN" \
-  --arg hostname       "$DD_HOSTNAME" \
-  --arg env            "$DD_ENV" \
-  --arg gh_client_id   "$DD_GITHUB_CLIENT_ID" \
-  --arg gh_client_secret "$DD_GITHUB_CLIENT_SECRET" \
-  --arg gh_callback    "$DD_GITHUB_CALLBACK_URL" \
-  --arg ita_api_key    "$DD_ITA_API_KEY" \
-  --arg ita_base_url   "$DD_ITA_BASE_URL" \
-  --arg ita_jwks_url   "$DD_ITA_JWKS_URL" \
-  --arg ita_issuer     "$DD_ITA_ISSUER" \
-  '[
-    {
-      "github_release": {
-        "repo": "cloudflare/cloudflared",
-        "asset": "cloudflared-linux-amd64",
-        "rename": "cloudflared"
-      },
-      "app_name": "cloudflared"
-    },
-    {
-      "github_release": {
-        "repo": "devopsdefender/dd",
-        "asset": "devopsdefender",
-        "tag": $dd_tag
-      },
-      "cmd": ["devopsdefender"],
-      "app_name": "dd-management",
-      "env": (
-        [
-          "DD_MODE=management",
-          ("DD_CF_API_TOKEN="   + $cf_token),
-          ("DD_CF_ACCOUNT_ID="  + $cf_account),
-          ("DD_CF_ZONE_ID="     + $cf_zone),
-          ("DD_CF_DOMAIN="      + $domain),
-          ("DD_HOSTNAME="       + $hostname),
-          ("DD_ENV="            + $env),
-          "DD_OWNER=devopsdefender",
-          "DD_REGISTER_PORT=8081",
-          "DD_OIDC_AUDIENCE=dd-web",
-          "DD_PORT=8080"
-        ]
-        + (if $gh_client_id == "" then [] else [
-          ("DD_GITHUB_CLIENT_ID="     + $gh_client_id),
-          ("DD_GITHUB_CLIENT_SECRET=" + $gh_client_secret),
-          ("DD_GITHUB_CALLBACK_URL="  + $gh_callback)
-        ] end)
-        + [
-          ("DD_ITA_API_KEY="          + $ita_api_key),
-          ("DD_ITA_BASE_URL="         + $ita_base_url),
-          ("DD_ITA_JWKS_URL="         + $ita_jwks_url),
-          ("DD_ITA_ISSUER="           + $ita_issuer)
-        ]
-      )
-    }
-  ]')
-
-# ── Wrap into ee-config ───────────────────────────────────────────────────
-jq -c -n \
-  --arg workloads "$EE_BOOT_WORKLOADS" \
-  '{ "EE_BOOT_WORKLOADS": $workloads, "EE_OWNER": "devopsdefender" }' \
-  > /tmp/ee-config.json
-
-trap 'rm -f /tmp/ee-config.json' EXIT
-
-# ── Create the VM ─────────────────────────────────────────────────────────
-gcloud compute instances create "$VM_NAME" \
-  --project="$GCP_PROJECT_ID" \
-  --zone="$GCP_ZONE" \
-  --machine-type="$VM_MACHINE_TYPE" \
-  --confidential-compute-type=TDX \
-  --maintenance-policy=TERMINATE \
-  --boot-disk-size="$VM_DISK_SIZE" \
-  --image-family="$EE_IMAGE_FAMILY" \
-  --image-project="$EE_IMAGE_PROJECT" \
-  --metadata-from-file=ee-config=/tmp/ee-config.json \
-  --labels=devopsdefender=managed,dd_env="${DD_ENV}" \
-  --tags=dd-management
-
-echo "VM: $VM_NAME"
-echo "  image:    family $EE_IMAGE_FAMILY ($EE_IMAGE_PROJECT)"
-echo "  hostname: $DD_HOSTNAME"
-echo "  dd release: $DD_RELEASE_TAG"
-echo "  workload: dd management"
diff --git a/scripts/ollama-deploy.sh b/scripts/ollama-deploy.sh
deleted file mode 100755
index 4d4babf..0000000
--- a/scripts/ollama-deploy.sh
+++ /dev/null
@@ -1,327 +0,0 @@
-#!/usr/bin/env bash
-# ollama-deploy.sh — run ollama + OpenClaw inside a DD agent VM as
-# podman containers. No ollama binary on the guest rootfs (that's
-# dynamically linked and fails on EE's busybox rootfs with
-# `libstdc++.so.6: cannot open shared object file`). Instead:
-#
-#   1. Fetch static podman (mgoltzsche/podman-static tarball) as a
-#      fetch-only DD workload.
-#   2. One-shot bootstrap via /exec — flatten the tarball's nested
-#      bin dir into /var/lib/easyenclave/bin and write a minimal
-#      /etc/containers/containers.conf (cgroup_manager=cgroupfs so
-#      we don't need systemd).
-#   3. Deploy the ollama container as a long-running workload
-#      (podman run --net=host ...). Prod also passes the three
-#      nvidia device nodes for H100 access.
-#   4. Pull the right-sized model via `podman exec ollama ollama pull`.
-#   5. Launch OpenClaw (a bridge from messaging apps to coding
-#      agents; subcommand of ollama, npm-installed on first run) as
-#      a second long-running workload using the same container.
-#
-#   ollama-deploy.sh <kind> <cp_url>
-#     kind:    prod | preview
-#     cp_url:  https://app.devopsdefender.com | https://pr-N.devopsdefender.com
-#
-# Requires DD_PAT in the environment (the workflow's GITHUB_TOKEN).
-
-set -euo pipefail
-
-KIND="${1?usage: ollama-deploy.sh <prod|preview> <cp_url>}"
-CP_URL="${2?cp_url required}"
-: "${DD_PAT?}"
-
-case "$KIND" in
-  prod)
-    MODEL="llama3.1:8b"
-    # GPU passthrough. /dev/nvidia-uvm appears once CUDA is touched;
-    # the nv-insmod boot workload in scripts/local-agents.sh loads
-    # the kernel module, so the device nodes exist by this point.
-    GPU_FLAGS='["--device=/dev/nvidia0","--device=/dev/nvidiactl","--device=/dev/nvidia-uvm"]'
-    ;;
-  preview)
-    MODEL="qwen2.5:0.5b"
-    GPU_FLAGS='[]'
-    ;;
-  *) echo "unknown kind: $KIND" >&2; exit 2 ;;
-esac
-
-VM_NAME="dd-local-$KIND"
-AUTH=(-H "Authorization: Bearer $DD_PAT")
-
-echo "== ollama-deploy $VM_NAME (model=$MODEL, cp=$CP_URL) =="
-
-# ── 1. Discover the fresh agent registration on the CP ─────────────
-# last_seen > started_at_iso filters out stale entries from the VM
-# generation we just destroyed during `virsh destroy`.
-started_at_iso="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
-echo "  waiting for a fresh ${VM_NAME} registration (last_seen > ${started_at_iso})"
-agent_host=""
-for i in $(seq 1 60); do
-  agent_host=$(curl -fsS "${AUTH[@]}" "$CP_URL/api/agents" 2>/dev/null \
-    | jq -r --arg vm "$VM_NAME" --arg since "$started_at_iso" '
-        [.[] | select(.vm_name==$vm and .status=="healthy" and .last_seen > $since)]
-        | sort_by(.last_seen) | reverse | .[0].hostname // empty' 2>/dev/null || true)
-  if [ -n "$agent_host" ] && [ "$agent_host" != "null" ]; then
-    break
-  fi
-  sleep 10
-done
-if [ -z "$agent_host" ] || [ "$agent_host" = "null" ]; then
-  echo "ERROR: $VM_NAME never appeared in CP fleet" >&2
-  exit 1
-fi
-echo "  agent: https://$agent_host"
-
-# ── 2. Wait for Cloudflare DNS to propagate ────────────────────────
-echo "  waiting for DNS on $agent_host..."
-for i in $(seq 1 30); do
-  if getent hosts "$agent_host" >/dev/null 2>&1; then
-    echo "  DNS resolved"
-    break
-  fi
-  sleep 5
-done
-
-agent() { curl -fsS --max-time 300 "${AUTH[@]}" "https://$agent_host$1" "${@:2}"; }
-
-# ── 3. Fetch podman-static (fetch-only DD workload) ────────────────
-# Tarball unpacks to /var/lib/easyenclave/bin/podman-linux-amd64/
-# with usr/local/bin/{podman,crun,conmon,netavark,...}.
-# NOTE: omit `tag` — EE treats `tag: null` as "GET /releases/latest"
-# (the real newest release), while `tag: "latest"` is a literal tag
-# lookup and 404s on repos like mgoltzsche/podman-static that version
-# their tags as v5.7.1 rather than with a rolling "latest" ref.
-echo "  POST /deploy podman-static..."
-A_SPEC=$(jq -c -n '{
-  app_name: "podman-static",
-  github_release: {
-    repo: "mgoltzsche/podman-static",
-    asset: "podman-linux-amd64.tar.gz"
-  }
-}')
-agent /deploy -H 'Content-Type: application/json' -d "$A_SPEC" | jq -c '.' || true
-
-echo "  waiting for podman binary to appear..."
-podman_path="/var/lib/easyenclave/bin/podman-linux-amd64/usr/local/bin/podman"
-for i in $(seq 1 60); do
-  resp=$(agent /exec -H 'Content-Type: application/json' \
-    -d "$(jq -c -n --arg p "$podman_path" '{cmd:["/bin/busybox","sh","-c",("test -x " + $p + " && echo found")],timeout_secs:5}')" \
-    2>/dev/null || true)
-  if echo "$resp" | grep -q found; then
-    echo "  podman unpacked"
-    break
-  fi
-  sleep 5
-done
-
-# ── 4. Bootstrap: stage podman's helper binaries ───────────────────
-# mgoltzsche's tarball layout:
-#   usr/local/bin/                podman, crun, runc, fuse-overlayfs,
-#                                 fusermount3, pasta, pasta.avx2
-#   usr/local/lib/podman/         conmon, netavark, aardvark-dns,
-#                                 rootlessport, catatonit
-# EE's guest rootfs has BOTH /usr AND /etc mounted read-only. The
-# only writable paths are under /var/lib/easyenclave (on the
-# persistent vdc ext4 disk) and /run/tmp-style tmpfs locations. So
-# we cannot write a containers.conf anywhere podman looks for one,
-# and we cannot cp conmon into any of podman's hardcoded search
-# dirs. Every path has to be on the podman CLI directly.
-#
-# We DO stage the helpers into /var/lib/easyenclave/bin so the
-# container workload's `cmd[0]` can reach `podman`, and the
-# --conmon / --runtime / --root / --runroot flags on the `podman`
-# command (see step 5) point podman at the rest.
-echo "  bootstrapping podman (staging binaries to writable dirs)..."
-bootstrap_sh='set -e
-BIN=/var/lib/easyenclave/bin
-SRC=$BIN/podman-linux-amd64
-cp -f $SRC/usr/local/bin/* $BIN/
-cp -f $SRC/usr/local/lib/podman/conmon $BIN/
-cp -f $SRC/usr/local/lib/podman/netavark $BIN/ 2>/dev/null || true
-cp -f $SRC/usr/local/lib/podman/aardvark-dns $BIN/ 2>/dev/null || true
-cp -f $SRC/usr/local/lib/podman/rootlessport $BIN/ 2>/dev/null || true
-mkdir -p /var/lib/easyenclave/containers/storage /var/lib/easyenclave/containers/runroot
-echo podman-bootstrap: ok'
-boot_resp=$(agent /exec -H 'Content-Type: application/json' \
-  -d "$(jq -c -n --arg s "$bootstrap_sh" '{cmd:["/bin/busybox","sh","-c",$s],timeout_secs:30}')")
-if ! echo "$boot_resp" | jq -e '.exit_code == 0' >/dev/null 2>&1; then
-  echo "ERROR: podman bootstrap failed"
-  echo "$boot_resp" | jq .
-  exit 1
-fi
-echo "  bootstrap: $(echo "$boot_resp" | jq -r '.stdout // ""' | tail -1)"
-
-# ── 5. Launch the ollama container (long-running workload) ─────────
-# --net=host  : ollama listens on guest's 127.0.0.1:11434.
-# --name      : so we can `podman exec ollama ...` by name.
-# --cgroup-manager=cgroupfs: matches containers.conf, still required
-#               on the command line because podman doesn't always
-#               pick it up from the engine section when invoked
-#               outside systemd.
-# Volume      : /var/lib/easyenclave/ollama is the persistent vdc
-#               ext4 disk (mounted by the mount-models boot workload
-#               in local-agents.sh); doubles as ollama's model cache
-#               and openclaw's npm prefix.
-echo "  POST /deploy ollama container..."
-# Every writable path (--root, --runroot, --conmon, --runtime) is
-# on the CLI because EE's /etc and /usr are read-only — podman
-# can't fall back on /etc/containers/containers.conf the way it
-# normally does. Storage lives on the persistent vdc disk so the
-# 900 MB ollama image pull survives VM relaunches.
-# --cgroup-manager=cgroupfs because there's no systemd in the guest.
-# --network=host so ollama's :11434 binds on the VM's loopback,
-# reachable from other EE workloads (like openclaw) and via /exec.
-OLLAMA_SPEC=$(jq -c -n --argjson gpu "$GPU_FLAGS" '{
-  app_name: "ollama",
-  cmd: ([
-    "/var/lib/easyenclave/bin/podman",
-    "--conmon=/var/lib/easyenclave/bin/conmon",
-    "--runtime=/var/lib/easyenclave/bin/crun",
-    "--root=/var/lib/easyenclave/containers/storage",
-    "--runroot=/var/lib/easyenclave/containers/runroot",
-    "--cgroup-manager=cgroupfs",
-    "run",
-    "--rm", "--name", "ollama",
-    "--network=host"
-  ] + $gpu + [
-    "-v", "/var/lib/easyenclave/ollama:/root/.ollama",
-    "-e", "OLLAMA_HOST=127.0.0.1:11434",
-    "docker.io/ollama/ollama:latest",
-    "serve"
-  ])
-}')
-agent /deploy -H 'Content-Type: application/json' -d "$OLLAMA_SPEC" | jq -c '.' || true
-
-# ── 6. Wait for ollama HTTP to come up inside the container ────────
-# `podman exec ollama ollama list` exits 0 once the server is ready.
-# First run has to pull ~900 MB of container image, so allow plenty.
-echo "  waiting for ollama to be ready (first run pulls the image)..."
-ollama_ready=0
-for i in $(seq 1 120); do
-  resp=$(agent /exec -H 'Content-Type: application/json' \
-    -d '{"cmd":["/var/lib/easyenclave/bin/podman","--root=/var/lib/easyenclave/containers/storage","--runroot=/var/lib/easyenclave/containers/runroot","--cgroup-manager=cgroupfs","exec","ollama","ollama","list"],"timeout_secs":15}' \
-    2>/dev/null || true)
-  if echo "$resp" | jq -e '.exit_code == 0' >/dev/null 2>&1; then
-    echo "  ollama responding"
-    ollama_ready=1
-    break
-  fi
-  sleep 10
-done
-if [ "$ollama_ready" = "0" ]; then
-  echo "ERROR: ollama container never became ready (20 min timeout)"
-  echo "  most recent /exec response:"
-  echo "$resp" | jq .
-  echo "  last 30 lines of 'podman ps -a' + 'podman logs ollama':"
-  agent /exec -H 'Content-Type: application/json' \
-    -d '{"cmd":["/var/lib/easyenclave/bin/podman","--root=/var/lib/easyenclave/containers/storage","--runroot=/var/lib/easyenclave/containers/runroot","ps","-a"],"timeout_secs":10}' | jq -r '.stdout // .stderr // ""'
-  agent /exec -H 'Content-Type: application/json' \
-    -d '{"cmd":["/var/lib/easyenclave/bin/podman","--root=/var/lib/easyenclave/containers/storage","--runroot=/var/lib/easyenclave/containers/runroot","logs","ollama"],"timeout_secs":10}' 2>&1 | jq -r '.stdout // .stderr // ""' | tail -30
-  exit 1
-fi
-
-# ── 7. Pull the model ──────────────────────────────────────────────
-echo "  pulling $MODEL (this can take a few minutes)..."
-pull_resp=$(agent /exec -H 'Content-Type: application/json' \
-  -d "$(jq -c -n --arg m "$MODEL" '{
-    cmd:["/var/lib/easyenclave/bin/podman","--root=/var/lib/easyenclave/containers/storage","--runroot=/var/lib/easyenclave/containers/runroot","--cgroup-manager=cgroupfs","exec","ollama","ollama","pull",$m],
-    timeout_secs:1800
-  }')")
-if ! echo "$pull_resp" | jq -e '.exit_code == 0' >/dev/null 2>&1; then
-  echo "ERROR: ollama pull $MODEL failed"
-  echo "$pull_resp" | jq .
-  exit 1
-fi
-echo "  pull: $(echo "$pull_resp" | jq -r '.stdout // "(no stdout)"' | tail -3)"
-
-# ── 8. Launch OpenClaw ─────────────────────────────────────────────
-# `ollama launch openclaw` installs via npm on first run if missing
-# and then stays foreground, so we register it as a second long-
-# running workload. --yes accepts the install prompt non-interactively.
-echo "  POST /deploy openclaw..."
-OPENCLAW_SPEC=$(jq -c -n --arg m "$MODEL" '{
-  app_name: "openclaw",
-  cmd: [
-    "/var/lib/easyenclave/bin/podman",
-    "--root=/var/lib/easyenclave/containers/storage",
-    "--runroot=/var/lib/easyenclave/containers/runroot",
-    "--cgroup-manager=cgroupfs",
-    "exec", "ollama",
-    "ollama", "launch", "openclaw",
-    "--model", $m,
-    "--yes"
-  ]
-}')
-agent /deploy -H 'Content-Type: application/json' -d "$OPENCLAW_SPEC" | jq -c '.' || true
-
-# ── 9. Confirm openclaw is up ─ three probes, weakest → strongest ──
-# (a) EE lists `openclaw` in /health — proves the workload was
-#     accepted by the in-VM runtime. Flips green on fork, before
-#     npm-install finishes, so on its own it's weak.
-# (b) GET http://127.0.0.1:18789/healthz (the OpenClaw gateway HTTP
-#     endpoint). Docs: https://docs.openclaw.ai/gateway/health.
-#     200 with valid JSON = gateway has bound its port and is
-#     serving. The ollama container runs with --net=host so the
-#     loopback is the VM's loopback; we curl through `podman exec`
-#     so we hit the in-container curl (EE's busybox lacks one).
-# (c) `openclaw agent --message "ping"` — the documented one-shot
-#     CLI. Goes through the running gateway, hands the prompt to
-#     the loaded model, returns the assistant reply. Exit 0 AND
-#     non-empty stdout = the full ollama → openclaw → model path
-#     works end-to-end. The reply gets echoed into the workflow
-#     log as proof of life.
-echo "  confirming openclaw workload is registered with EE..."
-for i in $(seq 1 30); do
-  list=$(agent /health 2>/dev/null || true)
-  if echo "$list" | jq -e '.deployments // [] | index("openclaw")' >/dev/null 2>&1; then
-    echo "  openclaw: registered"
-    break
-  fi
-  sleep 5
-done
-
-echo "  waiting for openclaw gateway on http://127.0.0.1:18789/healthz..."
-openclaw_live=0
-for i in $(seq 1 60); do
-  resp=$(agent /exec -H 'Content-Type: application/json' \
-    -d '{"cmd":["/var/lib/easyenclave/bin/podman","--root=/var/lib/easyenclave/containers/storage","--runroot=/var/lib/easyenclave/containers/runroot","--cgroup-manager=cgroupfs","exec","ollama","curl","-fsS","http://127.0.0.1:18789/healthz"],"timeout_secs":10}' \
-    2>/dev/null || true)
-  if echo "$resp" | jq -e '.exit_code == 0' >/dev/null 2>&1; then
-    echo "  openclaw: /healthz 200"
-    echo "$resp" | jq -r '.stdout // ""' | head -c 200 | sed 's/^/    /'
-    echo
-    openclaw_live=1
-    break
-  fi
-  sleep 5
-done
-
-if [ "$openclaw_live" != "1" ]; then
-  echo "ERROR: openclaw /healthz never returned 200 (gateway didn't come up within 5 min)"
-  echo "  last /exec response:"
-  echo "$resp" | jq -c '.' | head -c 500
-  exit 1
-fi
-
-echo "  sending a round-trip prompt: 'ping'"
-chat=$(agent /exec -H 'Content-Type: application/json' \
-  -d '{"cmd":["/var/lib/easyenclave/bin/podman","--root=/var/lib/easyenclave/containers/storage","--runroot=/var/lib/easyenclave/containers/runroot","--cgroup-manager=cgroupfs","exec","ollama","openclaw","agent","--message","ping","--thinking","low"],"timeout_secs":120}' \
-  2>/dev/null || true)
-reply=$(echo "$chat" | jq -r '.stdout // ""')
-if [ -z "$reply" ] || ! echo "$chat" | jq -e '.exit_code == 0' >/dev/null 2>&1; then
-  echo "ERROR: openclaw agent --message didn't return a reply"
-  echo "  raw: $(echo "$chat" | jq -c '.' | head -c 500)"
-  exit 1
-fi
-echo
-echo "=== openclaw replied ==="
-echo "$reply"
-echo "========================"
-
-echo
-echo "=== agent fleet summary ==="
-echo "  agent:    https://$agent_host"
-echo "  model:    $MODEL"
-echo "  ollama:   podman container 'ollama' on host net, :11434"
-echo "  openclaw: http://127.0.0.1:18789 (gateway), replied to round-trip ping"
-echo "==========================="

From 2f7194142238d428451f110f4ed4faf09a58cf67 Mon Sep 17 00:00:00 2001
From: Alex Newman <posix4e@gmail.com>
Date: Sat, 18 Apr 2026 20:24:49 +0000
Subject: [PATCH 2/2] feat(apps): wire ollama+openclaw into
 dd-local-{preview,prod} + apps/README.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Agent VMs boot the full container stack now — every PR push and every
main merge exercises podman + ollama + openclaw end-to-end. Preview
runs CPU inference with qwen2.5:0.5b; prod runs GPU inference with
qwen2.5:7b on the H100.

Changes:
  - apps/_infra/local-agents.sh: replace the inline `jq -c -n` workload
    literals with a `bake()` helper that reads from apps/<name>/workload
    .{json,json.tmpl}. The workload set grows from
    {nv, mount-models, cloudflared, dd-agent} to
    {nv (prod only), mount-models, podman-static, podman-bootstrap,
     ollama.{prod,preview}.json, openclaw, cloudflared, dd-agent}.

  - apps/podman-bootstrap: install the wrapper as `podman` (not
    `dd-podman`) so bare `podman ps` from PATH reaches the right
    storage root instead of erroring with `mkdir /var/lib/containers:
    read-only file system`. Raw binary moves to .podman-raw; dd-podman
    becomes a symlink for back-compat.

  - deploy-cp.yml + local-agents.sh bake() now pass envsubst a
    restricted var list — only the uppercase `${VAR}` references the
    template actually declares. Lowercase shell locals ($i, $((…)))
    inside openclaw's `until` loop are no longer eaten.

  - apps/README.md: canonical reference for the workload spec, lifecycle
    matrix (CP / preview agent / prod agent), and a "deploying your own"
    walkthrough. Main README.md points at it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .github/workflows/deploy-cp.yml     |  13 +++-
 README.md                           |  17 +----
 apps/README.md                      | 108 +++++++++++++++++++++++++++
 apps/_infra/local-agents.sh         | 111 +++++++++++++++++-----------
 apps/podman-bootstrap/workload.json |   2 +-
 5 files changed, 188 insertions(+), 63 deletions(-)
 create mode 100644 apps/README.md

diff --git a/.github/workflows/deploy-cp.yml b/.github/workflows/deploy-cp.yml
index d21265c..772d750 100644
--- a/.github/workflows/deploy-cp.yml
+++ b/.github/workflows/deploy-cp.yml
@@ -116,13 +116,18 @@ jobs:
           : "${DD_ITA_API_KEY:?set DD_ITA_API_KEY via secrets.DD_ITA_API_KEY}"
           export DD_GITHUB_CALLBACK_URL="${DD_GITHUB_CALLBACK_URL:-https://${DD_HOSTNAME}/auth/github/callback}"
 
-          # Bake a workload template: envsubst ${VAR} placeholders and
-          # strip any "KEY=" env entries that ended up with empty values
-          # (e.g. OAuth creds in non-prod envs).
+          # Bake a workload template: substitute ${VAR} placeholders
+          # and strip "KEY=" env entries that ended up with empty values
+          # (e.g. OAuth creds in non-prod envs). envsubst is restricted
+          # to the uppercase ${VAR} refs the template actually declares
+          # so shell locals inside cmd strings ($i, $((…)), etc.)
+          # aren't eaten.
           bake() {
             case "$1" in
               *.json.tmpl)
-                envsubst < "$1" \
+                local vars
+                vars=$(grep -oE '\$\{[A-Z_][A-Z0-9_]*\}' "$1" | sort -u | tr -d '\n')
+                envsubst "$vars" < "$1" \
                   | jq -c 'if .env then .env |= map(select(test("^[^=]+=.+"))) else . end'
                 ;;
               *.json)
diff --git a/README.md b/README.md
index f8a1df1..575ec1f 100644
--- a/README.md
+++ b/README.md
@@ -22,22 +22,7 @@ The sealed enclave runtime is [EasyEnclave](https://github.com/easyenclave/easye
 
 Every fleet VM boots from a sealed easyenclave image published by [easyenclave/easyenclave](https://github.com/easyenclave/easyenclave/releases). No cloud-init, no stock Ubuntu, no runtime `apt-get install`. The TDX VM's rootfs is the latest image in the `easyenclave-staging` (or `-stable`) family, attestable against a single UKI SHA256.
 
-The `devopsdefender` binary ships as a **GitHub release asset** — not an OCI image. Easyenclave fetches it directly via its `github_release` boot workload source:
-
-```json
-{
-  "github_release": {
-    "repo": "devopsdefender/dd",
-    "asset": "devopsdefender",
-    "tag": "latest"
-  },
-  "cmd": ["devopsdefender"],
-  "app_name": "dd-management",
-  "env": ["DD_MODE=management", ...]
-}
-```
-
-`cloudflared` is also pulled directly from `cloudflare/cloudflared`'s GitHub releases as a fetch-only boot workload — no bundling in our image, no Dockerfile step.
+Every workload is a JSON spec consumed by easyenclave's `DeployRequest`. Boot-time and runtime-deployed workloads share one schema; both the `devopsdefender` binary and `cloudflared` ship as **GitHub release assets** — not OCI images — and easyenclave fetches them via its `github_release` source. The full set of specs and a guide to writing your own lives in [`apps/README.md`](apps/README.md).
 
 Per-VM configuration (CF credentials, GitHub OAuth, the workload spec itself) is passed to easyenclave at boot via **GCE instance metadata** (`ee-config` attribute), read by `easyenclave::init::fetch_gce_metadata_config()` and applied as env vars. The CP-deploy step in `.github/workflows/deploy-cp.yml` builds the spec and invokes `gcloud compute instances create --image-family=easyenclave-staging --metadata-from-file=ee-config=...`.
 
diff --git a/apps/README.md b/apps/README.md
new file mode 100644
index 0000000..12d98d0
--- /dev/null
+++ b/apps/README.md
@@ -0,0 +1,108 @@
+# apps/ — workload specs
+
+This directory is DD's canonical reference for **how to deploy a workload**. Every directory here is one workload — a process easyenclave runs inside a TDX-sealed VM. The specs are both the live deployment configuration and the worked example for operators writing their own.
+
+## Layout
+
+```
+apps/
+  <name>/
+    workload.json          # literal spec
+    workload.json.tmpl     # spec with ${VAR} placeholders (baked at deploy time)
+  _infra/                  # host-side scripts; not a deployable workload
+```
+
+## What a workload looks like
+
+A **workload** is a JSON object consumed by easyenclave's `DeployRequest` (see `src/easyenclave/src/workload.rs`). Minimum shape:
+
+```json
+{
+  "app_name": "myapp",
+  "cmd": ["/bin/busybox", "sh", "-c", "echo hello; sleep inf"]
+}
+```
+
+Add `github_release` to fetch a binary asset directly from a GitHub release — no OCI registry, no Dockerfile. The asset lands in `/var/lib/easyenclave/bin/` and is spawned by `cmd`:
+
+```json
+{
+  "app_name": "cloudflared",
+  "github_release": {
+    "repo": "cloudflare/cloudflared",
+    "asset": "cloudflared-linux-amd64",
+    "rename": "cloudflared"
+  }
+}
+```
+
+Add `env` to inject config:
+
+```json
+{
+  "env": ["MY_ENDPOINT=https://api.example.com", "DEBUG=1"]
+}
+```
+
+## Templates
+
+Files ending in `.json.tmpl` carry `${VAR}` placeholders. At bake time:
+
+1. `envsubst` substitutes every uppercase `${VAR}` that appears in the template using the caller's environment.
+2. `jq` drops env-array entries whose value ended up empty (so you can make OAuth creds / optional secrets conditional by just leaving them unset).
+3. The result is a plain `workload.json` ready for EE.
+
+Only uppercase placeholders get substituted — shell locals like `$i` or `$((n+1))` inside `cmd` strings are left alone. The bake helper is duplicated inline in two places so both lifecycle points behave identically:
+
+- `.github/workflows/deploy-cp.yml` (CI, for CP workloads)
+- `apps/_infra/local-agents.sh` (tdx2 host, for agent VMs)
+
+## Where each workload runs
+
+| workload | CP VM | agent VM (preview) | agent VM (prod) |
+|---|---|---|---|
+| `cloudflared` | ✅ | ✅ | ✅ |
+| `dd-management` | ✅ | | |
+| `dd-agent` | | ✅ | ✅ |
+| `mount-models` | | ✅ | ✅ |
+| `nv` | | | ✅ (GPU insmod) |
+| `podman-static` | | ✅ | ✅ |
+| `podman-bootstrap` | | ✅ | ✅ |
+| `ollama` | | ✅ (CPU, preview.json) | ✅ (GPU, prod.json) |
+| `openclaw` | | ✅ (qwen2.5:0.5b) | ✅ (qwen2.5:7b) |
+
+CP stays slim: just `cloudflared` + `dd-management`. Containerised LLM serving lives on agent VMs where the `vdc` ext4 disk holds models + image storage.
+
+## Ordering
+
+EasyEnclave spawns boot workloads concurrently — there's no declared dependency graph. Dependents self-sequence by polling for their prerequisites. Worked example from this tree:
+
+- `podman-bootstrap` waits for `podman-static`'s tarball (`until [ -x $SRC/usr/local/bin/podman ]; do sleep 1; done`).
+- `ollama`'s cmd waits for the wrapper (`until [ -x /var/lib/easyenclave/bin/podman ]; do sleep 2; done`).
+- `openclaw`'s cmd waits for ollama's HTTP endpoint (`until wget -q -O- http://127.0.0.1:11434/api/tags; do sleep 5; done`) before pulling the model and launching the gateway.
+
+Costs seconds of wasted polling at boot; easy to reason about; no workload-runner changes needed.
+
+## Deploying your own
+
+1. Copy an existing folder as a starting point:
+   ```
+   cp -r apps/cloudflared apps/myapp
+   $EDITOR apps/myapp/workload.json
+   ```
+2. Decide where it runs:
+   - **CP VM**: add a `bake apps/myapp/workload.json` line to the workload-building `run:` step in `.github/workflows/deploy-cp.yml`.
+   - **Agent VM**: add the same call to `apps/_infra/local-agents.sh` in `build_config_iso()`.
+   - **Ad-hoc, runtime-only**: POST the baked JSON to `/deploy` on a running agent:
+     ```
+     curl -H "Authorization: Bearer $DD_PAT" \
+          -H "Content-Type: application/json" \
+          -d @apps/myapp/workload.json \
+          https://<agent-host>/deploy
+     ```
+
+## Reference
+
+- Schema source of truth: [`src/easyenclave/src/workload.rs`](../src/easyenclave/src/workload.rs) — the `DeployRequest` struct EE deserializes on `/deploy`.
+- CP deploy caller: [`.github/workflows/deploy-cp.yml`](../.github/workflows/deploy-cp.yml) — inline `bake()` + CP workload set.
+- Agent VM builder: [`apps/_infra/local-agents.sh`](_infra/local-agents.sh) — inline `bake()` + agent workload set per kind.
diff --git a/apps/_infra/local-agents.sh b/apps/_infra/local-agents.sh
index 20b772a..9d879e0 100755
--- a/apps/_infra/local-agents.sh
+++ b/apps/_infra/local-agents.sh
@@ -31,10 +31,42 @@ fi
 : "${DD_PAT?set DD_PAT (e.g. DD_PAT=\$(gh auth token))}"
 : "${DD_ITA_API_KEY?set DD_ITA_API_KEY}"
 
+# Resolve repo root regardless of invoking CWD — the workload specs
+# under apps/<name>/ need absolute paths so bake() can find them.
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+
 IMG_DIR=/var/lib/libvirt/images
 BASE="$IMG_DIR/easyenclave-local.qcow2"
 BASE_DOMAIN="easyenclave-local"
 
+# Render one workload spec. Matches the helper inlined in
+# .github/workflows/deploy-cp.yml — same envsubst + empty-entry strip,
+# so boot-time (config.iso) and runtime (/deploy) see identical JSON.
+#
+# envsubst is restricted to the ALL-CAPS `${VAR}` references that
+# appear in the template itself. Lowercase `$i`, `${i}`, and bare
+# `$((…))` arithmetic inside shell cmd strings are left alone —
+# otherwise envsubst would eat shell locals in openclaw's `until`
+# loop and produce broken scripts.
+bake() {
+  case "$1" in
+    *.json.tmpl)
+      local vars
+      vars=$(grep -oE '\$\{[A-Z_][A-Z0-9_]*\}' "$1" | sort -u | tr -d '\n')
+      envsubst "$vars" < "$1" \
+        | jq -c 'if .env then .env |= map(select(test("^[^=]+=.+"))) else . end'
+      ;;
+    *.json)
+      jq -c . "$1"
+      ;;
+    *)
+      echo "local-agents.sh: unknown workload file type: $1" >&2
+      return 1
+      ;;
+  esac
+}
+
 [ -r "$BASE" ] || { echo "missing $BASE" >&2; exit 1; }
 virsh dominfo "$BASE_DOMAIN" >/dev/null 2>&1 || {
   echo "base libvirt domain '$BASE_DOMAIN' not defined — rebuild the EE image first" >&2
@@ -58,51 +90,46 @@ build_config_iso() {
   tmp=$(mktemp -d)
   trap "rm -rf $tmp" RETURN
 
-  # EE reads `agent.env` from the config disk (dotenv: KEY=VALUE per
-  # line). EE_BOOT_WORKLOADS is a JSON-encoded array of workload
-  # specs. The first entry on the GPU VM insmods the nvidia driver
-  # so it's ready by the time the dd-agent comes up.
-  local nv_workload="null"
+  # Boot workload chain (EE spawns concurrently; each uses `until`
+  # loops to self-sequence):
+  #   nv             — insmod nvidia driver (prod only, first so the
+  #                    device nodes exist by the time ollama runs)
+  #   mount-models   — mount /dev/vdc at /var/lib/easyenclave/ollama
+  #   podman-static  — fetch the podman binary tarball into /var/lib/easyenclave/bin
+  #   podman-bootstrap — stage binaries, write containers.conf + policy.json,
+  #                    install /var/lib/easyenclave/bin/podman as the wrapper
+  #                    (symlinked from dd-podman for back-compat)
+  #   ollama         — run docker.io/ollama/ollama:latest serve via the wrapper
+  #   openclaw       — wait for ollama, pull $MODEL, launch openclaw gateway
+  #   cloudflared    — fetch cloudflared binary (dd-register spawns it)
+  #   dd-agent       — run devopsdefender agent, register with CP, serve workloads
+  #
+  # Prod gets the GPU model; preview gets the tiny CPU-friendly one.
+  local model ollama_spec
   if [ "$with_gpu" = "yes" ]; then
-    nv_workload=$(jq -c -n '{
-      app_name:"nv",
-      cmd:["/bin/busybox","sh","-c",
-           "/sbin/insmod /lib/modules/7.0.0-14-generic/kernel/nvidia-580srv-open/nvidia.ko NVreg_OpenRmEnableUnsupportedGpus=1 2>&1 && echo nv: loaded || echo nv: failed; sleep inf"]
-    }')
+    model="qwen2.5:7b"
+    ollama_spec="$REPO_ROOT/apps/ollama/workload.prod.json"
+  else
+    model="qwen2.5:0.5b"
+    ollama_spec="$REPO_ROOT/apps/ollama/workload.preview.json"
   fi
 
-  # Mount the persistent models disk (vdc) at /var/lib/easyenclave/ollama
-  # before ollama might try to use it. Pre-formatted ext4 on the host.
-  local mount_workload
-  mount_workload=$(jq -c -n '{
-    app_name:"mount-models",
-    cmd:["/bin/busybox","sh","-c",
-         "mkdir -p /var/lib/easyenclave/ollama && mount /dev/vdc /var/lib/easyenclave/ollama && echo mount-models: ok; sleep inf"]
-  }')
-
   local workloads
-  workloads=$(jq -c -n \
-    --argjson nv "$nv_workload" \
-    --argjson mount "$mount_workload" \
-    --arg cp "$cp" --arg pat "$DD_PAT" --arg ita "$DD_ITA_API_KEY" \
-    --arg env "$env" --arg vm "dd-local-$name" '[
-      $nv,
-      $mount,
-      {"app_name":"cloudflared",
-       "github_release":{"repo":"cloudflare/cloudflared","asset":"cloudflared-linux-amd64","rename":"cloudflared"}},
-      {"app_name":"dd-agent",
-       "github_release":{"repo":"devopsdefender/dd","asset":"devopsdefender","tag":"latest"},
-       "cmd":["devopsdefender","agent"],
-       "env":[
-         "DD_MODE=agent",
-         ("DD_CP_URL=" + $cp), ("DD_PAT=" + $pat), ("DD_ITA_API_KEY=" + $ita),
-         "DD_ITA_BASE_URL=https://api.trustauthority.intel.com",
-         "DD_ITA_JWKS_URL=https://portal.trustauthority.intel.com/certs",
-         "DD_ITA_ISSUER=https://portal.trustauthority.intel.com",
-         "DD_OWNER=devopsdefender", ("DD_ENV=" + $env), ("DD_VM_NAME=" + $vm),
-         "DD_PORT=8080"
-       ]}
-    ] | map(select(. != null))')
+  workloads=$({
+    [ "$with_gpu" = "yes" ] && bake "$REPO_ROOT/apps/nv/workload.json"
+    bake "$REPO_ROOT/apps/mount-models/workload.json"
+    bake "$REPO_ROOT/apps/podman-static/workload.json"
+    bake "$REPO_ROOT/apps/podman-bootstrap/workload.json"
+    bake "$ollama_spec"
+    MODEL="$model" bake "$REPO_ROOT/apps/openclaw/workload.json.tmpl"
+    bake "$REPO_ROOT/apps/cloudflared/workload.json"
+    DD_CP_URL="$cp" \
+      DD_PAT="$DD_PAT" \
+      DD_ITA_API_KEY="$DD_ITA_API_KEY" \
+      DD_ENV="$env" \
+      DD_VM_NAME="dd-local-$name" \
+      bake "$REPO_ROOT/apps/dd-agent/workload.json.tmpl"
+  } | jq -cs '.')
 
   {
     echo "EE_OWNER=devopsdefender"
@@ -112,7 +139,7 @@ build_config_iso() {
   # ext4 — EE rootfs has no iso9660 module.
   truncate -s 4M "$out"
   mkfs.ext4 -q -d "$tmp" "$out"
-  echo "  wrote $out (env=$env, gpu=$with_gpu)"
+  echo "  wrote $out (env=$env, gpu=$with_gpu, model=$model)"
 }
 
 build_overlay() {
diff --git a/apps/podman-bootstrap/workload.json b/apps/podman-bootstrap/workload.json
index 5a797e4..690421b 100644
--- a/apps/podman-bootstrap/workload.json
+++ b/apps/podman-bootstrap/workload.json
@@ -2,6 +2,6 @@
   "app_name": "podman-bootstrap",
   "cmd": [
     "/bin/busybox", "sh", "-c",
-    "set -e\nBIN=/var/lib/easyenclave/bin\nSRC=$BIN/podman-linux-amd64\nuntil [ -x $SRC/usr/local/bin/podman ]; do sleep 1; done\n# If there's a vdc scratch disk, wait for mount-models to actually\n# mount it before we write files under /var/lib/easyenclave/ollama —\n# otherwise our writes land on the rootfs tmpfs and get shadowed the\n# moment vdc is mounted. On VMs without vdc (GCP CP preview) there's\n# no mount-models workload and this check short-circuits.\nif [ -b /dev/vdc ]; then\n  until mountpoint -q /var/lib/easyenclave/ollama 2>/dev/null; do sleep 1; done\nfi\nmkdir -p /var/lib/easyenclave/ollama\ncp -f $SRC/usr/local/bin/* $BIN/\ncp -f $SRC/usr/local/lib/podman/conmon $BIN/\ncp -f $SRC/usr/local/lib/podman/netavark $BIN/ 2>/dev/null || true\ncp -f $SRC/usr/local/lib/podman/aardvark-dns $BIN/ 2>/dev/null || true\ncp -f $SRC/usr/local/lib/podman/rootlessport $BIN/ 2>/dev/null || true\nmkdir -p /var/lib/easyenclave/ollama/.podman/storage /var/lib/easyenclave/ollama/.podman/runroot\n# /dev/shm is where podman puts its per-container POSIX shm lock\n# file (libpod_lock). EE's guest rootfs may not mount tmpfs on\n# /dev/shm; without it, podman fails 'failed to create 2048 locks\n# in /libpod_lock: no such file or directory'. mkdir + mount idempotently.\nif ! mountpoint -q /dev/shm 2>/dev/null; then\n  mkdir -p /dev/shm\n  mount -t tmpfs -o size=64M tmpfs /dev/shm 2>/dev/null || true\nfi\n# Pick storage driver based on substrate. vdc-backed ext4 supports\n# native overlay (fast + space-efficient). Without vdc (GCP CP\n# preview, any guest running on tmpfs rootfs), overlay-on-tmpfs\n# errors out, so fall back to vfs (slower, full copy per layer, but\n# works on any filesystem).\nif mountpoint -q /var/lib/easyenclave/ollama; then\n  DRIVER=overlay\nelse\n  DRIVER=vfs\nfi\n# Write containers.conf on vdc (writable). /etc is RO on EE so we\n# can't put it where podman looks by default. helper_binaries_dir\n# tells podman where we staged conmon/netavark/aardvark-dns/… —\n# podman probes those at startup even with --network=host.\nPOL=/var/lib/easyenclave/ollama/.podman/policy.json\n# Minimum viable signature policy: trust anything. EE's attestation\n# story happens one layer up (image digest pinned by the spec we\n# baked); podman's own signature checking would duplicate that.\nprintf '%s' '{\"default\":[{\"type\":\"insecureAcceptAnything\"}]}' > $POL\n# Podman's containers-common looks for policy.json at hardcoded\n# paths (/etc/containers/, $HOME/.config/containers/). /etc and\n# /root are both RO on EE, so build a fake HOME under\n# /var/lib/easyenclave/.home (writable) and set HOME there in the\n# dd-podman wrapper.\nHOME_DIR=/var/lib/easyenclave/.home\nmkdir -p $HOME_DIR/.config/containers\ncp -f $POL $HOME_DIR/.config/containers/policy.json\nCONF=/var/lib/easyenclave/ollama/.podman/containers.conf\nprintf '%s\\n' '[engine]' 'helper_binaries_dir = [\"/var/lib/easyenclave/bin\"]' > $CONF\nmkdir -p $HOME_DIR/tmp\nprintf '%s\\n' '#!/bin/sh' \"export HOME=$HOME_DIR\" \"export TMPDIR=$HOME_DIR/tmp\" \"export CONTAINERS_CONF=$CONF\" \"exec /var/lib/easyenclave/bin/podman --conmon=/var/lib/easyenclave/bin/conmon --runtime=/var/lib/easyenclave/bin/crun --storage-driver=$DRIVER --root=/var/lib/easyenclave/ollama/.podman/storage --runroot=/var/lib/easyenclave/ollama/.podman/runroot --cgroup-manager=cgroupfs \\\"\\$@\\\"\" > $BIN/dd-podman\nchmod +x $BIN/dd-podman\nls -la $CONF $POL $BIN/dd-podman 2>&1 || true\ncat $CONF\necho podman-bootstrap: v2 ok driver=$DRIVER conf=$CONF policy=$POL"
+    "set -e\nBIN=/var/lib/easyenclave/bin\nSRC=$BIN/podman-linux-amd64\nuntil [ -x $SRC/usr/local/bin/podman ]; do sleep 1; done\n# Wait for mount-models to mount /dev/vdc before writing under\n# /var/lib/easyenclave/ollama — otherwise writes land on tmpfs\n# and get shadowed the moment vdc is mounted. On VMs without vdc\n# (e.g. CP previews with no models disk) this check short-circuits.\nif [ -b /dev/vdc ]; then\n  until mountpoint -q /var/lib/easyenclave/ollama 2>/dev/null; do sleep 1; done\nfi\nmkdir -p /var/lib/easyenclave/ollama\n# Stage helpers first (conmon, netavark, crun, etc.).\nfor f in $SRC/usr/local/bin/*; do\n  name=$(basename $f)\n  case $name in\n    podman) cp -f $f $BIN/.podman-raw ;;\n    *)      cp -f $f $BIN/ ;;\n  esac\ndone\ncp -f $SRC/usr/local/lib/podman/conmon $BIN/\ncp -f $SRC/usr/local/lib/podman/netavark $BIN/ 2>/dev/null || true\ncp -f $SRC/usr/local/lib/podman/aardvark-dns $BIN/ 2>/dev/null || true\ncp -f $SRC/usr/local/lib/podman/rootlessport $BIN/ 2>/dev/null || true\nmkdir -p /var/lib/easyenclave/ollama/.podman/storage /var/lib/easyenclave/ollama/.podman/runroot\n# /dev/shm holds podman's per-container POSIX shm lock file\n# (libpod_lock). EE may not mount tmpfs there; without it, podman\n# fails `failed to create 2048 locks in /libpod_lock`. Idempotent.\nif ! mountpoint -q /dev/shm 2>/dev/null; then\n  mkdir -p /dev/shm\n  mount -t tmpfs -o size=64M tmpfs /dev/shm 2>/dev/null || true\nfi\n# Pick storage driver: overlay on vdc-backed ext4; vfs elsewhere\n# (overlay-on-tmpfs errors out).\nif mountpoint -q /var/lib/easyenclave/ollama; then\n  DRIVER=overlay\nelse\n  DRIVER=vfs\nfi\nPOL=/var/lib/easyenclave/ollama/.podman/policy.json\nprintf '%s' '{\"default\":[{\"type\":\"insecureAcceptAnything\"}]}' > $POL\n# /etc and /root are RO on EE. Build a writable fake HOME for\n# policy.json + podman's default lookups.\nHOME_DIR=/var/lib/easyenclave/.home\nmkdir -p $HOME_DIR/.config/containers $HOME_DIR/tmp\ncp -f $POL $HOME_DIR/.config/containers/policy.json\nCONF=/var/lib/easyenclave/ollama/.podman/containers.conf\nprintf '%s\\n' '[engine]' 'helper_binaries_dir = [\"/var/lib/easyenclave/bin\"]' > $CONF\n# Wrapper installed as $BIN/podman so bare `podman ps` (from PATH)\n# reaches the right storage root + driver. Raw binary lives at\n# $BIN/.podman-raw. $BIN/dd-podman stays as a back-compat symlink\n# since openclaw's workload calls dd-podman by name.\nprintf '%s\\n' '#!/bin/sh' \"export HOME=$HOME_DIR\" \"export TMPDIR=$HOME_DIR/tmp\" \"export CONTAINERS_CONF=$CONF\" \"exec $BIN/.podman-raw --conmon=$BIN/conmon --runtime=$BIN/crun --storage-driver=$DRIVER --root=/var/lib/easyenclave/ollama/.podman/storage --runroot=/var/lib/easyenclave/ollama/.podman/runroot --cgroup-manager=cgroupfs \\\"\\$@\\\"\" > $BIN/podman\nchmod +x $BIN/podman\nln -sf podman $BIN/dd-podman\nls -la $CONF $POL $BIN/podman $BIN/dd-podman $BIN/.podman-raw 2>&1 || true\necho podman-bootstrap: ok driver=$DRIVER conf=$CONF policy=$POL"
   ]
 }