Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions .github/actions/relaunch-agent/action.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Relaunch local TDX agent
description: >-
SSH into the tdx2 host and recreate the matching dd-local-{kind} libvirt
domain against the given CP url, pulling scripts from the given git ref.
Shared between Local Agents (push/PR/dispatch) and Deploy CP (cascading
relaunch after a successful CP deploy).
SSH into the tdx2 host, recreate the matching dd-local-{kind} libvirt
domain against the given CP url (pulling apps/ from the given git ref),
then block until the agent re-registers with the CP. A release is "done"
only when this action succeeds end-to-end.

inputs:
kind:
Expand Down Expand Up @@ -68,4 +68,36 @@ runs:
ssh-keyscan -H "$HOST" >> ~/.ssh/known_hosts 2>/dev/null
ssh -o BatchMode=yes -o StrictHostKeyChecking=yes \
-i ~/.ssh/id_ed25519 "tdx2@$HOST" \
"DD_PAT='$DD_PAT' DD_ITA_API_KEY='$DD_ITA_API_KEY' /home/tdx2/src/dd/scripts/dd-relaunch.sh '$KIND' '$URL' '$REF'"
"DD_PAT='$DD_PAT' DD_ITA_API_KEY='$DD_ITA_API_KEY' /home/tdx2/src/dd/apps/_infra/dd-relaunch.sh '$KIND' '$URL' '$REF'"

# Block until the freshly-booted agent VM registers with the CP.
# This is the "I can see the local agent deployment worked" signal
# that gates the whole release. 5-min budget covers a cold VM boot
# (~60s) + cloudflared tunnel (~30s) + agent startup + register —
# plenty of headroom. Doesn't probe openclaw/ollama readiness —
# that first-boot pays a 30-min npm-install tax and isn't part
# of the release gate.
- name: Verify agent registered with CP
shell: bash
env:
URL: ${{ inputs.url }}
DD_PAT: ${{ inputs.dd-pat }}
KIND: ${{ inputs.kind }}
run: |
vm="dd-local-$KIND"
started_at=$(date -u +%Y-%m-%dT%H:%M:%SZ)
AUTH=(-H "Authorization: Bearer $DD_PAT")
for i in $(seq 1 30); do
host=$(curl -fsS --max-time 10 "${AUTH[@]}" "$URL/api/agents" 2>/dev/null \
| jq -r --arg since "$started_at" --arg vm "$vm" '
[.[] | select(.vm_name==$vm and .status=="healthy" and .last_seen > $since)]
| sort_by(.last_seen) | reverse | .[0].hostname // empty' 2>/dev/null || true)
if [ -n "$host" ] && [ "$host" != "null" ]; then
echo "$vm registered at https://$host"
exit 0
fi
echo " waiting for $vm to register with $URL... (${i}/30)"
sleep 10
done
echo "::error::$vm never registered with $URL within 5 min"
exit 1
97 changes: 76 additions & 21 deletions .github/workflows/deploy-cp.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
name: Deploy CP

# Reusable workflow: provision the CP TDX VM on GCP, wait for it to be
# healthy, verify attestation + dashboard + STONITH, and cascade a
# relaunch of the matching dd-local agent VM. Called from release.yml
# (preview path) and production-deploy.yml (prod path) with different
# inputs — both paths share this exact set of verification steps, so
# preview CI exercises the same code that prod runs.
# healthy, verify attestation + dashboard + STONITH, then cascade a
# relaunch of the matching dd-local agent VM and block until it
# re-registers. Called from release.yml's deploy-preview (PR path) and
# deploy-production (main / dispatch path) with env-specific inputs —
# both paths share this exact set of verification steps so every PR
# exercises the prod deploy code.
#
# GitHub Actions allows ≤4 levels of workflow_call nesting. Today's
# chain is `release.yml → deploy-cp.yml` (2) and
# `production-deploy.yml → deploy-cp.yml` (2) — deep enough headroom
# that we can still call one more reusable workflow below us if needed.
# The agent-relaunch cascade uses a composite action (same-job, no
# nesting) to keep that headroom.
# chain is `release.yml → deploy-cp.yml` (2). The agent-relaunch
# cascade uses a composite action (same-job, no nesting) to keep
# headroom for future wrapping.

on:
workflow_call:
Expand Down Expand Up @@ -95,15 +94,75 @@ jobs:
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.DD_CP_CF_ACCOUNT_ID }}
CLOUDFLARE_ZONE_ID: ${{ secrets.DD_CP_CF_ZONE_ID }}
# OAuth only in environments that have these set (production).
# When empty, gcp-deploy.sh omits the workload env vars →
# dd-web disables /auth/github/* and serves /auth/pat only.
# Empty placeholder values get stripped below before baking the
# workload spec, so dd-web disables /auth/github/* and serves
# /auth/pat only in those envs.
DD_GITHUB_CLIENT_ID: ${{ inputs.oauth_enabled && (vars.DD_GITHUB_CLIENT_ID || secrets.DD_GITHUB_CLIENT_ID) || '' }}
DD_GITHUB_CALLBACK_URL: ${{ inputs.oauth_enabled && vars.DD_GITHUB_CALLBACK_URL || '' }}
DD_GITHUB_CLIENT_SECRET: ${{ inputs.oauth_enabled && secrets.DD_GITHUB_CLIENT_SECRET || '' }}
# ITA — optional. When set, the CP mints + verifies quotes.
DD_ITA_API_KEY: ${{ secrets.DD_ITA_API_KEY }}
DD_RELEASE_TAG: ${{ inputs.release_tag }}
run: scripts/gcp-deploy.sh
EE_IMAGE_FAMILY: easyenclave-staging
EE_IMAGE_PROJECT: easyenclave
VM_MACHINE_TYPE: c3-standard-4
VM_DISK_SIZE: 10GB
DD_ITA_BASE_URL: https://api.trustauthority.intel.com
DD_ITA_JWKS_URL: https://portal.trustauthority.intel.com/certs
DD_ITA_ISSUER: https://portal.trustauthority.intel.com
run: |
set -euo pipefail

VM_NAME="dd-${DD_ENV}-$(date +%s)"
: "${DD_ITA_API_KEY:?set DD_ITA_API_KEY via secrets.DD_ITA_API_KEY}"
export DD_GITHUB_CALLBACK_URL="${DD_GITHUB_CALLBACK_URL:-https://${DD_HOSTNAME}/auth/github/callback}"

# Bake a workload template: envsubst ${VAR} placeholders and
# strip any "KEY=" env entries that ended up with empty values
# (e.g. OAuth creds in non-prod envs).
bake() {
case "$1" in
*.json.tmpl)
envsubst < "$1" \
| jq -c 'if .env then .env |= map(select(test("^[^=]+=.+"))) else . end'
;;
*.json)
jq -c . "$1"
;;
*)
echo "::error::unknown workload file type: $1" >&2
return 1
;;
esac
}

# Boot workloads come from apps/<name>/workload.{json,json.tmpl}.
# cloudflared fetches the binary onto PATH; dd-management runs
# devopsdefender in DD_MODE=management (CP + dashboard).
EE_BOOT_WORKLOADS=$({
bake apps/cloudflared/workload.json
bake apps/dd-management/workload.json.tmpl
} | jq -cs '.')

jq -c -n \
--arg workloads "$EE_BOOT_WORKLOADS" \
'{ "EE_BOOT_WORKLOADS": $workloads, "EE_OWNER": "devopsdefender" }' \
> /tmp/ee-config.json

gcloud compute instances create "$VM_NAME" \
--project="$GCP_PROJECT_ID" \
--zone="$GCP_ZONE" \
--machine-type="$VM_MACHINE_TYPE" \
--confidential-compute-type=TDX \
--maintenance-policy=TERMINATE \
--boot-disk-size="$VM_DISK_SIZE" \
--image-family="$EE_IMAGE_FAMILY" \
--image-project="$EE_IMAGE_PROJECT" \
--metadata-from-file=ee-config=/tmp/ee-config.json \
--labels=devopsdefender=managed,dd_env="${DD_ENV}" \
--tags=dd-management

rm -f /tmp/ee-config.json
echo "VM: $VM_NAME ($DD_HOSTNAME, release $DD_RELEASE_TAG)"

- name: Wait for agent health (streams serial console)
env:
Expand Down Expand Up @@ -287,15 +346,11 @@ jobs:
}

# Cascade a relaunch of the matching dd-local-{env} libvirt domain
# on the tdx2 host. Preview runs dd-local-preview against the PR's
# CP; prod runs dd-local-prod against app.devopsdefender.com.
# Non-blocking (`continue-on-error`) because the openclaw boot
# chain inside dd-local-preview can take 30 min on first boot —
# we want PR status reflecting the CP deploy, with the agent
# relaunch as a signal-only exercise until vdc is warm.
# on the tdx2 host, then block on it registering with the freshly-
# deployed CP. This is the gate: a release is "done" only when the
# local agent is back online talking to the new CP.
- name: Relaunch dd-local-${{ inputs.env == 'production' && 'prod' || 'preview' }}
if: inputs.relaunch_agent
continue-on-error: true
uses: ./.github/actions/relaunch-agent
with:
kind: ${{ inputs.env == 'production' && 'prod' || 'preview' }}
Expand Down
47 changes: 0 additions & 47 deletions .github/workflows/local-agents.yml

This file was deleted.

53 changes: 0 additions & 53 deletions .github/workflows/production-deploy.yml

This file was deleted.

71 changes: 52 additions & 19 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
name: Release

# Build the static musl binary, publish it as a GitHub release asset,
# and (on PRs) deploy it to an ephemeral per-PR preview. Replaces the
# Docker build+push pipeline — easyenclave fetches the asset directly
# via its github_release workload source.
# One workflow to rule them all: build the static musl binary, publish
# it as a GitHub release asset, and deploy it to either the PR preview
# (per-PR ephemeral CP at pr-N.domain) or production (app.domain). Both
# paths cascade into a relaunch of the matching dd-local agent VM on
# the tdx2 host, and the Release run only goes green when that agent
# re-registers with the freshly-deployed CP.
#
# PR: pre-release tagged pr-{sha12}, then full PR-preview deploy.
# push to main: rolling `latest` release (no deploy — that's production)
# push v* tag: versioned release (no deploy)
# Paths:
# pull_request → build → deploy-preview → dd-local-preview relaunch
# push main → build → deploy-production → dd-local-prod relaunch
# push v* → build only (versioned release, no deploy)
# workflow_dispatch → build → deploy-production (rollback tool;
# release_tag input picks which tag to deploy)

on:
push:
Expand All @@ -18,10 +23,18 @@ on:
pull_request:
paths-ignore:
- "README.md"
workflow_dispatch:
inputs:
release_tag:
description: 'Release tag to deploy to production (rollback tool; default: latest)'
required: false
default: 'latest'

concurrency:
group: dd-release-${{ github.ref }}
cancel-in-progress: true
# PR pushes cancel old runs. Main / tag / manual dispatch queue —
# we never want to cancel an in-progress prod deploy.
cancel-in-progress: ${{ github.event_name == 'pull_request' }}

permissions:
contents: write
Expand Down Expand Up @@ -75,10 +88,7 @@ jobs:
# `https://github.com/devopsdefender/dd/.github/workflows/release.yml@<ref>`).
# The attestation is stored on the repo's /attestations endpoint
# and retrievable via `gh attestation verify` or the REST API.
#
# For now we're tracking (not enforcing) — the CP will eventually
# use this to verify that a registering agent's artifact came
# from this workflow. Skipped on fork PRs (they lack id-token).
# Skipped on fork PRs (they lack id-token).
- name: Attest devopsdefender binary
if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
uses: actions/attest-build-provenance@v2
Expand Down Expand Up @@ -113,13 +123,8 @@ jobs:
| tail -n +12 \
| xargs -rI{} gh release delete {} --yes --cleanup-tag

# Deploy the freshly-built binary to the PR's ephemeral preview.
# Each PR gets its own env at pr-{N}.{domain} with DD_ENV=pr-{N}
# (hostname-isolated, no OAuth — browser access via /auth/pat).
# main/v* produce releases that production-deploy picks up separately.
#
# Body lives in deploy-cp.yml — same workflow prod uses, so every PR
# exercises the prod deploy path.
# Per-PR ephemeral preview at pr-{N}.{domain}. No OAuth (browser login
# via /auth/pat). Cascades into dd-local-preview relaunch.
deploy-preview:
if: github.event_name == 'pull_request'
needs: build
Expand All @@ -139,3 +144,31 @@ jobs:
comment_on_pr: true
ref: ${{ github.event.pull_request.head.ref }}
secrets: inherit

# Production deploy at app.{domain}. Fires on push-to-main OR on a
# manual workflow_dispatch (rollback to a specific release_tag).
# Tag pushes (v*) intentionally do not auto-deploy — they just
# publish the artifact. Cascades into dd-local-prod relaunch.
deploy-production:
if: >-
(github.event_name == 'push' && github.ref == 'refs/heads/main')
|| github.event_name == 'workflow_dispatch'
needs: build
permissions:
contents: read
id-token: write
# Granted (though unused — comment_on_pr=false here) so the
# permissions intersection with deploy-cp.yml's job matches.
pull-requests: write
uses: ./.github/workflows/deploy-cp.yml
with:
env: production
hostname: app.${{ vars.DD_CF_DOMAIN || 'devopsdefender.com' }}
gcp_environment: production
workload_identity_provider: 'projects/779946350556/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'
service_account: 'easyenclave-production-ci@easyenclave.iam.gserviceaccount.com'
release_tag: ${{ inputs.release_tag || 'latest' }}
oauth_enabled: true
comment_on_pr: false
ref: main
secrets: inherit
Loading
Loading