Status: Accepted Date: 2026-04-13 Supersedes (partially): ADR-015 — the single-VM alternative is kept as a fallback but is no longer the preferred cost-control path.
The full team deployment on Azure Container Apps (ADR-012) runs 13 Container Apps, 3 Jobs, PostgreSQL Flexible Server, ACR Basic, Key Vault, Log Analytics, and a VNet. At 24×7 scale it consumes the full personal 50 EUR/month VS-Enterprise credit within ~10 days. We need the full ACA topology — not a single-VM substitute — because:
- CI/CD, Playwright smoke tests, and the external demo URL all target ACA
- Contributors validate PRs against the same infrastructure that compliance tests run on
- The demo schedule (Mon–Fri 07:00–20:00 Europe/Berlin) already matches our working hours
ADR-015 documented a single-VM fallback, but running the stack on ACA during working hours and scaling everything down outside them is strictly better: no drift between "personal dev" and "team deployment", no cloud-init / systemd maintenance, and the cost difference is meaningful enough to fit the credit.
Implement a scheduled off-hours scale-down of the entire rg-mvhd-dev resource group
via GitHub Actions cron:
- Stop action — daily 18:00 UTC (= 20:00 Europe/Berlin CEST / 19:00 CET):
az containerapp update --min-replicas 0 --max-replicas 0for all 13 Container Appsaz postgres flexible-server stop --name pg-mvhd-dev
- Start action — Monday–Friday 05:00 UTC (= 07:00 Europe/Berlin CEST / 06:00 CET):
az postgres flexible-server start(waits for Ready)- Restore original
min-replicas/max-replicasper app (baked into the workflow) - Re-run
mvhd-bootstrap-jobbecause Vault loses in-memory state on pod restart (CLAUDE.md gotcha #1) - Smoke-check the UI URL returns HTTP 200
| App | min | max |
|---|---|---|
| mvhd-neo4j, mvhd-keycloak, mvhd-vault, mvhd-nats, mvhd-identityhub, mvhd-issuerservice, mvhd-tenant-mgr, mvhd-provision-mgr | 1 | 1 |
| mvhd-neo4j-proxy, mvhd-controlplane, mvhd-dp-fhir, mvhd-dp-omop | 1 | 2 |
| mvhd-ui | 1 | 3 |
All apps scale to 0/0 off-hours; started apps restore these values.
At West Europe list prices, April 2026:
| Component | On-hours (Mon–Fri 07–20) | Off-hours | Monthly |
|---|---|---|---|
| ACA compute (13 apps, ~1.0 vCPU avg) | ~286 h | 0 (scaled) | ~€18 |
| PostgreSQL Flexible Server B1ms | ~286 h | stopped (free) | ~€4 |
| ACR Basic | 24×7 | 24×7 | ~€4 |
| Log Analytics (pay-as-you-go) | usage-based | minimal | ~€3 |
| VNet + private endpoints + public IP | 24×7 | 24×7 | ~€3 |
| GitHub Actions (scale jobs) | ~25 runs/month | — | €0 |
| Total | ~€32/month |
Comfortable 36% margin under the 50 EUR credit.
- No infrastructure drift: team deployment, personal dev, and CI all target the same ACA topology
- Eliminates cloud-init / systemd / custom VM image maintenance
- ~€32/month vs. ~€53/month (ADR-015 VM path) → more room under the 50 EUR cap
- Free cold-start recovery for stateful services via native ACA lifecycle
- Weekend safety: on Saturday/Sunday the env stays at 0 replicas so no one can accidentally burn credit running expensive queries
- Cold-start delay: ~60–90 s for Neo4j to accept connections, ~30 s for Keycloak.
Users hitting the URL at 07:00 sharp may see 502s for up to ~2 minutes. Workflow
waits for
mvhd-uiHTTP 200 before declaring success. - Vault re-bootstrap required every morning: Vault is in-memory (gotcha #1). The
start workflow must re-run
mvhd-vault-bootstrapbefore the UI is usable. Adds ~30 s to morning startup. Neo4j is ephemeral on ACA and must be re-seeded every morning— Superseded by ADR-017. Themvhd-neo4jContainer App now mounts Azure Files at/dataand/logs, so the knowledge graph survives revision restarts and the morning scale-up no longer wipes the database. The seed job runs as a one-time bootstrap rather than a daily re-population. The seed job talks to Neo4j via Bolt (cypher-shell) on port 7687, so themvhd-neo4jingress only needstransport: Tcp, targetPort: 7687— noadditionalPortMappingsrequired.- DST drift: GitHub Actions cron is UTC-only. 18:00 UTC = 20:00 Europe/Berlin in summer (CEST), 19:00 local in winter (CET). Same ±1 h drift as ADR-014.
- Accidental manual scaling: if a contributor manually scales an app during the on-hours window, the stop job will still set min=max=0 at 20:00 regardless. This is intentional — the schedule is authoritative.
If scale-down causes persistent problems (e.g., Keycloak cold-start corrupts state),
disable the cron in .github/workflows/aca-schedule.yml and manually run the start
job. The VM fallback in ADR-015 remains provisioned as an option.
- Workflow:
.github/workflows/aca-schedule.yml - Team deployment: ADR-012: Azure Container Apps
- Weekly reset: ADR-014: Weekly Demo Reset
- VM fallback: ADR-015: Single-VM Dev Deployment
- Vault-in-memory constraint:
CLAUDE.mdgotcha #1 — resolution pending in ADR-017 (Vault file backend on Azure Files) - Neo4j ephemeral on ACA: resolved by ADR-017 — Azure Files volumes mounted at
/dataand/logs - Persistent storage decision: ADR-017: Persistent Storage for Stateful Services on ACA