Kubernetes cluster on Hetzner Cloud (ash) running k3s, provisioned via
hetzner-k3s CLI.
Hosts isolated dotCMS environments at TENANT-ENV.botcms.cloud.
| Component | Details |
|---|---|
| k3s | v1.32.0+k3s1 |
| Cilium | CNI |
| Caddy | Ingress — HA (2 replicas), on-demand TLS, CNAME + ConfigMap routing, sticky sessions |
| Valkey | Caddy cert storage shared across replicas (caddy-storage-redis plugin) |
| CloudNativePG (CNPG) | Shared Postgres cluster — one DB + role per tenant environment |
| OpenSearch operator | Shared OpenSearch cluster — per-tenant users/indices |
| csi-s3 + geesefs | S3-backed ReadWriteMany storage via Wasabi |
| Descheduler | Bin-packing — evicts pods from underutilized nodes every 5 min |
| Prometheus + Grafana + Loki | Observability stack at observe.botcms.cloud |
| Headlamp | Kubernetes UI at manage.botcms.cloud |
| Control Plane | dotCMS provisioning app at control.botcms.cloud |
| Pool | Type | Count | RAM |
|---|---|---|---|
| master1/2/3 | cpx21 | 3 | 4 GB |
| medium workers | cpx31 | 4 | 8 GB |
kubectl,helm,envsubst(gettext),curl.envsourced with credentials (see.env.example)- Wildcard DNS:
*.botcms.cloud→ cluster LB IP
Phase 1 Helm repos
Phase 2 Namespaces
Phase 3 Cilium CNI
Phase 5 Caddy ingress on-demand TLS via cname_router plugin
Phase 6 Wildcard DNS *.botcms.cloud → LB IP
Phase 7 CNPG operator
Phase 8 OpenSearch operator
Phase 9 OpenSearch cluster shared 3-node cluster
Phase 10 CSI-S3 Wasabi-backed geesefs storage class
Phase 11 Postgres cluster shared CNPG cluster
Phase 12 Monitoring Prometheus + Grafana + Loki
Phase 13 Descheduler
Phase 14 Valkey Caddy cert storage
Note: Phase 4 (cert-manager) was removed — Caddy handles all TLS directly via ACME.
./deploy.sh --dry-run # validate prereqs, print plan
./deploy.sh --phase 4 # run only phase 4
./deploy.sh --skip 3,4 # skip phases 3 and 4Custom image (dotcms/caddy-cname) with cname_router and caddy-storage-redis plugins.
Routing resolution order:
- Tenant lookup — subdomain parsed as
<org>-<env>, verified via headless service DNS, proxied to pod IP (sticky vialb_sessioncookie). - ConfigMap lookup — any ConfigMap in
caddy-ingressnamespace with labelbotcms.cloud/type=caddy-routewhosedata.hostnamematches the request is proxied todata.clusterip-svc:data.service-port. Used for internal services (control plane, etc.) — no Caddyfile change needed.
Adding a new service route:
apiVersion: v1
kind: ConfigMap
metadata:
name: route-myservice
namespace: caddy-ingress
labels:
botcms.cloud/type: caddy-route
data:
hostname: myservice.botcms.cloud
clusterip-svc: myservice.mynamespace.svc.cluster.local
service-port: "8080"kustomize/
dotcms-base/ canonical Deployment, Services, HPA, PDB, PVC, CaddyRoute
tenants/INSTANCE/ per-tenant overlay — generated by the Control Plane worker
The Control Plane worker generates overlays automatically when provisioning environments. Manual generation:
TENANT_ID=acme ENV_ID=prod DOTCMS_IMAGE=mirror.gcr.io/dotcms/dotcms:LTS-24.10 \
./generate-tenant-overlay.sh
kubectl apply -k kustomize/tenants/acme-prod/- Shared 3-node cluster in
postgresnamespace - One database + role per environment (
TENANT-ENV) - Image:
dotcms/cnpg-postgresql:18(PG 18 + pgvector + pgvectorscale) - Backups: continuous WAL + daily base backups to Wasabi, 30-day retention
- Endpoint:
postgres-rw.postgres.svc.cluster.local:5432
Database-driven provisioning app at https://control.botcms.cloud.
- Auth: Google OAuth (NextAuth v5)
- DB:
dotcms_cloud_controldatabase in the shared CNPG cluster - Worker: background polling loop — provisions/patches/stops/decommissions tenant environments
See control-plane/README.md for deployment details.
- Grafana:
https://observe.botcms.cloud - Headlamp:
https://manage.botcms.cloud. (not currently running)
# Get Grafana admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
-o jsonpath='{.data.admin-password}' | base64 -d| Variable | Purpose |
|---|---|
KUBECONFIG |
Path to kubeconfig (default: ./kubeconfig) |
HCLOUD_TOKEN |
Hetzner Cloud API token |
WASABI_ACCESS_KEY / WASABI_SECRET_KEY |
S3 credentials |
WASABI_REGION |
e.g. us-east-1 |
WASABI_BUCKET |
CNPG WAL + backups |
WASABI_S3FUSE_BUCKET |
dotCMS assets (csi-s3) |
WASABI_LOKI_BUCKET |
Loki log storage |
ACME_EMAIL |
Let's Encrypt email |
BASE_DOMAIN |
e.g. botcms.cloud |
OPENSEARCH_ADMIN_USER / OPENSEARCH_ADMIN_PASSWORD |
OpenSearch admin |
GRAFANA_ADMIN_PASSWORD |
Grafana admin password |
DOTCMS_IMAGE |
e.g. mirror.gcr.io/dotcms/dotcms:latest |