Skip to content

wezell/hetzner-k3s

Repository files navigation

dotCMS k3s — Multi-Tenant Hetzner Infrastructure

Kubernetes cluster on Hetzner Cloud (ash) running k3s, provisioned via hetzner-k3s CLI. Hosts isolated dotCMS environments at TENANT-ENV.botcms.cloud.

Stack

Component Details
k3s v1.32.0+k3s1
Cilium CNI
Caddy Ingress — HA (2 replicas), on-demand TLS, CNAME + ConfigMap routing, sticky sessions
Valkey Caddy cert storage shared across replicas (caddy-storage-redis plugin)
CloudNativePG (CNPG) Shared Postgres cluster — one DB + role per tenant environment
OpenSearch operator Shared OpenSearch cluster — per-tenant users/indices
csi-s3 + geesefs S3-backed ReadWriteMany storage via Wasabi
Descheduler Bin-packing — evicts pods from underutilized nodes every 5 min
Prometheus + Grafana + Loki Observability stack at observe.botcms.cloud
Headlamp Kubernetes UI at manage.botcms.cloud
Control Plane dotCMS provisioning app at control.botcms.cloud

Node Layout

Pool Type Count RAM
master1/2/3 cpx21 3 4 GB
medium workers cpx31 4 8 GB

Prerequisites

  • kubectl, helm, envsubst (gettext), curl
  • .env sourced with credentials (see .env.example)
  • Wildcard DNS: *.botcms.cloud → cluster LB IP

deploy.sh — Infrastructure Phases

Phase 1   Helm repos
Phase 2   Namespaces
Phase 3   Cilium CNI
Phase 5   Caddy ingress         on-demand TLS via cname_router plugin
Phase 6   Wildcard DNS          *.botcms.cloud → LB IP
Phase 7   CNPG operator
Phase 8   OpenSearch operator
Phase 9   OpenSearch cluster    shared 3-node cluster
Phase 10  CSI-S3                Wasabi-backed geesefs storage class
Phase 11  Postgres cluster      shared CNPG cluster
Phase 12  Monitoring            Prometheus + Grafana + Loki
Phase 13  Descheduler
Phase 14  Valkey                Caddy cert storage

Note: Phase 4 (cert-manager) was removed — Caddy handles all TLS directly via ACME.

./deploy.sh --dry-run       # validate prereqs, print plan
./deploy.sh --phase 4       # run only phase 4
./deploy.sh --skip 3,4      # skip phases 3 and 4

Caddy — Ingress & Routing

Custom image (dotcms/caddy-cname) with cname_router and caddy-storage-redis plugins.

Routing resolution order:

  1. Tenant lookup — subdomain parsed as <org>-<env>, verified via headless service DNS, proxied to pod IP (sticky via lb_session cookie).
  2. ConfigMap lookup — any ConfigMap in caddy-ingress namespace with label botcms.cloud/type=caddy-route whose data.hostname matches the request is proxied to data.clusterip-svc:data.service-port. Used for internal services (control plane, etc.) — no Caddyfile change needed.

Adding a new service route:

apiVersion: v1
kind: ConfigMap
metadata:
  name: route-myservice
  namespace: caddy-ingress
  labels:
    botcms.cloud/type: caddy-route
data:
  hostname: myservice.botcms.cloud
  clusterip-svc: myservice.mynamespace.svc.cluster.local
  service-port: "8080"

Kustomize — Tenant Manifests

kustomize/
  dotcms-base/          canonical Deployment, Services, HPA, PDB, PVC, CaddyRoute
  tenants/INSTANCE/     per-tenant overlay — generated by the Control Plane worker

The Control Plane worker generates overlays automatically when provisioning environments. Manual generation:

TENANT_ID=acme ENV_ID=prod DOTCMS_IMAGE=mirror.gcr.io/dotcms/dotcms:LTS-24.10 \
  ./generate-tenant-overlay.sh
kubectl apply -k kustomize/tenants/acme-prod/

PostgreSQL — CloudNativePG

  • Shared 3-node cluster in postgres namespace
  • One database + role per environment (TENANT-ENV)
  • Image: dotcms/cnpg-postgresql:18 (PG 18 + pgvector + pgvectorscale)
  • Backups: continuous WAL + daily base backups to Wasabi, 30-day retention
  • Endpoint: postgres-rw.postgres.svc.cluster.local:5432

Control Plane

Database-driven provisioning app at https://control.botcms.cloud.

  • Auth: Google OAuth (NextAuth v5)
  • DB: dotcms_cloud_control database in the shared CNPG cluster
  • Worker: background polling loop — provisions/patches/stops/decommissions tenant environments

See control-plane/README.md for deployment details.

Monitoring

  • Grafana: https://observe.botcms.cloud
  • Headlamp: https://manage.botcms.cloud. (not currently running)
# Get Grafana admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath='{.data.admin-password}' | base64 -d

Required .env Variables

Variable Purpose
KUBECONFIG Path to kubeconfig (default: ./kubeconfig)
HCLOUD_TOKEN Hetzner Cloud API token
WASABI_ACCESS_KEY / WASABI_SECRET_KEY S3 credentials
WASABI_REGION e.g. us-east-1
WASABI_BUCKET CNPG WAL + backups
WASABI_S3FUSE_BUCKET dotCMS assets (csi-s3)
WASABI_LOKI_BUCKET Loki log storage
ACME_EMAIL Let's Encrypt email
BASE_DOMAIN e.g. botcms.cloud
OPENSEARCH_ADMIN_USER / OPENSEARCH_ADMIN_PASSWORD OpenSearch admin
GRAFANA_ADMIN_PASSWORD Grafana admin password
DOTCMS_IMAGE e.g. mirror.gcr.io/dotcms/dotcms:latest

About

dotCMS k3s cluster on Hetzner Cloud

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors