The Aphex Pipeline Infrastructure provides a production-ready GitOps platform using ArgoCD and Tekton with revolutionary layered cert-manager architecture. It enables zero-touch deployment, centralized authentication, self-service repository onboarding, and bulletproof certificate management.
Kind (Development):
- Use for local development and testing
- Requires Docker and Kind CLI
- Simulated GPU support (RuntimeClass only)
- Fast iteration and experimentation
- Bootstrap:
./bootstrap.sh --deployment kind
K3s (Production):
- Use for production deployments with GPU workloads
- Requires Ubuntu 22.04+, NVIDIA drivers, nvidia-container-toolkit
- Real GPU support via NVIDIA GPU Operator
- Gateway API with external DNS
- Bootstrap:
./bootstrap.sh --deployment k3s
Infrastructure:
- Kind: nginx-ingress-controller, simulated GPU
- K3s: Gateway API, NVIDIA GPU Operator, External DNS
ArgoCD Applications:
- Kind: Uses
platform/base/argocd/apps→ referencesplatform/deployments/kind/* - K3s: Uses
platform/deployments/k3s/argocd/apps→ referencesplatform/deployments/k3s/*
Additional Apps (K3s only):
platform-gpu-operator: Real NVIDIA GPU supportplatform-gateway: Gateway API for advanced routingplatform-external-dns: Automatic DNS record management
This platform provides shared CI/CD infrastructure with complete tenant isolation. Teams can onboard repositories through RepoBinding CRDs, which automatically provisions isolated namespaces with RBAC, network policies, AppProject boundaries, and pipeline resources. The platform serves as the foundation for automated deployments and infrastructure management.
The platform implements a layered cert-manager deployment that eliminates the classic "webhook chicken-and-egg" problem through sync waves and PostSync validation. This eliminates manual intervention and timing-related failures.
For detailed architecture, see architecture.md.
The bootstrap script achieves complete platform convergence automatically by generating all secrets, creating the cluster, and waiting for full platform functionality without manual steps. The dispatcher routes to deployment-specific bootstrap scripts based on the --deployment flag.
For deployment procedures, see operations.md.
A tenant is a product team with complete isolation and dedicated resources:
- Namespace: Isolated Kubernetes namespace with RBAC boundaries
- Service Account: Least-privilege access with role-based permissions
- Resource Quotas: CPU, memory, and storage limits
- Network Policies: Traffic isolation with ingress exceptions
- AppProject: ArgoCD project isolation with scoped destinations
- EventListener: Tekton webhook handler for GitHub integration
- Pipeline Resources: Access to shared catalog and custom pipelines
All platform services use centralized authentication via Authentik and Dex:
ArgoCD UI: https://argocd.home.local - Click "Login via Dex"
Tekton Dashboard: https://tekton.home.local - Authenticate via Dex/Authentik
Authentik UI: https://auth.home.local - Direct login with admin credentials
For detailed authentication procedures, see operations.md.
Use the Authentik web UI at https://auth.home.local. Create users and assign to groups (admins or engineering). Users can authenticate immediately without pod restarts.
The platform uses two domain strategies for different purposes:
arbiter-dev.com (Public Domain):
- Organization webhook endpoints accessible from the internet
- GitHub webhooks can reach these endpoints
- SSL/TLS termination at Cloudflare edge
- Example:
acme-corp.arbiter-dev.com
home.local (Local Domain):
- Authentication and platform services accessible only within home network
- ArgoCD, Authentik, Dex, Tekton Dashboard
- Example:
argocd.home.local,auth.home.local
This separation ensures webhook endpoints are publicly accessible while keeping platform administration services private.
Webhooks use Cloudflare Tunnels to reach the cluster without exposing ports. The cloudflared pod maintains an outbound connection to Cloudflare, so no inbound ports are opened on your network.
For detailed webhook flow and networking architecture, see architecture.md.
Organization deletion follows this sequence:
- Delete DNS CNAME record from Cloudflare
- Cleanup active tunnel connections via Cloudflare API
- Delete tunnel from Cloudflare
- Delete ClusterRoleBinding for EventListener
- Delete ClusterSecretStore (cluster-scoped)
- Delete organization namespace (cascades all resources including ESO RBAC)
This ensures complete cleanup with no orphaned resources in Cloudflare or Kubernetes.
Each organization automatically receives a ClusterSecretStore that enables centralized secret management:
-
Create secrets in your organization namespace:
kubectl create secret generic org-secrets \ -n org-my-org \ --from-literal=github-token=ghp_xxx \ --from-literal=database-password=mypass
-
Label your application namespace:
kubectl label namespace my-app aphex.dev/org=my-org
-
Create ExternalSecret in your application namespace:
apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: my-app-secrets namespace: my-app spec: refreshInterval: 1h secretStoreRef: name: org-my-org-store kind: ClusterSecretStore target: name: my-app-secrets data: - secretKey: github_token remoteRef: key: org-secrets property: github-token
The External Secrets Operator automatically syncs secrets from org-secrets to your application namespace.
Benefits:
- Centralized secret management per organization
- No need to duplicate secrets across namespaces
- Automatic synchronization and rotation support
- Namespace isolation via label selectors
For detailed API documentation, see api.md.
Each pipeline automatically receives an ArgoCD AppProject that enforces security boundaries:
Scoping Rules:
- Destinations: Only
{pipelineName}and{pipelineName}-*namespaces - Source Repositories: Only the specific GitHub repository
- Cluster Resources: None (empty whitelist)
- Namespace Resources: All resources within scoped namespaces
Benefits:
- Prevents cross-pipeline Application deployments
- Enforces namespace boundaries
- Restricts source repositories
- Prevents cluster-scoped resource creation
Lifecycle: Created during RepoBinding provisioning, deleted when RepoBinding is deleted.
For detailed AppProject API, see api.md.
For detailed webhook troubleshooting procedures, see operations.md and operations.md.
For user management procedures, see operations.md.
kubectl get secret authentik-secrets -n auth-system \
-o jsonpath='{.data.admin-password}' | base64 -dBootstrap never prints secrets to stdout for security.
The platform uses real hostnames for OIDC authentication:
- Browser redirects require reachable URLs
- Internal Kubernetes DNS (
*.svc.cluster.local) won't work - Configure DNS for
*.home.localin your router or hosts file
Create a RepoBinding resource:
apiVersion: aphex.io/v1alpha1
kind: RepoBinding
metadata:
name: my-repo-binding
namespace: platform-system
spec:
aphexOrg: "acme-corp"
repoOrg: "acme-corp"
repoName: "my-application"
pipelineName: "cdktf-deploy-pipeline"
templateRef: "cdktf-deploy-trigger-template"For detailed onboarding procedures, see operations.md.
After RepoBinding reaches Ready phase, get webhook configuration from RepoBinding status and configure in GitHub repository Settings → Webhooks.
For webhook configuration details, see operations.md.
# List recent PipelineRuns
kubectl get pipelineruns -n my-app --sort-by=.metadata.creationTimestamp
# Get PipelineRun details
kubectl describe pipelinerun <name> -n my-app
# View logs for all tasks
kubectl logs -n my-app -l tekton.dev/pipelineRun=<name>
# Stream logs in real-time
kubectl logs -n my-app -l tekton.dev/pipelineRun=<name> -fCreate a PipelineRun manually:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: test-run
namespace: my-app
spec:
pipelineRef:
name: my-pipeline
params:
- name: git-url
value: "https://github.com/acme-corp/my-application"
- name: git-revision
value: "main"
workspaces:
- name: shared-data
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1GiThe platform is self-upgrading via GitOps:
- Update component versions in
platform/manifests - Commit and push to Git
- ArgoCD detects changes and syncs automatically
- Monitor application health in ArgoCD UI
Check the layered deployment status:
# Wave 10: cert-manager installation
kubectl get pods -n cert-manager
kubectl get job cert-manager-webhook-readiness -n cert-manager
kubectl logs job/cert-manager-webhook-readiness -n cert-manager
# Wave 20: Certificate creation
kubectl get clusterissuer selfsigned-issuer
kubectl get certificates -A
# Wave 30: Ingress resources
kubectl get ingress -ACheck the cert-manager webhook validation:
# Check webhook readiness job
kubectl get job cert-manager-webhook-readiness -n cert-manager
kubectl logs job/cert-manager-webhook-readiness -n cert-manager
# Check webhook endpoints
kubectl get endpoints cert-manager-webhook -n cert-manager
# Check webhook configuration
kubectl get validatingwebhookconfiguration cert-manager-webhook \
-o jsonpath='{.webhooks[0].clientConfig.caBundle}' | base64 -d | openssl x509 -text -noout# List all applications with status
kubectl get applications -n argocd
# Get detailed application status
kubectl describe application platform-cert-manager -n argocd
# Check sync waves and ordering
kubectl get applications -n argocd \
-o custom-columns="NAME:.metadata.name,WAVE:.metadata.annotations.argocd\.argoproj\.io/sync-wave,STATUS:.status.sync.status,HEALTH:.status.health.status"Use port-forward to bypass ingress and authentication:
# Port-forward to ArgoCD server
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get admin password
kubectl get secret argocd-initial-admin-secret -n argocd \
-o jsonpath='{.data.password}' | base64 -d
# Access at https://localhost:8080
# Username: admin, Password: (from above)If authentication is completely broken:
# Delete auth-system pods to restart
kubectl delete pods -n auth-system --all
# Check Config Sync Job status
kubectl get job auth-config-sync -n auth-system
kubectl logs job/auth-config-sync -n auth-system
# Verify Dex scaling
kubectl get deployment dex -n auth-systemThe platform is designed for portability:
- Deploy ingress controller to real cluster
- Configure real DNS for
*.home.local(or your domain) - Update ClusterIssuer for Let's Encrypt (optional)
- Run bootstrap script with
--use-existingflag - Platform converges identically to Kind deployment
The only differences are ingress controller and DNS configuration.
This usually means ArgoCD can't access the Git repository:
- Verify repository URL is correct and accessible
- Check if repository is private (may need access tokens)
- Ensure network connectivity from cluster to GitHub
Check the layered cert-manager deployment:
- Verify cert-manager pods are Running
- Check webhook readiness job completed successfully
- Verify ClusterIssuer is ready before certificates
Verify OIDC configuration:
- Ensure hostnames are reachable from browser
- Check DNS configuration for
*.home.local - Verify ingress controller is accessible
- Never use
*.svc.cluster.localURLs for browser redirects
Check ingress and TLS configuration:
- Verify ingress controller is running
- Check certificate status (
kubectl get certificates -A) - Verify DNS resolution from your device
- Check ingress resource configuration
Source
platform/bootstrap/bootstrap.sh- Bootstrap implementation and troubleshootingplatform/cert-manager/webhook-readiness-hook.yaml- cert-manager validationplatform/auth/- Authentication system componentsplatform/argocd/apps/- ArgoCD application definitions pipelineRef: name: cdktf-deploy-pipeline namespace: pipeline-system params:- name: repo-url value: "https://github.com/your-github-org/your-repo"
- name: commit-sha value: "abc123..."
- name: tenant-name value: "my-tenant" serviceAccountName: pipeline-runner EOF
## Operational Questions
### What should I do if my repository isn't triggering pipelines?
**Diagnosis**:
```bash
# Check if EventListener exists
kubectl get eventlistener -n <tenant-namespace>
# Check EventListener logs for webhook events
kubectl logs -n <tenant-namespace> -l eventlistener=github-listener | grep "webhook"
# Check if Ingress exists
kubectl get ingress -n <tenant-namespace>
# Check RepoBinding status
kubectl describe repobinding <name> -n pipeline-system
Common Issues:
- EventListener not running (check onboarding controller logs)
- Ingress not configured correctly (check Ingress controller installation)
- GitHub webhook not configured (check GitHub webhook settings)
- Webhook secret mismatch (check RepoBinding status for correct secret)
Diagnosis:
# Get PipelineRun status
kubectl get pipelinerun <name> -n <tenant-namespace>
# Get detailed status
kubectl describe pipelinerun <name> -n <tenant-namespace>
# Get pod logs
kubectl logs -n <tenant-namespace> -l tekton.dev/pipelineRun=<name>
# Check pod events
kubectl get events -n <tenant-namespace> --sort-by='.lastTimestamp'Common Issues:
- Git clone failure: Check repository access
- CDKTF synth failure: Check Node.js dependencies and syntax
- CDKTF deploy failure: Check Terraform state and permissions
- RBAC denial: Check service account permissions
The platform upgrades itself via ArgoCD when manifests change in Git:
# Update component manifests in Git
vi platform/platform-controller/controller-deployment.yaml # Update image tag
# Commit changes
git add .
git commit -m "Update onboarding controller to v1.1.0"
git push
# ArgoCD will automatically sync and update the controller
# Watch sync status
kubectl get application -n argocd -wDelete the RepoBinding:
kubectl delete repobinding <name> -n pipeline-systemOr manually delete the namespace:
kubectl delete namespace <tenant-namespace>ArgoCD provides several advantages:
- Better UI for visualizing sync status
- More mature and widely adopted
- Better support for App of Apps pattern
- Easier to troubleshoot sync issues
- Strong community support
Tekton is the standard pipeline engine for Kubernetes-native CI/CD and provides:
- Native Kubernetes integration
- Reusable Tasks and Pipelines
- Strong community support
- Cloud-native design
- Better integration with Tekton Triggers for webhooks
Isolation is achieved through multiple layers:
- Kubernetes Namespaces: Each tenant gets dedicated namespace
- RBAC: Service accounts scoped to tenant namespace only
- Network Policies: Restrict inter-namespace communication
- Resource Quotas: Prevent resource exhaustion
- Terraform State: Isolated per tenant
The platform uses a root ArgoCD Application that manages child Applications for each component layer:
- platform-root: Manages all child Applications
- platform-crds: CRDs and foundational resources
- platform-infrastructure: Namespaces and RBAC
- platform-controllers: Onboarding controller
- platform-catalog: Tekton tasks, pipelines, triggers
This provides better separation of concerns, independent lifecycle management, and clearer troubleshooting.
Yes! You can deploy separate clusters for different environments:
- Development cluster (smaller, fewer resources)
- Staging cluster (production-like)
- Production cluster (larger, more resources)
Each cluster is independent with its own tenants and ArgoCD Applications.
Webhook secrets are generated by the Onboarding Controller using cryptographic randomness and stored in Kubernetes Secrets in the tenant namespace. EventListeners reference these secrets for webhook signature validation.
Terraform backend credentials are stored in per-tenant Secrets. For Kubernetes backend, no external credentials are needed. The platform uses the Kubernetes backend by default for simplicity.
No. RBAC ensures service accounts can only access resources in their own namespace. Network policies prevent cross-namespace network access.
No. RBAC denies access to platform namespaces (platform-system, argocd, tekton-pipelines, etc.). Only the onboarding controller has permissions to create resources in platform namespaces.
# Delete existing secret
kubectl delete secret webhook-<tenant-name> -n <tenant-namespace>
# Delete and recreate RepoBinding to regenerate secret
kubectl delete repobinding <name> -n platform-system
kubectl apply -f repobinding.yaml
# Get new webhook secret from RepoBinding status
kubectl get repobinding <name> -n platform-system -o yaml
# Update GitHub webhook with new secret# View PipelineRuns for a tenant
kubectl get pipelineruns -n <tenant-namespace>
# View events for a tenant
kubectl get events -n <tenant-namespace> --sort-by='.lastTimestamp'
# View EventListener logs for a tenant
kubectl logs -n <tenant-namespace> -l eventlistener=github-listenerCommon Issues:
- ArgoCD cannot access Git repository (check repo-server logs)
- Sync policy not configured (check Application spec)
- Manifest errors in Git (check Application status)
- ArgoCD controller not running (check argocd namespace)
Resolution:
# Check Application sync status
kubectl get application platform-root -n argocd
# Check ArgoCD controller logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller --tail=100
# Manually trigger sync
kubectl patch application platform-root -n argocd --type merge -p '{"operation":{"initiatedBy":{"username":"admin"},"sync":{"revision":"HEAD"}}}'Common Issues:
- Controller not running (check pod status)
- Controller lacks RBAC permissions (check controller logs)
- RepoBinding validation failed (check RepoBinding status)
- Tekton Triggers not installed (check tekton-pipelines namespace)
Resolution:
# Check controller logs
kubectl logs -n platform-system -l app=platform-controller --tail=100
# Check controller pod status
kubectl get pods -n platform-system -l app=platform-controller
# Restart controller if needed
kubectl rollout restart deployment platform-controller -n platform-systemCommon Issues:
-
Missing Core Interceptors: EventListener needs ClusterInterceptors (github, gitlab, cel, etc.) to validate webhooks
# Check if ClusterInterceptors exist kubectl get clusterinterceptors # If missing, install Core Interceptors kubectl apply -f https://infra.tekton.dev/tekton-releases/triggers/previous/v0.34.0/interceptors.yaml
-
Missing Cluster-scoped RBAC: EventListener needs read permissions for ClusterInterceptor and ClusterTriggerBinding
# Check if ClusterRole exists kubectl get clusterrole pipeline-runner-<tenant-name> # If missing, delete and recreate RepoBinding kubectl delete repobinding <name> -n platform-system kubectl apply -f repobinding.yaml
-
Webhook Secret Missing: EventListener needs webhook secret for signature validation
# Check if secret exists kubectl get secret webhook-<tenant-name> -n <tenant-namespace>
ClusterInterceptors are cluster-scoped Tekton Triggers resources that provide webhook validation and filtering capabilities. The Core Interceptors include:
- github: Validates GitHub webhook signatures and filters events
- gitlab: Validates GitLab webhook signatures and filters events
- cel: Evaluates CEL expressions for custom filtering
- bitbucket: Validates Bitbucket webhook signatures
- slack: Validates Slack webhook signatures
EventListeners reference ClusterInterceptors to validate incoming webhooks before creating PipelineRuns.
EventListener pods run with the tenant's pipeline-runner ServiceAccount and need to read cluster-scoped Tekton Triggers resources (ClusterInterceptor, ClusterTriggerBinding). These resources are cluster-scoped and cannot be accessed via namespace-scoped Roles.
The onboarding controller provisions a ClusterRole with read-only permissions for these resources and binds it to the tenant's ServiceAccount. This follows the principle of least privilege - tenants can only read cluster-scoped Tekton Triggers resources, not modify them.
Common Issues:
- Ingress not accessible from GitHub (check Ingress configuration)
- Webhook secret mismatch (check RepoBinding status)
- EventListener not running (check pod status)
- GitHub webhook not configured (check GitHub webhook settings)
Resolution:
# Check EventListener logs
kubectl logs -n <tenant-namespace> -l eventlistener=github-listener --tail=100
# Check Ingress configuration
kubectl get ingress -n <tenant-namespace> -o yaml
# Check GitHub webhook delivery logs
# Go to GitHub repository Settings → Webhooks → Recent DeliveriesAfter bootstrap completes and ArgoCD syncs the auth system:
- Ensure DNS is configured so that
auth.home.localresolves to your Ingress controller's IP - Open Authentik UI:
https://auth.home.local - Accept certificate warning (if using self-signed certificates)
- Retrieve admin password:
kubectl get secret authentik-secrets -n auth-system \ -o jsonpath='{.data.admin-password}' | base64 -d
- Login with username
adminand the password from step 4
- Login to Authentik UI at
https://auth.home.local - Navigate to Directory → Users
- Click Create
- Fill in user details (username, email, name, password)
- Assign user to groups (
adminsorengineering) - Click Create
Users can authenticate immediately - no pod restarts or configuration changes required.
platform-admins group:
- Full access to ArgoCD (can create, update, delete applications)
- Full access to Tekton Dashboard (can create, update, delete pipelines)
- Full CRUD access to all platform CRDs in all namespaces
- Can create and delete namespaces
- Superuser access to Authentik UI (can manage users and groups)
platform-operators group:
- Full access to platform CRDs (cannot create/delete namespaces)
- Can read logs and events in all namespaces for troubleshooting
- Read-only access to ArgoCD and Tekton Dashboard
platform-engineering group:
- Can create, read, update platform CRDs (no delete permissions)
- Restricted to user-* and team-* namespaces only
- Cannot access platform system namespaces
- Read-only access to ArgoCD and Tekton Dashboard
Symptom: "Cannot reach Dex" during login
-
Check Dex pod status:
kubectl get pods -n auth-system -l app=dex kubectl logs -n auth-system -l app=dex
-
Verify Dex Ingress:
kubectl get ingress -n auth-system dex curl -v https://dex.home.local/.well-known/openid-configuration
-
Check DNS resolution:
nslookup dex.home.local
Symptom: "Permission Denied" after successful login
-
Check user's group membership in Authentik UI
-
Verify RBAC permissions:
kubectl auth can-i create pipelines.platform.dev --as=user@platform.local --as-group=platform-engineering -n user-alice
-
Check token claims:
# Decode JWT token to verify groups claim # (Use jwt.io or similar tool to decode token)
Symptom: Break-glass access needed
# Use certificate-based admin access
kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -n auth-system
# Fix OIDC issues using admin access
kubectl --kubeconfig /etc/kubernetes/admin.conf rollout restart deployment/dex -n auth-systemDNS configuration is required for user browsers to reach services via hostnames. OIDC authentication requires redirect URIs that browsers can reach.
Option 1: Router/Pi-hole DNS (Recommended)
Add A records in your home router or Pi-hole:
auth.home.local → 192.168.1.100
dex.home.local → 192.168.1.100
argocd.home.local → 192.168.1.100
tekton.home.local → 192.168.1.100
Replace 192.168.1.100 with your Ingress controller's IP address.
Option 2: Hosts File
Add entries to /etc/hosts on each device:
# Linux/macOS
sudo nano /etc/hosts
# Add these lines:
192.168.1.100 auth.home.local
192.168.1.100 dex.home.local
192.168.1.100 argocd.home.local
192.168.1.100 tekton.home.localFind your Ingress controller IP:
kubectl get svc -n ingress-nginx ingress-nginx-controllerIf you're using self-signed certificates (the simplest option for homelab), browsers will show certificate warnings. This is expected behavior.
To proceed:
- Chrome: Click "Advanced" → "Proceed to auth.home.local (unsafe)"
- Firefox: Click "Advanced" → "Accept the Risk and Continue"
- Safari: Click "Show Details" → "visit this website"
To avoid warnings, use Let's Encrypt with DNS-01 challenge (requires DNS provider API access). See .kiro/docs/operations.md for setup instructions.
Common Issues:
-
DNS not configured: Browser cannot reach
dex.home.localorauth.home.local# Test DNS resolution nslookup auth.home.local nslookup dex.home.local -
User not in correct group: Check user group membership in Authentik UI
- Navigate to Directory → Users → Select user → Groups tab
- Ensure user is in
adminsorengineeringgroup
-
Redirect URI mismatch: Check Dex config and Authentik OIDC provider
# Check Authentik OIDC discovery curl https://auth.home.local/application/o/dex/.well-known/openid-configuration # Check Dex OIDC discovery curl https://dex.home.local/.well-known/openid-configuration
-
Services not ready: Check pod status
kubectl get pods -n auth-system kubectl get pods -n argocd kubectl get pods -n tekton-pipelines
Common Issues:
-
Authentik not ready yet: Dex should start with replicas=0 and only scale to 1 after Authentik is configured
# Check Dex replicas kubectl get deployment dex -n auth-system -o jsonpath='{.spec.replicas}' # Should be 0 initially, then 1 after Config Sync Job completes
-
Invalid Dex configuration: Check ConfigMap syntax
kubectl get configmap dex-config -n auth-system -o yaml
-
Missing RBAC permissions: Check ServiceAccount and Role
kubectl get serviceaccount dex -n auth-system kubectl get role dex -n auth-system
Resolution:
# Check Dex logs
kubectl logs -n auth-system deployment/dex
# Check Config Sync Job status
kubectl get job auth-config-sync -n auth-system
kubectl logs -n auth-system job/auth-config-syncThe Config Sync Job orchestrates the integration between Authentik and Dex by:
- Waiting for Authentik to be fully ready
- Reading the Dex client secret from Kubernetes Secrets
- Updating Authentik's OIDC provider with the client secret via API
- Verifying Authentik's OIDC discovery endpoint is working
- Scaling Dex from 0 to 1 replica (starts Dex now that Authentik is ready)
- Waiting for Dex to be fully ready
- Verifying Dex's OIDC discovery endpoint is working
This provides deterministic, observable, and retryable convergence without relying on timing-based hacks.
# Delete existing Job
kubectl delete job auth-config-sync -n auth-system
# ArgoCD will recreate the Job automatically
# Or manually apply:
kubectl apply -f platform/auth/config-sync/job.yaml
# Watch Job progress
kubectl logs -n auth-system -l app=auth-config-sync -fVia Authentik UI (Recommended):
- Login to Authentik UI
- Navigate to Directory → Users →
admin - Click Set password
- Enter new password
- Click Update
- Update Kubernetes Secret:
kubectl create secret generic authentik-secrets \ -n auth-system \ --from-literal=secret-key="$(kubectl get secret authentik-secrets -n auth-system -o jsonpath='{.data.secret-key}' | base64 -d)" \ --from-literal=admin-password="NEW_PASSWORD" \ --dry-run=client -o yaml | kubectl apply -f -
-
Generate new secret:
NEW_SECRET=$(openssl rand -base64 32) -
Update Kubernetes Secret:
kubectl create secret generic dex-secrets \ -n auth-system \ --from-literal=client-secret="$NEW_SECRET" \ --dry-run=client -o yaml | kubectl apply -f -
-
Update Authentik OIDC provider:
- Login to Authentik UI
- Navigate to Applications → Providers → Dex OIDC Provider
- Update Client Secret field
- Click Update
-
Restart Dex:
kubectl rollout restart deployment/dex -n auth-system
-
Create new group in Authentik UI:
- Navigate to Directory → Groups → Create
- Enter group name (e.g.,
developers) - Click Create
-
Update ArgoCD RBAC policy:
- Edit
platform/integrations/argocd-rbac-policy.yaml - Add group mapping and permissions
- Commit and push to Git
- ArgoCD syncs changes automatically
- Edit
-
Update Tekton RBAC policy:
- Edit
platform/integrations/tekton-rbac.yaml - Create ClusterRole with desired permissions
- Create ClusterRoleBinding for new group
- Commit and push to Git
- ArgoCD syncs changes automatically
- Edit
-
Create GitHub OAuth App:
- Go to GitHub Settings → Developer settings → OAuth Apps
- Click New OAuth App
- Application name:
Platform Services - Homepage URL:
https://auth.home.local - Authorization callback URL:
https://auth.home.local/source/oauth/callback/github/ - Copy Client ID and Client Secret
-
Configure in Authentik UI:
- Navigate to Directory → Federation & Social login
- Click Create → GitHub
- Enter Client ID and Client Secret
- Configure organization/team filtering (optional)
- Map GitHub teams to Authentik groups
- Click Create
-
Test:
- Logout of Authentik
- Click Login with GitHub on login page
- Authorize application
- User is created in Authentik with mapped groups
Bootstrap never prints secret values to stdout or logs by default. This prevents accidental exposure in CI/CD logs, terminal history, or shared screens.
To retrieve secrets after bootstrap:
# Authentik admin password
kubectl get secret authentik-secrets -n auth-system \
-o jsonpath='{.data.admin-password}' | base64 -d
# Dex client secret
kubectl get secret dex-secrets -n auth-system \
-o jsonpath='{.data.client-secret}' | base64 -d
# Authentik API token
kubectl get secret authentik-api-token -n auth-system \
-o jsonpath='{.data.token}' | base64 -dFor local debugging only, use --show-secrets flag:
./platform/bootstrap/bootstrap.sh --show-secretsNever use --show-secrets in production, CI/CD, or shared environments.
All authentication system components are managed by ArgoCD via GitOps. To make changes:
-
Update manifests in Git:
# Example: Update Authentik image version vi platform/auth/authentik/server-deployment.yaml # Commit changes git add . git commit -m "Update Authentik to v2024.2.2" git push
-
ArgoCD detects and syncs changes automatically:
# Watch ArgoCD sync status kubectl get application platform-auth -n argocd -w -
Verify changes:
kubectl get pods -n auth-system kubectl get application platform-auth -n argocd
No manual kubectl apply required - ArgoCD handles all deployments and updates.
An Ingress controller is required for OIDC authentication flows because:
- Browser-reachable URLs: OIDC requires redirect URIs that user browsers can reach (e.g.,
https://dex.home.local/callback) - TLS termination: OIDC requires HTTPS for security
- Hostname-based routing: Different services need different hostnames (auth.home.local, dex.home.local, argocd.home.local)
Install nginx-ingress (recommended for homelab):
# For Kind clusters
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
# For bare-metal clusters
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/baremetal/deploy.yamlCommon Issues:
-
ArgoCD auto-sync disabled: Check Application sync policy
kubectl get application platform-auth -n argocd -o yaml | grep -A 5 syncPolicy -
Application in error state: Check Application status
kubectl describe application platform-auth -n argocd
-
Invalid YAML syntax: Check for manifest errors in Git
kubectl get application platform-auth -n argocd -o json | \ jq '.status.conditions[] | select(.type=="SyncError")'
Resolution:
# Manually trigger sync
kubectl patch application platform-auth -n argocd \
--type merge -p '{"operation":{"initiatedBy":{"username":"admin"},"sync":{}}}'
# Or use ArgoCD UI
# https://argocd.home.local → Applications → platform-auth → SyncCheck Job status:
kubectl get job auth-config-sync -n auth-system
kubectl logs -n auth-system -l app=auth-config-syncCommon Issues:
-
Authentik not ready: Job waits for Authentik to be ready before proceeding
kubectl get pods -n auth-system -l app=authentik kubectl logs -n auth-system -l app=authentik
-
OIDC provider not found: Authentik Blueprint did not create the OIDC provider
# Check if Blueprint ConfigMap exists kubectl get configmap authentik-blueprints -n auth-system # Check Authentik logs for Blueprint application kubectl logs -n auth-system -l app=authentik | grep -i blueprint
-
API token invalid: Authentik API token has insufficient permissions
# Verify API token exists kubectl get secret authentik-api-token -n auth-system # Re-run bootstrap to recreate token ./platform/bootstrap/bootstrap.sh
Resolution: Fix the underlying issue and re-run the Job:
kubectl delete job auth-config-sync -n auth-system
# ArgoCD will recreate the Job automaticallyArchon reads all Markdown files under .kiro/docs/ from this public GitHub repository. Documentation follows the contract defined in CLAUDE.md.
Update the relevant files under .kiro/docs/ and ensure changes are grounded in code. Include "Source" references to relevant files. Follow the 6-file structure (overview, architecture, operations, api, data-models, faq).
Follow the Archon documentation contract in CLAUDE.md:
- Keep sections small and focused (400-800 tokens)
- Use clear, direct language
- Maintain provenance (reference source files)
- No hallucinations (only document what exists)
- Avoid duplication (link instead of repeating)
- Use descriptive, specific headings
Source
CLAUDE.md.kiro/steering/archon-docs.mdREADME.md.kiro/specs/argocd-tekton-platform/design.md.kiro/specs/argocd-tekton-platform/requirements.md.kiro/specs/dex-authentication-platform/design.md.kiro/specs/dex-authentication-platform/requirements.md.kiro/docs/operations.md.kiro/docs/api.mdplatform/auth/README.mdplatform/auth/secrets/README.mdplatform/auth/ingress/README.mdplatform/auth/config-sync/README.md