This document provides context and guidelines for AI agents working in this repository. The project is a "home-ops" Infrastructure-as-Code (IaC) repository managing a Kubernetes cluster using Flux CD, Talos Linux, and SOPS for secret management.
- Task (
task): The primary entry point for all operations. Usage:task <task_name>. - Flux CD: Manages Kubernetes resources via GitOps.
- Talos Linux: The underlying OS for the Kubernetes nodes.
- SOPS/Age: Used for encrypting secrets.
age.keyis required (do not commit!). - Kustomize: Used for Kubernetes manifest composition.
- Pre-commit: Enforces linting and formatting.
Before submitting changes, ensure all validations pass.
Run the configuration task which renders templates, checks secrets, and validates manifests:
task configureNote: This may ask for confirmation to overwrite files.
-
Validate Kubernetes Manifests: Runs
kubeconformagainst all manifests inkubernetes/fluxandkubernetes/apps(including Kustomize builds).task kubernetes:kubeconform
-
Check for Broken Kustomize References: Scans
kustomization.yamlfiles for missing file references.python3 scripts/find_mistakes.py
(Requires
fdinstalled)
Run all pre-commit hooks to check for YAML syntax, whitespace, and secrets:
pre-commit run --all-filesHooks include: yamllint, trailing-whitespace, end-of-file-fixer, gitleaks.
task --list: List all available tasks.task reconcile: Force Flux to reconcile the cluster.
When modifying Talos configuration (e.g., patches in talos/patches/), apply changes to each control plane node:
-
Generate new configuration:
task talos:generate-config
-
Apply to individual nodes:
task talos:apply-node IP=192.168.69.110 # k8s-0 task talos:apply-node IP=192.168.69.111 # k8s-1 task talos:apply-node IP=192.168.69.112 # k8s-2
-
Monitor changes:
kubectl get nodes kubectl get pods -n kube-system
Static pods (kube-apiserver, kube-controller-manager, kube-scheduler, etcd) are configured via Talos patches:
- Location:
talos/patches/controller/ - Critical: Always set both
requestsandlimitsto prevent OOM kills (exit code 137) - Example:
apiserver-resources.yamlsets kube-apiserver memory limit to 2Gi
# Check for OOM kills in kernel logs
talosctl -n <node-ip> dmesg | grep -i "oom\|kill"
# Check pod restart counts
kubectl get pods -n kube-system
# Check resource usage
kubectl top nodes- Formatting:
- Indentation: 2 spaces.
- No tabs.
- No trailing whitespace.
- Line length: No hard limit (disabled in
.yamllint.yaml), but keep it readable.
- Linter Rules (
.yamllint.yaml):truthy: Use"true","false", or"on"(quoted strings preferred for booleans in K8s to avoid type confusion).comments: at least 1 space from content.
- Structure:
- Applications go in
kubernetes/apps/<namespace>/<app_name>. - Cluster config goes in
kubernetes/flux. - Use
kustomization.yamlto aggregate resources.
- Applications go in
- NEVER commit unencrypted secrets.
- Use SOPS with Age encryption.
- Secrets should be in files named
*.sops.yamlor similar, which are ignored by some linters but checked bygitleaks. - If you need to add a secret, ask the user to handle the encryption or use the
task bootstrap:secretsflow if appropriate.
- Bash: Use
set -o errexitandset -o pipefail. Validate inputs. - Python: Follow basic PEP8. Used mostly for utility scripts in
scripts/.
- Tasks should fail fast.
- When writing scripts, ensure exit codes are passed through correctly.
- Safety First: Do not run
task configureortask bootstrap:*without understanding the impact, as they can overwrite files. - Context: Always check
Taskfile.yamland included taskfiles to understand how commands are constructed. - Verification: After modifying YAML files, always run
task kubernetes:kubeconformandpre-commit run --all-filesto verify validity. - Files: When creating new apps, follow the pattern of existing apps in
kubernetes/apps. - Cluster Diagnostics: When the user asks about cluster status, errors, or logs, ALWAYS use
kubectlto gather information before responding. Use commands like:kubectl get kustomizations.kustomize.toolkit.fluxcd.io -Ato see kustomization statuskubectl describe kustomization <name> -n <namespace>for detailed errorskubectl logs -n flux-system deploy/kustomize-controllerfor controller logs
- Git Operations: NEVER commit or push changes automatically. Always wait for explicit user approval before running any
git commitorgit pushcommands. Present a summary of changes and ask the user if they want to proceed with committing/pushing.