Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,21 +1,34 @@
---
name: gke-app-onboarding
description: >-
Onboards applications to GKE, covering containerization, deployment
manifests, and migration. Use when onboarding or deploying an application to
GKE for the first time, or containerizing an app for GKE. Don't use for
general GKE cluster administration or upgrades (use gke-basics or
gke-upgrades instead).
---

# GKE App Onboarding

This reference provides workflows for containerizing and deploying applications to GKE for the first time.
This reference provides workflows for containerizing and deploying applications
to GKE for the first time.

> **MCP Tools:** `apply_k8s_manifest`, `get_k8s_resource`, `get_k8s_rollout_status`, `get_k8s_logs`, `describe_k8s_resource`
> **MCP Tools:** `apply_k8s_manifest`, `get_k8s_resource`,
> `get_k8s_rollout_status`, `get_k8s_logs`, `describe_k8s_resource`

## Workflow

### 1. App Assessment

Before containerizing, assess the application:

- **Language & Framework**: Identify the tech stack
- **Dependencies**: List required libraries and external services
- **Configuration**: How is the app configured? (env vars, config files, secrets)
- **Statefulness**: Does it need persistent storage? (databases, file storage)
- **Networking**: Port mapping and protocol (HTTP, gRPC, TCP)
- **Health endpoints**: Does the app expose health check endpoints?
- **Language & Framework**: Identify the tech stack
- **Dependencies**: List required libraries and external services
- **Configuration**: How is the app configured? (env vars, config files,
secrets)
- **Statefulness**: Does it need persistent storage? (databases, file storage)
- **Networking**: Port mapping and protocol (HTTP, gRPC, TCP)
- **Health endpoints**: Does the app expose health check endpoints?

### 2. Containerization

Expand All @@ -38,14 +51,18 @@ ENTRYPOINT ["/server"]
```

**Best practices:**
- Use multi-stage builds to keep production images small
- Use distroless or minimal base images to reduce attack surface
- Run as non-root user
- Log to `stdout` and `stderr` for Cloud Logging collection

- Use multi-stage builds to keep production images small
- Use distroless or minimal base images to reduce attack surface
- Run as non-root user
- Log to `stdout` and `stderr` for Cloud Logging collection

**Alternatives:**
- **Cloud Native Buildpacks** — auto-detect language and build without a Dockerfile: `pack build <image> --builder gcr.io/buildpacks/builder:latest`
- **Skaffold** — development workflow tool for iterating on containerized apps: `skaffold dev`

- **Cloud Native Buildpacks** — auto-detect language and build without a
Dockerfile: `pack build <image> --builder gcr.io/buildpacks/builder:latest`
- **Skaffold** — development workflow tool for iterating on containerized
apps: `skaffold dev`

### 3. Image Management

Expand All @@ -60,7 +77,8 @@ docker build -t <REGION>-docker.pkg.dev/<PROJECT>/<REPO>/<IMAGE>:<TAG> .
docker push <REGION>-docker.pkg.dev/<PROJECT>/<REPO>/<IMAGE>:<TAG>
```

**Vulnerability scanning**: Enable automatic scanning in Artifact Registry to detect issues in base images and dependencies.
**Vulnerability scanning**: Enable automatic scanning in Artifact Registry to
detect issues in base images and dependencies.

```bash
# Check scan results
Expand Down Expand Up @@ -127,10 +145,12 @@ spec:
```

**Checklist for manifests:**
- Resource requests and limits set
- Liveness and readiness probes configured
- At least 2 replicas for production
- Service type appropriate (ClusterIP for internal, use Gateway API for external)

- Resource requests and limits set
- Liveness and readiness probes configured
- At least 2 replicas for production
- Service type appropriate (ClusterIP for internal, use Gateway API for
external)

### 5. Deploy

Expand All @@ -154,7 +174,10 @@ kubectl get pods -l app=my-app
## Next Steps

Once the application is running on GKE:
- Configure autoscaling — see [gke-scaling.md](./gke-scaling.md)
- Set up observability — see [gke-observability.md](./gke-observability.md)
- Harden security — see [gke-security.md](./gke-security.md)
- Configure reliability (PDBs, topology spread) — see [gke-reliability.md](./gke-reliability.md)

- Configure autoscaling — see [gke-scaling.md](../gke-scaling/SKILL.md)
- Set up observability — see
[gke-observability.md](../gke-observability/SKILL.md)
- Harden security — see [gke-security.md](../gke-security/SKILL.md)
- Configure reliability (PDBs, topology spread) — see
[gke-reliability.md](../gke-reliability/SKILL.md)
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
---
name: gke-backup-dr
description: >-
Configures Backup for GKE and disaster recovery plans. Use when configuring
GKE backup policies, setting up disaster recovery, or restoring GKE clusters.
Don't use for generic database backups or persistent volume configuration
(use gke-storage instead).
---

# GKE Backup & Disaster Recovery

This reference provides workflows for protecting stateful workloads on GKE using Backup for GKE.
This reference provides workflows for protecting stateful workloads on GKE using
Backup for GKE.

> **MCP Tools:** `get_cluster`, `update_cluster`. **CLI-only:** `gcloud container backup-restore *`
> **MCP Tools:** `get_cluster`, `update_cluster`. **CLI-only:** `gcloud
> container backup-restore *`

## Workflows

Expand Down Expand Up @@ -38,9 +49,11 @@ gcloud container backup-restore backup-plans create <PLAN_NAME> \
```

**Options:**
- `--all-namespaces` — back up everything
- `--included-namespaces=<ns1>,<ns2>` — back up specific namespaces
- `--backup-encryption-key=<KEY>` — encrypt with Customer-Managed Encryption Key (CMEK)

- `--all-namespaces` — back up everything
- `--included-namespaces=<ns1>,<ns2>` — back up specific namespaces
- `--backup-encryption-key=<KEY>` — encrypt with Customer-Managed Encryption
Key (CMEK)

### 3. Create a Manual Backup

Expand Down Expand Up @@ -79,8 +92,11 @@ gcloud container backup-restore restores create <RESTORE_NAME> \

## Best Practices

1. **Automate backups**: Always use a cron schedule for production workloads
2. **Test restores regularly**: Restore to a separate namespace or cluster to verify data integrity
3. **Cross-region DR**: Store backups in a different region or configure cross-region restore plans
4. **Encrypt backups**: Use CMEK for compliance and security requirements
5. **Scope backups**: Back up specific namespaces rather than the entire cluster when possible to reduce restore complexity
1. **Automate backups**: Always use a cron schedule for production workloads
2. **Test restores regularly**: Restore to a separate namespace or cluster to
verify data integrity
3. **Cross-region DR**: Store backups in a different region or configure
cross-region restore plans
4. **Encrypt backups**: Use CMEK for compliance and security requirements
5. **Scope backups**: Back up specific namespaces rather than the entire
cluster when possible to reduce restore complexity
71 changes: 43 additions & 28 deletions skills/cloud/gke-basics/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
---
name: gke-basics
description: "Plan, create, and configure production-ready Google Kubernetes Engine (GKE) clusters using the golden path Autopilot configuration. Covers Day-0 checklist, Autopilot vs Standard, networking (private clusters, VPC-native, Gateway API), security (Workload Identity, Secret Manager, RBAC hardening), observability, scaling, cost optimization, and AI/ML inference. WHEN: create GKE cluster, provision GKE environment, design GKE networking, secure GKE, optimize GKE cost, GKE autoscaling, GKE inference, GKE upgrade, GKE observability, GKE multi-tenancy, GKE batch, GKE HPC, GKE compute class."
description: >-
Plans, creates, and configures production-ready GKE clusters using the golden
path Autopilot configuration. Covers Day-0 checklist, Autopilot vs Standard,
networking, security, observability, scaling, cost optimization, and AI/ML
inference. Use when creating GKE clusters, provisioning GKE environments,
designing GKE networking, securing GKE, optimizing GKE cost, autoscaling, or
upgrading. Don't use if specialized skills for security, networking, scaling,
cost, storage, or upgrades are more applicable (use gke-security,
gke-networking, gke-scaling, gke-cost, gke-storage, or gke-upgrades instead).
---

# Google Kubernetes Engine (GKE) Basics

GKE is a managed Kubernetes platform on Google Cloud for deploying, scaling, and operating containerized applications. This skill defaults to the **golden path Autopilot configuration** — see [gke-golden-path.md](./references/gke-golden-path.md) for defaults, rules, and guardrails.
GKE is a managed Kubernetes platform on Google Cloud for deploying, scaling, and
operating containerized applications. This skill defaults to the **golden path
Autopilot configuration** — see [gke-golden-path](../gke-golden-path/SKILL.md)
for defaults, rules, and guardrails.

## Quick Start

Expand All @@ -19,31 +30,35 @@ kubectl create deployment hello-server \

## Reference Directory

Load the relevant reference based on trigger keywords. Prefer the most specific match; if ambiguous, ask the user to clarify.

| Scenario | Trigger Keywords | Reference |
|----------|-----------------|-----------|
| Core Concepts | Autopilot vs Standard, architecture, pricing, what is GKE | [core-concepts.md](./references/core-concepts.md) |
| Golden Path & Defaults | golden path, Day-0 checklist, production defaults, cluster defaults | [gke-golden-path.md](./references/gke-golden-path.md) |
| Cluster Creation | create cluster, new cluster, provision GKE | [gke-cluster-creation.md](./references/gke-cluster-creation.md) |
| Networking | private cluster, VPC, subnet, Gateway API, DNS, ingress, egress, datapath | [gke-networking.md](./references/gke-networking.md) |
| Security & IAM | Workload Identity, Secret Manager, RBAC, Binary Auth, hardening, audit, gVisor, IAM roles | [gke-security.md](./references/gke-security.md) |
| Scaling | HPA, VPA, autoscaler, autoscaling, NAP, scale pods, scale nodes | [gke-scaling.md](./references/gke-scaling.md) |
| Compute Classes | ComputeClass, machine family, Spot fallback, GPU node pool, node selection | [gke-compute-classes.md](./references/gke-compute-classes.md) |
| Cost | cost, savings, Spot VMs, rightsizing, CUD, optimize spend, budget | [gke-cost.md](./references/gke-cost.md) |
| AI/ML Inference | inference, model serving, LLM, GPU, TPU, GIQ, vLLM | [gke-inference.md](./references/gke-inference.md) |
| Upgrades | upgrade, maintenance window, release channel, patching, version | [gke-upgrades.md](./references/gke-upgrades.md) |
| Observability | monitoring, logging, Prometheus, Grafana, metrics, alerts, dashboards | [gke-observability.md](./references/gke-observability.md) |
| Multi-tenancy | multi-tenant, namespace isolation, team access, enterprise, RBAC planning | [gke-multitenancy.md](./references/gke-multitenancy.md) |
| Batch & HPC | batch, HPC, job queue, high performance, MPI, parallel | [gke-batch-hpc.md](./references/gke-batch-hpc.md) |
| App Onboarding | containerize, deploy app, Dockerfile, onboard, migrate to GKE | [gke-app-onboarding.md](./references/gke-app-onboarding.md) |
| Backup & DR | backup, restore, disaster recovery, CMEK | [gke-backup-dr.md](./references/gke-backup-dr.md) |
| Storage | storage, PVC, persistent volume, StorageClass, Filestore, GCS FUSE | [gke-storage.md](./references/gke-storage.md) |
| Reliability | PDB, health probe, liveness, readiness, topology spread, graceful shutdown | [gke-reliability.md](./references/gke-reliability.md) |
| Client Libraries | client library, client-go, kubernetes python, kubernetes java, kubernetes SDK | [client-library-usage.md](./references/client-library-usage.md) |
| Infrastructure as Code | Terraform, IaC, HCL, infrastructure as code | [iac-usage.md](./references/iac-usage.md) |
| MCP Server | MCP tools, MCP server, MCP setup | [mcp-usage.md](./references/mcp-usage.md) |
| CLI / Tools | gcloud, kubectl, commands, how to | [cli-reference.md](./references/cli-reference.md) |
| Production Audit | production readiness, compliance, golden path check | [gke-cluster-creation.md](./references/gke-cluster-creation.md) |
Load the relevant reference based on trigger keywords. Prefer the most specific
match; if ambiguous, ask the user to clarify. If a referenced sibling skill
(pointing to `..`) is not installed or cannot be accessed, inform the user that
they may need to install that specific skill (e.g., `gke-networking`), and fall
back to your general GKE knowledge.

Scenario | Trigger Keywords | Reference
---------------------- | ----------------------------------------------------------------------------------------- | ---------
Core Concepts | Autopilot vs Standard, architecture, pricing, what is GKE | [core-concepts.md](./references/core-concepts.md)
Golden Path & Defaults | golden path, Day-0 checklist, production defaults, cluster defaults | [gke-golden-path](../gke-golden-path/SKILL.md)
Cluster Creation | create cluster, new cluster, provision GKE | [gke-cluster-creation](../gke-cluster-creation/SKILL.md)
Networking | private cluster, VPC, subnet, Gateway API, DNS, ingress, egress, datapath | [gke-networking](../gke-networking/SKILL.md)
Security & IAM | Workload Identity, Secret Manager, RBAC, Binary Auth, hardening, audit, gVisor, IAM roles | [gke-security](../gke-security/SKILL.md)
Scaling | HPA, VPA, autoscaler, autoscaling, NAP, scale pods, scale nodes | [gke-scaling](../gke-scaling/SKILL.md)
Compute Classes | ComputeClass, machine family, Spot fallback, GPU node pool, node selection | [gke-compute-classes](../gke-compute-classes/SKILL.md)
Cost | cost, savings, Spot VMs, rightsizing, CUD, optimize spend, budget | [gke-cost](../gke-cost/SKILL.md)
AI/ML Inference | inference, model serving, LLM, GPU, TPU, GIQ, vLLM | [gke-inference](../gke-inference/SKILL.md)
Upgrades | upgrade, maintenance window, release channel, patching, version | [gke-upgrades](../gke-upgrades/SKILL.md)
Observability | monitoring, logging, Prometheus, Grafana, metrics, alerts, dashboards | [gke-observability](../gke-observability/SKILL.md)
Multi-tenancy | multi-tenant, namespace isolation, team access, enterprise, RBAC planning | [gke-multitenancy](../gke-multitenancy/SKILL.md)
Batch & HPC | batch, HPC, job queue, high performance, MPI, parallel | [gke-batch-hpc](../gke-batch-hpc/SKILL.md)
App Onboarding | containerize, deploy app, Dockerfile, onboard, migrate to GKE | [gke-app-onboarding](../gke-app-onboarding/SKILL.md)
Backup & DR | backup, restore, disaster recovery, CMEK | [gke-backup-dr](../gke-backup-dr/SKILL.md)
Storage | storage, PVC, persistent volume, StorageClass, Filestore, GCS FUSE | [gke-storage](../gke-storage/SKILL.md)
Reliability | PDB, health probe, liveness, readiness, topology spread, graceful shutdown | [gke-reliability](../gke-reliability/SKILL.md)
Client Libraries | client library, client-go, kubernetes python, kubernetes java, kubernetes SDK | [client-library-usage.md](./references/client-library-usage.md)
Infrastructure as Code | Terraform, IaC, HCL, infrastructure as code | [iac-usage.md](./references/iac-usage.md)
MCP Server | MCP tools, MCP server, MCP setup | [mcp-usage.md](./references/mcp-usage.md)
CLI / Tools | gcloud, kubectl, commands, how to | [cli-reference.md](./references/cli-reference.md)
Production Audit | production readiness, compliance, golden path check | [gke-cluster-creation](../gke-cluster-creation/SKILL.md)

*If you need product information not found in these references, use the Developer Knowledge MCP server `search_documents` tool.*
Loading