Skip to content

Commit 55b4f6f

Browse files
authored
doc: Add k8s documentation for common commands and cluster management (#183)
1 parent abc0187 commit 55b4f6f

3 files changed

Lines changed: 177 additions & 2 deletions

File tree

docs/kubernetes.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Kubernetes resources
2+
3+
(kubectl has been aliased to k in these examples: `alias k=kubectl`)
4+
5+
## Connecting to a pod
6+
7+
Find the pod you want to connect to:
8+
9+
```shell
10+
k get pods
11+
```
12+
13+
Find the name of the pod you want to connect to, then run `exec`:
14+
15+
```shell
16+
k exec -it <pod name> -- sh
17+
```
18+
19+
You should now be connected to the pod. _Note that this won't work in some cases if the container doesn't have a shell binary in it._
20+
21+
## Logs
22+
23+
You can stream the logs from a pod directly to your local machine:
24+
25+
```shell
26+
k logs <pod id> -f
27+
```
28+
29+
You can also stream the logs from an entire group of pods (such as a deployment):
30+
31+
```shell
32+
k logs deployment/something -f --since=5m
33+
```
34+
35+
Another useful application is `stern` which makes it easy to stream logs in a nice format from multiple pods:
36+
```shell
37+
stern <partial pod id>
38+
```
39+
40+
## Port Forwarding
41+
42+
You can use port forwarding between your local machine and a pod or service.
43+
This can be helpful for debugging by allowing you to `curl` a service directly from your local machine, for example.
44+
45+
Find pods:
46+
47+
```shell
48+
k get pods
49+
```
50+
51+
Find services:
52+
53+
```shell
54+
k get services
55+
```
56+
57+
Forward port to a specific pod:
58+
59+
```shell
60+
k port-forward pod/something-cf99bd9d6-wx9h4 8080:3000
61+
curl localhost:8080
62+
```
63+
64+
_Or_ forward port to a service:
65+
66+
```shell
67+
k port-forward service/something 8080:80
68+
curl localhost:8080
69+
```
70+
71+
## Useful utils
72+
73+
- Stern - tail logs from many containers at once
74+
- `brew install stern`
75+
- `stern <service name>`
76+
- Kustomize - used during deployment to build and overlay k8s manifests
77+
- `brew install kustomize`
78+
- Lens - desktop k8s dashboard / GUI
79+
- [https://k8slens.dev/](https://k8slens.dev/)
80+
- Autocomplete - There is auto completion of commands and cluster resources for most shells
81+
- bash - `source <(k completion bash)`
82+
- zsh - `source <(k completion zsh)`
83+
- Ingress Nginx kubectl [plugin](https://kubernetes.github.io/ingress-nginx/kubectl-plugin/) ( useful to introspect nginx conf and issues )
84+
- Install [krew](https://github.com/GoogleContainerTools/krew)
85+
- `k krew install ingress-nginx`
86+
- `k ingress-nginx --help`
87+
88+
## Useful commands
89+
90+
- Start a bash shell (or any other command) inside an already-running pod
91+
- `k exec -it <pod name> bash`
92+
- _Note that if bash is not installed in the container you may need to start another shell like sh._
93+
- Start an arbitrary container in the cluster
94+
- This can be very useful to start a container within a namespace to be able to connect to other services
95+
- `k run -it --image ubuntu bash`
96+
- Inspect deployments / pods / services
97+
- Useful to be able to quickly check issues (env var / image tag) with Kubernetes objects
98+
- You can also specify a service or a deployment instead of a pod
99+
- `k describe pod <pod name>`
100+
- Restart the pods in a deployment (for example, after changing volume mounted `configmap`)
101+
- As long as there are more than 1 pod, this will do a rolling restart which means you should keep serving traffic as normal
102+
- `k rollout restart deployment <deployment name>`
103+
104+
## How do I replace a node in the cluster with zero downtime?
105+
106+
Occasionally you may need to replace a node, for example to pre-empt AWS from rebooting it in the case they need to do some maintenance, which can happen occasionally.
107+
108+
This process can vary a bit depending on the amount of free resources you have available in your cluster.
109+
110+
**If you have enough free resources to be able move all the pods from the node you want to replace to other nodes, it is as simple as telling k8s to drain the node and then terminating it:**
111+
112+
- `k get nodes`
113+
- `k drain --ignore-daemonsets <node name>`
114+
- If it complains that it wants to delete local data for a pod, verify that that is okay and add the flag `--delete-local-data` - This should be fine for pods like `coredns`, `metrics-server` etc, as the data is ephemeral.
115+
- Then terminate the instance in AWS Console. A new one will come up to replace it automatically.
116+
117+
118+
**If you are trying to keep your overhead as low as possible and your cluster wouldn't be able to accomodate reallocating pods from the node, there will be a couple extra steps:**
119+
- Stop the cluster autoscaler from trying to control the cluster size during the process
120+
- `k scale deployments/cluster-autoscaler -n kube-system --replicas=0`
121+
- Find the name and desired capacity of the auto scaling group you want to change.
122+
- `aws autoscaling describe-auto-scaling-groups --output text --query 'AutoScalingGroups[].[AutoScalingGroupName,DesiredCapacity]'`
123+
- Find the instance id of the node you want to terminate
124+
- `aws ec2 describe-instances --output text --query 'Reservations[].Instances[].[InstanceId, PrivateDnsName]' --filters "Name=tag:aws:autoscaling:groupName,Values=<asg name>"`
125+
- Bring up 1 new node
126+
- `aws autoscaling set-desired-capacity --auto-scaling-group-name <asg name> --desired-capacity <previous desired capacity +1>`
127+
- Wait until the new node appears in the list, the total number of nodes should match the new desired capacity
128+
- `k get nodes`
129+
- Drain pods onto other nodes in the cluster
130+
- `k drain --ignore-daemonsets <node name>`
131+
- If it complains that it wants to delete local data for a pod, verify that that is okay and add the flag `--delete-local-data` - This should be fine for pods like `coredns`, `metrics-server` etc, as the data is ephemeral.
132+
- Terminate the old instance and reduce the desired capacity
133+
- `aws autoscaling terminate-instance-in-auto-scaling-group --should-decrement-desired-capacity --instance-id <instance id starting with "i-">`
134+
- Wait for the instance to disappear for the node list
135+
- Re-enable the cluster autoscaler
136+
- `k scale deployments/cluster-autoscaler -n kube-system --replicas=1`
137+
138+
*In the case that you determine you don't want to terminate the instance but you have already drained it, the cluster won't schedule any new pods to that node until you uncordon it:*
139+
- `k uncordon <node name>`
140+
141+
## How do I upgrade a cluster to a new version of EKS?
142+
143+
Occasionally you may need to upgrade an EKS cluster. This is usually a pretty painless process, and there’s a ton of documentation online about it.
144+
145+
As part of this process you will need to upgrade the cluster itself, and some core components. Kubernetes has various applications that run as deployments or daemonsets in the kube-system namespace like coredns, kube-proxy and the AWS VPC CNI provider called aws-node.
146+
147+
This document has great instructions on upgrading all of the different pieces, including listing the appropriate versions of the core components for each version of Kubernetes.
148+
149+
[https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html)
150+
151+
When doing this from terraform you should be able to go into the tf and change the version of the cluster. It should start the upgrade process, rather than tearing down the cluster and rebuilding it. This will make the cluster inaccessible through the AWS console for about 20 minutes, ***though everything in the cluster should continue to work normally, serve traffic, etc.***
152+
153+
The process should be:
154+
155+
- Update the API version number in terraform
156+
- Update the AMI for the ASG to the AMI for the corresponding version of EKS in eks.tf and apply terraform
157+
- See this page: [https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html)
158+
- This should update the worker group, but not affect any of the running nodes
159+
- Update any core components if necessary, as mentioned in the aws update-cluster documentation
160+
- Run terraform apply
161+
- Drain and remove the old nodes from the cluster. New ones will come up in their place with the new AMI
162+
- `k get nodes`
163+
- `k drain --ignore-daemonsets <node name>`
164+
- Then terminate the instance in AWS Console
165+
- Do the drain/delete process with one node at a time. Wait for a new node to be available before running the process on a second one. This will prevent any traffic from being lost.
166+
167+
168+
169+
## More resources
170+
171+
[https://kubernetes.io/docs/reference/kubectl/cheatsheet/](https://kubernetes.io/docs/reference/kubectl/cheatsheet/)
172+
173+
[https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)
174+
175+

templates/terraform/environments/prod/main.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ module "prod" {
101101

102102
# Logging configuration
103103
logging_type = "<% index .Params `loggingType` %>"
104-
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_version = "7.7"
104+
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_version = "7.9"
105105
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_az_count = "2"
106106
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_instance_type = "m5.large.elasticsearch"
107107
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_instance_count = "2" # Must be a mulitple of the az count

templates/terraform/environments/stage/main.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ module "stage" {
120120

121121
# Logging configuration
122122
logging_type = "<% index .Params `loggingType` %>"
123-
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_version = "7.7"
123+
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_version = "7.9"
124124
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_create_service_role = true # Set this to false if you need to create more than one ES cluster in an AWS account
125125
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_az_count = "1"
126126
<% if ne (index .Params `loggingType`) "kibana" %># <% end %>logging_es_instance_type = "t2.medium.elasticsearch"

0 commit comments

Comments
 (0)