|
| 1 | +<p>Kubernetes has been super useful for me, it allows me to have |
| 2 | +infrastructure as yml files. given the cluster is set up. I have only |
| 3 | +ever used k3s to set up a cluster, and I have mixed feelings. Here they |
| 4 | +are</p> |
| 5 | +<p>TL;DR: If you dont know how to deploy kubernetes from scratch, should |
| 6 | +you be using kubernetes at all?</p> |
| 7 | +<h2 id="single-node-is-simple">✅ Single node is simple</h2> |
| 8 | +<p>To set up a single node cluster, you simply need to run one command |
| 9 | +like it says on their website |
| 10 | +<code>curl -sfL https://get.k3s.io | sh -</code> and it works pretty |
| 11 | +well. The only thing is it does use a lot of memory like .75G, not sure |
| 12 | +if this is useful or just garbage that has not been cleaned up (lets |
| 13 | +re-write it in Rust instead of Go anyone?). For example a single node |
| 14 | +k3s cluster should ideally not use any more memory than installing |
| 15 | +docker on the system, but it is. Its ok, memory is not expensive at all |
| 16 | +now adays anyway.</p> |
| 17 | +<h2 id="multinode-is-simple-and-secure">✅ Multinode is simple and |
| 18 | +secure</h2> |
| 19 | +<p>Spinning up a multinode cluster is simple, just run |
| 20 | +<code>--cluster-init</code> for the first node and create a joinable |
| 21 | +secret so that no other nodes can join without knowing the secret.</p> |
| 22 | +<pre><code>curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - server --cluster-init</code></pre> |
| 23 | +<p>Then for the next nodes, pass in the secret and the API server |
| 24 | +address of the first node</p> |
| 25 | +<pre><code>curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - server \ |
| 26 | + --server https://<ip or hostname of server1>:6443</code></pre> |
| 27 | +<h2 id="issue-1-secrets-encryption">😕 Issue 1: secrets encryption</h2> |
| 28 | +<p>For some reason by default k3s does not encrypt kubernetes secrets |
| 29 | +resources, they are just put in the database unencrypted. This is not |
| 30 | +the worst depending on your threat model but why not just have them |
| 31 | +encrypted anyway. Linkilly it is just one cluster initialization |
| 32 | +parameter</p> |
| 33 | +<pre><code>curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption </code></pre> |
| 34 | +<h2 id="issue-2-default-lb">😕 Issue 2: Default LB</h2> |
| 35 | +<p>One of the most widely used load balancers used in the kubernetes |
| 36 | +ecosystem is Metal LB. This is a failover load balancer where instead of |
| 37 | +connecting incoming requests round robin to various backends, it uses |
| 38 | +layer 2 IP address advertizements to decide the backend that is serving |
| 39 | +the request at a time, if that fails, another backend picks up the l2 |
| 40 | +advertizements. In my use case this is exactly what I am looking for |
| 41 | +since I dont want to manage IP addresses outside of kubernetes</p> |
| 42 | +<p>One of the drawbacks to this method of load balancing is if a backend |
| 43 | +is overloaded either serving a request or receiving/sending the max |
| 44 | +amount of network traffic that link can handle, it will start failing |
| 45 | +over, already possibly degrading the service. This is less likely to |
| 46 | +happen with round robin based load balancing. This should not be a |
| 47 | +problem with most k3s use cases since it is geared towards small |
| 48 | +deployments and they would not be handling a lot of traffic</p> |
| 49 | +<p>By default k3s ships with a load balancer called <a |
| 50 | +href="https://github.com/k3s-io/klipper-lb">ServiceLB</a>. It does not |
| 51 | +seem that useful, it is like a ClusterIP served through a NodePort. I |
| 52 | +want to deal specifically with IP addresses, so I needed to install |
| 53 | +Metal LB and disable ServiceLB when spinning up the service</p> |
| 54 | +<p>Installing k3s so far:</p> |
| 55 | +<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb |
| 56 | +$ <install metal LB CRDs></code></pre> |
| 57 | +<h2 id="issue-3-bundled-traefik">😕 Issue 3: Bundled Traefik</h2> |
| 58 | +<p>Traefik is yet another HTTP/S reverse proxy implementation modeled as |
| 59 | +an “ingress” implementation. I have no issues with it so far, but I do |
| 60 | +have an issue with the fact that it is bundled with k3s. My use case |
| 61 | +specifically, I want to serve both public services and private services |
| 62 | +through the same cluster (this means both public and private IP |
| 63 | +addresses assigned to the cluster).</p> |
| 64 | +<p>You cannot securely use the same reverse proxy for this since there |
| 65 | +is no way to differentiate that even though a request came from the |
| 66 | +public internet through the public IP address, it still hits the same |
| 67 | +load balancer as the internal services, which means with a well crafted |
| 68 | +HTTP request, it can hit private services not intending to be exposed to |
| 69 | +the public internet (see spoofing the host header)</p> |
| 70 | +<p>It may be possible to seperate this traffic, but it would be a lot of |
| 71 | +effort and probably error prone and a maintenence nightmare too, so an |
| 72 | +easier way is to use two seperate instances of Traefik, each exposed on |
| 73 | +their own Metal LB IP address (one public, one private) therefore the |
| 74 | +same ingress is not serving both public and private requests. Since I |
| 75 | +need to run 2 instances of Traefik, and there is not one single |
| 76 | +parameter in k3s to say deploy 2, let me just deploy both manually with |
| 77 | +CRDs</p> |
| 78 | +<p>Installing k3s so far:</p> |
| 79 | +<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb --disable=traefik |
| 80 | +$ <install metal LB CRDs> |
| 81 | +$ <install Traefik CRDs for the first instance> |
| 82 | +$ <install Traefik CRDs for the second instance></code></pre> |
| 83 | +<h2 id="issue-4-unencrypted-cni-by-default">😕 Issue 4: Unencrypted CNI |
| 84 | +by default</h2> |
| 85 | +<p>So this is not the worst but the inter-node communication is done by |
| 86 | +default using the VXLAN protocol, which is pretty much L2 packets |
| 87 | +tunneled over UDP to the node it needs to go to, then the kernel sends |
| 88 | +it on its marry L2 way. The issue is that this is not encrypted so a |
| 89 | +network evesdropper can see all the traffic. You would be wrong to say |
| 90 | +that since we are using HTTPS in the cluster all of this is encrypted |
| 91 | +anyway because the ingress controller actually unencrypts the packets |
| 92 | +before sending them to the correct node to be served.</p> |
| 93 | +<p>If your threat model is that if an attacker can evesdrop on the |
| 94 | +network the k3s cluster is sending stuff between anyway, it is already |
| 95 | +game over, you are fine, but why not just encrypt it anyway. It may take |
| 96 | +2% network bandwidth overhead or whatever to do this, which is not alot, |
| 97 | +and the Wireguard protocol is optimized in the linux kernel anyway, so |
| 98 | +there should not be much latency or CPU overhead either.</p> |
| 99 | +<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb --disable=traefik --flannel-backend=wireguard-native |
| 100 | +$ <install metal LB CRDs> |
| 101 | +$ <install Traefik CRDs for the first instance> |
| 102 | +$ <install Traefik CRDs for the second instance></code></pre> |
| 103 | +<h2 id="issue-5-flannel-does-not-work-after-a-while">😭 Issue 5: Flannel |
| 104 | +does not work after a while</h2> |
| 105 | +<p>I was testing a case of my 3 node cluster loosing one node, and still |
| 106 | +being able to function, since a majority still exists. ETCD was able to |
| 107 | +switch leaders just fine, but when I brought the node back up, something |
| 108 | +weird happened. A few of the pods that the cluster was serving on that |
| 109 | +node did not come back up but had the status of either error or |
| 110 | +unknown!</p> |
| 111 | +<p>Describing the pods showed an error message</p> |
| 112 | +<pre><code>"Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"longhorn-manager-2d6cw_longhorn-system(fe3d388b-4766-4585-9cf2-ab677ca2f6e9)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"longhorn-manager-2d6cw_longhorn-system(fe3d388b-4766-4585-9cf2-ab677ca2f6e9)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"91e16400f64262f5ab20bb4c83c8adb72a85ddc45ddebb483a02a26012869d4c\\\": plugin type=\\\"flannel\\\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.\"" pod="longhorn-system/longhorn-manager-2d6cw" podUID="fe3d388b-4766-4585-9cf2-ab677ca2f6e9"</code></pre> |
| 113 | +<p>This is a scary message, why was flannel not coming up? I checked and |
| 114 | +there is no file called <code>/run/flannel/subnet.env</code> on the |
| 115 | +node, even though there should be. I checked the other two nodes and yes |
| 116 | +they have this file. So that means I cannot restart the node at all or |
| 117 | +else that node will never be usable again? I really dont want to have to |
| 118 | +delete the node and add another one every time something happens! (maybe |
| 119 | +I should do this anyway as immutable deployments, but I digressed)</p> |
| 120 | +<p>What if I just dont use flannel at all, and install the well-loved |
| 121 | +Calico instead? Lickilly there is prescident for this, you just install |
| 122 | +k3s without a CNI, and bolt one on later</p> |
| 123 | +<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb --disable=traefik --flannel-backend=none |
| 124 | +$ <install metal LB CRDs> |
| 125 | +$ <install Traefik CRDs for the first instance> |
| 126 | +$ <install Traefik CRDs for the second instance> |
| 127 | +$ <install Calico CRD></code></pre> |
| 128 | +<h2 id="conclusion">Conclusion 🤔</h2> |
| 129 | +<p>k3s markets itself as a simple way to deploy kubernetes, but this is |
| 130 | +only true if you have a simple use case, it pretty much pushes you |
| 131 | +towards deploying all the components of your cluster from scratch with |
| 132 | +every edge case not well thought out. This is not a bad thing though, |
| 133 | +because frequently I wonder if you do not know how to deploy your own |
| 134 | +kubernetes cluster from scratch anyway, should you be given the |
| 135 | +privelage to use kubernetes at all? If one thing starts to not work, you |
| 136 | +will have to learn how to anyway</p> |
0 commit comments