Skip to content

Commit 5e2a413

Browse files
Automated build from blog source
1 parent 7b9b8da commit 5e2a413

2 files changed

Lines changed: 137 additions & 0 deletions

File tree

tech_blog.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ <h1 id="titl"></h1>
4444

4545
<script>
4646
const blogs_map = [
47+
[7, "The joys and pains of k3s", "14 Mar 2026"],
4748
[6, "My first open source contribution", "29 Sep 2025"],
4849
[5, "What you dont know (in networking) can hurt you", "28 Aug 2025"],
4950
[4, "Be careful of Python for loops (or Python in general)", "29 Jul 2025"],

tech_blog/7.html

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
<p>Kubernetes has been super useful for me, it allows me to have
2+
infrastructure as yml files. given the cluster is set up. I have only
3+
ever used k3s to set up a cluster, and I have mixed feelings. Here they
4+
are</p>
5+
<p>TL;DR: If you dont know how to deploy kubernetes from scratch, should
6+
you be using kubernetes at all?</p>
7+
<h2 id="single-node-is-simple">✅ Single node is simple</h2>
8+
<p>To set up a single node cluster, you simply need to run one command
9+
like it says on their website
10+
<code>curl -sfL https://get.k3s.io | sh -</code> and it works pretty
11+
well. The only thing is it does use a lot of memory like .75G, not sure
12+
if this is useful or just garbage that has not been cleaned up (lets
13+
re-write it in Rust instead of Go anyone?). For example a single node
14+
k3s cluster should ideally not use any more memory than installing
15+
docker on the system, but it is. Its ok, memory is not expensive at all
16+
now adays anyway.</p>
17+
<h2 id="multinode-is-simple-and-secure">✅ Multinode is simple and
18+
secure</h2>
19+
<p>Spinning up a multinode cluster is simple, just run
20+
<code>--cluster-init</code> for the first node and create a joinable
21+
secret so that no other nodes can join without knowing the secret.</p>
22+
<pre><code>curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - server --cluster-init</code></pre>
23+
<p>Then for the next nodes, pass in the secret and the API server
24+
address of the first node</p>
25+
<pre><code>curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - server \
26+
--server https://&lt;ip or hostname of server1&gt;:6443</code></pre>
27+
<h2 id="issue-1-secrets-encryption">😕 Issue 1: secrets encryption</h2>
28+
<p>For some reason by default k3s does not encrypt kubernetes secrets
29+
resources, they are just put in the database unencrypted. This is not
30+
the worst depending on your threat model but why not just have them
31+
encrypted anyway. Linkilly it is just one cluster initialization
32+
parameter</p>
33+
<pre><code>curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption </code></pre>
34+
<h2 id="issue-2-default-lb">😕 Issue 2: Default LB</h2>
35+
<p>One of the most widely used load balancers used in the kubernetes
36+
ecosystem is Metal LB. This is a failover load balancer where instead of
37+
connecting incoming requests round robin to various backends, it uses
38+
layer 2 IP address advertizements to decide the backend that is serving
39+
the request at a time, if that fails, another backend picks up the l2
40+
advertizements. In my use case this is exactly what I am looking for
41+
since I dont want to manage IP addresses outside of kubernetes</p>
42+
<p>One of the drawbacks to this method of load balancing is if a backend
43+
is overloaded either serving a request or receiving/sending the max
44+
amount of network traffic that link can handle, it will start failing
45+
over, already possibly degrading the service. This is less likely to
46+
happen with round robin based load balancing. This should not be a
47+
problem with most k3s use cases since it is geared towards small
48+
deployments and they would not be handling a lot of traffic</p>
49+
<p>By default k3s ships with a load balancer called <a
50+
href="https://github.com/k3s-io/klipper-lb">ServiceLB</a>. It does not
51+
seem that useful, it is like a ClusterIP served through a NodePort. I
52+
want to deal specifically with IP addresses, so I needed to install
53+
Metal LB and disable ServiceLB when spinning up the service</p>
54+
<p>Installing k3s so far:</p>
55+
<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb
56+
$ &lt;install metal LB CRDs&gt;</code></pre>
57+
<h2 id="issue-3-bundled-traefik">😕 Issue 3: Bundled Traefik</h2>
58+
<p>Traefik is yet another HTTP/S reverse proxy implementation modeled as
59+
an “ingress” implementation. I have no issues with it so far, but I do
60+
have an issue with the fact that it is bundled with k3s. My use case
61+
specifically, I want to serve both public services and private services
62+
through the same cluster (this means both public and private IP
63+
addresses assigned to the cluster).</p>
64+
<p>You cannot securely use the same reverse proxy for this since there
65+
is no way to differentiate that even though a request came from the
66+
public internet through the public IP address, it still hits the same
67+
load balancer as the internal services, which means with a well crafted
68+
HTTP request, it can hit private services not intending to be exposed to
69+
the public internet (see spoofing the host header)</p>
70+
<p>It may be possible to seperate this traffic, but it would be a lot of
71+
effort and probably error prone and a maintenence nightmare too, so an
72+
easier way is to use two seperate instances of Traefik, each exposed on
73+
their own Metal LB IP address (one public, one private) therefore the
74+
same ingress is not serving both public and private requests. Since I
75+
need to run 2 instances of Traefik, and there is not one single
76+
parameter in k3s to say deploy 2, let me just deploy both manually with
77+
CRDs</p>
78+
<p>Installing k3s so far:</p>
79+
<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb --disable=traefik
80+
$ &lt;install metal LB CRDs&gt;
81+
$ &lt;install Traefik CRDs for the first instance&gt;
82+
$ &lt;install Traefik CRDs for the second instance&gt;</code></pre>
83+
<h2 id="issue-4-unencrypted-cni-by-default">😕 Issue 4: Unencrypted CNI
84+
by default</h2>
85+
<p>So this is not the worst but the inter-node communication is done by
86+
default using the VXLAN protocol, which is pretty much L2 packets
87+
tunneled over UDP to the node it needs to go to, then the kernel sends
88+
it on its marry L2 way. The issue is that this is not encrypted so a
89+
network evesdropper can see all the traffic. You would be wrong to say
90+
that since we are using HTTPS in the cluster all of this is encrypted
91+
anyway because the ingress controller actually unencrypts the packets
92+
before sending them to the correct node to be served.</p>
93+
<p>If your threat model is that if an attacker can evesdrop on the
94+
network the k3s cluster is sending stuff between anyway, it is already
95+
game over, you are fine, but why not just encrypt it anyway. It may take
96+
2% network bandwidth overhead or whatever to do this, which is not alot,
97+
and the Wireguard protocol is optimized in the linux kernel anyway, so
98+
there should not be much latency or CPU overhead either.</p>
99+
<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb --disable=traefik --flannel-backend=wireguard-native
100+
$ &lt;install metal LB CRDs&gt;
101+
$ &lt;install Traefik CRDs for the first instance&gt;
102+
$ &lt;install Traefik CRDs for the second instance&gt;</code></pre>
103+
<h2 id="issue-5-flannel-does-not-work-after-a-while">😭 Issue 5: Flannel
104+
does not work after a while</h2>
105+
<p>I was testing a case of my 3 node cluster loosing one node, and still
106+
being able to function, since a majority still exists. ETCD was able to
107+
switch leaders just fine, but when I brought the node back up, something
108+
weird happened. A few of the pods that the cluster was serving on that
109+
node did not come back up but had the status of either error or
110+
unknown!</p>
111+
<p>Describing the pods showed an error message</p>
112+
<pre><code>&quot;Error syncing pod, skipping&quot; err=&quot;failed to \&quot;CreatePodSandbox\&quot; for \&quot;longhorn-manager-2d6cw_longhorn-system(fe3d388b-4766-4585-9cf2-ab677ca2f6e9)\&quot; with CreatePodSandboxError: \&quot;Failed to create sandbox for pod \\\&quot;longhorn-manager-2d6cw_longhorn-system(fe3d388b-4766-4585-9cf2-ab677ca2f6e9)\\\&quot;: rpc error: code = Unknown desc = failed to setup network for sandbox \\\&quot;91e16400f64262f5ab20bb4c83c8adb72a85ddc45ddebb483a02a26012869d4c\\\&quot;: plugin type=\\\&quot;flannel\\\&quot; failed (add): failed to load flannel &#39;subnet.env&#39; file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.\&quot;&quot; pod=&quot;longhorn-system/longhorn-manager-2d6cw&quot; podUID=&quot;fe3d388b-4766-4585-9cf2-ab677ca2f6e9&quot;</code></pre>
113+
<p>This is a scary message, why was flannel not coming up? I checked and
114+
there is no file called <code>/run/flannel/subnet.env</code> on the
115+
node, even though there should be. I checked the other two nodes and yes
116+
they have this file. So that means I cannot restart the node at all or
117+
else that node will never be usable again? I really dont want to have to
118+
delete the node and add another one every time something happens! (maybe
119+
I should do this anyway as immutable deployments, but I digressed)</p>
120+
<p>What if I just dont use flannel at all, and install the well-loved
121+
Calico instead? Lickilly there is prescident for this, you just install
122+
k3s without a CNI, and bolt one on later</p>
123+
<pre><code>$ curl -sfL https://get.k3s.io | K3S_TOKEN=$secret sh -s - server --cluster-init --secrets-encryption --disable=servicelb --disable=traefik --flannel-backend=none
124+
$ &lt;install metal LB CRDs&gt;
125+
$ &lt;install Traefik CRDs for the first instance&gt;
126+
$ &lt;install Traefik CRDs for the second instance&gt;
127+
$ &lt;install Calico CRD&gt;</code></pre>
128+
<h2 id="conclusion">Conclusion 🤔</h2>
129+
<p>k3s markets itself as a simple way to deploy kubernetes, but this is
130+
only true if you have a simple use case, it pretty much pushes you
131+
towards deploying all the components of your cluster from scratch with
132+
every edge case not well thought out. This is not a bad thing though,
133+
because frequently I wonder if you do not know how to deploy your own
134+
kubernetes cluster from scratch anyway, should you be given the
135+
privelage to use kubernetes at all? If one thing starts to not work, you
136+
will have to learn how to anyway</p>

0 commit comments

Comments
 (0)