docs: add how-to guide for debugging Kubernetes charms#2498
docs: add how-to guide for debugging Kubernetes charms#2498tonyandrewmeyer wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Thanks for compiling this! I need to review in more detail, but have taken a first pass.
I think it would be easier for people to orient themselves if we move "Common failure modes" nearer the beginning of the doc - probably after "Know which container you’re looking at". I think that section is a great quick reference and should be (slightly) expanded by migrating other content from around the doc.
I've commented on the pieces I think we should move.
My thinking is that we should make a cleaner split between the why and the how. If you already know why you need to be reading a particular section, there should be minimal intro text. Get right into the how. But if you don't know which section you should be reading, "Common failure modes" points you in the right direction and helps you understand why.
Let me know if you'd like to discuss this suggestion together. I'm also very happy to experiment with different structures if that would help.
| ```{tip} | ||
| If [`Container.can_connect()`](ops.Container.can_connect) returns `False` or your charm raises [`ops.pebble.ConnectionError`](ops.pebble.ConnectionError), the charm container cannot reach the workload's Pebble over that socket. This usually means the workload container hasn't started yet (no [`PebbleReadyEvent`](ops.PebbleReadyEvent) has fired) -- look at the pod first (see [](#k8s-inspect-the-pod)), not at your charm code. | ||
| ``` |
There was a problem hiding this comment.
Suggest migrating this tip
| (k8s-debug-from-charm-container)= | ||
| ## Debug from the charm container | ||
|
|
||
| Many production workload images are stripped down to just the application -- with no shell or utilities -- so `juju ssh --container` lands you nowhere useful. You can still run Pebble commands against that workload from the charm container, because the workload's socket is mounted there: |
There was a problem hiding this comment.
Suggest migrating most of this
| (k8s-inspect-the-pod)= | ||
| ## Inspect the pod at the Kubernetes layer | ||
|
|
||
| When a unit is stuck before Pebble is even reachable -- the container is `waiting`, the image won't pull, or the pod won't schedule -- the answer is below Juju, at the Kubernetes layer. Juju puts each model in its own namespace, and names each unit's pod `<app>-<unit-number>`. |
There was a problem hiding this comment.
Suggest migrating the first sentence
| (k8s-common-failure-modes)= | ||
| ## Common failure modes | ||
|
|
||
| | Symptom | Where to look | | ||
| | --- | --- | | ||
| | Charm stuck in `maintenance`/`waiting`; `can_connect()` is `False` | The workload container hasn't started -- `kubectl describe pod` for image-pull or scheduling errors ([](#k8s-inspect-the-pod)). | | ||
| | Service shows `backoff` or `error` | `pebble logs` for the crash output, then `pebble changes` / `pebble tasks` for the start failure ([](#k8s-pebble-cli)). | | ||
| | Config change has no effect on the running process | The charm added a layer but didn't [`replan`](#run-workloads-with-a-charm-kubernetes-replan); confirm with `pebble plan` and `pebble services`. | | ||
| | Charm raises `ConnectionError` mid-handler | The workload's Pebble became unreachable -- guard Pebble calls with `try`/`except` rather than `can_connect()` ([](ops.Container.can_connect)). | | ||
| | `pebble_custom_notice` never fires | Confirm the notice was recorded with `pebble notices`; check the `key` your handler matches on ([](#k8s-pebble-cli)). | | ||
| | Workload won't go ready despite running | A health check is failing -- `pebble checks` and `pebble check <name> --refresh` ([](#k8s-pebble-cli)). | |
This PR adds a follow-on guide to the recent how-to for debugging, specifically focused on K8s charms and using Pebble.
At the recent sprints we received a couple of comments that more information was needed for debugging in this specific case, so this is addressing those.
The main focus is on Pebble, but there's a little bit for K8s directly, without going all the way into being a guide for debugging K8s itself.
Preview
Fixes #2489