apecloud · weicao · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/docs/addon-vcluster-cluster-delete-convergence-guide.md b/docs/addon-vcluster-cluster-delete-convergence-guide.md
@@ -0,0 +1,175 @@
+# vcluster KB Cluster Delete Convergence
+
+> **Audience**: addon test author writing per-run cleanup logic that deletes a KB Cluster CR and waits for namespace to be empty
+> **Status**: stable
+> **Applies to**: any KB addon test running inside vcluster (Loft / Sealos / hand-rolled) backed by a shared host K8s
+> **Companion docs**:
+> - [`addon-vcluster-bounded-convergence-window-guide.md`](addon-vcluster-bounded-convergence-window-guide.md) — chaos pod-delete + replacement window; this guide is the parallel for CR delete + finalizer chain
+> - [`addon-terminating-archive-before-force-finalizer-guide.md`](addon-terminating-archive-before-force-finalizer-guide.md) — never force / patch finalizer before evidence
+
+When a test deletes a KB `Cluster` CR (`kubectl delete cluster <name>` or
+`terminationPolicy: Delete`), inside vcluster the actual host-side teardown
+of pods + PVCs + InstanceSet + namespace mappings is much slower than the
+same operation in single-tier K8s. If your test's `cleanup wait` is too
+short, you will misclassify slow-but-converging cleanup as a finalizer
+deadlock or a KB controller bug.
+
+## Symptom
+
+After `kubectl delete cluster <name>`:
+
+- Cluster object stays in `Deleting` with finalizer
+  `cluster.kubeblocks.io/finalizer`.
+- Component condition `wait for the workloads to be deleted` looping for
+  >3 min.
+- Pods phase `Failed/Error` with deletionTimestamp set but no finalizer
+  and not garbage-collected yet.
+- PVCs `Terminating` with `kubernetes.io/pvc-protection` finalizer.
+- InstanceSet `delete OK` in controller log but the object is still
+  visible from `kubectl get`.
+
+In single-tier k3d these typically clear within 2-3 min. In vcluster the
+same chain commonly takes 10-15 min, occasionally up to 25 min.
+
+## Mechanism (engine-neutral)
+
+```
+T0 : kubectl delete cluster X → Cluster has deletionTimestamp + cluster.kubeblocks.io/finalizer
+T0+: KB cluster-controller sees Cluster Deleting → cascades delete to Component → Component cascades to InstanceSet
+T0+: InstanceSet's owner-deletion handler issues delete for owned Pods, PVCs, headless Service
+T0+: vcluster syncer sees the vcluster-side Pod deletionTimestamp → propagates delete to host Pod
+T0+30s..3min: host kubelet runs preStop / SIGTERM / SIGKILL; host Pod enters Terminating
+T0+: host container runtime stops containers (engine flush, fsync); pod object GC
+T0+: vcluster syncer sees host Pod gone → propagates back to vcluster, removes vcluster Pod object
+T0+: vcluster InstanceSet's controller observes Pod gone → InstanceSet finalizer cleared → Component finalizer cleared → Cluster finalizer cleared
+T0+10..15min (typical): Cluster object actually disappears
+T0+25..30min (worst case): same but with engine taking longer to flush / fsync, or syncer GC running behind
+```
+
+Every "→" between vcluster and host is an async syncer hop with seconds-to-minutes
+of lag. The chain is long; the multiplier vs single-tier k3d is 4-8x.
+
+## Recommended cleanup-wait baseline
+
+| Test step | Single-tier k3d | vcluster |
+|---|---|---|
+| Per-run `cleanup wait` after `K delete cluster` | 180s | **≥1500s (25 min)** |
+| Cross-test namespace clean check | 300s | **≥1800s (30 min)** |
+| Soak teardown after EXIT trap | 600s | **≥1800s** |
+
+Concrete recommended snippet:
+
+```bash
+# After cleanup wait, check residual pods managed by KubeBlocks.
+# CRITICAL: preserve kubectl rc. `2>/dev/null | wc -l` would silently
+# convert an API timeout / RBAC denial / NotFound into "0 pods" and
+# misclassify env failure as a clean cleanup. Also: under `set -e`,
+# `stdout=$(K ...)` would exit before `pods_rc=$?` runs, so use the
+# `if cmd; then ok; else rc=$?; fi` form so the failing branch keeps
+# the rc. Use a per-iteration stderr file to avoid clobbering on retry.
+local cd=$((SECONDS + 1500))   # 25 min
+local iter=0
+local pods="" pods_rc=0
+local cleanup_evd_dir="${EVD:-/tmp}/cleanup-wait"
+mkdir -p "$cleanup_evd_dir"
+while [ "$SECONDS" -lt $cd ]; do
+  iter=$((iter + 1))
+  local stderr_file="$cleanup_evd_dir/get-pod-$(printf '%04d' "$iter").err"
+  local stdout
+  if stdout=$(K get pod -n "$NS" \
+        -l "app.kubernetes.io/instance=$cluster" --no-headers \
+        2>"$stderr_file"); then
+    pods_rc=0
+  else
+    pods_rc=$?
+  fi
+  if [ "$pods_rc" -ne 0 ]; then
+    # API error: do NOT treat as "clean". Wait and retry; per-iter file
+    # is kept for evidence (no clobbering across retries).
+    echo "cleanup wait iter=$iter: kubectl rc=$pods_rc stderr=$(head -c 200 "$stderr_file")"
+    sleep 15
+    continue
+  fi
+  pods=$(printf '%s' "$stdout" | grep -c . || true)
+  [ "$pods" = "0" ] && break
+  sleep 15
+done
+if [ "$pods_rc" -ne 0 ]; then
+  echo "cleanup wait: API not healthy at the end (rc=$pods_rc) — route as env"
+  return 2
+fi
+echo "cleanup wait done. residual pods=$pods (rc=0 verified, iters=$iter)"
+```
+
+Three-track verdict (rc + stderr + observed count) avoids the silent-fallback
+trap. If `pods_rc != 0` at the end of the window, treat as environment, not
+"clean". See `addon-kubectl-pipeline-evidence-integrity-guide.md` for the
+general principle.
+
+If `pods > 0` (with rc=0) after 25 min, route to DevOps with the residual
+object names
+and time window (per `addon-vcluster-bounded-convergence-window-guide.md`
+escalation path). Do NOT `--force --grace-period=0` and do NOT patch the
+finalizer — that loses the diagnostic evidence and may corrupt cluster
+state.
+
+## When NOT to assume cleanup is stuck
+
+The "stuck cleanup" pattern is convergent in vcluster. Before escalating:
+
+1. Run the cleanup wait for at least the recommended baseline.
+2. Check `kubectl get events` for `delete ... successful` lines from
+   InstanceSet — proves the controller side has fired.
+3. Check `kubectl logs deploy/kb-kubeblocks -n kb-system` for `wait for the
+   workloads to be deleted` looping repeatedly with no error — proves KB is
+   waiting on InstanceSet, not deadlocked.
+4. If all of the above are normal, **wait longer**. Cluster will converge.
+
+## When the cleanup IS stuck
+
+Stuck signals (must hold for >25 min after delete):
+
+- Cluster.deletionTimestamp + finalizer present
+- AND no progress from `kubectl get pod -n <ns>` for >10 min straight
+- AND host-side Pod (resolvable via `<pod>-x-<vc-ns>-x-<vc-name>` mapping)
+  is also stuck (request DevOps to read-only check)
+- AND KB controller log shows no recent reconciliation activity for that
+  cluster
+
+In that case use the escalation packet:
+
+```text
+Environment blocker (vcluster cleanup stuck):
+- target: vcluster <name>, ns <ns>, cluster <cluster>
+- cluster delete time: <ts>
+- residual: pod-X phase Failed, PVC data-pod-X Terminating
+- finalizer present: cluster.kubeblocks.io/finalizer
+- controller log: "wait for the workloads to be deleted" looping >25 min
+- ruled out: cluster delete event propagated (instanceset DELETE OK in logs)
+- exact action: read-only check host-side pod state + node kubelet + container runtime; if stuck, restart kubelet on affected node (host-side) without force-deleting vcluster objects
+- work continuing: other lanes / other vcluster
+```
+
+## Source observations
+
+- 2026-05-15 Henry batch 4 self-induced kill of in-flight cluster (~25 min
+  observed) — initially misclassified as finalizer deadlock; Mason
+  read-only investigation
+  (`mysql-cleanup-residual-chaos-b4-73664-mason-readonly-20260515T114316Z.tar.gz`
+  sha `1e9d7fec0b904c1a797ee3684c4f179cc13d1f5f2c966b20ed3065c9932cb58b`)
+  showed the delete event DID propagate through syncer at T0+0,
+  host kubelet entered Killing path at T0+0, host Pod actually
+  GC'd at T0+4..13 min, vcluster mapping cleared at +13 min. No
+  deadlock, just slow convergence.
+- Henry's batch 4 / batch 5 / batch 6 / soak scripts updated to use 1500s
+  cleanup wait baseline after this finding.
+
+## Related skills / docs
+
+- `skills/soak-test-classification/SKILL.md` — classify long-run findings;
+  cleanup-stuck-but-eventually-converged is `external-environmental-cascade`,
+  not invariant break
+- `docs/addon-vcluster-bounded-convergence-window-guide.md` — chaos pod-delete
+  side of the same vcluster syncer multiplier story
+- `docs/addon-terminating-archive-before-force-finalizer-guide.md` — never
+  force-finalize before evidence