docs: add runner port-forward staleness keepalive (stop-gap) guide#126
Open
weicao wants to merge 1 commit into
Open
docs: add runner port-forward staleness keepalive (stop-gap) guide#126weicao wants to merge 1 commit into
weicao wants to merge 1 commit into
Conversation
Methodology body covers: - Why kubectl port-forward can be "process alive but TCP stream dead" - 6 hard rules: monitor only probes + restarts (no proxy / no retry); separate INTERVAL and FAIL_THRESHOLD; precise pkill pattern (no collateral kill of other port-forwards); monitor liveness logged to file; classified as runner-harness only (not product / addon / KB fix evidence); when in-cluster pattern is available, prefer it and let keepalive retire - 5-point PR review checklist - 3 anti-pattern vs correct-pattern pairs - Explicit framing as stop-gap; preferred structural fix references addon-runner-incluster-vcluster-access-pattern-guide.md Appendix A is OceanBase enterprise addon N=3 attempt pf-staleness case + 1-sample keepalive landing (PID 87376, 3 restarts) + later structural migration retiring the monitor. Explicit boundary: keepalive 1 sample only, not extrapolated to permanent staleness elimination.
3 tasks
Contributor
Author
|
Blocking for merge:
The stop-gap framing is important and should stay: port-forward keepalive is transitional; new/long-running test paths should prefer host-runner + in-cluster vcluster API access. |
weicao
pushed a commit
that referenced
this pull request
May 17, 2026
…llout guide Sediment from 2026-05-18 SQL Server PITR PR #126 backport second-round validation: after revert to stock + delay, the backport image was still in containerd cache on node3, so re-sideload was not needed. Document the probe pod recipe (nsenter + crictl images filter) and when re-sideload is actually required (node restart + GC / explicit rmi / node rebuild). This avoids unnecessary DevOps round-trips for multi-round agent loops. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New methodology doc
addon-runner-portforward-staleness-keepalive-guide.mdcovering workstation-sidekubectl port-forwardsilent TCP staleness (process alive, stream dead) and a lightweight/versionprobe + auto-restart stop-gap monitor. The doc explicitly frames the keepalive as transitional; preferred structural fix isaddon-runner-incluster-vcluster-access-pattern-guide.md.Body (generic methodology, version-agnostic, no engine binding):
pkill -fpattern that won't kill other port-forwards on workstation; monitor liveness logged; framed as runner-harness only (NOT product / addon / KB / vcluster fix evidence); when in-cluster runner pattern is feasible, prefer it and retire keepaliveaddon-runner-incluster-vcluster-access-pattern-guide.md,addon-runner-openapi-schema-fetch-brittleness-guide.md,addon-test-runner-cadence-discipline-guide.md,addon-evidence-discipline-guide.mdAppendix A is OceanBase enterprise addon case: (A.1) N=3 attempt RUN_ID
pitr-runtime-runner-hardening-N3-...hit T1 1800s budget overflow due to workstation pf-staleness; remote was healthy throughout. (A.2) keepalive landing in RUN_IDpitr-runtime-pf-keepalive-N3-...: monitor PID 87376 alive 47m21s, 3 auto-restarts (05:33:26Z / 05:34:13Z / 05:38:51Z), T1 PASSed where prior attempt failed. (A.3) subsequent migration to in-cluster pattern retired the monitor. Explicit boundary: keepalive 1-sample landing not extrapolated to permanent immunity.SKILL-INDEX.md updated: added entry under
### 5. 改造 runner / 工具链.Test plan
🤖 Generated with Claude Code