docs(controller): add root-cause vs fail-fast fix direction guide#240
Closed
weicao wants to merge 1 commit into
Closed
docs(controller): add root-cause vs fail-fast fix direction guide#240weicao wants to merge 1 commit into
weicao wants to merge 1 commit into
Conversation
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Reviewing controller bug fixes this week exposed a recurring failure mode: a PR adds an early
fail-fastcheck at the entry that originally hit the bug, the symptom disappears in that one scenario, but the root cause in the controller state-machine remains. Any other entry that writes the same illegal state later reproduces the bug.Field example: PR #10254 v1 (
27fa72416) added schema validation inpkg/operations/reconfigure.goso the Reconfigure Ops entry rejects an invalid request before writingComponentParameter.Spec.Desired. That stops the Valkey false-success scene from that one entry, but nothing prevents another writer (Cluster API, controller reconcile, or a future ops type) from putting the same illegal value intoDesiredand reaching the same false-success state. Three rounds of westonnnn pushback later, v3 (08a7c6482) moved the protection into the ComponentParameter controller's own processing ofSpec.Desired, where the protection covers every entry.If we do not write down this lesson, the team will keep accepting fail-fast PRs as "fix" when they are actually short-term workarounds; the root-cause work then gets pushed off indefinitely.
Scope
New short guide
docs/controller/addon-controller-root-cause-vs-fail-fast-fix-direction-guide.md(91 lines, one topic per file peraddon-docs-writing).Why
Two hard rules need to land in the team workflow:
The guide expresses both as decision tree, anti-pattern table, and a concrete case appendix from PR #10254 v1-v6.
Verification
addon-docs-writing3Q self-check passed (jargon removed, 60-second readability, first body paragraph states the problem).addon-controller-patch-identity-preservation-guide.md,addon-controller-pr-author-motivation-and-review-channel-guide.md,addon-controller-diagnostic-pr-scope-guide.md) without rule duplication.Related
docs/controller/subdirectory, one topic per file.