Skip to content

docs: reconfigure SET GLOBAL persist race methodology + MariaDB case#236

Merged
weicao merged 1 commit into
mainfrom
helen/reconfigure-set-global-persist-race-20260519
May 19, 2026
Merged

docs: reconfigure SET GLOBAL persist race methodology + MariaDB case#236
weicao merged 1 commit into
mainfrom
helen/reconfigure-set-global-persist-race-20260519

Conversation

@weicao
Copy link
Copy Markdown
Contributor

@weicao weicao commented May 19, 2026

Summary

Sediment a false-success class that any addon team with restart-prone topologies and runtime-only reconfigure semantics may hit.

1. Engine-neutral methodology guide

  • docs/troubleshoot/addon-reconfigure-set-global-persist-race-guide.md explains the case where reconfigureAction only runs runtime-apply statements such as SET GLOBAL, CONFIG SET, or ALTER SYSTEM without persisting to disk, and a later engine process restart wipes the runtime state back to chart defaults.

Documents:

  • trigger conditions: runtime-only action, engine restart in the window, and stale startup config
  • symptom pattern: OpsRequest succeeds but live engine values read chart defaults after switchover / OOMKill / node drain / default rolling restart
  • common misjudgments to avoid
  • two-layer defense: complete ParametersDefinition + persisted override path loaded on engine startup
  • verify gate: PD dynamic hit, no rolling restart, all pods reflect new values, forced process restart preserves values

2. MariaDB case appendix

  • docs/cases/mariadb/mariadb-reconfigure-set-global-without-persist-race-case.md keeps only the MariaDB-specific evidence: semisync missing ParametersDefinition, switchover / promote restart, stale ConfigMap-mounted my.cnf, evidence table, timeline, four chart fixes, boundary, and artifact hashes.

Index updates

  • docs/troubleshoot/README.md adds the new methodology guide entry.
  • docs/SKILL-INDEX.md bumps MariaDB case count 15 → 16.

Test plan

  • git diff --check origin/main...HEAD
  • Changed markdown links resolve
  • Methodology page is engine-neutral; MariaDB details are in the case appendix
  • Methodology is at the 150-line budget; case is under 120 lines
  • README and SKILL-INDEX updated
  • Public hygiene clean for PR body, commit message, and changed docs

@weicao
Copy link
Copy Markdown
Contributor Author

weicao commented May 19, 2026

Docs gate HOLD for now. The topic is useful and the split (methodology + MariaDB case) is the right direction, but this PR is not merge-ready yet.

Blockers:

  1. Broken markdown links in changed docs.

    • docs/troubleshoot/addon-reconfigure-set-global-persist-race-guide.md links to ../addon-evidence-discipline-guide.md and ../addon-bounded-eventual-convergence-guide.md, but those files now live under ../test/ from this guide.
    • docs/cases/mariadb/mariadb-reconfigure-set-global-without-persist-race-case.md links to ../../addon-evidence-discipline-guide.md; that should also point at the current docs/test/ location.
    • PR body currently says markdown links are clean, so please update the test plan after fixing.
  2. Case doc is too long and repeats the methodology doc.

    • docs/cases/mariadb/mariadb-reconfigure-set-global-without-persist-race-case.md is 147 lines, above the default case target (<120).
    • The case repeats generic sections already covered by the guide: symptoms, evidence checklist, verify gate, and cross-addon lessons. Keep the case to MariaDB-specific root cause, evidence table, fix path, boundary, and artifact hashes; link back to the guide for the generic method.
  3. Standard intro metadata is incomplete.

    • The methodology guide has Audience and Status, but misses Applies to, Applies to KB version, and Affected by version skew.
    • The case doc misses Affected by version skew.
    • Please use the standard metadata block from addon-docs-writing so future readers know the version boundary.
  4. Public hygiene still has stale PR / commit wording.

    • PR body line about Claude Code / Codex agents should be neutral. Say “human engineers and agents” instead.
    • Commit body still says docs/SKILL-INDEX.md: bump mariadb case count 13 → 14, while the PR body and file state are 15 → 16.
    • Commit body also contains the same public-attribution literal wording. Please amend the commit message and re-run the hygiene grep across origin/main..HEAD.
  5. The guide needs the required problem paragraph before the plain-language section.

    • Right after the H1 + metadata, add one short paragraph that states the concrete problem: OpsRequest can succeed while runtime values revert after an engine process restart because the action only applied runtime state and did not persist it.
    • Then keep ## 先用白话理解这篇文档 as the next section.

After you fix and force-push, I’ll rerun: git diff --check, changed-doc link check, line budget, commit/PR-body public hygiene, and README/SKILL-INDEX consistency.

@weicao weicao force-pushed the helen/reconfigure-set-global-persist-race-20260519 branch from a5e225f to 4001551 Compare May 19, 2026 06:16
Add an engine-neutral troubleshooting guide for reconfigure actions that only apply runtime state, plus a MariaDB semisync case that shows how a process-level restart can make OpsRequest success revert to chart defaults.

The guide documents trigger conditions, symptom checks, evidence collection, a two-layer defense with ParametersDefinition and persisted override files, and a verify gate. The MariaDB case keeps the engine-specific evidence table, timeline, chart fix path, boundaries, and artifact hashes.

Update the troubleshoot README and MariaDB case count in SKILL-INDEX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant