LucaButBoring
diff --git a/‎plugins/mcp-spec/agents/spec-qa.md‎
Lines changed: 36 additions & 11 deletions b/‎plugins/mcp-spec/agents/spec-qa.md‎
Lines changed: 36 additions & 11 deletions
diff --git a/‎plugins/mcp-spec/agents/spec-reviewer.md‎
Lines changed: 16 additions & 1 deletion b/‎plugins/mcp-spec/agents/spec-reviewer.md‎
Lines changed: 16 additions & 1 deletion
diff --git a/‎plugins/mcp-spec/skills/spec-annotate/SKILL.md‎
Lines changed: 8 additions & 1 deletion b/‎plugins/mcp-spec/skills/spec-annotate/SKILL.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎plugins/mcp-spec/skills/spec-annotation-workflow/SKILL.md‎
Lines changed: 9 additions & 2 deletions b/‎plugins/mcp-spec/skills/spec-annotation-workflow/SKILL.md‎
Lines changed: 9 additions & 2 deletions
diff --git a/‎plugins/mcp-spec/skills/spec-diff/SKILL.md‎
Lines changed: 33 additions & 9 deletions b/‎plugins/mcp-spec/skills/spec-diff/SKILL.md‎
Lines changed: 33 additions & 9 deletions
@@ -6,22 +6,26 @@ description: Use this agent as a quality gate on annotation artifacts. It valida
 
 You are a QA Agent for SEP annotation artifacts. Your job is to audit the quality of `meta-spec.json` and `annotations.json` and return a structured verdict.
 
+## Execution Constraints
+
+This is a quick checklist audit, not a deep investigation. Read the two JSON files and the SEP, run through the checks, and return the verdict. Use the Read tool to load the files — do not shell out to jq, python, or other tools to query the JSON. Parse and evaluate the data from what you read directly. Aim for 10-15 tool calls total.
+
 ## Input
 
 You will receive a SEP number. Read these files from `.reviews/SEP-{n}/`:
 
 - `meta-spec.json` — extracted requirements
 - `annotations.json` — annotation data
-- The original SEP from `seps/{n}-*.md`
+- The original SEP from `seps/{n}-*.md` (or `.reviews/SEP-{n}/sep-source.md` if it hasn't been merged)
 
 ## Checklist
 
 Run through every check below. For each failure, record the requirement ID and a specific description of the problem.
 
 ### Requirements Quality (meta-spec.json)
 
-1. **EARS format**: Every requirement's `summary` follows an EARS pattern (When/While/If/Where/The [actor] shall [action]). Flag summaries that are vague noun phrases ("Task ID handling") or missing an actor.
-2. **Specific actors**: The actor in each summary is a concrete party (receiver, requestor, server, client) — not "the system," "implementations," or passive voice.
+1. **EARS format**: Every requirement's `summary` follows an EARS pattern (When/While/If/Where/The [actor] shall [action]). Flag summaries that are vague noun phrases ("Task ID handling") or missing an actor. Exception: `must-document` and `must-not-change` requirements may use "The specification shall..." or "The protocol shall..." as their actor — these describe spec edits, not runtime behavior.
+2. **Specific actors**: The actor in each summary is a concrete party (receiver, requestor, server, client, specification, protocol) — not "the system," "implementations," or passive voice.
 3. **Affected paths present**: Every requirement has at least one entry in `affected_paths`. Empty arrays are failures.
 4. **Source quotes present**: Every requirement has a non-empty `source.quote`. The quote should be verbatim from the SEP (spot-check a few against the actual SEP text).
 5. **Group coherence**: Requirements within the same `group` are genuinely related. Flag requirements that seem miscategorized.
@@ -31,27 +35,38 @@ Run through every check below. For each failure, record the requirement ID and a
 
 7. **No empty explanations**: Every annotation (including `not_addressed`) has a non-empty `explanation` field.
 8. **Explanation specificity**: Spot-check at least 5 satisfied annotations — each explanation should name specific code/text from the hunks it references. Flag generic explanations like "Documentation discusses X" or "Adds support for Y."
+   8b. **Current-version language**: Explanations and summaries should describe spec behavior in terms of the current version only. Flag language that references old specification versions, describes migration paths, or explains backward-compatibility logic — unless a specific requirement explicitly asks for backward-compatibility documentation.
 9. **Multi-hunk synthesis**: For annotations with 3+ hunks, the explanation should reference what each hunk contributes. Flag annotations where the explanation doesn't mention their multiple locations.
 10. **No cross-product noise**: No requirement should be annotated on more than 8 hunks. Flag any that exceed this — it likely means the agent matched too broadly.
 11. **Reasonable annotation density**: Total annotations across all hunks should be roughly 1-3x the requirement count. If total annotations exceed 5x requirements, the matching was too aggressive.
 12. **Not-addressed explanations**: Every `not_addressed` annotation explains _why_ — was the feature removed? Is it a behavioral guideline? Deferred? Flag empty or unexplained not-addressed items.
 13. **Patch text present**: Spot-check that hunks in the top-level `files` array have non-empty `patch_text` fields. Note: the `hunks` arrays inside individual annotations in the `annotations` dict intentionally only contain `file` and `hunk_header` (they are references, not full data). Only check the `files` array for `patch_text`.
 
+### Implementation Substance
+
+14. **Diff contains real spec changes**: The annotated diff should contain actual specification implementation — edits to `schema/draft/schema.ts`, `docs/specification/draft/**/*.mdx`, or similar source-of-truth files. The SEP markdown file itself (`seps/*.md`) is NOT the implementation; it is the proposal document. If the only changed file is the SEP itself, this is an error — the implementer has not yet produced spec changes.
+15. **Satisfied annotations reference implementation, not the SEP**: Spot-check satisfied annotations. Their hunk references should point to spec/schema files, not to the SEP file. A requirement cannot be "satisfied" by the proposal describing what should happen — it is satisfied by the implementation that makes it happen. Flag any satisfied annotation whose only hunks are in `seps/*.md`.
+
+### Blast Radius
+
+16. **No unaccounted spec changes**: Read the `files` array and identify any hunks that are NOT referenced by any annotation. These are spec changes that don't map to any requirement — they may be correct supporting changes, or they may represent undocumented scope creep. Flag files/hunks with zero annotation references so a reviewer can verify they're intentional.
+17. **Missing requirements**: Scan the SEP for concepts, methods, types, or behaviors that appear in the specification sections but have no corresponding requirement in the meta-spec. Compare the SEP's section headings and key terms against the requirement groups. Flag gaps where a SEP section has no requirements extracted from it.
+
 ### Completeness
 
-14. **Bidirectional hunk links**: Every annotation with status `satisfied`, `violated`, or `unclear` must have a non-empty `hunks` array in the `annotations` dict. Cross-check: for each annotation ID referenced in the `files` array's hunk `annotations` lists, verify the same hunk appears in the annotation's `hunks` array. Flag missing reverse links.
-15. **All requirements covered**: Every requirement ID from meta-spec.json appears as a key in `annotations`. Flag missing IDs.
-16. **Summary counts match**: The `summary` counts (satisfied + violated + unclear + not_addressed) equal the total number of annotations.
-17. **Generated files skipped**: `schema/draft/schema.json` and generated `schema.mdx` should not be major annotation sources — most annotations should reference `.ts` and `.mdx` source files.
+18. **Bidirectional hunk links**: Every annotation with status `satisfied`, `violated`, or `unclear` must have a non-empty `hunks` array in the `annotations` dict. Cross-check: for each annotation ID referenced in the `files` array's hunk `annotations` lists, verify the same hunk appears in the annotation's `hunks` array. Flag missing reverse links.
+19. **All requirements covered**: Every requirement ID from meta-spec.json appears as a key in `annotations`. Flag missing IDs.
+20. **Summary counts match**: The `summary` counts (satisfied + violated + unclear + not_addressed) equal the total number of annotations.
+21. **Generated files skipped**: `schema/draft/schema.json` and generated `schema.mdx` should not be major annotation sources — most annotations should reference `.ts` and `.mdx` source files.
 
 ## Output
 
-Return a JSON object in your response. Issues are split into two categories so the caller knows which agent to dispatch for fixes:
+Return a JSON object in your response. Issues are split into three categories so the caller knows which agent to dispatch for fixes:
 
 ```json
 {
   "verdict": "pass" | "fail",
-  "score": "14/16",
+  "score": "19/21",
   "meta_spec_issues": [
     {
       "check": 1,
@@ -69,13 +84,23 @@ Return a JSON object in your response. Issues are split into two categories so t
       "affected": ["TAD-001", "TAD-002", "AUA-001", "..."],
       "fix_hint": "Add explanations stating why each requirement is not covered (removed feature, behavioral guideline, deferred, etc.)"
     }
+  ],
+  "implementation_issues": [
+    {
+      "check": 14,
+      "severity": "error",
+      "description": "Diff only contains the SEP file itself — no spec/schema implementation found",
+      "affected": [],
+      "fix_hint": "The spec-implementer must run to produce actual edits to schema/draft/schema.ts and docs/specification/draft/ files"
+    }
   ]
 }
 ```
 
 - **verdict**: `pass` if no errors (warnings are okay), `fail` if any errors exist
 - **severity**: `error` = must fix before the review is usable, `warning` = should fix but doesn't block
-- **meta_spec_issues**: Problems with `meta-spec.json` (checks 1-6) — these need the meta-spec to be updated before re-annotating
-- **annotation_issues**: Problems with `annotations.json` (checks 7-16) — these can be fixed by resuming the reviewer
+- **meta_spec_issues**: Problems with `meta-spec.json` (checks 1-6) — fix the meta-spec before re-annotating
+- **annotation_issues**: Problems with `annotations.json` (checks 7-13) — resume the reviewer to fix
+- **implementation_issues**: Problems with what was implemented (checks 14-17) — the implementer needs to run or re-run
 - **fix_hint**: Actionable instruction the fixing agent can follow
 - Only include checks that found issues — omit passing checks
@@ -35,7 +35,22 @@ You may be resumed by the orchestrator with a list of annotation issues from the
 4. Re-render the HTML via the render script
 5. Return a summary of what you fixed
 
-Do not re-run the full pipeline — only fix the specific issues identified.
+Do not re-run the full pipeline — only fix the specific issues identified. Use the render script to re-render after fixes:
+
+```
+python3 plugins/mcp-spec/skills/spec-render/scripts/render.py .reviews/SEP-{n}/meta-spec.json .reviews/SEP-{n}/annotations.json .reviews/SEP-{n}/annotated-diff.html
+```
+
+## Output Constraints
+
+Write ONLY these files to `.reviews/SEP-{n}/`:
+
+- `meta-spec.json`
+- `annotations.json`
+- `annotated-diff.html` (via render script)
+- `pr-diff.txt`, `parsed-diff.json`, `matches.json` (intermediate artifacts)
+
+Do not create summary.md, README.md, QA-FIXES.md, or any other supplementary files.
 
 ## Output
 
 
@@ -52,6 +52,13 @@ If `meta_spec_issues` contains errors:
 ```
 Re-annotate SEP-{sep_number}. Mode: validator. {commit_range if provided, else "PR mode."}
 The meta-spec was updated to fix QA issues. Re-annotate the diff against it and re-render.
+
+Use the pre-built scripts — do NOT write HTML manually or create custom Python scripts:
+- python3 plugins/mcp-spec/skills/spec-diff/scripts/parse_diff.py (parse diff)
+- python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py (build skeleton)
+- python3 plugins/mcp-spec/skills/spec-render/scripts/render.py (render HTML)
+
+Write ONLY meta-spec.json, annotations.json, and annotated-diff.html. No summary.md, README, or other files.
 ```
 
 Save this new reviewer's agent ID (replacing the old one).
@@ -68,7 +75,7 @@ The QA agent found these annotation issues. Fix them in annotations.json and re-
 {paste annotation_issues JSON here}
 ```
 
-After the reviewer finishes, re-run `spec-qa` to verify. Allow up to 2 total QA rounds — if still failing after 2 fix attempts, report remaining issues to the user rather than looping further.
+After the reviewer finishes, re-run `spec-qa` to verify. **Convergence rule:** Track the QA score across attempts. If the score does not improve after one fix round, stop the QA loop and proceed — do not retry the same fixes. Maximum 2 fix rounds total. Report remaining warnings to the user but do not block on them.
 
 ### Step 5: Report
 
 
@@ -69,15 +69,22 @@ Since the script runs instantly, extraction can begin immediately while the diff
 
 ### Step 4: Annotate the diff (requires steps 2 & 3 complete)
 
-If you saved the diff to a file, parse it with the script first:
+If you saved the diff to a file, parse and scaffold it with the scripts:
 
 ```bash
+# Parse and split hunks
 python3 plugins/mcp-spec/skills/spec-diff/scripts/parse_diff.py \
   .reviews/SEP-{sep_number}/pr-diff.txt \
   .reviews/SEP-{sep_number}/parsed-diff.json
+
+# Build annotation skeleton (all requirements as not_addressed, patch_text included, generated files excluded)
+python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py \
+  .reviews/SEP-{sep_number}/meta-spec.json \
+  .reviews/SEP-{sep_number}/parsed-diff.json \
+  .reviews/SEP-{sep_number}/annotations.json
 ```
 
-Then follow the `spec-diff` skill instructions to annotate each hunk against the requirements. Write `annotations.json` to `.reviews/SEP-{sep_number}/annotations.json`.
+Then read the skeleton `annotations.json` and fill in each requirement's `status`, `summary`, `explanation`, and `hunks` references. Follow the `spec-diff` skill instructions for matching rules and explanation quality. You can either edit annotations.json directly, or write a `matches.json` and re-run the annotate script with `--matches` to have it handle bidirectional linking automatically.
 
 ### Step 5: Render HTML
 
 
@@ -155,15 +155,39 @@ python3 plugins/mcp-spec/skills/spec-diff/scripts/parse_diff.py <diff_file> <par
 
 This produces a JSON file with files split into logical hunks (MDX files split on `##` headings, TS files split on declarations). If you received per-file patches from the GitHub API instead of a raw diff file, you can skip this script and split hunks manually following the rules in "Splitting Large Hunks" above.
 
-### Phase 2: Annotate (agent)
-
-1. Read `meta-spec.json` to load all requirements
-2. Read the parsed diff (from the script or from API data)
-3. For each hunk, check relevant requirements and create annotations with full explanations
-4. **Copy the `patch_text` from the parsed diff into each hunk in annotations.json.** The render script needs the patch text to display the diff. If `patch_text` is empty, the HTML will show empty hunks.
-5. Build the requirement coverage summary
-6. Compute summary counts
-7. Write `annotations.json` to the output path
+### Phase 2: Build annotation skeleton (script)
+
+Generate a skeleton annotations.json with all structure pre-populated:
+
+```bash
+python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py \
+  <meta_spec.json> <parsed_diff.json> <output_annotations.json>
+```
+
+This produces a valid annotations.json with:
+
+- All requirements pre-populated as `not_addressed` with empty summary/explanation
+- Files array with `patch_text` copied from the parsed diff
+- Generated files excluded
+- Bidirectional link infrastructure ready
+
+### Phase 3: Fill annotations (agent)
+
+Read the skeleton annotations.json and for each requirement:
+
+1. Read the hunks in the `files` array to find which ones address this requirement
+2. Update the annotation's `status`, `summary`, and `explanation`
+3. Add hunk references to the annotation's `hunks` array
+4. Add the requirement ID to each referenced hunk's `annotations` list in the `files` array
+
+Alternatively, write a `matches.json` file and re-run the script to apply it:
+
+```bash
+python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py \
+  <meta_spec.json> <parsed_diff.json> <output.json> --matches <matches.json>
+```
+
+The matches file maps requirement IDs to their status, summary, explanation, and hunk references. The script handles all bidirectional linking and summary counting automatically.
 
 ### Deduplication