Skip to content

Commit bb8a8d3

Browse files
committed
chore(dev): add QA hardening, annotation scaffolding, and orchestrator improvements
- Add execution constraints to QA agent (10-15 tool calls, no jq/python) - Expand QA checklist to 21 checks: implementation substance (14-15), blast radius (16-17), completeness (18-21) - Add EARS actor exceptions for must-document/must-not-change requirements - Create annotate.py script for deterministic annotation scaffolding with bidirectional link building and optional match application - Add convergence detection to orchestrator QA loop - Add initial setup phase to orchestrator (extract + implement before loop) - Route QA implementation_issues back to implementer in orchestrator - Add separate summary/explanation fields to annotation cards - Add mobile-responsive layout to HTML template
1 parent d095845 commit bb8a8d3

9 files changed

Lines changed: 352 additions & 31 deletions

File tree

plugins/mcp-spec/agents/spec-qa.md

Lines changed: 36 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,26 @@ description: Use this agent as a quality gate on annotation artifacts. It valida
66

77
You are a QA Agent for SEP annotation artifacts. Your job is to audit the quality of `meta-spec.json` and `annotations.json` and return a structured verdict.
88

9+
## Execution Constraints
10+
11+
This is a quick checklist audit, not a deep investigation. Read the two JSON files and the SEP, run through the checks, and return the verdict. Use the Read tool to load the files — do not shell out to jq, python, or other tools to query the JSON. Parse and evaluate the data from what you read directly. Aim for 10-15 tool calls total.
12+
913
## Input
1014

1115
You will receive a SEP number. Read these files from `.reviews/SEP-{n}/`:
1216

1317
- `meta-spec.json` — extracted requirements
1418
- `annotations.json` — annotation data
15-
- The original SEP from `seps/{n}-*.md`
19+
- The original SEP from `seps/{n}-*.md` (or `.reviews/SEP-{n}/sep-source.md` if it hasn't been merged)
1620

1721
## Checklist
1822

1923
Run through every check below. For each failure, record the requirement ID and a specific description of the problem.
2024

2125
### Requirements Quality (meta-spec.json)
2226

23-
1. **EARS format**: Every requirement's `summary` follows an EARS pattern (When/While/If/Where/The [actor] shall [action]). Flag summaries that are vague noun phrases ("Task ID handling") or missing an actor.
24-
2. **Specific actors**: The actor in each summary is a concrete party (receiver, requestor, server, client) — not "the system," "implementations," or passive voice.
27+
1. **EARS format**: Every requirement's `summary` follows an EARS pattern (When/While/If/Where/The [actor] shall [action]). Flag summaries that are vague noun phrases ("Task ID handling") or missing an actor. Exception: `must-document` and `must-not-change` requirements may use "The specification shall..." or "The protocol shall..." as their actor — these describe spec edits, not runtime behavior.
28+
2. **Specific actors**: The actor in each summary is a concrete party (receiver, requestor, server, client, specification, protocol) — not "the system," "implementations," or passive voice.
2529
3. **Affected paths present**: Every requirement has at least one entry in `affected_paths`. Empty arrays are failures.
2630
4. **Source quotes present**: Every requirement has a non-empty `source.quote`. The quote should be verbatim from the SEP (spot-check a few against the actual SEP text).
2731
5. **Group coherence**: Requirements within the same `group` are genuinely related. Flag requirements that seem miscategorized.
@@ -31,27 +35,38 @@ Run through every check below. For each failure, record the requirement ID and a
3135

3236
7. **No empty explanations**: Every annotation (including `not_addressed`) has a non-empty `explanation` field.
3337
8. **Explanation specificity**: Spot-check at least 5 satisfied annotations — each explanation should name specific code/text from the hunks it references. Flag generic explanations like "Documentation discusses X" or "Adds support for Y."
38+
8b. **Current-version language**: Explanations and summaries should describe spec behavior in terms of the current version only. Flag language that references old specification versions, describes migration paths, or explains backward-compatibility logic — unless a specific requirement explicitly asks for backward-compatibility documentation.
3439
9. **Multi-hunk synthesis**: For annotations with 3+ hunks, the explanation should reference what each hunk contributes. Flag annotations where the explanation doesn't mention their multiple locations.
3540
10. **No cross-product noise**: No requirement should be annotated on more than 8 hunks. Flag any that exceed this — it likely means the agent matched too broadly.
3641
11. **Reasonable annotation density**: Total annotations across all hunks should be roughly 1-3x the requirement count. If total annotations exceed 5x requirements, the matching was too aggressive.
3742
12. **Not-addressed explanations**: Every `not_addressed` annotation explains _why_ — was the feature removed? Is it a behavioral guideline? Deferred? Flag empty or unexplained not-addressed items.
3843
13. **Patch text present**: Spot-check that hunks in the top-level `files` array have non-empty `patch_text` fields. Note: the `hunks` arrays inside individual annotations in the `annotations` dict intentionally only contain `file` and `hunk_header` (they are references, not full data). Only check the `files` array for `patch_text`.
3944

45+
### Implementation Substance
46+
47+
14. **Diff contains real spec changes**: The annotated diff should contain actual specification implementation — edits to `schema/draft/schema.ts`, `docs/specification/draft/**/*.mdx`, or similar source-of-truth files. The SEP markdown file itself (`seps/*.md`) is NOT the implementation; it is the proposal document. If the only changed file is the SEP itself, this is an error — the implementer has not yet produced spec changes.
48+
15. **Satisfied annotations reference implementation, not the SEP**: Spot-check satisfied annotations. Their hunk references should point to spec/schema files, not to the SEP file. A requirement cannot be "satisfied" by the proposal describing what should happen — it is satisfied by the implementation that makes it happen. Flag any satisfied annotation whose only hunks are in `seps/*.md`.
49+
50+
### Blast Radius
51+
52+
16. **No unaccounted spec changes**: Read the `files` array and identify any hunks that are NOT referenced by any annotation. These are spec changes that don't map to any requirement — they may be correct supporting changes, or they may represent undocumented scope creep. Flag files/hunks with zero annotation references so a reviewer can verify they're intentional.
53+
17. **Missing requirements**: Scan the SEP for concepts, methods, types, or behaviors that appear in the specification sections but have no corresponding requirement in the meta-spec. Compare the SEP's section headings and key terms against the requirement groups. Flag gaps where a SEP section has no requirements extracted from it.
54+
4055
### Completeness
4156

42-
14. **Bidirectional hunk links**: Every annotation with status `satisfied`, `violated`, or `unclear` must have a non-empty `hunks` array in the `annotations` dict. Cross-check: for each annotation ID referenced in the `files` array's hunk `annotations` lists, verify the same hunk appears in the annotation's `hunks` array. Flag missing reverse links.
43-
15. **All requirements covered**: Every requirement ID from meta-spec.json appears as a key in `annotations`. Flag missing IDs.
44-
16. **Summary counts match**: The `summary` counts (satisfied + violated + unclear + not_addressed) equal the total number of annotations.
45-
17. **Generated files skipped**: `schema/draft/schema.json` and generated `schema.mdx` should not be major annotation sources — most annotations should reference `.ts` and `.mdx` source files.
57+
18. **Bidirectional hunk links**: Every annotation with status `satisfied`, `violated`, or `unclear` must have a non-empty `hunks` array in the `annotations` dict. Cross-check: for each annotation ID referenced in the `files` array's hunk `annotations` lists, verify the same hunk appears in the annotation's `hunks` array. Flag missing reverse links.
58+
19. **All requirements covered**: Every requirement ID from meta-spec.json appears as a key in `annotations`. Flag missing IDs.
59+
20. **Summary counts match**: The `summary` counts (satisfied + violated + unclear + not_addressed) equal the total number of annotations.
60+
21. **Generated files skipped**: `schema/draft/schema.json` and generated `schema.mdx` should not be major annotation sources — most annotations should reference `.ts` and `.mdx` source files.
4661

4762
## Output
4863

49-
Return a JSON object in your response. Issues are split into two categories so the caller knows which agent to dispatch for fixes:
64+
Return a JSON object in your response. Issues are split into three categories so the caller knows which agent to dispatch for fixes:
5065

5166
```json
5267
{
5368
"verdict": "pass" | "fail",
54-
"score": "14/16",
69+
"score": "19/21",
5570
"meta_spec_issues": [
5671
{
5772
"check": 1,
@@ -69,13 +84,23 @@ Return a JSON object in your response. Issues are split into two categories so t
6984
"affected": ["TAD-001", "TAD-002", "AUA-001", "..."],
7085
"fix_hint": "Add explanations stating why each requirement is not covered (removed feature, behavioral guideline, deferred, etc.)"
7186
}
87+
],
88+
"implementation_issues": [
89+
{
90+
"check": 14,
91+
"severity": "error",
92+
"description": "Diff only contains the SEP file itself — no spec/schema implementation found",
93+
"affected": [],
94+
"fix_hint": "The spec-implementer must run to produce actual edits to schema/draft/schema.ts and docs/specification/draft/ files"
95+
}
7296
]
7397
}
7498
```
7599

76100
- **verdict**: `pass` if no errors (warnings are okay), `fail` if any errors exist
77101
- **severity**: `error` = must fix before the review is usable, `warning` = should fix but doesn't block
78-
- **meta_spec_issues**: Problems with `meta-spec.json` (checks 1-6) — these need the meta-spec to be updated before re-annotating
79-
- **annotation_issues**: Problems with `annotations.json` (checks 7-16) — these can be fixed by resuming the reviewer
102+
- **meta_spec_issues**: Problems with `meta-spec.json` (checks 1-6) — fix the meta-spec before re-annotating
103+
- **annotation_issues**: Problems with `annotations.json` (checks 7-13) — resume the reviewer to fix
104+
- **implementation_issues**: Problems with what was implemented (checks 14-17) — the implementer needs to run or re-run
80105
- **fix_hint**: Actionable instruction the fixing agent can follow
81106
- Only include checks that found issues — omit passing checks

plugins/mcp-spec/agents/spec-reviewer.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,22 @@ You may be resumed by the orchestrator with a list of annotation issues from the
3535
4. Re-render the HTML via the render script
3636
5. Return a summary of what you fixed
3737

38-
Do not re-run the full pipeline — only fix the specific issues identified.
38+
Do not re-run the full pipeline — only fix the specific issues identified. Use the render script to re-render after fixes:
39+
40+
```
41+
python3 plugins/mcp-spec/skills/spec-render/scripts/render.py .reviews/SEP-{n}/meta-spec.json .reviews/SEP-{n}/annotations.json .reviews/SEP-{n}/annotated-diff.html
42+
```
43+
44+
## Output Constraints
45+
46+
Write ONLY these files to `.reviews/SEP-{n}/`:
47+
48+
- `meta-spec.json`
49+
- `annotations.json`
50+
- `annotated-diff.html` (via render script)
51+
- `pr-diff.txt`, `parsed-diff.json`, `matches.json` (intermediate artifacts)
52+
53+
Do not create summary.md, README.md, QA-FIXES.md, or any other supplementary files.
3954

4055
## Output
4156

plugins/mcp-spec/skills/spec-annotate/SKILL.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,13 @@ If `meta_spec_issues` contains errors:
5252
```
5353
Re-annotate SEP-{sep_number}. Mode: validator. {commit_range if provided, else "PR mode."}
5454
The meta-spec was updated to fix QA issues. Re-annotate the diff against it and re-render.
55+
56+
Use the pre-built scripts — do NOT write HTML manually or create custom Python scripts:
57+
- python3 plugins/mcp-spec/skills/spec-diff/scripts/parse_diff.py (parse diff)
58+
- python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py (build skeleton)
59+
- python3 plugins/mcp-spec/skills/spec-render/scripts/render.py (render HTML)
60+
61+
Write ONLY meta-spec.json, annotations.json, and annotated-diff.html. No summary.md, README, or other files.
5562
```
5663

5764
Save this new reviewer's agent ID (replacing the old one).
@@ -68,7 +75,7 @@ The QA agent found these annotation issues. Fix them in annotations.json and re-
6875
{paste annotation_issues JSON here}
6976
```
7077

71-
After the reviewer finishes, re-run `spec-qa` to verify. Allow up to 2 total QA rounds — if still failing after 2 fix attempts, report remaining issues to the user rather than looping further.
78+
After the reviewer finishes, re-run `spec-qa` to verify. **Convergence rule:** Track the QA score across attempts. If the score does not improve after one fix round, stop the QA loop and proceed — do not retry the same fixes. Maximum 2 fix rounds total. Report remaining warnings to the user but do not block on them.
7279

7380
### Step 5: Report
7481

plugins/mcp-spec/skills/spec-annotation-workflow/SKILL.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,22 @@ Since the script runs instantly, extraction can begin immediately while the diff
6969

7070
### Step 4: Annotate the diff (requires steps 2 & 3 complete)
7171

72-
If you saved the diff to a file, parse it with the script first:
72+
If you saved the diff to a file, parse and scaffold it with the scripts:
7373

7474
```bash
75+
# Parse and split hunks
7576
python3 plugins/mcp-spec/skills/spec-diff/scripts/parse_diff.py \
7677
.reviews/SEP-{sep_number}/pr-diff.txt \
7778
.reviews/SEP-{sep_number}/parsed-diff.json
79+
80+
# Build annotation skeleton (all requirements as not_addressed, patch_text included, generated files excluded)
81+
python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py \
82+
.reviews/SEP-{sep_number}/meta-spec.json \
83+
.reviews/SEP-{sep_number}/parsed-diff.json \
84+
.reviews/SEP-{sep_number}/annotations.json
7885
```
7986

80-
Then follow the `spec-diff` skill instructions to annotate each hunk against the requirements. Write `annotations.json` to `.reviews/SEP-{sep_number}/annotations.json`.
87+
Then read the skeleton `annotations.json` and fill in each requirement's `status`, `summary`, `explanation`, and `hunks` references. Follow the `spec-diff` skill instructions for matching rules and explanation quality. You can either edit annotations.json directly, or write a `matches.json` and re-run the annotate script with `--matches` to have it handle bidirectional linking automatically.
8188

8289
### Step 5: Render HTML
8390

plugins/mcp-spec/skills/spec-diff/SKILL.md

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -155,15 +155,39 @@ python3 plugins/mcp-spec/skills/spec-diff/scripts/parse_diff.py <diff_file> <par
155155

156156
This produces a JSON file with files split into logical hunks (MDX files split on `##` headings, TS files split on declarations). If you received per-file patches from the GitHub API instead of a raw diff file, you can skip this script and split hunks manually following the rules in "Splitting Large Hunks" above.
157157

158-
### Phase 2: Annotate (agent)
159-
160-
1. Read `meta-spec.json` to load all requirements
161-
2. Read the parsed diff (from the script or from API data)
162-
3. For each hunk, check relevant requirements and create annotations with full explanations
163-
4. **Copy the `patch_text` from the parsed diff into each hunk in annotations.json.** The render script needs the patch text to display the diff. If `patch_text` is empty, the HTML will show empty hunks.
164-
5. Build the requirement coverage summary
165-
6. Compute summary counts
166-
7. Write `annotations.json` to the output path
158+
### Phase 2: Build annotation skeleton (script)
159+
160+
Generate a skeleton annotations.json with all structure pre-populated:
161+
162+
```bash
163+
python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py \
164+
<meta_spec.json> <parsed_diff.json> <output_annotations.json>
165+
```
166+
167+
This produces a valid annotations.json with:
168+
169+
- All requirements pre-populated as `not_addressed` with empty summary/explanation
170+
- Files array with `patch_text` copied from the parsed diff
171+
- Generated files excluded
172+
- Bidirectional link infrastructure ready
173+
174+
### Phase 3: Fill annotations (agent)
175+
176+
Read the skeleton annotations.json and for each requirement:
177+
178+
1. Read the hunks in the `files` array to find which ones address this requirement
179+
2. Update the annotation's `status`, `summary`, and `explanation`
180+
3. Add hunk references to the annotation's `hunks` array
181+
4. Add the requirement ID to each referenced hunk's `annotations` list in the `files` array
182+
183+
Alternatively, write a `matches.json` file and re-run the script to apply it:
184+
185+
```bash
186+
python3 plugins/mcp-spec/skills/spec-diff/scripts/annotate.py \
187+
<meta_spec.json> <parsed_diff.json> <output.json> --matches <matches.json>
188+
```
189+
190+
The matches file maps requirement IDs to their status, summary, explanation, and hunk references. The script handles all bidirectional linking and summary counting automatically.
167191

168192
### Deduplication
169193

0 commit comments

Comments
 (0)