You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore(dev): add QA hardening, annotation scaffolding, and orchestrator improvements
- Add execution constraints to QA agent (10-15 tool calls, no jq/python)
- Expand QA checklist to 21 checks: implementation substance (14-15),
blast radius (16-17), completeness (18-21)
- Add EARS actor exceptions for must-document/must-not-change requirements
- Create annotate.py script for deterministic annotation scaffolding
with bidirectional link building and optional match application
- Add convergence detection to orchestrator QA loop
- Add initial setup phase to orchestrator (extract + implement before loop)
- Route QA implementation_issues back to implementer in orchestrator
- Add separate summary/explanation fields to annotation cards
- Add mobile-responsive layout to HTML template
Copy file name to clipboardExpand all lines: plugins/mcp-spec/agents/spec-qa.md
+36-11Lines changed: 36 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,22 +6,26 @@ description: Use this agent as a quality gate on annotation artifacts. It valida
6
6
7
7
You are a QA Agent for SEP annotation artifacts. Your job is to audit the quality of `meta-spec.json` and `annotations.json` and return a structured verdict.
8
8
9
+
## Execution Constraints
10
+
11
+
This is a quick checklist audit, not a deep investigation. Read the two JSON files and the SEP, run through the checks, and return the verdict. Use the Read tool to load the files — do not shell out to jq, python, or other tools to query the JSON. Parse and evaluate the data from what you read directly. Aim for 10-15 tool calls total.
12
+
9
13
## Input
10
14
11
15
You will receive a SEP number. Read these files from `.reviews/SEP-{n}/`:
12
16
13
17
-`meta-spec.json` — extracted requirements
14
18
-`annotations.json` — annotation data
15
-
- The original SEP from `seps/{n}-*.md`
19
+
- The original SEP from `seps/{n}-*.md` (or `.reviews/SEP-{n}/sep-source.md` if it hasn't been merged)
16
20
17
21
## Checklist
18
22
19
23
Run through every check below. For each failure, record the requirement ID and a specific description of the problem.
20
24
21
25
### Requirements Quality (meta-spec.json)
22
26
23
-
1.**EARS format**: Every requirement's `summary` follows an EARS pattern (When/While/If/Where/The [actor] shall [action]). Flag summaries that are vague noun phrases ("Task ID handling") or missing an actor.
24
-
2.**Specific actors**: The actor in each summary is a concrete party (receiver, requestor, server, client) — not "the system," "implementations," or passive voice.
27
+
1.**EARS format**: Every requirement's `summary` follows an EARS pattern (When/While/If/Where/The [actor] shall [action]). Flag summaries that are vague noun phrases ("Task ID handling") or missing an actor. Exception: `must-document` and `must-not-change` requirements may use "The specification shall..." or "The protocol shall..." as their actor — these describe spec edits, not runtime behavior.
28
+
2.**Specific actors**: The actor in each summary is a concrete party (receiver, requestor, server, client, specification, protocol) — not "the system," "implementations," or passive voice.
25
29
3.**Affected paths present**: Every requirement has at least one entry in `affected_paths`. Empty arrays are failures.
26
30
4.**Source quotes present**: Every requirement has a non-empty `source.quote`. The quote should be verbatim from the SEP (spot-check a few against the actual SEP text).
27
31
5.**Group coherence**: Requirements within the same `group` are genuinely related. Flag requirements that seem miscategorized.
@@ -31,27 +35,38 @@ Run through every check below. For each failure, record the requirement ID and a
31
35
32
36
7.**No empty explanations**: Every annotation (including `not_addressed`) has a non-empty `explanation` field.
33
37
8.**Explanation specificity**: Spot-check at least 5 satisfied annotations — each explanation should name specific code/text from the hunks it references. Flag generic explanations like "Documentation discusses X" or "Adds support for Y."
38
+
8b. **Current-version language**: Explanations and summaries should describe spec behavior in terms of the current version only. Flag language that references old specification versions, describes migration paths, or explains backward-compatibility logic — unless a specific requirement explicitly asks for backward-compatibility documentation.
34
39
9.**Multi-hunk synthesis**: For annotations with 3+ hunks, the explanation should reference what each hunk contributes. Flag annotations where the explanation doesn't mention their multiple locations.
35
40
10.**No cross-product noise**: No requirement should be annotated on more than 8 hunks. Flag any that exceed this — it likely means the agent matched too broadly.
36
41
11.**Reasonable annotation density**: Total annotations across all hunks should be roughly 1-3x the requirement count. If total annotations exceed 5x requirements, the matching was too aggressive.
37
42
12.**Not-addressed explanations**: Every `not_addressed` annotation explains _why_ — was the feature removed? Is it a behavioral guideline? Deferred? Flag empty or unexplained not-addressed items.
38
43
13.**Patch text present**: Spot-check that hunks in the top-level `files` array have non-empty `patch_text` fields. Note: the `hunks` arrays inside individual annotations in the `annotations` dict intentionally only contain `file` and `hunk_header` (they are references, not full data). Only check the `files` array for `patch_text`.
39
44
45
+
### Implementation Substance
46
+
47
+
14.**Diff contains real spec changes**: The annotated diff should contain actual specification implementation — edits to `schema/draft/schema.ts`, `docs/specification/draft/**/*.mdx`, or similar source-of-truth files. The SEP markdown file itself (`seps/*.md`) is NOT the implementation; it is the proposal document. If the only changed file is the SEP itself, this is an error — the implementer has not yet produced spec changes.
48
+
15.**Satisfied annotations reference implementation, not the SEP**: Spot-check satisfied annotations. Their hunk references should point to spec/schema files, not to the SEP file. A requirement cannot be "satisfied" by the proposal describing what should happen — it is satisfied by the implementation that makes it happen. Flag any satisfied annotation whose only hunks are in `seps/*.md`.
49
+
50
+
### Blast Radius
51
+
52
+
16.**No unaccounted spec changes**: Read the `files` array and identify any hunks that are NOT referenced by any annotation. These are spec changes that don't map to any requirement — they may be correct supporting changes, or they may represent undocumented scope creep. Flag files/hunks with zero annotation references so a reviewer can verify they're intentional.
53
+
17.**Missing requirements**: Scan the SEP for concepts, methods, types, or behaviors that appear in the specification sections but have no corresponding requirement in the meta-spec. Compare the SEP's section headings and key terms against the requirement groups. Flag gaps where a SEP section has no requirements extracted from it.
54
+
40
55
### Completeness
41
56
42
-
14.**Bidirectional hunk links**: Every annotation with status `satisfied`, `violated`, or `unclear` must have a non-empty `hunks` array in the `annotations` dict. Cross-check: for each annotation ID referenced in the `files` array's hunk `annotations` lists, verify the same hunk appears in the annotation's `hunks` array. Flag missing reverse links.
43
-
15.**All requirements covered**: Every requirement ID from meta-spec.json appears as a key in `annotations`. Flag missing IDs.
44
-
16.**Summary counts match**: The `summary` counts (satisfied + violated + unclear + not_addressed) equal the total number of annotations.
45
-
17.**Generated files skipped**: `schema/draft/schema.json` and generated `schema.mdx` should not be major annotation sources — most annotations should reference `.ts` and `.mdx` source files.
57
+
18.**Bidirectional hunk links**: Every annotation with status `satisfied`, `violated`, or `unclear` must have a non-empty `hunks` array in the `annotations` dict. Cross-check: for each annotation ID referenced in the `files` array's hunk `annotations` lists, verify the same hunk appears in the annotation's `hunks` array. Flag missing reverse links.
58
+
19.**All requirements covered**: Every requirement ID from meta-spec.json appears as a key in `annotations`. Flag missing IDs.
59
+
20.**Summary counts match**: The `summary` counts (satisfied + violated + unclear + not_addressed) equal the total number of annotations.
60
+
21.**Generated files skipped**: `schema/draft/schema.json` and generated `schema.mdx` should not be major annotation sources — most annotations should reference `.ts` and `.mdx` source files.
46
61
47
62
## Output
48
63
49
-
Return a JSON object in your response. Issues are split into two categories so the caller knows which agent to dispatch for fixes:
64
+
Return a JSON object in your response. Issues are split into three categories so the caller knows which agent to dispatch for fixes:
50
65
51
66
```json
52
67
{
53
68
"verdict": "pass"| "fail",
54
-
"score": "14/16",
69
+
"score": "19/21",
55
70
"meta_spec_issues": [
56
71
{
57
72
"check": 1,
@@ -69,13 +84,23 @@ Return a JSON object in your response. Issues are split into two categories so t
Write ONLY meta-spec.json, annotations.json, and annotated-diff.html. No summary.md, README, or other files.
55
62
```
56
63
57
64
Save this new reviewer's agent ID (replacing the old one).
@@ -68,7 +75,7 @@ The QA agent found these annotation issues. Fix them in annotations.json and re-
68
75
{paste annotation_issues JSON here}
69
76
```
70
77
71
-
After the reviewer finishes, re-run `spec-qa` to verify. Allow up to 2 total QA rounds — if still failing after 2 fix attempts, report remaining issues to the user rather than looping further.
78
+
After the reviewer finishes, re-run `spec-qa` to verify. **Convergence rule:** Track the QA score across attempts. If the score does not improve after one fix round, stop the QA loop and proceed — do not retry the same fixes. Maximum 2 fix rounds total. Report remaining warnings to the user but do not block on them.
Then follow the `spec-diff` skill instructions to annotate each hunk against the requirements. Write `annotations.json`to `.reviews/SEP-{sep_number}/annotations.json`.
87
+
Then read the skeleton `annotations.json` and fill in each requirement's `status`, `summary`, `explanation`, and `hunks` references. Follow the `spec-diff` skill instructions for matching rules and explanation quality. You can either edit annotations.json directly, or write a `matches.json`and re-run the annotate script with `--matches` to have it handle bidirectional linking automatically.
This produces a JSON file with files split into logical hunks (MDX files split on `##` headings, TS files split on declarations). If you received per-file patches from the GitHub API instead of a raw diff file, you can skip this script and split hunks manually following the rules in "Splitting Large Hunks" above.
157
157
158
-
### Phase 2: Annotate (agent)
159
-
160
-
1. Read `meta-spec.json` to load all requirements
161
-
2. Read the parsed diff (from the script or from API data)
162
-
3. For each hunk, check relevant requirements and create annotations with full explanations
163
-
4.**Copy the `patch_text` from the parsed diff into each hunk in annotations.json.** The render script needs the patch text to display the diff. If `patch_text` is empty, the HTML will show empty hunks.
164
-
5. Build the requirement coverage summary
165
-
6. Compute summary counts
166
-
7. Write `annotations.json` to the output path
158
+
### Phase 2: Build annotation skeleton (script)
159
+
160
+
Generate a skeleton annotations.json with all structure pre-populated:
The matches file maps requirement IDs to their status, summary, explanation, and hunk references. The script handles all bidirectional linking and summary counting automatically.
0 commit comments