From 324aea949516883c2c9b942260e575f56da2afb4 Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Wed, 13 May 2026 11:21:48 -0500 Subject: [PATCH 1/3] feat(semantic-scan): add /semantic-scan skill and command for detecting logical duplication MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a new skill that detects business logic reimplemented multiple times across architectural layers — a gap that linters (syntactic) and domain-review (single instance) cannot catch. Builds a persistent computation-register.json, updated incrementally via git diff, and clusters entries semantically to surface duplicate domain concepts with file:line references and canonical location suggestions. Co-Authored-By: Claude Sonnet 4.6 (1M context) --- docs/specs/semantic-duplication-scan.md | 236 +++++++++++ .../non-trivial-function.ts | 10 + .../domain-layer.ts | 10 + .../presentation-layer.tsx | 19 + evals/fixtures/sds-empty-scope/README.md | 15 + .../fixtures/sds-fileline-accuracy/README.md | 14 + .../fixtures/sds-fileline-accuracy/source.ts | 8 + .../fixtures/sds-full-flag-override/README.md | 16 + evals/fixtures/sds-idempotency/README.md | 16 + .../sds-incremental-0-changed/README.md | 16 + .../sds-incremental-5-of-100/README.md | 18 + .../sds-incremental-deleted-file/README.md | 16 + .../sds-incremental-trivial-changed/README.md | 17 + .../sds-layer-domain/domain-function.ts | 7 + .../infra-function.ts | 12 + .../presentation-function.tsx | 13 + .../sds-layer-unknown/ambiguous-function.ts | 10 + evals/fixtures/sds-missing-commit/README.md | 15 + .../sds-mode-full-no-register/README.md | 10 + evals/fixtures/sds-mode-incremental/README.md | 10 + evals/fixtures/sds-no-canonical/README.md | 20 + evals/fixtures/sds-no-duplicates/README.md | 21 + .../sds-permissions-failure/README.md | 15 + .../fixtures/sds-prefilter-trivial/README.md | 26 ++ .../trivial-functions.ts | 54 +++ .../sds-scoped-cross-scope-notice/README.md | 17 + .../README.md | 22 + evals/fixtures/sds-shallow-clone/README.md | 19 + .../fixtures/sds-subdirectory-scope/README.md | 19 + plans/semantic-duplication-scan.md | 400 ++++++++++++++++++ plugins/agentic-dev-team/CLAUDE.md | 1 + .../commands/semantic-scan.md | 22 + .../knowledge/agent-registry.md | 1 + .../skills/semantic-duplication-scan/SKILL.md | 329 ++++++++++++++ 34 files changed, 1454 insertions(+) create mode 100644 docs/specs/semantic-duplication-scan.md create mode 100644 evals/fixtures/sds-annotation-schema/non-trivial-function.ts create mode 100644 evals/fixtures/sds-duplicate-with-canonical/domain-layer.ts create mode 100644 evals/fixtures/sds-duplicate-with-canonical/presentation-layer.tsx create mode 100644 evals/fixtures/sds-empty-scope/README.md create mode 100644 evals/fixtures/sds-fileline-accuracy/README.md create mode 100644 evals/fixtures/sds-fileline-accuracy/source.ts create mode 100644 evals/fixtures/sds-full-flag-override/README.md create mode 100644 evals/fixtures/sds-idempotency/README.md create mode 100644 evals/fixtures/sds-incremental-0-changed/README.md create mode 100644 evals/fixtures/sds-incremental-5-of-100/README.md create mode 100644 evals/fixtures/sds-incremental-deleted-file/README.md create mode 100644 evals/fixtures/sds-incremental-trivial-changed/README.md create mode 100644 evals/fixtures/sds-layer-domain/domain-function.ts create mode 100644 evals/fixtures/sds-layer-infrastructure/infra-function.ts create mode 100644 evals/fixtures/sds-layer-presentation/presentation-function.tsx create mode 100644 evals/fixtures/sds-layer-unknown/ambiguous-function.ts create mode 100644 evals/fixtures/sds-missing-commit/README.md create mode 100644 evals/fixtures/sds-mode-full-no-register/README.md create mode 100644 evals/fixtures/sds-mode-incremental/README.md create mode 100644 evals/fixtures/sds-no-canonical/README.md create mode 100644 evals/fixtures/sds-no-duplicates/README.md create mode 100644 evals/fixtures/sds-permissions-failure/README.md create mode 100644 evals/fixtures/sds-prefilter-trivial/README.md create mode 100644 evals/fixtures/sds-prefilter-trivial/trivial-functions.ts create mode 100644 evals/fixtures/sds-scoped-cross-scope-notice/README.md create mode 100644 evals/fixtures/sds-semanticscanignore-removes-entries/README.md create mode 100644 evals/fixtures/sds-shallow-clone/README.md create mode 100644 evals/fixtures/sds-subdirectory-scope/README.md create mode 100644 plans/semantic-duplication-scan.md create mode 100644 plugins/agentic-dev-team/commands/semantic-scan.md create mode 100644 plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md diff --git a/docs/specs/semantic-duplication-scan.md b/docs/specs/semantic-duplication-scan.md new file mode 100644 index 0000000..d3272bb --- /dev/null +++ b/docs/specs/semantic-duplication-scan.md @@ -0,0 +1,236 @@ +# Spec: Semantic Duplication Scan + +## Intent Description + +This feature adds a `/semantic-scan` skill and command that detects business logic reimplemented multiple times across different architectural layers. Traditional linters and existing review agents catch syntactic similarity or single-instance layer violations — they cannot detect when the same domain calculation independently appears in multiple places with different variable names and structure. + +The skill runs in two modes. On first run (**full scan**), it scans the codebase, annotates each non-trivial computation function with a structured semantic description, and writes a persistent `computation-register.json` to the project root. On subsequent runs (**incremental scan**), it uses `git diff` to identify only changed files, re-annotates those files, and updates the register. + +A **trivial function** — one that performs no calculations and does not modify state — is excluded from the register regardless of language. Concretely: a function is trivial if it contains no arithmetic operators, no boolean logic, no branching constructs, no assignments to variables outside its own scope, and no calls to higher-order collection operations (map, filter, reduce). + +Layer identification is inferred from each function's **coupling profile** (what it imports, what concerns it touches, what it depends on) rather than directory conventions. Annotation is language-agnostic. + +After building or updating the register, the skill runs a clustering pass to find semantic duplicates across layers. For each cluster, it suggests the most likely canonical location but marks all canonical designations as requiring human confirmation — because the canonical may not yet exist. + +Integration with `/code-review` is out of scope for this slice. + +## User-Facing Behavior + +```gherkin +Feature: Semantic Duplication Scan + + Background: + Given a project with source files in any language + + Scenario: First-time full scan creates a computation register + Given no computation-register.json exists in the project root + When the developer runs /semantic-scan + Then computation-register.json is created at the project root + And it contains one entry per non-trivial computation identified + And each entry contains: file path, function name, inferred layer, structured semantic description, prompt version, and HEAD commit hash + And functions that perform no calculations and do not modify state are excluded + And the developer sees progress output as each file is annotated: "Annotating [N/total] " + + Scenario: Annotation failure is reported, not silenced + Given a full scan is running + And annotation fails for one file due to a transient LLM error + When the scan completes + Then the register is written with all successfully annotated files + And the register includes a scanErrors array identifying the failed file and error reason + And the developer sees a warning: "Warning: 1 file could not be annotated. Re-run /semantic-scan to retry." + And the scan exits with code 0 + + Scenario: Incremental scan re-annotates only changed files + Given computation-register.json exists with a lastScanCommit recorded + And that commit exists in the full git history + And 5 source files have been modified or added since that commit + And 95 source files have not changed + When the developer runs /semantic-scan + Then only the 5 changed files are re-annotated + And the 95 unchanged entries are preserved exactly + And lastScanCommit is updated to HEAD + + Scenario: Incremental scan with no changed files updates only lastScanCommit + Given computation-register.json exists with a lastScanCommit recorded + And no source files have changed since that commit + When the developer runs /semantic-scan + Then no entries in the register are modified + And lastScanCommit is updated to HEAD + And the output reports "No changes since last scan — register up to date" + + Scenario: Deleted file is removed from the register + Given computation-register.json exists with an entry for src/pricing/calculator.js + And src/pricing/calculator.js has been deleted since lastScanCommit + When the developer runs /semantic-scan + Then the register entry for src/pricing/calculator.js is removed + And lastScanCommit is updated to HEAD + + Scenario: --full flag forces full re-scan regardless of lastScanCommit + Given computation-register.json exists with a valid lastScanCommit + And no source files have changed since that commit + When the developer runs /semantic-scan --full + Then all files in scope are re-annotated + And lastScanCommit is updated to HEAD + + Scenario: Shallow clone detected in incremental mode + Given computation-register.json exists with a lastScanCommit recorded + And the repository is a shallow clone + When the developer runs /semantic-scan + Then the scan exits with a non-zero code + And the output reports the exact string: "Shallow clone detected — semantic-scan requires full history for incremental mode. Run with --full to override." + + Scenario: Register cannot be written due to file system permissions + Given the project root directory is not writable by the current user + When the developer runs /semantic-scan + Then the scan exits with a non-zero code + And the output reports the exact path that could not be written and the OS-level error + + Scenario: Semantic duplicate detected across inferred layers + Given a source file in the domain layer containing a function that applies a percentage discount to a base price + And a source file in the presentation layer containing a function that independently computes a discounted total using the same inputs + When the developer runs /semantic-scan + Then a duplicate cluster is reported containing both functions + And the domain-layer function is identified as the suggested canonical in the format "canonical: suggested — requires human confirmation" + And the presentation-layer function is listed with its file:line reference + + Scenario: Canonical does not exist in any registered copy + Given source files in three different layers each containing a function computing the same domain concept + And all three functions import infrastructure-specific dependencies + When the developer runs /semantic-scan + Then a duplicate cluster is reported containing all three functions with their file:line references + And the output includes "canonical: none — a new domain-layer implementation may be required" + + Scenario: No semantic duplicates detected + Given all source files contain computations that express distinct domain concepts + When the developer runs /semantic-scan + Then no duplicate clusters are reported + And the output confirms "No semantic duplication detected" + + Scenario: Scan scoped to a subdirectory — existing out-of-scope entries preserved + Given computation-register.json exists with entries for files both inside and outside src/pricing + When the developer runs /semantic-scan src/pricing + Then only files under src/pricing are re-annotated in this pass + And entries for files outside src/pricing are unchanged in the register + And lastScanCommit is updated to HEAD + + Scenario: Scoped scan cluster includes out-of-scope entries — user is notified + Given computation-register.json exists with entries for files inside and outside src/pricing + And a duplicate cluster spans one entry in src/pricing and one entry outside src/pricing + When the developer runs /semantic-scan src/pricing + Then the duplicate cluster is reported + And the output includes "Note: this cluster includes 1 entry outside the scoped path — run without scope argument to see full context" + + Scenario: Ignore configuration removes previously-registered entries + Given computation-register.json exists with entries for files under src/legacy/ + And a .semanticscanignore file lists src/legacy/ + When the developer runs /semantic-scan + Then files under src/legacy/ are not annotated + And existing register entries for files under src/legacy/ are removed from the register + + Scenario: No computation units found in scope on first run + Given no computation-register.json exists + And no source files in scope contain non-trivial computations + When the developer runs /semantic-scan + Then no register is created + And the output reports "No computation units found to analyze" + + Scenario: Incremental scan with changed files all trivial after pre-filter + Given computation-register.json exists with entries from a prior scan + And 3 source files have changed since lastScanCommit + And all 3 changed files contain only trivial functions after pre-filter + When the developer runs /semantic-scan + Then no entries are added or modified in the register + And the output reports "No new computation units found in changed files — register unchanged" + + Scenario: lastScanCommit not in git history + Given computation-register.json records a lastScanCommit that no longer exists in history + When the developer runs /semantic-scan + Then the skill falls back to full-scan mode + And the output warns "lastScanCommit not found in history — running full scan" +``` + +## Architecture Specification + +### New Components + +| Component | Location | Notes | +|-----------|----------|-------| +| Skill | `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md` | Defines scan procedure | +| Command | `plugins/agentic-dev-team/commands/semantic-scan.md` | User-invocable entry point | +| Register | `computation-register.json` in user's project root | Created per-project, not shipped with plugin | + +### Register Entry Schema + +```json +{ + "file": "src/checkout/order-service.ts", + "function": "applyDiscount", + "layer": "domain", + "semanticDescription": { + "verb": "calculates", + "domainConcept": "discounted price", + "inputs": ["base price", "discount rate"], + "outputConcept": "price after discount applied" + }, + "promptVersion": "1.0", + "commitHash": "abc123" +} +``` + +### Layer Inference Rules + +| Coupling profile | Inferred layer | +|-----------------|---------------| +| Imports DB clients, ORMs, HTTP clients, message brokers | `infrastructure` | +| Imports rendering primitives, formats for display, accesses DOM/templates | `presentation` | +| Depends only on domain types and pure functions | `domain` | +| Orchestrates domain + infrastructure without owning rules | `application` | +| Cannot be determined from coupling profile | `unknown` | + +### Process Flow + +1. Mode detection — check for register; full if absent, incremental if present +2. Pre-flight — in incremental mode, check `git rev-parse --is-shallow-repository`; exit if shallow +3. Scope resolution — apply path argument, then `.semanticscanignore` +4. File selection — incremental: `git diff HEAD --name-only`; full: glob all source files +5. Pre-filter — exclude test files, config, generated code, and trivial functions (no LLM call) +6. Annotation — batch files to LLM (Haiku, file-level): extract non-trivial computations with structured semantic descriptions and inferred layer; emit progress per file; collect failures in `scanErrors` +7. Register update — merge new entries, remove stale/ignored entries, sort for idempotency, update `lastScanCommit` +8. Clustering — shard by layer pair, Sonnet per shard; shard further by first domainConcept token if shard exceeds 50k tokens +9. Canonical suggestion — score by infrastructure coupling; escalate ambiguous clusters to Opus; output `canonical:` prefixed verdicts +10. Report — structured output per cluster with `file:line` references and cross-scope notices + +### Model Routing + +| Step | Model | Reason | +|------|-------|--------| +| Annotation + layer inference | Haiku | High volume, structured schema | +| Clustering | Sonnet | Cross-entry semantic grouping | +| Canonical scoring (ambiguous) | Opus | Judgment under uncertainty | + +### Constraints + +- Diagnostic only — no code changes suggested +- Canonical designations are always suggestions; human confirmation required +- Register lives in the user's project, not the plugin directory +- Shallow clone blocks incremental mode; `--full` overrides + +## Acceptance Criteria + +1. `computation-register.json` is valid JSON and human-readable +2. Idempotency: two runs with no code changes produce structurally identical register output (same entries, same semantic descriptions, same file:line references; `lastScanCommit` excluded from comparison) +3. Incremental mode never re-annotates files whose paths are not in `git diff HEAD --name-only` +4. Functions containing no arithmetic operators, boolean logic, branching constructs, assignments to variables outside their own scope, or higher-order collection operations are absent from the register +5. Shallow clone in incremental mode exits non-zero and outputs the exact string: "Shallow clone detected — semantic-scan requires full history for incremental mode. Run with --full to override." +6. Exit code 0 on scan success regardless of whether duplicates were found; non-zero only on scan failure or pre-flight error +7. All report findings include `file:line` references pointing to the first line of the identified function +8. Every duplicate cluster report uses a consistent `canonical:` prefix: "canonical: suggested — requires human confirmation" or "canonical: none — a new domain-layer implementation may be required" +9. Annotation failures are never silent: any file that could not be annotated is reported in a scan summary with the error reason, and the register includes a `scanErrors` array for that run + +## Consistency Gate + +- [x] Intent is unambiguous — trivial function, semantic description format, layer inference, and canonical-as-decision are all defined +- [x] Every behavior in the intent has at least one BDD scenario +- [x] Architecture constrains without over-engineering — no speculative features +- [x] Terminology is consistent across all four artifacts +- [x] No contradictions between artifacts diff --git a/evals/fixtures/sds-annotation-schema/non-trivial-function.ts b/evals/fixtures/sds-annotation-schema/non-trivial-function.ts new file mode 100644 index 0000000..aa262d4 --- /dev/null +++ b/evals/fixtures/sds-annotation-schema/non-trivial-function.ts @@ -0,0 +1,10 @@ +// SKILL: semantic-duplication-scan +// EXPECTED: register entry with all required fields including promptVersion +// EXPECTED FIELDS: file, function, layer, semanticDescription.verb, +// semanticDescription.domainConcept, semanticDescription.inputs, +// semanticDescription.outputConcept, promptVersion, commitHash, line + +// Non-trivial: arithmetic computation, no external imports → layer: domain +function applyDiscount(basePrice: number, discountRate: number): number { + return basePrice * (1 - discountRate); +} diff --git a/evals/fixtures/sds-duplicate-with-canonical/domain-layer.ts b/evals/fixtures/sds-duplicate-with-canonical/domain-layer.ts new file mode 100644 index 0000000..4fe62f2 --- /dev/null +++ b/evals/fixtures/sds-duplicate-with-canonical/domain-layer.ts @@ -0,0 +1,10 @@ +// SKILL: semantic-duplication-scan +// LAYER: domain (expected) +// ROLE: canonical copy — no infrastructure imports + +function applyDiscount(basePrice: number, discountRate: number): number { + if (discountRate < 0 || discountRate > 1) { + throw new Error("discountRate must be between 0 and 1"); + } + return basePrice * (1 - discountRate); +} diff --git a/evals/fixtures/sds-duplicate-with-canonical/presentation-layer.tsx b/evals/fixtures/sds-duplicate-with-canonical/presentation-layer.tsx new file mode 100644 index 0000000..398a28a --- /dev/null +++ b/evals/fixtures/sds-duplicate-with-canonical/presentation-layer.tsx @@ -0,0 +1,19 @@ +// SKILL: semantic-duplication-scan +// LAYER: presentation (expected) +// ROLE: leaked duplicate — imports React (presentation coupling) +// EXPECTED: flagged as duplicate of domain-layer.ts::applyDiscount +// EXPECTED canonical: domain-layer.ts::applyDiscount +// EXPECTED output: "canonical: suggested domain-layer.ts:5 — requires human confirmation" + +import React from "react"; + +interface PriceDisplayProps { + basePrice: number; + salePercent: number; +} + +function PriceDisplay({ basePrice, salePercent }: PriceDisplayProps): React.ReactNode { + // Same business rule as applyDiscount — reimplemented in presentation layer + const discountedPrice = basePrice * (1 - salePercent / 100); + return ${discountedPrice.toFixed(2)}; +} diff --git a/evals/fixtures/sds-empty-scope/README.md b/evals/fixtures/sds-empty-scope/README.md new file mode 100644 index 0000000..fa2c776 --- /dev/null +++ b/evals/fixtures/sds-empty-scope/README.md @@ -0,0 +1,15 @@ +# Fixture: sds-empty-scope + +**Skill**: semantic-duplication-scan +**Scenario**: First-time scan finds only trivial functions — no register created + +## Setup + +- No `computation-register.json` exists +- All source files in scope contain only trivial functions (getters, pass-throughs) + +## Expected Behavior + +- No `computation-register.json` is created +- Output: `No computation units found to analyze` +- Exit code: 0 diff --git a/evals/fixtures/sds-fileline-accuracy/README.md b/evals/fixtures/sds-fileline-accuracy/README.md new file mode 100644 index 0000000..772f424 --- /dev/null +++ b/evals/fixtures/sds-fileline-accuracy/README.md @@ -0,0 +1,14 @@ +# Fixture: sds-fileline-accuracy + +**Skill**: semantic-duplication-scan +**Scenario**: file:line references point to the first line of each function definition + +## Source File + +`source.ts` — function `calculateTax` begins at line 7 (after 6 lines of comments/imports). + +## Expected Behavior + +- Register entry for `calculateTax` has `line: 7` +- Report output references `source.ts:7` +- If the file has not changed since annotation, no staleness note is appended diff --git a/evals/fixtures/sds-fileline-accuracy/source.ts b/evals/fixtures/sds-fileline-accuracy/source.ts new file mode 100644 index 0000000..e15d794 --- /dev/null +++ b/evals/fixtures/sds-fileline-accuracy/source.ts @@ -0,0 +1,8 @@ +// SKILL: semantic-duplication-scan +// EXPECTED: register entry for calculateTax has line: 7 +// NOTE: function definition begins at line 7 (this comment block is lines 1-5, blank line is 6) + +// Tax calculation for US domestic orders +function calculateTax(subtotal: number, taxRate: number): number { + return subtotal * taxRate; +} diff --git a/evals/fixtures/sds-full-flag-override/README.md b/evals/fixtures/sds-full-flag-override/README.md new file mode 100644 index 0000000..0b08307 --- /dev/null +++ b/evals/fixtures/sds-full-flag-override/README.md @@ -0,0 +1,16 @@ +# Fixture: sds-full-flag-override + +**Skill**: semantic-duplication-scan +**Scenario**: --full flag forces full re-scan even when register and history are intact + +## Setup + +- Register exists with valid `lastScanCommit` +- No source files have changed since that commit +- Command: `/semantic-scan --full` + +## Expected Behavior + +- Shallow-clone check is skipped +- All files in scope are re-annotated (not just diff'd files) +- `lastScanCommit` updated to HEAD diff --git a/evals/fixtures/sds-idempotency/README.md b/evals/fixtures/sds-idempotency/README.md new file mode 100644 index 0000000..0b373b7 --- /dev/null +++ b/evals/fixtures/sds-idempotency/README.md @@ -0,0 +1,16 @@ +# Fixture: sds-idempotency + +**Skill**: semantic-duplication-scan +**Scenario**: Running the scan twice with no code changes produces structurally identical output + +## Setup + +Any codebase with at least 3 non-trivial functions. + +## Expected Behavior + +- Run 1 produces `computation-register.json` with entries sorted by `file` then `function` +- No code changes made +- Run 2 produces a register where all entries have identical `file`, `function`, `layer`, `semanticDescription`, `promptVersion` fields +- `lastScanCommit` is excluded from the idempotency comparison (it changes between runs) +- Exit code: 0 both runs diff --git a/evals/fixtures/sds-incremental-0-changed/README.md b/evals/fixtures/sds-incremental-0-changed/README.md new file mode 100644 index 0000000..de69b98 --- /dev/null +++ b/evals/fixtures/sds-incremental-0-changed/README.md @@ -0,0 +1,16 @@ +# Fixture: sds-incremental-0-changed + +**Skill**: semantic-duplication-scan +**Scenario**: Incremental scan with no changed files since lastScanCommit + +## Setup + +- Register exists with valid entries +- `git diff HEAD --name-only` returns empty + +## Expected Behavior + +- No entries modified or re-annotated +- `lastScanCommit` updated to HEAD +- Output: `No changes since last scan — register up to date` +- Exit code: 0 diff --git a/evals/fixtures/sds-incremental-5-of-100/README.md b/evals/fixtures/sds-incremental-5-of-100/README.md new file mode 100644 index 0000000..13c6e5e --- /dev/null +++ b/evals/fixtures/sds-incremental-5-of-100/README.md @@ -0,0 +1,18 @@ +# Fixture: sds-incremental-5-of-100 + +**Skill**: semantic-duplication-scan +**Scenario**: Incremental scan re-annotates only the files changed since lastScanCommit + +## Setup + +- Register has 100 entries across 100 source files +- 5 files have been modified since `lastScanCommit` +- 95 files are unchanged + +## Expected Behavior + +- Only 5 files are passed to the annotation LLM +- 95 entries in the register are preserved exactly (no re-annotation) +- `lastScanCommit` updated to HEAD +- Exit code: 0 +- Progress output: `Annotating [1/5] ` ... `Annotating [5/5] ` diff --git a/evals/fixtures/sds-incremental-deleted-file/README.md b/evals/fixtures/sds-incremental-deleted-file/README.md new file mode 100644 index 0000000..25fde82 --- /dev/null +++ b/evals/fixtures/sds-incremental-deleted-file/README.md @@ -0,0 +1,16 @@ +# Fixture: sds-incremental-deleted-file + +**Skill**: semantic-duplication-scan +**Scenario**: Incremental scan removes entries for deleted files + +## Setup + +- Register has an entry for `src/pricing/calculator.js` +- `src/pricing/calculator.js` was deleted since `lastScanCommit` +- `git diff` returns `src/pricing/calculator.js` as a deleted file + +## Expected Behavior + +- Register entry for `src/pricing/calculator.js` is removed +- `lastScanCommit` updated to HEAD +- Exit code: 0 diff --git a/evals/fixtures/sds-incremental-trivial-changed/README.md b/evals/fixtures/sds-incremental-trivial-changed/README.md new file mode 100644 index 0000000..5c9e293 --- /dev/null +++ b/evals/fixtures/sds-incremental-trivial-changed/README.md @@ -0,0 +1,17 @@ +# Fixture: sds-incremental-trivial-changed + +**Skill**: semantic-duplication-scan +**Scenario**: Incremental scan where all changed files are trivial after pre-filter + +## Setup + +- `computation-register.json` exists with entries from a prior scan +- 3 files have changed since `lastScanCommit` +- All 3 changed files contain only trivial functions after pre-filter + +## Expected Behavior + +- No entries added or modified in the register +- Output: `No new computation units found in changed files — register unchanged` +- Exit code: 0 +- `lastScanCommit` is NOT updated (no successful annotation occurred) diff --git a/evals/fixtures/sds-layer-domain/domain-function.ts b/evals/fixtures/sds-layer-domain/domain-function.ts new file mode 100644 index 0000000..6a40d36 --- /dev/null +++ b/evals/fixtures/sds-layer-domain/domain-function.ts @@ -0,0 +1,7 @@ +// SKILL: semantic-duplication-scan +// EXPECTED layer: "domain" +// REASON: no external imports, pure computation using only primitive types + +function calculateOrderTotal(lineItems: Array<{ quantity: number; unitPrice: number }>): number { + return lineItems.reduce((sum, item) => sum + item.quantity * item.unitPrice, 0); +} diff --git a/evals/fixtures/sds-layer-infrastructure/infra-function.ts b/evals/fixtures/sds-layer-infrastructure/infra-function.ts new file mode 100644 index 0000000..e55629b --- /dev/null +++ b/evals/fixtures/sds-layer-infrastructure/infra-function.ts @@ -0,0 +1,12 @@ +// SKILL: semantic-duplication-scan +// EXPECTED layer: "infrastructure" +// REASON: imports pg (PostgreSQL client) — database coupling + +import { Pool } from "pg"; + +const pool = new Pool(); + +async function getUserById(userId: string): Promise> { + const result = await pool.query("SELECT * FROM users WHERE id = $1", [userId]); + return result.rows[0]; +} diff --git a/evals/fixtures/sds-layer-presentation/presentation-function.tsx b/evals/fixtures/sds-layer-presentation/presentation-function.tsx new file mode 100644 index 0000000..bcbe966 --- /dev/null +++ b/evals/fixtures/sds-layer-presentation/presentation-function.tsx @@ -0,0 +1,13 @@ +// SKILL: semantic-duplication-scan +// EXPECTED layer: "presentation" +// REASON: imports React component — rendering/display coupling + +import React from "react"; + +function formatCurrencyDisplay(amount: number, currency: string): React.ReactNode { + const formatted = new Intl.NumberFormat("en-US", { + style: "currency", + currency, + }).format(amount); + return {formatted}; +} diff --git a/evals/fixtures/sds-layer-unknown/ambiguous-function.ts b/evals/fixtures/sds-layer-unknown/ambiguous-function.ts new file mode 100644 index 0000000..8a75081 --- /dev/null +++ b/evals/fixtures/sds-layer-unknown/ambiguous-function.ts @@ -0,0 +1,10 @@ +// SKILL: semantic-duplication-scan +// EXPECTED layer: "unknown" +// REASON: imports a custom utility with no clear infrastructure or presentation coupling + +import { formatValue } from "./internal-utils"; + +function processMetric(raw: number, scale: number): string { + const scaled = raw * scale; + return formatValue(scaled); +} diff --git a/evals/fixtures/sds-missing-commit/README.md b/evals/fixtures/sds-missing-commit/README.md new file mode 100644 index 0000000..25f2205 --- /dev/null +++ b/evals/fixtures/sds-missing-commit/README.md @@ -0,0 +1,15 @@ +# Fixture: sds-missing-commit + +**Skill**: semantic-duplication-scan +**Scenario**: lastScanCommit not in git history (e.g., after rebase) + +## Setup + +- `computation-register.json` has `lastScanCommit: "abc123def456"` +- That commit hash no longer exists in the git history + +## Expected Behavior + +- Output: `lastScanCommit not found in history — running full scan` +- Falls back to full-scan mode (does not exit non-zero) +- Proceeds to annotate all source files diff --git a/evals/fixtures/sds-mode-full-no-register/README.md b/evals/fixtures/sds-mode-full-no-register/README.md new file mode 100644 index 0000000..59e8693 --- /dev/null +++ b/evals/fixtures/sds-mode-full-no-register/README.md @@ -0,0 +1,10 @@ +# Fixture: sds-mode-full-no-register + +**Skill**: semantic-duplication-scan +**Scenario**: No register exists → full-scan mode selected automatically + +## Expected Behavior + +- Full-scan mode: glob all source files in scope +- No pre-flight shallow-clone check (only runs in incremental mode) +- Proceeds to annotation of all source files found diff --git a/evals/fixtures/sds-mode-incremental/README.md b/evals/fixtures/sds-mode-incremental/README.md new file mode 100644 index 0000000..361b40a --- /dev/null +++ b/evals/fixtures/sds-mode-incremental/README.md @@ -0,0 +1,10 @@ +# Fixture: sds-mode-incremental + +**Skill**: semantic-duplication-scan +**Scenario**: Register with valid lastScanCommit → incremental mode selected + +## Expected Behavior + +- Incremental mode: `git diff HEAD --name-only` used for file selection +- Pre-flight shallow-clone check runs +- Only diff'd files are re-annotated diff --git a/evals/fixtures/sds-no-canonical/README.md b/evals/fixtures/sds-no-canonical/README.md new file mode 100644 index 0000000..40f6c3a --- /dev/null +++ b/evals/fixtures/sds-no-canonical/README.md @@ -0,0 +1,20 @@ +# Fixture: sds-no-canonical + +**Skill**: semantic-duplication-scan +**Scenario**: All copies of a duplicated concept are infrastructure-coupled — no clear canonical + +## Setup + +Three source files, each computing "order total", each importing infrastructure-specific dependencies: + +1. `src/api/order-handler.ts` — imports Express `Request`/`Response` → infrastructure +2. `src/db/order-repo.ts` — imports `pg` Pool → infrastructure +3. `src/queue/order-processor.ts` — imports RabbitMQ client → infrastructure + +All three compute the same sum of (quantity × unit price) for line items. + +## Expected Behavior + +- One duplicate cluster reported with all three file:line references +- Output: `canonical: none — a new domain-layer implementation may be required` +- All three entries listed with their `file:line` references diff --git a/evals/fixtures/sds-no-duplicates/README.md b/evals/fixtures/sds-no-duplicates/README.md new file mode 100644 index 0000000..569b95c --- /dev/null +++ b/evals/fixtures/sds-no-duplicates/README.md @@ -0,0 +1,21 @@ +# Fixture: sds-no-duplicates + +**Skill**: semantic-duplication-scan +**Scenario**: Clustering finds no duplicates when all concepts are semantically distinct + +## Register State (pre-populated for clustering test) + +Five entries with distinct canonicalized domainConcepts: + +1. `applyDiscount` — domain — "discounted price" +2. `calculateTax` — domain — "tax amount" +3. `validateCouponCode` — application — "coupon validity" +4. `computeShippingWeight` — domain — "shipping weight" +5. `formatReceiptLine` — presentation — "receipt line text" + +## Expected Behavior + +After clustering: +- No duplicate clusters reported +- Output: `No semantic duplication detected` +- Exit code: 0 diff --git a/evals/fixtures/sds-permissions-failure/README.md b/evals/fixtures/sds-permissions-failure/README.md new file mode 100644 index 0000000..a41bdf1 --- /dev/null +++ b/evals/fixtures/sds-permissions-failure/README.md @@ -0,0 +1,15 @@ +# Fixture: sds-permissions-failure + +**Skill**: semantic-duplication-scan +**Scenario**: Register write fails due to file system permissions + +## Setup + +The project root directory is not writable by the current user (e.g., `chmod 555 .`). + +## Expected Behavior + +- Exit code: non-zero +- Output reports the exact path that could not be written AND the OS-level error +- Example: `Cannot write computation-register.json: EACCES: permission denied, open '/project/computation-register.json'` +- No partial register written diff --git a/evals/fixtures/sds-prefilter-trivial/README.md b/evals/fixtures/sds-prefilter-trivial/README.md new file mode 100644 index 0000000..a7ec66c --- /dev/null +++ b/evals/fixtures/sds-prefilter-trivial/README.md @@ -0,0 +1,26 @@ +# Fixture: sds-prefilter-trivial + +**Skill**: semantic-duplication-scan +**Scenario**: Pre-filter correctly excludes trivial functions + +## Purpose + +Verifies that the skill's pre-filter step excludes all trivial functions from the computation register, producing no register entries for a file containing only trivial code. + +## Expected Behavior + +When `/semantic-scan` is run against a project containing only `trivial-functions.ts`: + +- No `computation-register.json` is created +- Output: `"No computation units found to analyze"` +- Exit code: 0 + +## Why Each Function Is Trivial + +| Function | Reason excluded | +|----------|----------------| +| `getUserName` | Getter — reads and returns a field, no computation | +| `logUser` | Pass-through delegator — no transformation | +| `identity` | Identity function — returns input unchanged | +| `OrderItem` constructor | Only assigns parameters to fields | +| `Pricing.basePrice` getter | Single-expression property accessor | diff --git a/evals/fixtures/sds-prefilter-trivial/trivial-functions.ts b/evals/fixtures/sds-prefilter-trivial/trivial-functions.ts new file mode 100644 index 0000000..b49edad --- /dev/null +++ b/evals/fixtures/sds-prefilter-trivial/trivial-functions.ts @@ -0,0 +1,54 @@ +// SKILL: semantic-duplication-scan +// EXPECTED: no register entries created — all functions below are trivial +// EXPECTED OUTPUT: "No computation units found to analyze" +// +// A trivial function performs no calculations and does not modify state. +// The skill must exclude all of the following patterns. + +interface User { + id: string; + name: string; + role: string; + price: number; +} + +// Getter — reads and returns a field, no computation +function getUserName(user: User): string { + return user.name; +} + +// Pass-through delegator — calls one function with the same args, no transformation +function logUser(user: User): void { + console.log(user); +} + +// Identity function — returns its input unchanged +function identity(value: T): T { + return value; +} + +// Constructor-style initializer — only assigns parameters to fields +class OrderItem { + id: string; + quantity: number; + unitPrice: number; + + constructor(id: string, quantity: number, unitPrice: number) { + this.id = id; + this.quantity = quantity; + this.unitPrice = unitPrice; + } +} + +// Single-expression property accessor +class Pricing { + private _basePrice: number; + + constructor(price: number) { + this._basePrice = price; + } + + get basePrice(): number { + return this._basePrice; + } +} diff --git a/evals/fixtures/sds-scoped-cross-scope-notice/README.md b/evals/fixtures/sds-scoped-cross-scope-notice/README.md new file mode 100644 index 0000000..86d3914 --- /dev/null +++ b/evals/fixtures/sds-scoped-cross-scope-notice/README.md @@ -0,0 +1,17 @@ +# Fixture: sds-scoped-cross-scope-notice + +**Skill**: semantic-duplication-scan +**Scenario**: Scoped scan produces a cluster that includes an out-of-scope entry + +## Setup + +Register has: +- `src/pricing/discount.ts::applyDiscount` (domain, inside scope `src/pricing`) +- `src/checkout/cart.ts::computeDiscountedTotal` (presentation, outside scope) + +Both compute the same domain concept. Scan invoked with `/semantic-scan src/pricing`. + +## Expected Behavior + +- Duplicate cluster is reported containing both entries +- Output includes: `Note: this cluster includes 1 entry outside the scoped path — run without scope argument to see full context` diff --git a/evals/fixtures/sds-semanticscanignore-removes-entries/README.md b/evals/fixtures/sds-semanticscanignore-removes-entries/README.md new file mode 100644 index 0000000..8cc5442 --- /dev/null +++ b/evals/fixtures/sds-semanticscanignore-removes-entries/README.md @@ -0,0 +1,22 @@ +# Fixture: sds-semanticscanignore-removes-entries + +**Skill**: semantic-duplication-scan +**Scenario**: .semanticscanignore causes previously-registered entries to be removed + +## Setup + +Register has entries for: +- `src/legacy/old-pricing.ts::calculatePrice` +- `src/domain/pricing.ts::applyDiscount` + +`.semanticscanignore` contains: +``` +src/legacy/ +``` + +## Expected Behavior + +- `src/legacy/old-pricing.ts` is not re-annotated +- Register entry for `src/legacy/old-pricing.ts::calculatePrice` is **removed** +- Entry for `src/domain/pricing.ts::applyDiscount` is unchanged +- Exit code: 0 diff --git a/evals/fixtures/sds-shallow-clone/README.md b/evals/fixtures/sds-shallow-clone/README.md new file mode 100644 index 0000000..a6eddcf --- /dev/null +++ b/evals/fixtures/sds-shallow-clone/README.md @@ -0,0 +1,19 @@ +# Fixture: sds-shallow-clone + +**Skill**: semantic-duplication-scan +**Scenario**: Shallow clone blocks incremental mode + +## Setup + +1. A `computation-register.json` exists with a valid `lastScanCommit` +2. The git repository is a shallow clone (`git clone --depth 1 ...`) + +## Expected Behavior + +- Exit code: non-zero +- Output exact string: `Shallow clone detected — semantic-scan requires full history for incremental mode. Run with --full to override.` +- No modifications to the register + +## Override + +Running `/semantic-scan --full` on a shallow clone should skip this check and proceed with full-scan mode. diff --git a/evals/fixtures/sds-subdirectory-scope/README.md b/evals/fixtures/sds-subdirectory-scope/README.md new file mode 100644 index 0000000..17033bf --- /dev/null +++ b/evals/fixtures/sds-subdirectory-scope/README.md @@ -0,0 +1,19 @@ +# Fixture: sds-subdirectory-scope + +**Skill**: semantic-duplication-scan +**Scenario**: Scoped scan only re-annotates files inside the given path + +## Setup + +Register has entries for: +- `src/pricing/discount.ts` (inside scope) +- `src/checkout/cart.ts` (outside scope) + +Command: `/semantic-scan src/pricing` + +## Expected Behavior + +- Only `src/pricing/discount.ts` is re-annotated +- Entry for `src/checkout/cart.ts` is unchanged in the register +- `lastScanCommit` updated to HEAD +- Exit code: 0 diff --git a/plans/semantic-duplication-scan.md b/plans/semantic-duplication-scan.md new file mode 100644 index 0000000..b68f8ff --- /dev/null +++ b/plans/semantic-duplication-scan.md @@ -0,0 +1,400 @@ +# Plan: Semantic Duplication Scan + +**Created**: 2026-05-11 +**Branch**: main +**Status**: implemented + +## Goal + +Add a `/semantic-scan` skill and command that detects business logic reimplemented multiple times across architectural layers. The skill builds a persistent `computation-register.json` in the user's project, incrementally updated via `git diff`, and runs a clustering pass to surface semantic duplicates with `file:line` references. Canonical designation is always a suggestion requiring human confirmation. This addresses the gap where linters catch syntactic duplication and `domain-review` catches single-instance layer violations, but no existing tool detects the same domain calculation independently reimplemented in multiple layers. + +## Acceptance Criteria + +- [ ] `computation-register.json` is valid JSON and human-readable +- [ ] Idempotency: two runs with no code changes produce structurally identical register output (same entries, same semantic descriptions, same file:line references; `lastScanCommit` and `scanTimestamp` fields excluded from comparison) +- [ ] Incremental mode never re-annotates files whose paths are not in `git diff HEAD --name-only` +- [ ] Functions containing no arithmetic operators, boolean logic, branching constructs, assignments to variables outside their own scope, or higher-order collection operations (map, filter, reduce) are absent from the register +- [ ] Shallow clone in incremental mode exits non-zero and outputs the exact string: "Shallow clone detected — semantic-scan requires full history for incremental mode. Run with --full to override." +- [ ] Exit code 0 on scan success regardless of whether duplicates were found; non-zero only on scan failure or pre-flight error +- [ ] All report findings include `file:line` references pointing to the first line of the identified function +- [ ] Every duplicate cluster report uses a consistent `canonical:` prefix: "canonical: suggested — requires human confirmation" or "canonical: none — a new domain-layer implementation may be required" +- [ ] Annotation failures are never silent: any file that could not be annotated is reported in a scan summary with the error reason, and the register includes a `scanErrors` array for that run + +## Performance Guidelines + +Tracked as benchmarks in `evals/fixtures/sds-benchmark/`, not binary acceptance criteria. + +- Incremental scan on ≤20 changed files: target under 90 seconds +- Full scan of the `sds-benchmark-500` fixture: target under 10 minutes +- False positive target: fewer than 20% of flagged clusters on the `sds-benchmark-500` fixture; ground truth determined by fixture annotations + +## User-Facing Behavior + +```gherkin +Feature: Semantic Duplication Scan + + Background: + Given a project with source files in any language + + Scenario: First-time full scan creates a computation register + Given no computation-register.json exists in the project root + When the developer runs /semantic-scan + Then computation-register.json is created at the project root + And it contains one entry per non-trivial computation identified + And each entry contains: file path, function name, inferred layer, structured semantic description, prompt version, and HEAD commit hash + And functions containing no arithmetic, boolean logic, branching, external assignments, or higher-order collection operations are excluded + And the developer sees progress output as each file is annotated: "Annotating [N/total] " + + Scenario: Annotation failure is reported, not silenced + Given a full scan is running + And annotation fails for one file due to a transient LLM error + When the scan completes + Then the register is written with all successfully annotated files + And the register includes a scanErrors array identifying the failed file and error reason + And the developer sees a warning: "Warning: 1 file could not be annotated. Re-run /semantic-scan to retry." + And the scan exits with code 0 (partial success is not a failure) + + Scenario: Incremental scan re-annotates only changed files + Given computation-register.json exists with a lastScanCommit recorded + And that commit exists in the full git history + And 5 source files have been modified or added since that commit + And 95 source files have not changed + When the developer runs /semantic-scan + Then only the 5 changed files are re-annotated + And the 95 unchanged entries are preserved exactly + And lastScanCommit is updated to HEAD + + Scenario: Incremental scan with no changed files updates only lastScanCommit + Given computation-register.json exists with a lastScanCommit recorded + And no source files have changed since that commit + When the developer runs /semantic-scan + Then no entries in the register are modified + And lastScanCommit is updated to HEAD + And the output reports "No changes since last scan — register up to date" + + Scenario: Deleted file is removed from the register + Given computation-register.json exists with an entry for src/pricing/calculator.js + And src/pricing/calculator.js has been deleted since lastScanCommit + When the developer runs /semantic-scan + Then the register entry for src/pricing/calculator.js is removed + And lastScanCommit is updated to HEAD + + Scenario: --full flag forces full re-scan regardless of lastScanCommit + Given computation-register.json exists with a valid lastScanCommit + And no source files have changed since that commit + When the developer runs /semantic-scan --full + Then all files in scope are re-annotated + And lastScanCommit is updated to HEAD + + Scenario: Shallow clone detected in incremental mode + Given computation-register.json exists with a lastScanCommit recorded + And the repository is a shallow clone + When the developer runs /semantic-scan + Then the scan exits with a non-zero code + And the output reports the exact string: "Shallow clone detected — semantic-scan requires full history for incremental mode. Run with --full to override." + + Scenario: Register cannot be written due to file system permissions + Given the project root directory is not writable by the current user + When the developer runs /semantic-scan + Then the scan exits with a non-zero code + And the output reports the exact path that could not be written and the OS-level error + + Scenario: Semantic duplicate detected across inferred layers + Given a source file in the domain layer containing a function that applies a percentage discount to a base price + And a source file in the presentation layer containing a function that independently computes a discounted total using the same inputs + When the developer runs /semantic-scan + Then a duplicate cluster is reported containing both functions + And the domain-layer function is identified as the suggested canonical in the format "canonical: suggested — requires human confirmation" + And the presentation-layer function is listed with its file:line reference + + Scenario: Canonical does not exist in any registered copy + Given source files in three different layers each containing a function computing the same domain concept + And all three functions import infrastructure-specific dependencies + When the developer runs /semantic-scan + Then a duplicate cluster is reported containing all three functions with their file:line references + And the output includes "canonical: none — a new domain-layer implementation may be required" + + Scenario: No semantic duplicates detected + Given all source files contain computations that express distinct domain concepts + When the developer runs /semantic-scan + Then no duplicate clusters are reported + And the output confirms "No semantic duplication detected" + + Scenario: Scan scoped to a subdirectory — existing out-of-scope entries preserved + Given computation-register.json exists with entries for files both inside and outside src/pricing + When the developer runs /semantic-scan src/pricing + Then only files under src/pricing are re-annotated in this pass + And entries for files outside src/pricing are unchanged in the register + And lastScanCommit is updated to HEAD + + Scenario: Scoped scan cluster includes out-of-scope entries — user is notified + Given computation-register.json exists with entries for files inside and outside src/pricing + And a duplicate cluster spans one entry in src/pricing and one entry outside src/pricing + When the developer runs /semantic-scan src/pricing + Then the duplicate cluster is reported + And the output includes "Note: this cluster includes 1 entry outside the scoped path — run without scope argument to see full context" + + Scenario: Ignore configuration removes previously-registered entries + Given computation-register.json exists with entries for files under src/legacy/ + And a .semanticscanignore file lists src/legacy/ + When the developer runs /semantic-scan + Then files under src/legacy/ are not annotated + And existing register entries for files under src/legacy/ are removed from the register + + Scenario: No computation units found in scope on first run + Given no computation-register.json exists + And no source files in scope contain non-trivial computations + When the developer runs /semantic-scan + Then no register is created + And the output reports "No computation units found to analyze" + + Scenario: Incremental scan with changed files all trivial after pre-filter + Given computation-register.json exists with entries from a prior scan + And 3 source files have changed since lastScanCommit + And all 3 changed files contain only trivial functions after pre-filter + When the developer runs /semantic-scan + Then no entries are added or modified in the register + And the output reports "No new computation units found in changed files — register unchanged" + + Scenario: lastScanCommit not in git history + Given computation-register.json records a lastScanCommit that no longer exists in history + When the developer runs /semantic-scan + Then the skill falls back to full-scan mode + And the output warns "lastScanCommit not found in history — running full scan" +``` + +## Steps + +### Step 1: Scaffold skill with trivial-function definition and pre-filter rules + +**Complexity**: standard +**RED**: Create eval fixture `evals/fixtures/sds-prefilter-trivial` — a source file containing only trivial functions (getters, pass-through delegators, identity functions, constructors that only assign parameters to fields). Expected output from skill: empty register, "No computation units found to analyze." +**GREEN**: Create `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md` with: + - Frontmatter: `name`, `description`, `role: worker`, `user-invocable: true` + - Overview section referencing the spec at `docs/specs/semantic-duplication-scan.md` + - Trivial function definition: "A function is trivial if it contains no arithmetic operators (+, -, *, /, %, **), no boolean logic operators (&&, ||, !, not, and, or), no branching constructs (if, else, switch, ternary, match), no assignments to variables outside its own scope, and no calls to higher-order collection operations (map, filter, reduce, flatMap). Getters, pass-through delegators, identity functions, and constructors that only assign parameters to fields are trivial." + - Pre-filter rules: exclude `*.test.*`, `*.spec.*`, `__tests__/`, `*.generated.*`, `*.pb.*`, `dist/`, `build/`, `.semanticscanignore` patterns, and trivial functions per definition. No LLM call at this stage. +**REFACTOR**: Verify definition is unambiguous across TypeScript, Python, and Go examples. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-prefilter-trivial` +**Commit**: `feat: add semantic-duplication-scan skill scaffold with pre-filter rules` + +--- + +### Step 2: Add annotation procedure, register schema, and prompt versioning + +**Complexity**: standard +**RED**: Create eval fixture `evals/fixtures/sds-annotation-schema` — a file with one non-trivial computation function. Expected output: a register entry with all required fields populated and valid, `promptVersion` present and non-empty. +**GREEN**: Add to SKILL.md: + - Register entry schema with `promptVersion` field + - `domainConcept` canonicalization: lowercase, strip articles (a, an, the), normalize verb to infinitive + - Annotation prompt (pinned verbatim): "Describe only what this function computes in domain business terms using the JSON schema. Do not reference the implementation language, variable names, or data structure types. Use lowercase infinitive verbs." + - Batching: one Haiku call per file; all non-trivial functions in a single call + - Progress: emit "Annotating [N/total] \" to stderr before each file's LLM call + - Partial failure: if a file's LLM call fails, record `{file, error}` in `scanErrors` and continue; warn at end: "Warning: N files could not be annotated. Re-run /semantic-scan to retry." + - `promptVersion` re-annotation: entries whose `promptVersion` differs from SKILL.md current version are treated as stale and re-annotated on the next scan pass that touches that file +**REFACTOR**: Confirm canonicalization produces stable output for phrasing variants of the same concept. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-annotation-schema` +**Commit**: `feat: add annotation procedure, register schema, and prompt versioning` + +--- + +### Step 3: Add layer inference from coupling profile + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/sds-layer-infrastructure` — function importing `pg`, `redis`, `axios`; expected: `infrastructure` + - `evals/fixtures/sds-layer-domain` — function with no external imports; expected: `domain` + - `evals/fixtures/sds-layer-presentation` — function importing a React component; expected: `presentation` + - `evals/fixtures/sds-layer-unknown` — function with ambiguous imports; expected: `unknown` +**GREEN**: Add layer inference table to SKILL.md annotation section (inferred in the same Haiku call): + + | Coupling profile | Inferred layer | + |-----------------|---------------| + | Imports DB clients, ORMs, HTTP clients, message brokers | `infrastructure` | + | Imports rendering primitives, formats for display, accesses DOM/templates | `presentation` | + | Depends only on domain types and pure functions, no external imports | `domain` | + | Orchestrates domain + infrastructure without owning business rules | `application` | + | Cannot be determined from coupling profile | `unknown` | + +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-layer-*` +**Commit**: `feat: add layer inference rules to annotation procedure` + +--- + +### Step 4: Add mode detection, pre-flight checks, and --full flag + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/sds-shallow-clone` — register exists, shallow clone; expected: exit non-zero, exact error string + - `evals/fixtures/sds-mode-full-no-register` — no register; expected: full-scan mode + - `evals/fixtures/sds-mode-incremental` — register with valid `lastScanCommit`; expected: incremental mode + - `evals/fixtures/sds-full-flag-override` — register exists, `--full` passed; expected: all files re-annotated + - `evals/fixtures/sds-missing-commit` — `lastScanCommit` not in history; expected: fallback to full scan, warning emitted + - `evals/fixtures/sds-permissions-failure` — project root not writable; expected: exit non-zero, path + OS error +**GREEN**: Add to SKILL.md: + - Mode detection: full if no register, incremental if register present + - Pre-flight (incremental only): `git rev-parse --is-shallow-repository`; if `true`, emit exact error string and exit non-zero + - `--full` flag: skip shallow-clone check, force full-scan mode + - Missing `lastScanCommit`: warn "lastScanCommit not found in history — running full scan" and fall back to full mode + - Write-failure: if register cannot be written, emit path + OS error and exit non-zero +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-*` +**Commit**: `feat: add mode detection, pre-flight checks, and --full flag` + +--- + +### Step 5: Add incremental scan, register update, and idempotency + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/sds-incremental-5-of-100` — 100 entries, 5 changed files; expected: 5 re-annotated, 95 unchanged, `lastScanCommit` updated + - `evals/fixtures/sds-incremental-0-changed` — no files changed; expected: no entries modified, `lastScanCommit` updated, "No changes since last scan — register up to date" + - `evals/fixtures/sds-incremental-deleted-file` — entry exists for a deleted file; expected: entry removed + - `evals/fixtures/sds-idempotency` — two runs with identical source; expected: structurally identical register (excluding `lastScanCommit`) +**GREEN**: Add to SKILL.md: + - File selection: `git diff HEAD --name-only` filtered to source files in scope + - Apply `.semanticscanignore` and pre-filter to the diff result + - Deleted files: entries whose `file` path no longer exists are removed + - Merge strategy: replace entries for re-annotated paths; preserve all others + - Idempotency: sort register entries by `file` then `function` before writing; `domainConcept` canonicalization ensures consistent field values + - `promptVersion` staleness: re-annotate any entry whose `promptVersion` differs from current on the next scan that touches that file + - Update `lastScanCommit` to HEAD after successful write +**REFACTOR**: Verify sort order is stable across naming conventions. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-incremental-*`, `evals/fixtures/sds-idempotency` +**Commit**: `feat: add incremental scan, register update, and idempotency` + +--- + +### Step 6a: Add clustering with register partitioning + +**Complexity**: complex +**RED**: Create eval fixture `evals/fixtures/sds-no-duplicates` — register with 5 entries each with distinct canonicalized `domainConcept` values. Expected: no clusters, "No semantic duplication detected." +**GREEN**: Add clustering procedure to SKILL.md: + - Partition strategy: shard register by layer pair (domain×presentation, domain×infrastructure, application×presentation) as separate Sonnet calls + - Token budget: if a shard exceeds 50,000 tokens, further shard by first normalized token of `domainConcept` + - Sonnet clustering prompt (pinned verbatim): "Group these entries by semantic equivalence — entries that compute the same domain concept regardless of implementation differences. Return clusters as JSON arrays of entry IDs. Two entries belong in the same cluster only if both would need to change if the underlying business rule changed." + - Progress: emit "Clustering [layer-pair]: domain × presentation..." before each Sonnet call + - Cross-shard reconciliation: after per-shard clustering, run a lightweight merge pass on cluster representatives to catch cross-shard equivalents, keeping the merge input under the 50k threshold +**REFACTOR**: Confirm shard boundaries don't suppress cross-shard duplicates. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-no-duplicates` +**Commit**: `feat: add clustering with register partitioning` + +--- + +### Step 6b: Add canonical scoring and duplicate report + +**Complexity**: complex +**RED**: Create eval fixture `evals/fixtures/sds-duplicate-with-canonical` — domain-layer `applyDiscount` (no infrastructure imports) and presentation-layer `computeFinalPrice` (imports render helper). Expected: duplicate cluster, domain entry as canonical in format "canonical: suggested \ — requires human confirmation", cross-scope notice if applicable. +**GREEN**: Add to SKILL.md: + - Canonical scoring: rank by layer (`domain` > `application` > `presentation` > `infrastructure` > `unknown`); within same layer, rank by count of infrastructure imports (fewer = higher rank) + - Ambiguity predicate: if top two candidates tie on layer rank AND differ by ≤1 infrastructure import → escalate to Opus; emit "Resolving ambiguous canonical for cluster: \..." + - Opus prompt: "Given these N entries computing the same domain concept, which is the most appropriate canonical location? Consider domain purity, reusability, and least coupling to delivery mechanism." + - Output canonical: + - Clear winner: "canonical: suggested \ — requires human confirmation" + - No winner: "canonical: none — a new domain-layer implementation may be required" + - Cross-scope notice (scoped runs): "Note: this cluster includes N entry/entries outside the scoped path — run without scope argument to see full context" (use "entry" for N=1, "entries" for N>1) + - `--no-opus` flag: skip Opus escalation; report ambiguous clusters as "canonical: ambiguous — human review required" +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-duplicate-with-canonical` +**Commit**: `feat: add canonical scoring and duplicate report` + +--- + +### Step 6c: Add no-canonical handling and file:line accuracy + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/sds-no-canonical` — three infrastructure-coupled files computing the same concept. Expected: cluster with "canonical: none — a new domain-layer implementation may be required" + - `evals/fixtures/sds-fileline-accuracy` — fixture with known function positions; expected: reported line numbers match first line of each function definition +**GREEN**: Add to SKILL.md: + - `file:line` references point to the first line of the function definition (not the body) + - If a file has changed since annotation, append staleness note: "(line may have shifted — re-run scan to refresh)" +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-no-canonical`, `evals/fixtures/sds-fileline-accuracy` +**Commit**: `feat: add no-canonical handling and file:line accuracy` + +--- + +### Step 7: Add scoping, ignore configuration, and empty-scan handling + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/sds-subdirectory-scope` — register with entries inside and outside `src/pricing`; scan scoped to `src/pricing`. Expected: only `src/pricing` files re-annotated; out-of-scope entries unchanged + - `evals/fixtures/sds-scoped-cross-scope-notice` — cluster spans `src/pricing` and `src/checkout`; scan scoped to `src/pricing`. Expected: cluster reported with cross-scope notice + - `evals/fixtures/sds-semanticscanignore-removes-entries` — register with `src/legacy/` entries; `.semanticscanignore` lists `src/legacy/`. Expected: entries removed + - `evals/fixtures/sds-empty-scope` — no register, all files trivial. Expected: "No computation units found to analyze", no register created + - `evals/fixtures/sds-incremental-trivial-changed` — register exists, 3 changed files all trivial. Expected: "No new computation units found in changed files — register unchanged" +**GREEN**: Add to SKILL.md: + - Subdirectory scoping: path argument as prefix filter; out-of-scope register entries preserved + - `.semanticscanignore`: one glob per line; newly-ignored path entries removed from register on next scan + - Empty scan (first run): "No computation units found to analyze"; register not created + - Incremental trivial-changed: "No new computation units found in changed files — register unchanged" +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md`, `evals/fixtures/sds-*` +**Commit**: `feat: add scoping, ignore configuration, and empty-scan handling` + +--- + +### Step 8: Create the command file + +**Complexity**: trivial +**RED**: Confirm `/semantic-scan` is not discoverable without a command file (not listed in `/help` output). +**GREEN**: Create `plugins/agentic-dev-team/commands/semantic-scan.md` with frontmatter `argument-hint: "[path] [--full] [--no-opus]"` and body delegating to the skill. +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/commands/semantic-scan.md` +**Commit**: `feat: add /semantic-scan command entry point` + +--- + +### Step 9: Update agent registry and CLAUDE.md quick reference + +**Complexity**: trivial +**RED**: Run `/agent-audit` — expect structural compliance failures for missing registry entries. +**GREEN**: + - Append to `knowledge/agent-registry.md` Skills Registry table: `| Semantic Duplication Scan | skills/semantic-duplication-scan/SKILL.md | ~TBD | Orchestrator, Software Engineer |` + - Append to `plugins/agentic-dev-team/CLAUDE.md` Slash Commands table: `| /semantic-scan | commands/semantic-scan.md | worker | Build computation register and detect semantic duplicates across architectural layers |` + - Update `~TBD` token count after Steps 1–8 are complete +**REFACTOR**: None needed. +**Files**: `plugins/agentic-dev-team/knowledge/agent-registry.md`, `plugins/agentic-dev-team/CLAUDE.md` +**Commit**: `docs: register semantic-scan skill and command in registry and CLAUDE.md` + +--- + +## Complexity Classification + +| Rating | Criteria | Review depth | +|--------|----------|--------------| +| `trivial` | Single-file rename, config change, typo fix, documentation-only | Skip inline review; covered by final `/code-review` | +| `standard` | New function, test, module, or behavioral change within existing patterns | Spec-compliance + relevant quality agents | +| `complex` | Architectural change, security-sensitive, cross-cutting concern, new abstraction | Full agent suite including opus-tier agents | + +## Pre-PR Quality Gate + +- [ ] All eval fixtures pass `/agent-eval` +- [ ] `/agent-audit` passes with no structural compliance failures +- [ ] `/code-review` passes on all new files +- [ ] SKILL.md token count estimated and added to registry entry +- [ ] CLAUDE.md quick reference updated +- [ ] `evals/fixtures/sds-benchmark/` fixture and README created for performance guideline tracking + +## Risks & Open Questions + +- **Clustering token budget**: Documented threshold (50,000 tokens per layer-pair shard). Fallback: shard further by `domainConcept` first token. Known limit, not a silent failure. +- **`promptVersion` drift**: Entries with old `promptVersion` are re-annotated on their next incremental scan pass. Full re-annotation required only if the `domainConcept` canonicalization rule changes. +- **Language coverage**: Trivial-function definition applied by Haiku at annotation time, reducing language-specific risk vs. AST-based approaches. Higher-order function patterns (map/filter/reduce) explicitly listed. +- **Register commit convention**: Committing recommended for team projects; `.gitignore` acceptable for solo. Neither enforced. SKILL.md will include guidance. +- **`scanErrors` lifecycle**: Cleared when the next scan successfully re-annotates the previously-failed files. `--full` clears all prior `scanErrors` by re-attempting every file in scope. +- **Renamed files**: Rename edge case (old-path entry removed, new-path entry created) not covered by a scenario in this slice. Tracked as a known gap for a follow-on. +- **Higher-order function classification**: `map`/`filter`/`reduce` chains are non-trivial; explicitly included in the trivial-function exclusion list. + +## Plan Review Summary + +Four reviewers ran across two revision cycles. All four passed on the final revision. + +| Reviewer | Verdict | Top Finding | +|----------|---------|-------------| +| Acceptance Test Critic | **approve** | 9 AC criteria binary-verifiable; rename edge case known gap | +| Design & Architecture Critic | **approve** | Canonicalization routine should be single named function; promptVersion + sharding resolve blockers | +| UX Critic | **approve** | All blockers resolved: progress feedback, partial failure, cross-scope notice, canonical phrasing, --retry-failed removed | +| Strategic Critic | **approve** | Correct problem fit, clean reversibility, incremental delivery after steps 2 and 6 | diff --git a/plugins/agentic-dev-team/CLAUDE.md b/plugins/agentic-dev-team/CLAUDE.md index 93941ef..488d46e 100644 --- a/plugins/agentic-dev-team/CLAUDE.md +++ b/plugins/agentic-dev-team/CLAUDE.md @@ -88,6 +88,7 @@ User-invocable workflows in `.claude/commands/`. All review commands are execute | `/harness-audit` | `commands/harness-audit.md` | orchestrator | Analyze harness effectiveness and flag stale components | | `/version` | `commands/version.md` | worker | Report the installed plugin version | | `/benchmark` | `commands/benchmark.md` | worker | Capture runtime performance metrics (Core Web Vitals, resource sizes) and compare against baselines | +| `/semantic-scan` | `commands/semantic-scan.md` | worker | Build computation register and detect semantic duplicates across architectural layers | | `/help` | `commands/help.md` | worker | List all available slash commands with descriptions | ## Request Processing Flow diff --git a/plugins/agentic-dev-team/commands/semantic-scan.md b/plugins/agentic-dev-team/commands/semantic-scan.md new file mode 100644 index 0000000..930eee7 --- /dev/null +++ b/plugins/agentic-dev-team/commands/semantic-scan.md @@ -0,0 +1,22 @@ +--- +name: semantic-scan +description: >- + Build a computation register and detect semantic duplicates across architectural + layers. Finds business logic reimplemented multiple times in different layers — + the same domain calculation independently appearing in domain services, client + adapters, and presentation components. Runs incrementally (git-diff-based) after + the first scan. Produces a structured duplicate report with file:line references + and canonical location suggestions. +argument-hint: "[path] [--full] [--no-opus]" +user-invocable: true +--- + +Apply the guidelines defined in skills/semantic-duplication-scan/SKILL.md to the current task. Read the skill file and follow its process flow, pre-filter rules, annotation procedure, clustering strategy, and report format. + +## Flags + +- `[path]` — Optional subdirectory to scope the scan. Only files under this path are re-annotated; out-of-scope register entries are preserved. +- `--full` — Force full-scan mode regardless of whether a register exists. Bypasses the shallow-clone pre-flight check. Use when `lastScanCommit` is stale or after a major restructuring. +- `--no-opus` — Skip Opus canonical resolution for ambiguous clusters. Ambiguous clusters are reported as `canonical: ambiguous — human review required` instead of triggering an Opus call. Use in cost-sensitive environments (CI, personal accounts). + +Apply this skill to: $ARGUMENTS diff --git a/plugins/agentic-dev-team/knowledge/agent-registry.md b/plugins/agentic-dev-team/knowledge/agent-registry.md index 78d663f..cc82747 100644 --- a/plugins/agentic-dev-team/knowledge/agent-registry.md +++ b/plugins/agentic-dev-team/knowledge/agent-registry.md @@ -82,6 +82,7 @@ Skills are reusable knowledge modules in `.claude/skills/` that agents reference | Docker Image Create | `skills/docker-image-create/SKILL.md` | 800 | Platform Engineer, Software Engineer | | Docker Image Audit | `skills/docker-image-audit/SKILL.md` | 750 | Orchestrator (inline review), Platform Engineer, Security Engineer | | Performance Benchmark | `skills/performance-benchmark/SKILL.md` | 800 | QA Engineer, Platform Engineer, `/benchmark` command | +| Semantic Duplication Scan | `skills/semantic-duplication-scan/SKILL.md` | ~4,500 | Orchestrator, Software Engineer, Architect | ## Subagent Prompt Templates diff --git a/plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md b/plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md new file mode 100644 index 0000000..5e1fc04 --- /dev/null +++ b/plugins/agentic-dev-team/skills/semantic-duplication-scan/SKILL.md @@ -0,0 +1,329 @@ +--- +name: semantic-duplication-scan +description: >- + Detect business logic reimplemented in multiple architectural layers. Builds a + persistent computation-register.json by annotating non-trivial computation + functions with structured semantic descriptions, then clusters entries to + surface duplicate domain concepts. Runs in full-scan mode on first use, + incremental (git-diff-based) mode on subsequent runs. Use when the user wants + to find logical duplication that linters and diff-scoped review agents miss — + the same domain calculation independently reimplemented across layers. +role: worker +user-invocable: true +--- + +# Semantic Duplication Scan + +## Overview + +Detect business logic that has been reimplemented multiple times across different architectural layers. Unlike linters (which detect syntactic similarity) or `domain-review` (which catches single-instance layer violations), this skill detects semantic equivalence — the same domain calculation independently appearing in domain services, client adapters, and presentation components with different variable names and structure. + +## Annotation Prompt Version + +**promptVersion**: `1.0` + +When this version changes, any register entry with a different `promptVersion` is treated as stale and re-annotated on the next scan pass that touches that file. + +--- + +## Pre-Filter Rules + +**Apply before any LLM call. No model invocation at this stage.** + +### Trivial Function Definition + +A function is **trivial** — and must be excluded from the register — if it meets ALL of the following: + +- Contains **no arithmetic operators**: `+`, `-`, `*`, `/`, `%`, `**` +- Contains **no boolean logic operators**: `&&`, `||`, `!`, `not`, `and`, `or` +- Contains **no branching constructs**: `if`, `else`, `switch`, `case`, `ternary` (`?:`), `match` +- Contains **no assignments to variables outside its own scope** (no external state mutation) +- Contains **no calls to higher-order collection operations**: `map`, `filter`, `reduce`, `flatMap`, `forEach`, `find`, `some`, `every`, or language equivalents + +Trivial patterns (always excluded): +- Getters: read and return a field with no transformation +- Pass-through delegators: call one function with the same arguments, return the result unchanged +- Identity functions: return the input unchanged +- Constructors / initializers that only assign parameters to instance fields + +If a file contains **only trivial functions**, output: + +``` +No computation units found to analyze +``` + +and do not create or modify the register. + +### File Exclusion Patterns + +Exclude the following from annotation regardless of content: + +``` +*.test.* +*.spec.* +__tests__/ +*.test-d.* +*.generated.* +*.pb.* +*.d.ts +dist/ +build/ +.next/ +coverage/ +``` + +Also exclude any path matching a pattern listed in `.semanticscanignore` (one glob per line) if that file exists in the project root. + +--- + +## Process Flow + +### Step 1 — Mode Detection + +Check for `computation-register.json` in the project root: +- **Absent** → full-scan mode +- **Present** → incremental mode + +### Step 2 — Pre-Flight (Incremental Mode Only) + +Run: `git rev-parse --is-shallow-repository` + +If output is `true`: +- Output the exact string: `Shallow clone detected — semantic-scan requires full history for incremental mode. Run with --full to override.` +- Exit non-zero + +If `--full` flag was passed: skip this check and force full-scan mode. + +If `lastScanCommit` in the register is not found in git history: +- Output: `lastScanCommit not found in history — running full scan` +- Switch to full-scan mode + +### Step 3 — Scope Resolution + +1. If a path argument was provided (e.g., `/semantic-scan src/pricing`), use it as a prefix filter: only consider files whose paths start with the argument +2. Apply `.semanticscanignore` patterns: exclude any file matching a listed glob +3. Apply file exclusion patterns from the Pre-Filter Rules above + +### Step 4 — File Selection + +**Full-scan mode**: Glob all source files in the resolved scope. + +**Incremental mode**: Run `git diff HEAD --name-only`, then filter to files in the resolved scope. + +If the git diff result is empty (no files changed since `lastScanCommit`): +1. Update `lastScanCommit` to HEAD in the register and write the updated register +2. Output: `No changes since last scan — register up to date` +3. Exit 0 — do not proceed to annotation or clustering + +### Step 5 — Pre-Filter + +For each selected file, identify non-trivial computation functions using the Trivial Function Definition above. Apply without an LLM call — use structural heuristics (presence of operators, branches, higher-order calls). + +If no non-trivial functions remain after filtering: + +- **First run**: `No computation units found to analyze` → exit 0, no register created +- **Incremental run**: `No new computation units found in changed files — register unchanged` → exit 0, register not modified + +### Step 6 — Annotation (Haiku, file-level batching) + +For each file with non-trivial functions: + +1. Emit progress to stderr: `Annotating [N/total] ` +2. Send all non-trivial functions from the file in a single Haiku call using the pinned prompt below +3. If the call fails, record `{file, error}` in `scanErrors` and continue — do not abort + +**Annotation prompt (pinned — do not paraphrase):** + +``` +You are a semantic annotation assistant. For each function below, produce a JSON object describing what it computes in pure domain business terms. + +Use this schema: +{ + "function": "", + "layer": "", + "semanticDescription": { + "verb": "", + "domainConcept": "", + "inputs": ["", ...], + "outputConcept": "" + } +} + +Layer inference rules — infer from what the function imports and uses: +- "infrastructure": imports DB clients, ORMs, HTTP clients, message brokers (pg, redis, axios, fetch, prisma, mongoose, etc.) +- "presentation": imports rendering primitives, formats for display, accesses DOM or templates (React, Vue, Svelte, JSX, HTML templates, etc.) +- "domain": depends only on domain types and pure functions, no external imports +- "application": orchestrates domain and infrastructure without owning business rules +- "unknown": cannot be determined from available context + +domainConcept rules: +- Use lowercase +- Remove articles: a, an, the +- Normalize the verb to infinitive form +- Example: "calculates the discounted price" → domainConcept: "discounted price", verb: "calculate" + +Describe only what the function computes in domain terms. Do not reference the implementation language, variable names, or data structure types. + +Functions: + +``` + +**Canonicalize `domainConcept` after receiving the response:** +1. Lowercase +2. Strip leading/trailing articles: `a `, `an `, `the ` +3. Normalize verb in the `verb` field to infinitive (e.g., "calculates" → "calculate", "computing" → "compute") + +### Step 7 — Register Update + +Build a register entry for each annotated function: + +```json +{ + "file": "", + "function": "", + "layer": "", + "semanticDescription": { + "verb": "", + "domainConcept": "", + "inputs": ["", ...], + "outputConcept": "" + }, + "promptVersion": "1.0", + "commitHash": "", + "line": +} +``` + +**Merge strategy:** +- Replace all entries whose `file` matches a re-annotated file +- Remove entries whose `file` no longer exists on disk +- Remove entries for files matching `.semanticscanignore` patterns +- Preserve all other entries unchanged + +**Idempotency:** Sort the full entry list by `file` ascending, then `function` ascending before writing. + +**Write the register.** If the write fails (permissions, disk full): +- Output the exact path that could not be written and the OS error +- Exit non-zero + +**Update `lastScanCommit`** to the current HEAD commit hash after successful write. + +**Report partial failures.** After writing (only if `scanErrors` is non-empty): +- N=1: `Warning: 1 file could not be annotated. Re-run /semantic-scan to retry.` +- N>1: `Warning: N files could not be annotated. Re-run /semantic-scan to retry.` + +Exit code 0 — partial success is not a failure. + +--- + +## Clustering + +### Token Budget and Partitioning + +The full register may be too large to cluster in a single call. Use layer-pair sharding: + +Shard the register by layer pair before sending to Sonnet: +- `domain × presentation` +- `domain × infrastructure` +- `application × presentation` +- `application × infrastructure` +- `domain × application` + +For each shard: if the shard exceeds **50,000 tokens**, further shard by the first normalized token of `domainConcept` (split alphabetically into sub-groups). + +After per-shard clustering, run a **cross-shard reconciliation pass**: send only the cluster representatives (one entry per cluster, the top-ranked candidate) back to Sonnet to catch equivalents that landed in different shards. Keep this merge input under 50,000 tokens. + +### Clustering Call (Sonnet) + +For each shard, emit progress: `Clustering [layer-pair]: × ...` + +**Clustering prompt (pinned — do not paraphrase):** + +``` +Group these register entries by semantic equivalence — entries that compute the same domain concept regardless of implementation differences. Return clusters as JSON arrays of entry IDs (use "file:function" as the ID). + +Two entries belong in the same cluster ONLY IF both would need to change if the underlying business rule changed. + +Input entries: + + +Return only: +{ "clusters": [["file1:funcA", "file2:funcB"], ...] } + +Entries that are semantically unique should not appear in any cluster. +``` + +--- + +## Canonical Scoring + +For each cluster with 2 or more entries: + +**Layer rank** (higher = more canonical): +1. `domain` +2. `application` +3. `presentation` +4. `infrastructure` +5. `unknown` + +**Within the same layer rank:** count infrastructure imports in the function's file; fewer imports = higher rank. + +**Ambiguity predicate:** If the top two candidates tie on layer rank AND differ by ≤1 infrastructure import → escalate to Opus. + +Before the Opus call, emit: `Resolving ambiguous canonical for cluster: ...` + +**Opus prompt:** + +``` +Given these N entries that all compute the same domain concept, which one is the most appropriate canonical location for the business rule? Consider: domain purity, reusability, and least coupling to delivery mechanism. Return the file:function ID of the preferred canonical and a one-sentence rationale. + +Entries: + +``` + +If `--no-opus` flag was passed: skip Opus escalation. Report ambiguous clusters as: +``` +canonical: ambiguous — human review required +``` + +### Canonical Verdict Output + +- Clear winner (domain or application layer, no tie): `canonical: suggested — requires human confirmation` +- No winner (all infrastructure/unknown, or tie unresolved): `canonical: none — a new domain-layer implementation may be required` + +**Cross-scope notice** (scoped runs only): If a cluster contains entries outside the scoped path: +``` +Note: this cluster includes entry/entries outside the scoped path — run without scope argument to see full context +``` +(Use "entry" when N=1, "entries" when N>1.) + +--- + +## Report + +After clustering is complete: + +**Duplicates found:** + +``` +## Semantic Duplication Report + +### Cluster: + + - : [] ← inferred canonical (or all entries if no canonical) + - : [] + + canonical: suggested — requires human confirmation + [Note: this cluster includes N entry/entries outside the scoped path ...] + +--- +``` + +**No duplicates:** + +``` +No semantic duplication detected +``` + +**`file:line` accuracy:** Line numbers point to the first line of the function definition. If the file has been modified since annotation, append: `(line may have shifted — re-run scan to refresh)` + From e872244f91b1368109aac4db71b540bde9440b94 Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Thu, 14 May 2026 11:47:20 -0500 Subject: [PATCH 2/3] feat(agent-create): add agent-create skill, official agent template, and schema validation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the agent-create skill for producing new sub-agent files following the official Claude Code sub-agent schema with token-efficiency budgets (≤ 40 lines for review agents, ≤ 75 for team agents). Extends /agent-add to cover both review and team agents by delegating to the skill. Adds the canonical agent template with all 15 official frontmatter fields documented, and extends claude-setup-review to validate agent frontmatter schema compliance. Co-Authored-By: Claude Sonnet 4.6 (1M context) --- docs/specs/agent-create.md | 194 ++++++++ evals/fixtures/aca-existing-file/README.md | 36 ++ evals/fixtures/aca-invalid-name/README.md | 41 ++ evals/fixtures/aca-no-tools/README.md | 31 ++ .../aca-plugin-unsupported-field/README.md | 33 ++ evals/fixtures/aca-registry-update/README.md | 46 ++ .../aca-review-body-over-budget/README.md | 26 ++ .../fixtures/aca-review-body-valid/README.md | 34 ++ .../aca-review-preamble-rejected/README.md | 33 ++ evals/fixtures/aca-scope-overlap/README.md | 36 ++ evals/fixtures/aca-team-body-valid/README.md | 30 ++ evals/fixtures/aca-valid-name/README.md | 25 + .../fixtures/aca-validation-failure/README.md | 29 ++ plans/agent-create.md | 435 ++++++++++++++++++ .../agents/claude-setup-review.md | 57 ++- .../agentic-dev-team/commands/agent-add.md | 198 +------- .../knowledge/agent-registry.md | 1 + .../skills/agent-create/SKILL.md | 346 ++++++++++++++ .../templates/agents/agent-template.md | 143 ++++++ 19 files changed, 1586 insertions(+), 188 deletions(-) create mode 100644 docs/specs/agent-create.md create mode 100644 evals/fixtures/aca-existing-file/README.md create mode 100644 evals/fixtures/aca-invalid-name/README.md create mode 100644 evals/fixtures/aca-no-tools/README.md create mode 100644 evals/fixtures/aca-plugin-unsupported-field/README.md create mode 100644 evals/fixtures/aca-registry-update/README.md create mode 100644 evals/fixtures/aca-review-body-over-budget/README.md create mode 100644 evals/fixtures/aca-review-body-valid/README.md create mode 100644 evals/fixtures/aca-review-preamble-rejected/README.md create mode 100644 evals/fixtures/aca-scope-overlap/README.md create mode 100644 evals/fixtures/aca-team-body-valid/README.md create mode 100644 evals/fixtures/aca-valid-name/README.md create mode 100644 evals/fixtures/aca-validation-failure/README.md create mode 100644 plans/agent-create.md create mode 100644 plugins/agentic-dev-team/skills/agent-create/SKILL.md create mode 100644 plugins/agentic-dev-team/templates/agents/agent-template.md diff --git a/docs/specs/agent-create.md b/docs/specs/agent-create.md new file mode 100644 index 0000000..cd31819 --- /dev/null +++ b/docs/specs/agent-create.md @@ -0,0 +1,194 @@ +# Spec: Create Agent + +## Intent Description + +This feature adds a `agent-create` skill that produces new Claude Code sub-agent files following the official sub-agent schema (from `templates/agents/agent-template.md`) and enforcing token-efficiency rules. It handles both **review agents** (narrow scope, JSON output, read-only tools, runs on every code-review pass) and **team agents** (broader scope, prose output, action tools, loaded on demand). + +Token efficiency is a first-class constraint. Review agents are loaded 8–10 simultaneously on every `/code-review` run; at ~12 tokens per line, a 40-line body costs ~480 tokens — exceeding this multiplied across ten agents adds thousands of tokens of overhead per review. Team agents are loaded one at a time and allow up to 75 lines. The skill enforces these budgets at generation time and prevents the patterns that inflate them: role-playing preambles, description restatement, prose where bullets suffice, and template sections copied over when inapplicable. + +The skill is user-invocable directly. `/agent-add` is updated to delegate to it, extending its scope from review-only to all agent types. No new command file is created — the skill is the interface. + +## User-Facing Behavior + +```gherkin +Feature: Create Agent Skill + + Background: + Given the agentic-dev-team plugin is installed + + Scenario: Creates a valid agent file from name and description + Given the user provides an agent name and description + When the agent-create skill runs + Then a file is written to plugins/agentic-dev-team/agents/.md + And the frontmatter name matches ^[a-z][a-z0-9-]*$ + And the frontmatter description is non-empty + And the model field is one of: haiku, sonnet, opus, inherit + And running agent-eval with claude-setup-review against the file returns status "pass" + + Scenario: Review agent body stays within the line budget + Given the agent type is "review" + When the skill generates the agent body + Then the body is 40 lines or fewer + And the body contains an Output JSON block + And the body contains a Skip section + And the body contains a Detect section + + Scenario: Team agent body stays within the line budget + Given the agent type is "team" + When the skill generates the agent body + Then the body is 75 lines or fewer + And the body contains a Responsibilities section + And the body does not contain an Output JSON block + + Scenario: Body contains no token-wasting patterns + Given the skill has generated any agent body + Then the body does not begin with "You are a" or "You are an" + And the body does not restate the frontmatter description verbatim + And the body contains none of the template placeholder strings: "your-agent-name", "One-sentence description", "# Agent Name" + And no single detection rule or responsibility spans more than two lines + + Scenario: User is prompted for tools when none are provided + Given the user has not specified a tools list + When the skill processes the request + Then the skill prompts: "Which tools does this agent need?" + And the prompt includes examples: "Read, Grep, Glob (read-only) | add Edit, Write (file changes) | add Bash (shell) | add Skill (skill invocation) | add Agent (spawn subagents)" + And the skill waits for the user's selection before proceeding + + Scenario: Plugin-unsupported fields are absent by default + Given the user has not explicitly requested hooks, mcpServers, or permissionMode + When the skill generates the frontmatter + Then hooks is absent from the frontmatter + And mcpServers is absent from the frontmatter + And permissionMode is absent from the frontmatter + + Scenario: Plugin-unsupported fields warned when explicitly requested + Given the user explicitly requests a hooks, mcpServers, or permissionMode field + When the skill processes the request + Then the skill emits: "hooks/mcpServers/permissionMode are silently ignored for plugin agents — move the file to .claude/agents/ if you need them to take effect" + And the field is included if the user confirms + And the field is omitted if the user declines + + Scenario: Skill refuses to overwrite an existing agent silently + Given an agent file already exists at plugins/agentic-dev-team/agents/.md + When the user invokes the skill with the same name + Then the skill reports the existing file path and its current description + And asks for explicit confirmation before overwriting + And does not write until the user confirms + + Scenario: Skill rejects an invalid name + Given the user provides a name containing uppercase letters, spaces, or special characters + When the skill processes the name + Then the skill rejects it with an error describing the rule + And suggests the corrected kebab-case form + And does not write any file + + Scenario: Duplicate scope is flagged before writing + Given an existing agent already covers the same detection domain + When the skill identifies the overlap + Then the skill reports the overlapping agent name and shared scope + And asks whether to continue or cancel + And does not write until the user decides + + Scenario: Agent registry is updated after successful creation + Given a new agent has been written successfully + When the skill completes + Then knowledge/agent-registry.md contains a new row for the agent + And the row includes the agent name, file path, model tier, and a description + And plugins/agentic-dev-team/CLAUDE.md contains a new row for the agent in the appropriate table +``` + +## Architecture Specification + +### New Component + +| Component | Location | +|-----------|----------| +| Skill | `plugins/agentic-dev-team/skills/agent-create/SKILL.md` | + +No new command file. The skill is `user-invocable: true` and is also delegated to by the updated `/agent-add` command. + +### Modified Component + +| Component | Change | +|-----------|--------| +| `commands/agent-add.md` | Replace hardcoded inline template with delegation to `skills/agent-create/SKILL.md`; extend scope to cover team agents in addition to review agents | + +### Reads + +- `templates/agents/agent-template.md` — field reference and permitted values +- `plugins/agentic-dev-team/agents/*.md` — scope-overlap check +- `plugins/agentic-dev-team/knowledge/agent-registry.md` — to append after creation +- `plugins/agentic-dev-team/CLAUDE.md` — to append after creation + +### Writes + +- `plugins/agentic-dev-team/agents/.md` — the new agent +- `plugins/agentic-dev-team/knowledge/agent-registry.md` — new row (append only) +- `plugins/agentic-dev-team/CLAUDE.md` — new row in the appropriate agents table (append only) + +### Process Flow + +1. **Parse arguments** — name, type (review|team), description, tools, model +2. **Prompt for missing required fields** — if name absent: ask; if type absent: ask; if tools absent: display tool-selection prompt with examples; if description absent: ask +3. **Validate name** — must match `^[a-z][a-z0-9-]*$`; reject with correction suggestion if not +4. **Check conflict** — if `agents/.md` exists: show its description and ask to confirm overwrite +5. **Check scope overlap** — scan existing agents for `description` and `## Detect` content; flag conceptual overlap and confirm with user +6. **Warn on plugin-unsupported fields** — if hooks/mcpServers/permissionMode requested: emit warning; confirm before including +7. **Apply defaults** — review: `tools: Read, Grep, Glob`, `model: haiku`; team: `model: sonnet` +8. **Generate frontmatter** — only official fields from template; validate all values +9. **Generate body** — type-specific structure; enforce token-efficiency rules; enforce line budget +10. **Validate** — run `agent-eval --agent claude-setup-review` against generated content; abort if status is not `pass` +11. **Present draft** — display full file; confirm before writing +12. **Write file** +13. **Update registry** — append row to `knowledge/agent-registry.md` +14. **Update CLAUDE.md** — append row to the appropriate agents table + +### Body Structure by Type + +| Section | Review agent | Team agent | +|---------|-------------|-----------| +| Output JSON block | Required | Absent | +| Status/Severity/Confidence lines | Required | Absent | +| Skip | Required | Optional | +| Detect | Required | Absent | +| Ignore | Required | Absent | +| Responsibilities | Absent | Required | +| Output discipline | Absent | Optional | +| Skills | Absent | Optional | + +### Token-Efficiency Rules (normative) + +| Rule | Applies to | +|------|-----------| +| Body ≤ 40 lines | Review agents | +| Body ≤ 75 lines | Team agents | +| No opener beginning "You are a" or "You are an" | All | +| No verbatim restatement of `description` field | All | +| Detection rules and responsibilities: ≤ 2 lines each | All | +| Knowledge file reference: one line only (`Read knowledge/X.md before starting`) | All | +| Omit template sections not applicable to this agent | All | +| No inline template comments in output | All | + +## Acceptance Criteria + +1. Generated agent file returns status `pass` from `agent-eval --agent claude-setup-review` +2. Review agent body is ≤ 40 lines +3. Team agent body is ≤ 75 lines +4. Body contains no opener matching `^You are an? ` +5. Body does not contain the frontmatter `description` value verbatim +6. Body contains none of the strings: `your-agent-name`, `One-sentence description`, `# Agent Name` +7. No bullet point in the body spans more than two lines +8. When tools are not provided, the skill prompts with the standard tool-selection examples before proceeding +9. `hooks`, `mcpServers`, and `permissionMode` are absent from generated frontmatter unless the user explicitly confirmed after the plugin warning +10. A file matching an existing agent path is not overwritten without explicit user confirmation +11. A name not matching `^[a-z][a-z0-9-]*$` is rejected with a correction suggestion; no file is written +12. After successful creation, `knowledge/agent-registry.md` contains a new row with name, file path, model tier, and description +13. After successful creation, `plugins/agentic-dev-team/CLAUDE.md` contains a new row in the appropriate agents table + +## Consistency Gate + +- [x] Intent is unambiguous — line budgets justified with token arithmetic, agent types distinguished by structure, tool prompt UX specified +- [x] Every behavior in the intent has at least one BDD scenario +- [x] Architecture constrains without over-engineering — no new command file, skill is user-invocable directly +- [x] Terminology is consistent across all four artifacts +- [x] No contradictions between artifacts diff --git a/evals/fixtures/aca-existing-file/README.md b/evals/fixtures/aca-existing-file/README.md new file mode 100644 index 0000000..488c989 --- /dev/null +++ b/evals/fixtures/aca-existing-file/README.md @@ -0,0 +1,36 @@ +# Fixture: aca-existing-file + +**Skill**: agent-create +**Scenario**: Agent file already exists — skill reports it and asks for confirmation + +## Setup + +A file exists at `plugins/agentic-dev-team/agents/import-cycle-review.md` with: +```yaml +--- +name: import-cycle-review +description: Old version — detects basic import cycles +--- +``` + +## Input + +- name: `import-cycle-review` +- type: `review` +- description: `New improved version for detecting circular import dependencies` + +## Expected Behavior + +Skill emits: `plugins/agentic-dev-team/agents/import-cycle-review.md already exists (description: Old version — detects basic import cycles)` + +Then asks: `Overwrite? (yes/no)` + +- On `no`: emits `Cancelled. Existing agent: plugins/agentic-dev-team/agents/import-cycle-review.md — Old version — detects basic import cycles` and stops. No file written or modified. +- On `yes`: skill continues to generation + +## Failure Conditions + +- File written without asking → FAIL +- Wrong file path or description in the message → FAIL +- Wrong question text → FAIL +- Skill continues after `no` → FAIL diff --git a/evals/fixtures/aca-invalid-name/README.md b/evals/fixtures/aca-invalid-name/README.md new file mode 100644 index 0000000..054b717 --- /dev/null +++ b/evals/fixtures/aca-invalid-name/README.md @@ -0,0 +1,41 @@ +# Fixture: aca-invalid-name + +**Skill**: agent-create +**Scenario**: Invalid names are rejected with rule + correction; no file written + +## Test Cases + +### Case 1: Uppercase letters +- Input name: `CodeQuality` +- Expected error: `Name must match ^[a-z][a-z0-9-]*$ — use lowercase letters, digits, and hyphens only` +- Expected suggestion: `Did you mean: code-quality?` +- Expected: no file written, skill stops + +### Case 2: Name starting with digit +- Input name: `3d-renderer-review` +- Expected error: `Name must match ^[a-z][a-z0-9-]*$ — use lowercase letters, digits, and hyphens only` +- Expected suggestion: a corrected form starting with a letter (e.g., `renderer-3d-review` or similar) +- Expected: no file written, skill stops + +### Case 3: Name with spaces +- Input name: `my review agent` +- Expected error: same rule message +- Expected suggestion: `my-review-agent` +- Expected: no file written, skill stops + +### Case 4: Empty string +- Input name: `""` +- Expected: same rule message, no correction offered +- Expected: no file written, skill stops + +## Kebab-Case Correction Algorithm + +Transform input to kebab-case suggestion: +1. Lowercase all characters +2. Replace runs of non-alphanumeric characters with a single hyphen +3. Strip leading/trailing hyphens +4. If result starts with a digit, prepend a hyphen then strip (or ask user) + +Example: `CodeQuality` → `code-quality` +Example: `my Review Agent` → `my-review-agent` +Example: `3d-renderer` → digits at start, no clean correction → suggest user provide a valid name diff --git a/evals/fixtures/aca-no-tools/README.md b/evals/fixtures/aca-no-tools/README.md new file mode 100644 index 0000000..52c12f0 --- /dev/null +++ b/evals/fixtures/aca-no-tools/README.md @@ -0,0 +1,31 @@ +# Fixture: aca-no-tools + +**Skill**: agent-create +**Scenario**: Tools not provided — skill emits exact two-line prompt before proceeding + +## Input + +- name: `import-cycle-review` +- type: `review` +- description: `Detects circular import dependencies` +- tools: (not provided) + +## Expected Behavior + +Before any generation or file writing, the skill emits exactly: + +``` +Which tools does this agent need? + Read, Grep, Glob (read-only) | add Edit, Write (file changes) | add Bash (shell) | add Skill (skill invocation) | add Agent (spawn subagents) +``` + +Line 1 is `Which tools does this agent need?` with no leading spaces. +Line 2 starts with exactly two spaces. + +The skill does NOT proceed until the user responds to this prompt. + +## Failure Conditions + +- Any variation in prompt text → FAIL +- Skill proceeds to generation before prompt response → FAIL +- Prompt emitted more than once → FAIL diff --git a/evals/fixtures/aca-plugin-unsupported-field/README.md b/evals/fixtures/aca-plugin-unsupported-field/README.md new file mode 100644 index 0000000..d6b3ebf --- /dev/null +++ b/evals/fixtures/aca-plugin-unsupported-field/README.md @@ -0,0 +1,33 @@ +# Fixture: aca-plugin-unsupported-field + +**Skill**: agent-create +**Scenario**: User requests a plugin-unsupported field — exact warning emitted + +## Input + +- name: `db-reader` +- type: `review` +- description: `Validates database queries` +- hooks: `PreToolUse: [...validate query...]` (user explicitly requested this) + +## Expected Behavior + +Skill emits exactly: +``` +hooks/mcpServers/permissionMode are silently ignored for plugin agents — move the file to .claude/agents/ if you need them to take effect +``` + +Then asks: `Include anyway? (yes/no)` + +- On `no`: generated frontmatter omits `hooks`; skill continues normally +- On `yes`: generated frontmatter includes the `hooks` field as requested + +## Note + +The warning covers all three fields (hooks, mcpServers, permissionMode) even if only one was requested. + +## Failure Conditions + +- Warning text varies from the exact string above → FAIL +- Field included without asking → FAIL +- Field omitted without emitting warning → acceptable only if field was never requested diff --git a/evals/fixtures/aca-registry-update/README.md b/evals/fixtures/aca-registry-update/README.md new file mode 100644 index 0000000..503b459 --- /dev/null +++ b/evals/fixtures/aca-registry-update/README.md @@ -0,0 +1,46 @@ +# Fixture: aca-registry-update + +**Skill**: agent-create +**Scenario**: After successful agent creation, registry and CLAUDE.md are updated + +## Test Case A: Review Agent + +Agent `unused-import-review` (type: review, model: haiku) created successfully. + +Expected: `knowledge/agent-registry.md` contains a new row in the Review Agents table: +``` +| unused-import-review | agents/unused-import-review.md | small | Detects unused import statements | +``` + +Expected: `plugins/agentic-dev-team/CLAUDE.md` contains a new row in the Review Agents table for `unused-import-review`. + +Expected: NO row added to the Team Agents table in either file. + +## Test Case B: Team Agent + +Agent `schema-planner` (type: team, model: sonnet) created successfully. + +Expected: `knowledge/agent-registry.md` contains a new row in the Team Agents table: +``` +| schema-planner | agents/schema-planner.md | mid | Plans database schema migrations | +``` + +Expected: `plugins/agentic-dev-team/CLAUDE.md` contains a new row in the Team Agents table. + +Expected: NO row added to the Review Agents table in either file. + +## Failure Conditions + +- Row added to wrong table → FAIL +- Existing rows modified → FAIL +- Row format deviates from `| | | | |` → FAIL +- Both files not updated → FAIL + +## Model → Tier Mapping + +| Model | Tier Label | +|-------|-----------| +| haiku | small | +| sonnet | mid | +| opus | frontier | +| inherit | mid | diff --git a/evals/fixtures/aca-review-body-over-budget/README.md b/evals/fixtures/aca-review-body-over-budget/README.md new file mode 100644 index 0000000..904d630 --- /dev/null +++ b/evals/fixtures/aca-review-body-over-budget/README.md @@ -0,0 +1,26 @@ +# Fixture: aca-review-body-over-budget + +**Skill**: agent-create +**Scenario**: Review agent body exceeds 40 lines — trim diff shown before write gate + +## Setup + +A review agent spec that would naturally generate 45 lines of body +(e.g., many detection rules with verbose descriptions). + +## Expected Behavior + +1. Skill emits: `Body is 45 lines — 5 lines over the 40-line budget for review agents` +2. Skill lists each removed/collapsed item, each prefixed with `- ` (dash space) +3. Skill emits: `Approve this trim? (yes/no)` +4. **No file is written until user answers "yes"** + +On `yes`: trimmed body is written +On `no`: skill emits `Options: (a) reduce spec scope and regenerate, (b) accept 45 lines and proceed without trimming` + +## Failure Conditions + +- File written before trim approval → FAIL +- Trim message text deviates from exact format → FAIL +- Items removed without being listed → FAIL +- Required sections (Output JSON, ## Skip, ## Detect, ## Ignore) removed during trim → FAIL diff --git a/evals/fixtures/aca-review-body-valid/README.md b/evals/fixtures/aca-review-body-valid/README.md new file mode 100644 index 0000000..dd9c6f8 --- /dev/null +++ b/evals/fixtures/aca-review-body-valid/README.md @@ -0,0 +1,34 @@ +# Fixture: aca-review-body-valid + +**Skill**: agent-create +**Scenario**: Review agent body generation produces valid, token-efficient output + +## Input + +- name: `unused-import-review` +- type: `review` +- description: `Detects unused import statements` +- tools: `Read, Grep, Glob` +- model: `haiku` + +## Expected Body Properties + +1. **Line count**: ≤ 40 lines (all content after closing `---` of frontmatter, inclusive) +2. **Required sections** (in order): `# Unused Import Review`, Output JSON block, Status/Severity/Confidence lines, `Model tier:`, `Context needs:`, `## Skip`, `## Detect`, `## Ignore` +3. **No "You are a/an" opener**: no line matches `^You are an? ` (case-insensitive) +4. **No description restatement**: body does not contain "Detects unused import statements" verbatim (whitespace-normalized) +5. **No placeholder text**: no `your-agent-name`, `One-sentence description`, `# Agent Name` +6. **Bullet length**: no bullet spans more than two lines + +## Expected Output JSON Block + +```json +{"status": "pass|warn|fail|skip", "issues": [{"severity": "error|warning|suggestion", "confidence": "high|medium|none", "file": "", "line": 0, "message": "", "suggestedFix": ""}], "summary": ""} +``` + +## Failure Conditions + +- Body > 40 lines → FAIL +- Missing any required section → FAIL +- "You are a" opener → FAIL +- Description restated verbatim → FAIL diff --git a/evals/fixtures/aca-review-preamble-rejected/README.md b/evals/fixtures/aca-review-preamble-rejected/README.md new file mode 100644 index 0000000..19b6cfc --- /dev/null +++ b/evals/fixtures/aca-review-preamble-rejected/README.md @@ -0,0 +1,33 @@ +# Fixture: aca-review-preamble-rejected + +**Skill**: agent-create +**Scenario**: Role-playing opener ("You are an expert...") must not appear in generated body + +## Expected Behavior + +When generating ANY review or team agent body, the skill must never produce a +line matching `^You are an? ` (case-insensitive). + +This fixture validates the anti-pattern rule is enforced at generation time, +not as a post-hoc check. + +## Examples of Forbidden Lines + +``` +You are an expert code reviewer. +You are a security specialist. +You are an experienced TypeScript developer. +``` + +## Examples of Acceptable Openers + +``` +# Import Cycle Review +Review agent for detecting circular imports. +Detects unused import statements in JavaScript and TypeScript files. +``` + +## Failure Conditions + +- Any generated body line matches `^You are an? ` (case-insensitive) → FAIL +- Must FAIL even if this appears mid-body, not just as the first line diff --git a/evals/fixtures/aca-scope-overlap/README.md b/evals/fixtures/aca-scope-overlap/README.md new file mode 100644 index 0000000..f1f0015 --- /dev/null +++ b/evals/fixtures/aca-scope-overlap/README.md @@ -0,0 +1,36 @@ +# Fixture: aca-scope-overlap + +**Skill**: agent-create +**Scenario**: New agent description overlaps with an existing agent's scope + +## Setup + +Existing agent `dependency-review` has: +- description: `Detects circular dependencies between modules` +- `## Detect` section: mentions circular imports, dependency cycles, import chains + +## Input + +- name: `import-cycle-review` +- type: `review` +- description: `Detect circular import dependencies` + +## Expected Behavior + +Skill emits the exact format: +`Possible overlap with dependency-review: . Continue anyway? (yes/no)` + +Where `` describes the shared concept (e.g., "both detect circular dependency patterns"). + +- On `no`: skill stops, no file written +- On `yes`: skill continues to generation + +## Note + +This check is advisory. The overlap threshold is ~60% topical similarity. A false positive here causes a UX inconvenience, not a correctness error — the user can always continue. + +## Failure Conditions + +- Overlap not detected when descriptions clearly overlap → FAIL +- Message format deviates from `Possible overlap with : . Continue anyway? (yes/no)` → FAIL +- File written after user answers `no` → FAIL diff --git a/evals/fixtures/aca-team-body-valid/README.md b/evals/fixtures/aca-team-body-valid/README.md new file mode 100644 index 0000000..433e861 --- /dev/null +++ b/evals/fixtures/aca-team-body-valid/README.md @@ -0,0 +1,30 @@ +# Fixture: aca-team-body-valid + +**Skill**: agent-create +**Scenario**: Team agent body generation produces valid, token-efficient output + +## Input + +- name: `schema-planner` +- type: `team` +- description: `Plans database schema migrations` +- tools: `Read, Grep, Glob, Edit, Write, Bash` +- model: `sonnet` + +## Expected Body Properties + +1. **Line count**: ≤ 75 lines +2. **Required section**: `## Responsibilities` +3. **Absent sections**: no Output JSON block, no `## Skip`, no `## Detect`, no `## Ignore` +4. **No "You are a/an" opener** +5. **No description restatement verbatim** +6. **No placeholder text** +7. **Each responsibility**: ≤ 2 lines, action-oriented + +## Failure Conditions + +- Body > 75 lines → FAIL +- Missing `## Responsibilities` → FAIL +- Contains Output JSON block → FAIL +- Contains `## Skip` → FAIL +- "You are a" opener → FAIL diff --git a/evals/fixtures/aca-valid-name/README.md b/evals/fixtures/aca-valid-name/README.md new file mode 100644 index 0000000..b11e82f --- /dev/null +++ b/evals/fixtures/aca-valid-name/README.md @@ -0,0 +1,25 @@ +# Fixture: aca-valid-name + +**Skill**: agent-create +**Scenario**: Valid name passes validation and skill proceeds + +## Input + +- name: `code-quality-review` +- type: `review` +- description: `Detects code quality violations including long functions and deep nesting` + +## Expected Behavior + +- Name `code-quality-review` matches `^[a-z][a-z0-9-]*$` — passes validation +- Skill proceeds to next step (tool selection or generation) +- No error message emitted about the name +- No file is written at this stage (fixture covers validation only) + +## Edge Cases Covered + +| Name | Expected | +|------|----------| +| `code-quality-review` | PASS — valid kebab-case | +| `a` | PASS — single lowercase letter | +| `my-agent-123` | PASS — letters, digits, hyphens | diff --git a/evals/fixtures/aca-validation-failure/README.md b/evals/fixtures/aca-validation-failure/README.md new file mode 100644 index 0000000..3703eff --- /dev/null +++ b/evals/fixtures/aca-validation-failure/README.md @@ -0,0 +1,29 @@ +# Fixture: aca-validation-failure + +**Skill**: agent-create +**Scenario**: Generated agent fails /agent-audit — skill shows error, preserves inputs, offers recovery + +## Setup + +A generated agent file that is missing the `description` frontmatter field +(which agent-audit requires), simulating a validation failure. + +## Expected Behavior + +1. Skill runs `/agent-audit plugins/agentic-dev-team/agents/.md` +2. `/agent-audit` returns errors (e.g., missing description) +3. Skill emits the raw `/agent-audit` output verbatim +4. Skill emits: `All your inputs are preserved.` +5. Skill emits: `(a) auto-correct and re-validate (b) cancel` +6. **No file confirmed as written until user chooses (a) and re-validation passes** + +On `(b)`: skill deletes the temporary file, makes no changes, stops. +On `(a)`: skill applies minimal corrections, re-runs /agent-audit, then: + - Second pass succeeds → continue to write gate + - Second pass fails → emit new /agent-audit output, emit "All your inputs are preserved.", emit the same menu again + +## Failure Conditions + +- Skill stops silently after second failure → FAIL +- Inputs lost after validation failure → FAIL +- Menu text deviates from `(a) auto-correct and re-validate (b) cancel` → FAIL diff --git a/plans/agent-create.md b/plans/agent-create.md new file mode 100644 index 0000000..767a616 --- /dev/null +++ b/plans/agent-create.md @@ -0,0 +1,435 @@ +# Plan: Agent Create Skill + +**Created**: 2026-05-13 +**Branch**: feat/semantic-duplication-scan +**Status**: implemented + +## Goal + +Add an `agent-create` skill that produces new Claude Code sub-agent files following the official sub-agent schema, enforcing token-efficiency budgets (≤ 40 lines review body, ≤ 75 lines team body), guiding tool selection, conflict detection, and registry updates. Update `/agent-add` to delegate to this skill and extend it from review-only to all agent types. + +The skill is user-invocable directly (Skill tool or natural language) and listed in the Skills Registry only — no new command file. `/agent-add` remains the user-facing slash command; this skill is its implementation. + +**Relationship to `agent-skill-authoring`**: that skill is a *reference* (meta-patterns, anti-patterns, registration checklist). This skill is *procedural* — it automates the creation workflow. `agent-create` references `agent-skill-authoring` for conventions and anti-patterns; it does not replace it. + +**Out of scope**: token-efficiency enforcement for skill files and command files. Distinct artifact types tracked separately. + +## Acceptance Criteria + +**"Body" definition**: all content after and excluding the closing `---` of the YAML frontmatter block, including blank lines. The limit is inclusive (40 lines is permitted; 41 is not). + +- [ ] Generated agent file passes `/agent-audit` structural compliance checks +- [ ] Review agent body is ≤ 40 lines +- [ ] Team agent body is ≤ 75 lines +- [ ] No line in the body matches `^You are an? ` (case-insensitive, per-line) +- [ ] Body does not contain the frontmatter `description` value verbatim. **Normalization**: collapse all runs of whitespace (space, tab, newline) to a single ASCII space and trim both ends of each string before the substring check. +- [ ] Body contains none of: `your-agent-name`, `One-sentence description`, `# Agent Name` +- [ ] No bullet point spans more than two lines +- [ ] When tools not provided, skill emits exact two-line prompt before proceeding: line 1 `Which tools does this agent need?`; line 2 ` Read, Grep, Glob (read-only) | add Edit, Write (file changes) | add Bash (shell) | add Skill (skill invocation) | add Agent (spawn subagents)` +- [ ] When `hooks`, `mcpServers`, or `permissionMode` requested, skill emits exact warning: `hooks/mcpServers/permissionMode are silently ignored for plugin agents — move the file to .claude/agents/ if you need them to take effect`; field absent unless user confirms +- [ ] Existing agent path not overwritten without confirmation; on decline skill reports path + description and stops +- [ ] Name not matching `^[a-z][a-z0-9-]*$` rejected with rule stated and kebab-case correction suggested; no file written +- [ ] When body exceeds line budget, skill emits "Body is N lines — X lines over the Y-line budget for Z agents", followed by a list of removed/collapsed items each prefixed with `- ` (dash space), followed by the exact prompt `Approve this trim? (yes/no)` — all before any file is written. On "no": skill emits "Options: (a) reduce spec scope and regenerate, (b) accept N lines and proceed without trimming" and waits. +- [ ] When `/agent-audit` fails, skill emits the raw `/agent-audit` output verbatim, then emits "All your inputs are preserved.", then emits the exact menu `(a) auto-correct and re-validate (b) cancel` and waits. On second failure after auto-correct: surfaces the same menu again rather than stopping silently. +- [ ] After success, `knowledge/agent-registry.md` contains a new row in the table whose heading contains "Review Agents" (review type) or "Team Agents" (team type). Row format: `| | | | |` +- [ ] After success, `plugins/agentic-dev-team/CLAUDE.md` contains a new row in the table whose heading contains "Review Agents" (review type) or "Team Agents" (team type). If the target heading is absent from CLAUDE.md, the skill emits an error identifying the missing heading and stops without modifying the file. + +## User-Facing Behavior + +```gherkin +Feature: Agent Create Skill + + Background: + Given the agentic-dev-team plugin is installed + + Scenario: Creates a valid agent file from name and description + Given the user provides name "import-cycle-review" and description "Detects circular import dependencies" + And tools: Read, Grep, Glob + When the agent-create skill runs + Then a file is written to plugins/agentic-dev-team/agents/import-cycle-review.md + And the frontmatter name is "import-cycle-review" + And the frontmatter description is "Detects circular import dependencies" + And the model field is "haiku" + And running /agent-audit against the file returns no errors + + Scenario: Review agent body stays within the line budget + Given the agent type is "review" + When the skill generates the agent body + Then the body line count is 40 or fewer + And the body contains an Output JSON block + And the body contains a "## Skip" section + And the body contains a "## Detect" section + + Scenario: Review agent body does not include team agent sections + Given the agent type is "review" + When the skill generates the agent body + Then the body does not contain a "## Responsibilities" section + + Scenario: Review agent body at exactly 40 lines passes the budget without trim prompt + Given a review agent spec that generates exactly 40 lines of body + When the skill runs the budget check + Then no message matching "Body is N lines" is emitted + And no "Approve this trim?" prompt is shown + And the write gate proceeds normally + + Scenario: Review agent body at 41 lines triggers visible trim before write gate + Given a review agent spec that would generate 41 lines of body + When the skill generates the body + Then the skill emits the exact string: "Body is 41 lines — 1 line over the 40-line budget for review agents" + And emits a list of removed/collapsed items each prefixed with "- " + And emits the exact prompt: "Approve this trim? (yes/no)" + And no file is written until the user answers "yes" + + Scenario: Team agent body at exactly 75 lines passes the budget without trim prompt + Given a team agent spec that generates exactly 75 lines of body + When the skill runs the budget check + Then no message matching "Body is N lines" is emitted + And no "Approve this trim?" prompt is shown + And the write gate proceeds normally + + Scenario: Team agent body at 76 lines triggers visible trim before write gate + Given a team agent spec that would generate 76 lines of body + When the skill generates the body + Then the skill emits the exact string: "Body is 76 lines — 1 line over the 75-line budget for team agents" + And emits a list of removed/collapsed items each prefixed with "- " + And emits the exact prompt: "Approve this trim? (yes/no)" + And no file is written until the user answers "yes" + + Scenario: User declines trim — skill offers two follow-up options + Given a review agent spec that would generate 44 lines of body + And the skill has emitted the trim list and "Approve this trim? (yes/no)" + And the user answers "no" + When the skill processes the decline + Then the skill emits: "Options: (a) reduce spec scope and regenerate, (b) accept 44 lines and proceed without trimming" + And waits for the user's choice + And no file is written until the user chooses + + Scenario: Team agent body stays within the line budget + Given the agent type is "team" + When the skill generates the agent body + Then the body line count is 75 or fewer + And the body contains a "## Responsibilities" section + And the body does not contain an Output JSON block + And the body does not contain a "## Skip" section + And the body does not contain a "## Detect" section + + Scenario: Body contains no token-wasting patterns + Given the skill has generated any agent body + Then no line matches "^You are a" or "^You are an" (case-insensitive) + And the body does not contain the frontmatter description value verbatim + And the body contains none of: "your-agent-name", "One-sentence description", "# Agent Name" + And no bullet point spans more than two lines + + Scenario: User is prompted for tools when none are provided + Given the user has not specified tools + When the skill processes the request + Then the skill emits exactly: + """ + Which tools does this agent need? + Read, Grep, Glob (read-only) | add Edit, Write (file changes) | add Bash (shell) | add Skill (skill invocation) | add Agent (spawn subagents) + """ + And does not proceed until the user responds + + Scenario: Plugin-unsupported fields are absent by default + Given the user has not requested hooks, mcpServers, or permissionMode + When the skill generates the frontmatter + Then hooks, mcpServers, and permissionMode are each absent + + Scenario: Plugin-unsupported field emits exact warning + Given the user requests the "hooks" field + When the skill processes the request + Then the skill emits: "hooks/mcpServers/permissionMode are silently ignored for plugin agents — move the file to .claude/agents/ if you need them to take effect" + And hooks is absent if the user declines + And hooks is present if the user confirms + + Scenario: Skill refuses to overwrite silently + Given plugins/agentic-dev-team/agents/import-cycle-review.md exists with description "Old version" + When the user invokes the skill with name "import-cycle-review" + Then the skill reports: "plugins/agentic-dev-team/agents/import-cycle-review.md already exists (description: Old version)" + And asks "Overwrite? (yes/no)" + And no file is written until the user answers "yes" + And if the user answers "no" the skill stops with no changes + + Scenario: Skill rejects a name with uppercase letters + Given the user provides name "ImportCycleReview" + When the skill processes the name + Then it emits: "Name must match ^[a-z][a-z0-9-]*$ — use lowercase letters, digits, and hyphens only" + And suggests: "Did you mean: import-cycle-review?" + And no file is written + + Scenario: Skill rejects a name starting with a digit + Given the user provides name "3d-renderer-review" + When the skill processes the name + Then it rejects the name with the rule and a suggested correction + And no file is written + + Scenario: Duplicate scope is flagged before writing + Given "dependency-review" has description "Detects circular dependencies between modules" + And the user creates an agent with description "Detect circular import dependencies" + When the skill scans for scope overlap + Then it emits: "Possible overlap with dependency-review: both detect circular dependency patterns. Continue anyway? (yes/no)" + And does not write until the user responds + And if the user responds "yes" the skill proceeds to generation + And if the user responds "no" the skill stops with no changes + + Scenario: Validation failure preserves inputs and offers recovery + Given the generated file fails /agent-audit (e.g. missing description) + When the validation gate runs + Then the skill emits the raw /agent-audit output verbatim + And emits: "All your inputs are preserved." + And emits the exact menu: "(a) auto-correct and re-validate (b) cancel" + And does not write the file + + Scenario: Second auto-correct failure resurfaces the menu + Given auto-correct was chosen after a validation failure + And the auto-corrected file still fails /agent-audit + When the second validation attempt fails + Then the skill emits the new /agent-audit output + And emits "All your inputs are preserved." + And emits the exact menu "(a) auto-correct and re-validate (b) cancel" again + And does not write the file + + Scenario: Registry updated with correct row after review agent creation + Given "import-cycle-review" (review type) is written successfully + When the skill completes + Then knowledge/agent-registry.md has a new row with name, file path, model tier, and description + And no row was added for import-cycle-review to the Team Agents section + + Scenario: CLAUDE.md updated in correct table for each agent type + Given a review agent "import-cycle-review" is written successfully + Then CLAUDE.md has a new row in the Review Agents table for import-cycle-review + And the Team Agents table is unchanged + + Given a team agent "schema-planner" is written successfully + Then CLAUDE.md has a new row in the Team Agents table for schema-planner + And the Review Agents table is unchanged +``` + +## Steps + +### Step 0: Spike — verify /agent-audit accepts a single file path argument + +**Complexity**: trivial +**RED**: No fixture — investigation gate. +**GREEN**: Run `/agent-audit plugins/agentic-dev-team/agents/domain-review.md` and observe output. + - **Outcome A (pass)**: agent-audit accepts a single file path and returns structured output → document exact invocation; use in Step 7 GREEN as written. + - **Outcome B (fail)**: agent-audit does not accept a file path or requires a directory → Step 7 must be revised: write generated content to `.claude/staging/.md`, run `/agent-audit .claude/staging/`, check result, delete staging file. Update Step 7 GREEN before implementing it. + - **Outcome C (command not available)**: agent-audit is unavailable in the target context → AC 1 must be reclassified to use `claude-setup-review` directly against the generated file; document this deviation and update Step 7 accordingly. + +Spike result must be recorded in the Risks section before Step 1 begins. Implementation must not begin until Step 0 is resolved. +**REFACTOR**: None. +**Files**: update Risks section of this plan with finding; update Step 7 GREEN section +**Commit**: none — spike only, no code + +--- + +### Step 1: Scaffold skill with input parsing, name validation, and type-based defaults + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/aca-valid-name/README.md` — name `code-quality-review`, type `review`, description given. Expected: passes validation, proceeds. + - `evals/fixtures/aca-invalid-name/README.md` — name `CodeQuality`. Expected: emits exact error + suggestion `code-quality`; no file. Additional cases: `3d-review` (digit start → rejected), `my--review` (double hyphen → accepted per regex). +**GREEN**: Create `plugins/agentic-dev-team/skills/agent-create/SKILL.md`: + - Frontmatter: `name: agent-create`, description, `role: worker`, `user-invocable: true` + - Opening note: `agent-create` automates the procedure; reference `agent-skill-authoring` for conventions and anti-patterns + - Parse: name, type (review|team), description, tools, model + - **Name validation (hard gate)**: must match `^[a-z][a-z0-9-]*$`; emit exact error + kebab-case suggestion; exit immediately, no file written + - **Type detection**: infer from keywords if absent ("review/audit/check/validate/detect/scan" → review; "engineer/architect/manager/writer/planner/designer" → team); ask if ambiguous + - **Defaults**: review → `tools: Read, Grep, Glob`, `model: haiku`; team → `model: sonnet` +**REFACTOR**: Verify edge cases: empty string, all-digit, leading hyphen. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixtures +**Commit**: `feat: scaffold agent-create skill with input parsing and name validation` + +--- + +### Step 2: Add tool-selection prompt + +**Complexity**: standard +**RED**: Create `evals/fixtures/aca-no-tools/README.md` — no tools provided. Expected: exact two-line prompt emitted before any generation. Failure: any other prompt text or proceeding without prompting. +**GREEN**: Add to SKILL.md: if tools not provided, emit (pinned verbatim): +``` +Which tools does this agent need? + Read, Grep, Glob (read-only) | add Edit, Write (file changes) | add Bash (shell) | add Skill (skill invocation) | add Agent (spawn subagents) +``` +Wait for response. If tools provided: warn on unrecognised names (not error). +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, `evals/fixtures/aca-no-tools/README.md` +**Commit**: `feat: add tool-selection prompt to agent-create skill` + +--- + +### Step 3: Add conflict detection, scope-overlap check, and frontmatter generation + +**Complexity**: standard +**RED**: Create eval fixtures: + - `evals/fixtures/aca-existing-file/README.md` — file exists. Expected: exact format "…already exists (description: …)" + "Overwrite? (yes/no)"; no write without "yes". + - `evals/fixtures/aca-scope-overlap/README.md` — overlapping descriptions. Expected exact format: "Possible overlap with : . Continue anyway? (yes/no)" where `` is the registry name of the overlapping agent. +**GREEN**: Add to SKILL.md: + - **Conflict**: glob `agents/.md`; if exists, read description, emit exact format, require "yes"; on "no" stop with no changes + - **Scope overlap** (review): compare new description against existing `description` + first 20 lines of `## Detect`; report if ≥ 60% topical overlap; advisory only + - **Scope overlap** (team): compare descriptions only + - **Frontmatter generation**: emit only official fields; apply defaults; omit fields with no value +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixtures +**Commit**: `feat: add conflict detection, scope-overlap, and frontmatter generation` + +--- + +### Step 4: Add plugin-unsupported field handling + +**Complexity**: standard +**RED**: Create `evals/fixtures/aca-plugin-unsupported-field/README.md` — user requests `hooks`. Expected: exact warning string; field absent on decline; field present on confirm. +**GREEN**: Add to SKILL.md: if `hooks`/`mcpServers`/`permissionMode` requested, emit pinned warning: +``` +hooks/mcpServers/permissionMode are silently ignored for plugin agents — move the file to .claude/agents/ if you need them to take effect +``` +Include field only on confirm; omit on decline. +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixture +**Commit**: `feat: add plugin-unsupported field handling` + +--- + +### Step 5: Add review agent body generation with token-efficiency rules + +**Complexity**: complex +**RED**: Create eval fixtures: + - `evals/fixtures/aca-review-body-valid/README.md` — description "Detects unused import statements". Expected: body ≤ 40 lines; Output JSON block, `## Skip`, `## Detect`, `## Ignore` present; no "You are a…" line; no placeholder text. + - `evals/fixtures/aca-review-body-over-budget/README.md` — spec requiring 45 lines. Expected: trim message shown; items listed with `- ` prefix; exact prompt "Approve this trim? (yes/no)" before any write. + - `evals/fixtures/aca-review-preamble-rejected/README.md` — describes a generation that opens "You are an expert reviewer". Expected: pattern must not appear in output. +**GREEN**: Add review body generation to SKILL.md: + - Required order: `# Title`, Output JSON block (exact schema), Status/Severity/Confidence one-liners, `## Skip`, `## Detect`, `## Ignore` + - Anti-patterns: no `^You are an?` opener; title ≠ description verbatim; detect rules ≤ 2 lines; knowledge ref = 1 line; Skip = 1–3 bullets; Ignore = 1 sentence + - **Budget gate**: if > 40 lines, emit exact string "Body is N lines — X lines over the 40-line budget for review agents"; emit list of removed/collapsed items prefixed with `- `; emit exact prompt "Approve this trim? (yes/no)"; if "yes": trim and continue to write gate; if "no": emit "Options: (a) reduce spec scope and regenerate, (b) accept N lines and proceed without trimming" and wait; on (a): return to generation; on (b): proceed to write gate with untrimmed body + - Trimmable content (in priority order): blank separator lines between sections, wordy multi-line bullets collapsed to one line; **protected** (never trim): Output JSON block, section headings (`## Skip`, `## Detect`, `## Ignore`), the single required bullet under each section + - Present full draft before write +**REFACTOR**: Verify trim never removes Output JSON block, section headers, or required sections. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixtures +**Commit**: `feat: add review agent body generation with token-efficiency rules` + +--- + +### Step 6: Add team agent body generation with token-efficiency rules + +**Complexity**: standard +**RED**: Create `evals/fixtures/aca-team-body-valid/README.md` — description "Plans database schema migrations". Expected: body ≤ 75 lines; `## Responsibilities` present; no Output JSON block, no `## Skip`, `## Detect`, `## Ignore`; no "You are a…". +**GREEN**: Add team body generation to SKILL.md: + - Required: `# Title`, `## Responsibilities`; optional: `## Output Discipline`, `## Skills`, `## Process` + - Same anti-pattern rules; same trim-with-diff gate at 75 lines + - Responsibilities: ≤ 2 lines each, action-oriented; Skills section: name + 1-line invocation context +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixture +**Commit**: `feat: add team agent body generation with token-efficiency rules` + +--- + +### Step 7: Add /agent-audit validation gate with recovery flow + +**Complexity**: standard +*(Update the GREEN section with the confirmed invocation from Step 0 before implementing)* +**RED**: Create `evals/fixtures/aca-validation-failure/README.md` — generated agent missing `description`. Expected: raw /agent-audit output emitted verbatim; then "All your inputs are preserved."; then exact menu "(a) auto-correct and re-validate (b) cancel" offered; no file written. +**GREEN**: Add to SKILL.md: + - Run `/agent-audit ` (or Step 0's confirmed fallback) against generated content + - On errors: emit raw `/agent-audit` output verbatim; emit "All your inputs are preserved."; emit exact menu "(a) auto-correct and re-validate (b) cancel"; on cancel no changes; on auto-correct: fix issues, re-run validation, then: + - If second attempt passes: proceed to write gate + - If second attempt fails: emit new `/agent-audit` output verbatim; emit "All your inputs are preserved."; emit the same menu again (no silent stop) + - On pass: present full file to user; write on confirmation +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixture +**Commit**: `feat: add /agent-audit validation gate with recovery flow` + +--- + +### Step 8: Add registry and CLAUDE.md update procedures + +**Complexity**: standard +**RED**: Create `evals/fixtures/aca-registry-update/README.md` — review agent and team agent each created. Expected per type: correct registry table updated, correct CLAUDE.md table updated, other table unchanged. +**GREEN**: Add to SKILL.md: + - Model → tier: haiku → small, sonnet → mid, opus → frontier + - Before appending: locate the target table in each file by searching for a heading that contains "Review Agents" or "Team Agents" (case-insensitive). If the heading is not found, emit an error: "Cannot update : heading containing ' Agents' not found. Update manually." and stop without modifying the file. + - Review agents: append row to `knowledge/agent-registry.md` Review Agents table; append row to `CLAUDE.md` Review Agents table. Row format: `| | | | |` + - Team agents: append row to `knowledge/agent-registry.md` Team Agents table; append row to `CLAUDE.md` Team Agents table. Same row format. + - Append-only; never edit existing rows; confirm both updates in completion report +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/skills/agent-create/SKILL.md`, eval fixture +**Commit**: `feat: add registry and CLAUDE.md update procedures` + +--- + +### Step 9: Update agent-add command to delegate to agent-create skill + +**Complexity**: standard +**RED**: Confirm `commands/agent-add.md` has hardcoded inline template scoped to review agents only. Invoking it with a team-agent description currently produces review-agent output — this is the failure. +**GREEN**: Rewrite `commands/agent-add.md`: + - Delegate all implementation to `skills/agent-create/SKILL.md` + - Update description to cover both agent types + - Add `--type review|team` to argument-hint + - Remove hardcoded inline template (now in the skill) + - Preserve `--name`, `--tier`, `--context`, `--lang`, `--dry` as pass-throughs +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/commands/agent-add.md` +**Commit**: `refactor(agent-add): delegate to agent-create skill, extend to team agents` + +--- + +### Step 10: Register skill in Skills Registry + +**Complexity**: trivial +**RED**: `/agent-audit` flags missing Skills Registry entry. +**GREEN**: Append to `knowledge/agent-registry.md` **Skills Registry** table (not Slash Commands): + `| Agent Create | skills/agent-create/SKILL.md | ~TBD | Orchestrator, Software Engineer, all team agents |` + Update `~TBD` after Steps 1–9 complete. + Do NOT add to Slash Commands table — no command file exists; skill is invoked via agent-add or natural language. +**REFACTOR**: None. +**Files**: `plugins/agentic-dev-team/knowledge/agent-registry.md` +**Commit**: `docs: register agent-create skill in Skills Registry` + +--- + +## Complexity Classification + +| Rating | Criteria | Review depth | +|--------|----------|--------------| +| `trivial` | Single-file change, config, docs-only | Skip inline review | +| `standard` | New behavior within existing patterns | Spec-compliance + quality agents | +| `complex` | Cross-cutting concern, new abstraction | Full agent suite | + +## Pre-PR Quality Gate + +- [ ] Step 0 spike result documented; Step 7 GREEN updated with confirmed invocation +- [ ] All eval fixtures pass manual validation +- [ ] `/agent-audit` passes on all new and modified files +- [ ] `/code-review` passes on all files +- [ ] agent-create token count estimated and added to registry entry +- [ ] `/agent-add` tested end-to-end for review and team agent creation + +## Risks & Open Questions + +- **agent-audit interface** (resolved — Outcome A): `/agent-audit` accepts a single file path argument directly ("A specific file path: audit that file only"). Step 7 uses `/agent-audit plugins/agentic-dev-team/agents/.md`. No staging path needed. +- **Scope-overlap reliability**: 60% overlap threshold is LLM judgment; false positives are advisory-only (user can continue), so this is a UX inconvenience, not a blocker. +- **Trim correctness**: Trim must never remove required structural sections. Trimmable = blank separators, wordy bullets. Protected = Output JSON block, section headers (`## Skip`, `## Detect`, `## Ignore`, `## Responsibilities`). If budget cannot be met without removing protected content, surface to user. +- **Type inference**: Keyword inference is a suggestion with correction opportunity; never applied silently. + +## Plan Review Summary + +Four reviewers ran; all returned `needs-revision` on the first draft. Key changes in this revision: + +| Reviewer | Top Blocker | Resolution | +|----------|-------------|-----------| +| Acceptance Test Critic | agent-eval AC unverifiable; body undefined; exact strings missing | Use `/agent-audit` instead; define body; pin all exact strings; split registry AC | +| Design Critic | Overlap with agent-skill-authoring; command file contradiction | Relationship clarified (reference vs procedure); no command file, Skills Registry only | +| UX Critic | Trim invisible before write gate; validation failure loses inputs | Trim shows diff before gate; validation failure preserves inputs + offers recovery | +| Strategic Critic | Same agent-skill-authoring overlap; scope boundary unstated | Relationship stated; out-of-scope (skills/commands) explicitly named | + +## Plan Review Summary + +Four reviewers ran across four revision cycles. All four passed on the final revision. + +| Reviewer | Verdict | Resolution across passes | +|----------|---------|--------------------------| +| Acceptance Test Critic | **approve** | 7 blockers (pass 1) → 4 blockers (pass 2) → 1 blocker + 3 step issues (pass 3) → approved (pass 4). Key resolutions: /agent-audit replaces agent-eval; body definition pinned; all exact strings defined; trim diff format; scope-overlap message pinned; team boundary scenarios added. | +| Design & Architecture Critic | **approve** (pass 2) | agent-skill-authoring relationship clarified (reference vs procedure); command file contradiction resolved (Skills Registry only). | +| UX Critic | **approve** (pass 2) | Trim visibility: diff shown before write gate; validation failure: inputs preserved + exact recovery menu. | +| Strategic Critic | **approve** (pass 1) | agent-skill-authoring overlap, scope boundary, and incremental delivery noted; addressed in plan goal section. | + +**One remaining warning**: Step 6 has no over-budget team-body eval fixture. The shared trim logic is covered by Step 5's fixture and both AC + scenarios are present; this is a gap for independent team-path eval coverage, not a correctness issue. diff --git a/plugins/agentic-dev-team/agents/claude-setup-review.md b/plugins/agentic-dev-team/agents/claude-setup-review.md index 5aa6a69..312fd5a 100644 --- a/plugins/agentic-dev-team/agents/claude-setup-review.md +++ b/plugins/agentic-dev-team/agents/claude-setup-review.md @@ -1,6 +1,6 @@ --- name: claude-setup-review -description: CLAUDE.md completeness, rules, skills, and path accuracy +description: CLAUDE.md completeness, rules, skills, path accuracy, and agent frontmatter schema compliance tools: Read, Grep, Glob model: haiku --- @@ -14,22 +14,17 @@ Output JSON: ``` Status: pass=complete config, warn=gaps, fail=critical missing -Severity: error=blocks AI effectiveness, warning=reduces quality, suggestion=enhancement -Confidence: high=mechanical fix (add missing section, fix broken path); medium=content exists but needs restructuring; none=requires human judgment (project-specific conventions) - -Model tier: small -Context needs: project-structure +Severity: error=blocks AI effectiveness or violates required schema; warning=reduces quality or uses unsupported field; suggestion=enhancement or style improvement +Confidence: high=mechanical fix (add missing field, fix invalid value); medium=content exists but needs restructuring; none=requires human judgment ## Skip Return `{"status": "skip", "issues": [], "summary": "Not a Claude Code project"}` when: -- No CLAUDE.md, `.claude/` directory, or `.clinerules` file exists +- No CLAUDE.md, `.claude/` directory, agent files, or `.clinerules` file exists - Target is clearly not a Claude Code-enabled project -## Detect - -CLAUDE.md: +## Detect — CLAUDE.md - Missing or malformed - No project overview @@ -56,6 +51,46 @@ Accuracy: - Documented structure doesn't match actual project - Commands don't work +## Detect — Agent frontmatter schema + +Apply to every `.md` file found in `agents/` directories within the target. Check against the official Claude Code sub-agent specification. + +### Required fields + +- **`name`** — must be present; must match `^[a-z][a-z0-9-]*$` (lowercase letters, digits, hyphens only, starting with a letter). Flag missing `name` as error. Flag names containing uppercase, spaces, or special characters as error. +- **`description`** — must be present and non-empty. Flag missing or empty `description` as error. + +### Optional fields with constrained values + +Check these only when present: + +- **`model`** — must be one of: `sonnet`, `opus`, `haiku`, `inherit`, or a recognized full Claude model ID (pattern: `claude-[a-z]+-[0-9]+(-[0-9]+)*(-[0-9]+)?`). Flag any other value as error. +- **`memory`** — must be one of: `user`, `project`, `local`. Flag any other value as error. +- **`background`** — must be `true` or `false`. Flag any other value as error. +- **`effort`** — must be one of: `low`, `medium`, `high`, `xhigh`, `max`. Flag any other value as error. +- **`isolation`** — must be `worktree`. Flag any other value as error. +- **`color`** — must be one of: `red`, `blue`, `green`, `yellow`, `purple`, `orange`, `pink`, `cyan`. Flag any other value as error. +- **`maxTurns`** — must be a positive integer. Flag non-integer values as error. + +### Plugin-unsupported fields + +Flag as **warning** (not error) when present in an agent file that ships as part of a plugin (i.e., lives in a plugin's `agents/` directory): + +- **`hooks`** — silently ignored for plugin agents; has no effect +- **`mcpServers`** — silently ignored for plugin agents; has no effect +- **`permissionMode`** — silently ignored for plugin agents; has no effect + +Suggested fix for each: "This field is ignored for plugin agents. Move the agent file to `.claude/agents/` or `~/.claude/agents/` if you need this field to take effect." + +### `tools` field guidance + +- Flag as **suggestion** if `Skill` appears in `tools` AND the agent appears to use skills for context-loading rather than runtime invocation: "Consider using the `skills` frontmatter field to preload skill content at startup instead of listing `Skill` in tools. Use `Skill` in tools when the agent needs to invoke skills dynamically at runtime." +- Do not flag `Skill` in tools as an error — it is a valid tool name for runtime skill invocation. + +### Unknown frontmatter fields + +Flag as **suggestion** any top-level frontmatter key that is not in the official field list (`name`, `description`, `tools`, `disallowedTools`, `model`, `permissionMode`, `maxTurns`, `skills`, `mcpServers`, `hooks`, `memory`, `background`, `effort`, `isolation`, `color`, `initialPrompt`). These fields are ignored by Claude Code and may indicate a typo or a skill field accidentally placed in an agent file. + ## Ignore -Code quality, tests, domain modeling (handled by other agents) +Code quality, tests, domain modeling (handled by other agents). Agent body content (system prompt quality) is out of scope for this agent. diff --git a/plugins/agentic-dev-team/commands/agent-add.md b/plugins/agentic-dev-team/commands/agent-add.md index 7cd44ec..b05e4a5 100644 --- a/plugins/agentic-dev-team/commands/agent-add.md +++ b/plugins/agentic-dev-team/commands/agent-add.md @@ -1,189 +1,33 @@ --- name: agent-add description: >- - Scaffold a new review agent from a description or URL. Use this whenever - the user wants to add a new review agent, detect a new category of code - issue, or says things like "add an agent for X", "create a reviewer for Y", - "I want to check for Z in code reviews". Also use when given a URL to a - coding standard or best-practices guide that should become a review agent. + Create a new Claude Code agent file (review or team type) following the + official sub-agent schema and token-efficiency budgets. Use when the user + wants to add a new review agent, detect a new category of code issue, create + a team agent persona, or says things like "add an agent for X", "create a + reviewer for Y", "new team agent for Z". Also use when given a URL to a + coding standard that should become a review agent. argument-hint: >- - [--name ] - [--tier small|mid|frontier] - [--context diff-only|full-file|project-structure] + [--name ] [--type review|team] + [--tier small|mid|frontier] [--context diff-only|full-file|project-structure] [--lang ] [--dry] user-invocable: true allowed-tools: Read, Write, Edit, Grep, Glob, WebFetch, Skill(agent-audit *) --- -# Agent Add +Apply the guidelines defined in skills/agent-create/SKILL.md to the current +task. Read the skill file and follow its steps exactly. -Role: implementation. This skill scaffolds new review agent files — it -generates compliant agent definitions from a description or reference URL. +If `$ARGUMENTS` starts with `http://` or `https://`, fetch the URL with +WebFetch first and extract the relevant guidance, then use that content as +the agent description. -You have been invoked with the `/agent-add` skill. Generate a new review -agent `.md` file that passes eval compliance checks, then update all -required documentation. +Pass these flags through to the skill as context: +- `--name ` → set agent name (skips name prompt) +- `--type review|team` → set agent type (skips type prompt) +- `--tier small|mid|frontier` → maps to model: small→haiku, mid→sonnet, frontier→opus +- `--context diff-only|full-file|project-structure` → sets `Context needs:` field +- `--lang ` → adds language scope declaration to the body +- `--dry` → show generated content without writing to disk or updating registry -## Implementation constraints - -1. **Follow the agent template exactly.** Every generated agent must - have: frontmatter, Output JSON block, Status/Severity lines, Model - tier, Context needs, `## Skip`, `## Detect`, `## Ignore` — in that - order. -2. **Do not invent detection rules.** Derive rules from the user's - description or URL content. If the description is vague, ask before - guessing. -3. **Respect scope boundaries.** Check existing agents for overlap - before generating. Warn the user if the new agent's scope conflicts - with an existing one. -4. **Always update documentation.** Steps 7–9 are mandatory, not - optional. A new agent that is not registered and documented does not - exist as far as the rest of the system is concerned. -5. **Be concise.** Detection rules should be short phrases, not - paragraphs. Skip/Ignore sections should be one-liners where possible. - -## Parse Arguments - -Arguments: $ARGUMENTS - -Required: description or URL (`$0`) — either a text description of what -the agent should review, or a URL to fetch guidance from. - -Optional: - -- `--name `: Agent name in kebab-case ending in `-review` - (derived from description if omitted) -- `--tier small|mid|frontier`: Model tier (default: `small`) -- `--context diff-only|full-file|project-structure`: Context needs - (default: `diff-only`) -- `--lang `: Comma-separated file extensions for - language-specific scope (e.g., `js,ts,jsx,tsx`) -- `--dry`: Preview the generated content without writing to disk - -## Steps - -### 1. Parse input - -- If `$0` starts with `http://` or `https://`, fetch it with WebFetch - and extract the review focus from the page content. -- Otherwise, treat `$0` as a text description of the agent's purpose. - -### 2. Derive agent name - -If `--name` was not provided: - -- Extract key concept from the description (e.g., "React hook - violations" → `react-hook`) -- Append `-review` if not already present -- Convert to kebab-case - -### 3. Check for scope overlap - -Read all files in `.claude/agents/*.md` that declare `Model tier:`. -For each existing agent: - -- Compare the `## Detect` section topics against the new agent's - intended scope -- If overlap is found, warn the user: - `⚠ Possible overlap with : ` -- Continue unless the user cancels - -### 4. Generate agent file - -Build the agent `.md` using this exact template: - -```markdown ---- -name: -description: -tools: Read, Grep, Glob -model: ---- - -# - -<If --lang provided> -Scope: <Language> files only (<extensions>). -Skip this agent entirely if the project has no <language> files. -</If> - -Output JSON: -\```json -{"status": "pass|warn|fail|skip", "issues": [...], "summary": ""} -\``` - -Status: pass=<no issues>, warn=<minor concerns>, fail=<critical issues> -Severity: error=<must fix>, warning=<should fix>, suggestion=<consider> - -Model tier: <tier> -Context needs: <context> - -## Skip - -Return `{"status": "skip", "issues": [], "summary": "<reason>"}` when: -- <inapplicability condition 1> -- <inapplicability condition 2> - -## Detect - -<Category 1>: -- <specific pattern to flag> -- <specific pattern to flag> - -## Ignore - -<What other agents handle> (handled by other agents) -``` - -Map `--tier` to frontmatter `model:`: small→haiku, mid→sonnet, -frontier→opus. - -### 5. Write or preview - -- If `--dry` was passed, display the generated content and stop. -- Otherwise, write to `.claude/agents/<name>.md`. - -### 6. Run eval audit - -Run `/agent-audit .claude/agents/<name>.md --fix` to validate compliance. -If any checks fail after auto-fix, report the remaining issues. - -### 7. Update CLAUDE.md - -Add a row to the Review Agents table in `.claude/CLAUDE.md`, inserted -alphabetically by agent name: - -```text -| <name> | `agents/<name>.md` | <tier> | <short focus description> | -``` - -Also add an entry to the Orchestrator Model Routing Table if the new -agent's tier or name warrants a distinct row. - -### 8. Update docs/agent_info.md - -Add a row to the Review Agents table in `docs/agent_info.md`, inserted -alphabetically by agent name: - -```text -| `<name>` | [`<name>.md`](../.claude/agents/<name>.md) | <model> | <short focus description> | -``` - -### 9. Update docs/team-structure.md - -Add the new review agent to the Review Agent Dispatch Mermaid diagram -under the appropriate trigger condition. If no existing trigger matches, -add a new edge from `CO[Orchestrator]` to the new agent node. - -### 10. Report - -```text -Agent created: .claude/agents/<name>.md -Model tier: <tier> -Context needs: <context> -Eval audit: PASS|WARN (details) -Documentation updated: - - .claude/CLAUDE.md (Review Agents table) - - docs/agent_info.md (Review Agents table) - - docs/team-structure.md (dispatch diagram) -``` +Apply this skill to: $ARGUMENTS diff --git a/plugins/agentic-dev-team/knowledge/agent-registry.md b/plugins/agentic-dev-team/knowledge/agent-registry.md index cc82747..de230cd 100644 --- a/plugins/agentic-dev-team/knowledge/agent-registry.md +++ b/plugins/agentic-dev-team/knowledge/agent-registry.md @@ -83,6 +83,7 @@ Skills are reusable knowledge modules in `.claude/skills/` that agents reference | Docker Image Audit | `skills/docker-image-audit/SKILL.md` | 750 | Orchestrator (inline review), Platform Engineer, Security Engineer | | Performance Benchmark | `skills/performance-benchmark/SKILL.md` | 800 | QA Engineer, Platform Engineer, `/benchmark` command | | Semantic Duplication Scan | `skills/semantic-duplication-scan/SKILL.md` | ~4,500 | Orchestrator, Software Engineer, Architect | +| Agent Create | `skills/agent-create/SKILL.md` | ~2,100 | Orchestrator, Software Engineer, all team agents | ## Subagent Prompt Templates diff --git a/plugins/agentic-dev-team/skills/agent-create/SKILL.md b/plugins/agentic-dev-team/skills/agent-create/SKILL.md new file mode 100644 index 0000000..c4a8946 --- /dev/null +++ b/plugins/agentic-dev-team/skills/agent-create/SKILL.md @@ -0,0 +1,346 @@ +--- +name: agent-create +description: >- + Create new Claude Code sub-agent files following the official schema and + token-efficiency budgets. Handles both review agents (JSON output, read-only + tools, ≤ 40-line body) and team agents (prose output, action tools, ≤ 75-line + body). Use when the user says "add an agent", "create a reviewer for X", + "new team agent for Y", or when /agent-add is invoked. Validates against + /agent-audit before writing. Updates the agent registry and CLAUDE.md after + success. +role: worker +user-invocable: true +--- + +# Agent Create + +Automates production of Claude Code sub-agent files that pass schema validation +and stay within token-efficiency budgets. For conventions, anti-patterns, and +registration checklists, see `skills/agent-skill-authoring/SKILL.md`. + +## Constraints + +- Do not write any file until validation passes and the user confirms the draft +- Name validation is a hard gate — exit immediately if the name is invalid +- Never include `hooks`, `mcpServers`, or `permissionMode` without explicit user + confirmation after the plugin warning +- Body line budgets are hard limits enforced at generation time; trim content is + shown to the user before any file is written +- Registry and CLAUDE.md updates are append-only; never edit existing rows + +--- + +## Step 1 — Parse Arguments + +Accept these inputs (from arguments or interactive prompts): + +| Input | Required | Notes | +|-------|----------|-------| +| `name` | yes | file stem of the new agent | +| `type` | yes | `review` or `team` | +| `description` | yes | one-line summary for frontmatter | +| `tools` | no | comma-separated tool list | +| `model` | no | haiku \| sonnet \| opus \| inherit | +| `--tier small\|mid\|frontier` | no | maps to model: small→haiku, mid→sonnet, frontier→opus | +| `--context diff-only\|full-file\|project-structure` | no | sets `Context needs:` field in review body | +| `--lang <exts>` | no | adds language scope line to review body (e.g. `Scope: .ts, .tsx files only`) | +| `--dry` | no | display generated content without writing file or updating registry | + +If `--tier` was provided, map to model: `small` → `haiku`, `mid` → `sonnet`, `frontier` → `opus`. This overrides any explicit `model` argument. + +--- + +## Step 2 — Validate Name (hard gate) + +The name must match `^[a-z][a-z0-9-]*$` exactly. + +If it does not: +1. Emit: `Name must match ^[a-z][a-z0-9-]*$ — use lowercase letters, digits, and hyphens only` +2. Compute a kebab-case correction: + - Lowercase all characters + - Replace runs of non-alphanumeric characters with a single hyphen + - Strip leading/trailing hyphens + - If result starts with a digit: strip leading digits and any adjacent hyphens from the front; if the result is then valid, use it; if empty or still invalid, skip the suggestion +3. If a valid correction exists, emit: `Did you mean: <corrected-name>?` +4. **Stop immediately. Do not write any file.** + +--- + +## Step 3 — Detect Agent Type + +If `type` was not provided: + +- Scan `description` for keywords: + - `review`, `audit`, `check`, `validate`, `detect`, `scan`, `lint` → infer `review` + - `engineer`, `architect`, `manager`, `writer`, `planner`, `designer`, `specialist` → infer `team` +- If inference is confident, state the inferred type and continue +- If ambiguous or no keywords match, ask: `Agent type: review or team?` + +--- + +## Step 4 — Prompt for Missing Tools + +If `tools` was not provided, emit exactly: + +``` +Which tools does this agent need? + Read, Grep, Glob (read-only) | add Edit, Write (file changes) | add Bash (shell) | add Skill (skill invocation) | add Agent (spawn subagents) +``` + +Wait for the user's selection before continuing. + +If tools were provided, validate each against known Claude Code tool names +(`Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write`, `Agent`, `Skill`, +`WebFetch`, `WebSearch`, `NotebookRead`, `NotebookEdit`). Flag unknown names +as a warning (not an error — custom tools are allowed). + +--- + +## Step 5 — Apply Defaults + +| Setting | Review default | Team default | +|---------|---------------|-------------| +| `tools` | `Read, Grep, Glob` | (whatever user specified) | +| `model` | `haiku` | `sonnet` | + +Only apply a default when the value was not specified by the user. + +--- + +## Step 6 — Check for Existing File + +Glob `plugins/agentic-dev-team/agents/<name>.md`. + +If the file exists: +1. Read its `description` frontmatter field +2. Emit: `plugins/agentic-dev-team/agents/<name>.md already exists (description: <existing-description>)` +3. Ask: `Overwrite? (yes/no)` +4. On `no`: emit `Cancelled. Existing agent: plugins/agentic-dev-team/agents/<name>.md — <existing-description>` and **stop with no changes** +5. On `yes`: continue + +--- + +## Step 7 — Check Scope Overlap (Review Agents Only) + +For review agents, scan existing agents for topical overlap: + +1. Read `description` frontmatter of all files in `plugins/agentic-dev-team/agents/` +2. For each existing agent, also read the first 20 lines of its `## Detect` section if present +3. If the LLM judges ≥ 60% topical overlap between the new description and an existing agent's scope, emit: + + `Possible overlap with <agent-name>: <one-sentence description of shared concept>. Continue anyway? (yes/no)` + +4. On `no`: stop with no changes +5. On `yes`: continue +6. This check is advisory — the user can always continue + +For team agents: compare descriptions only (no `## Detect` scan). + +--- + +## Step 8 — Handle Plugin-Unsupported Fields + +If the user has requested `hooks`, `mcpServers`, or `permissionMode`, emit: + +``` +hooks/mcpServers/permissionMode are silently ignored for plugin agents — move the file to .claude/agents/ if you need them to take effect +``` + +Then ask: `Include anyway? (yes/no)` + +- On `no`: omit the field from generated frontmatter +- On `yes`: include the field as requested + +Do not emit this warning for fields the user did not request. + +--- + +## Step 9 — Generate Frontmatter + +Emit only official fields with non-empty values. Use this structure: + +```yaml +--- +name: <name> +description: <description> +tools: <comma-separated tool list> +model: <model> +[any additional fields the user requested and confirmed] +--- +``` + +Do not include `hooks`, `mcpServers`, or `permissionMode` unless the user +confirmed their inclusion in Step 8. + +--- + +## Step 10 — Generate Body + +### Review Agent Body Structure (required order) + +If `--context` was provided, use it for the `Context needs:` field. Otherwise infer a sensible default from the description (simple detectors → `diff-only`; agents that need full file context → `full-file`; agents that need project structure → `project-structure`). + +If `--lang` was provided, insert a language scope line immediately after the title: `Scope: <exts> files only. Skip if no <exts> files are present.` + +```markdown +# <Title Case Name> + +[Scope: <exts> files only. Skip if no <exts> files are present.] + +Output JSON: + +```json +{"status": "pass|warn|fail|skip", "issues": [{"severity": "error|warning|suggestion", "confidence": "high|medium|none", "file": "", "line": 0, "message": "", "suggestedFix": ""}], "summary": ""} +``` + +Status: pass=<condition>, warn=<condition>, fail=<condition> +Severity: error=<condition>, warning=<condition>, suggestion=<condition> +Confidence: high=<condition>, medium=<condition>, none=<condition> + +Model tier: <small|mid|frontier> +Context needs: <diff-only|full-file|project-structure> + +## Skip + +Return `{"status": "skip", ...}` when: +- <inapplicability condition> + +## Detect + +<Category>: +- <specific pattern to flag> + +## Ignore + +<what other agents handle> (handled by other agents) +``` + +### Team Agent Body Structure (required order) + +```markdown +# <Title Case Name> + +## Responsibilities + +- <action-oriented responsibility> + +[## Output Discipline] (optional) + +[## Skills] (optional — list skill name + one-line invocation context) + +[## Process] (optional) +``` + +### Token-Efficiency Rules (both types) + +Apply these rules when generating the body: + +1. **No opener**: no line may match `^You are an? ` (case-insensitive) +2. **No description restatement**: title must not contain the `description` field value verbatim (whitespace-normalized) +3. **No placeholder text**: body must not contain `your-agent-name`, `One-sentence description`, or `# Agent Name` +4. **Bullet length**: no single bullet point may span more than two lines +5. **Knowledge file reference**: one line only — `Read knowledge/X.md before starting` — no prose explanation +6. **Review Skip section**: 1–3 bullet conditions, no prose explanation +7. **Review Ignore section**: one sentence listing what other agents handle +8. **Skills section (team)**: skill name + one-line invocation context only + +### Line Budget Gate + +After generating the body, count all lines (including blank lines). + +**Review agents**: if line count > 40: +1. Emit: `Body is N lines — X lines over the 40-line budget for review agents` +2. List each removed/collapsed item, each prefixed with `- ` (dash space) +3. Emit: `Approve this trim? (yes/no)` +4. On `yes`: apply trim and continue +5. On `no`: emit `Options: (a) reduce spec scope and regenerate, (b) accept N lines and proceed without trimming` and wait + +**Team agents**: same gate with budget of 75 and label `team agents`. + +**Trimmable content** (in priority order): +- Blank separator lines between sections (but not between bullets) +- Multi-line bullets collapsed to one line +- Wordy bullet text shortened to the essential action + +**Protected content** (never trim): +- Output JSON block +- Section headings (`## Skip`, `## Detect`, `## Ignore`, `## Responsibilities`) +- The closing `---` of any required section + +--- + +## Step 11 — Run /agent-audit Validation Gate + +Write the generated content to disk, then invoke the agent-audit skill: +`Skill(agent-audit plugins/agentic-dev-team/agents/<name>.md)` + +**If the audit returns errors:** +1. Emit the raw `/agent-audit` output verbatim +2. Emit: `All your inputs are preserved.` +3. Emit: `(a) auto-correct and re-validate (b) cancel` +4. On `(b)`: delete the file, make no changes, stop +5. On `(a)`: apply the minimal corrections, re-run `/agent-audit` once more + - If the second run passes: continue to Step 12 + - If the second run also fails: emit new `/agent-audit` output verbatim; emit `All your inputs are preserved.`; emit `(a) auto-correct and re-validate (b) cancel` again (no silent stop) + +**If the audit passes:** continue to Step 12. + +--- + +## Step 12 — Present Draft and Confirm Write + +Display the complete generated file content to the user. + +**If `--dry` was passed**: display the content and stop. Do not write any file and do not update the registry or CLAUDE.md. + +Otherwise, ask: `Write this file to plugins/agentic-dev-team/agents/<name>.md? (yes/no)` + +On `no`: delete the file written in Step 11, make no other changes, stop. +On `yes`: the file is already on disk from Step 11; no re-write needed unless the user modified the draft. + +--- + +## Step 13 — Update Agent Registry + +Locate the table in `knowledge/agent-registry.md` whose heading contains +`Review Agents` (for review type) or `Team Agents` (for team type). + +If the heading is not found: emit +`Cannot update knowledge/agent-registry.md: heading containing '<type> Agents' not found. Update manually.` +and stop without modifying the file. + +Map model to tier label: `haiku` → `small`, `sonnet` → `mid`, `opus` → `frontier`, `inherit` → `mid`. + +Append a row to the correct table: + +``` +| <name> | agents/<name>.md | <tier-label> | <description> | +``` + +--- + +## Step 14 — Update CLAUDE.md + +Locate the table in `plugins/agentic-dev-team/CLAUDE.md` whose heading +contains `Review Agents` (review type) or `Team Agents` (team type). + +If the heading is not found: emit +`Cannot update CLAUDE.md: heading containing '<type> Agents' not found. Update manually.` +and stop without modifying the file. + +Append a row to the correct table. Confirm both updates in the completion +report. + +--- + +## Completion Report + +``` +Agent created: plugins/agentic-dev-team/agents/<name>.md +Type: <review|team> +Model: <model> (<tier-label>) +Body: <N> lines +Validation: PASS (/agent-audit) +Registry updated: knowledge/agent-registry.md (<type> Agents table) +CLAUDE.md updated: <type> Agents table +``` diff --git a/plugins/agentic-dev-team/templates/agents/agent-template.md b/plugins/agentic-dev-team/templates/agents/agent-template.md new file mode 100644 index 0000000..4365c14 --- /dev/null +++ b/plugins/agentic-dev-team/templates/agents/agent-template.md @@ -0,0 +1,143 @@ +--- +# REQUIRED — unique identifier +# Format: lowercase letters, digits, and hyphens only (e.g. code-reviewer, db-reader) +# Must match the filename convention; used by hooks as agent_type +name: your-agent-name + +# REQUIRED — tells Claude when to delegate to this agent +# Write a clear, specific description. Include "Use proactively" to encourage automatic delegation. +description: >- + One-sentence description of what this agent does and when to use it. + Add "Use proactively when X" to encourage automatic delegation. + +# OPTIONAL — tools this agent can use +# If omitted, the agent inherits ALL tools from the main conversation. +# Use this as an allowlist to restrict what the agent can do. +# +# Common tools: Read, Grep, Glob, Bash, Edit, Write, Agent, Skill +# Tool restrictions: Agent(worker, researcher) limits which subagents can be spawned +# Bash restrictions: Bash(npx *) limits Bash to matching commands only +# +# Note: To preload skill content at startup, use the `skills` field below +# instead of listing Skill here. Listing Skill here allows runtime invocation +# but does not inject skill content into context automatically. +tools: Read, Grep, Glob + +# OPTIONAL — tools to deny (removed from inherited or specified list) +# Use when you want to inherit most tools but block a few specific ones. +# disallowedTools: Write, Edit + +# OPTIONAL — model to use +# Aliases: sonnet | opus | haiku | inherit +# Full model ID: claude-opus-4-7 | claude-sonnet-4-6 | claude-haiku-4-5-20251001 +# inherit: use the same model as the main conversation (default if omitted) +# +# Routing guidance: +# haiku — high-volume, structured extraction, simple classification +# sonnet — balanced capability and speed; most agents +# opus — frontier reasoning, security analysis, architectural judgment +model: sonnet + +# OPTIONAL — permission mode +# WARNING: Ignored for plugin agents (silently has no effect). +# Only works for project agents (.claude/agents/) or user agents (~/.claude/agents/). +# Values: default | acceptEdits | auto | dontAsk | bypassPermissions | plan +# permissionMode: default + +# OPTIONAL — maximum agentic turns before the agent stops +# maxTurns: 10 + +# OPTIONAL — skills to preload into this agent's context at startup +# The full content of each skill is injected before the agent runs. +# This is different from listing Skill in tools (which allows runtime invocation). +# Use skill names as they appear in the skills registry. +# skills: +# - api-conventions +# - error-handling-patterns + +# OPTIONAL — MCP servers available to this agent +# WARNING: Ignored for plugin agents (silently has no effect). +# Only works for project agents (.claude/agents/) or user agents (~/.claude/agents/). +# mcpServers: +# - playwright: +# type: stdio +# command: npx +# args: ["-y", "@playwright/mcp@latest"] + +# OPTIONAL — lifecycle hooks scoped to this agent +# WARNING: Ignored for plugin agents (silently has no effect). +# Only works for project agents (.claude/agents/) or user agents (~/.claude/agents/). +# hooks: +# PreToolUse: +# - matcher: "Bash" +# hooks: +# - type: command +# command: "./scripts/validate-command.sh" + +# OPTIONAL — persistent memory directory that survives across conversations +# Values: user | project | local +# user: ~/.claude/agent-memory/<name>/ (all projects) +# project: .claude/agent-memory/<name>/ (this project, check in to VCS) +# local: .claude/agent-memory-local/<name>/ (this project, do NOT check in) +# memory: project + +# OPTIONAL — always run this agent as a background task (non-blocking) +# Default: false (agent runs in foreground, blocking the main conversation) +# background: false + +# OPTIONAL — effort level for this agent's model calls +# Overrides the session effort level while this agent is active. +# Values: low | medium | high | xhigh | max (available levels depend on model) +# effort: medium + +# OPTIONAL — run agent in an isolated git worktree +# Set to "worktree" to give the agent a temporary copy of the repository. +# The worktree is cleaned up automatically if the agent makes no changes. +# Values: worktree +# isolation: worktree + +# OPTIONAL — display color in the task list and transcript +# Values: red | blue | green | yellow | purple | orange | pink | cyan +# color: blue + +# OPTIONAL — auto-submitted first turn when this agent runs as the main session +# (via --agent flag or `agent` setting in settings.json) +# Commands and skills in this prompt are processed. +# initialPrompt: "Load the project context and summarize the current state." +--- + +# Agent Name + +One-sentence description of what this agent is for and what distinguishes it. + +## Responsibilities + +- Primary responsibility +- Secondary responsibility +- What it explicitly does NOT do (scope boundary) + +## Process + +Step-by-step description of how this agent approaches its task. + +1. First step +2. Second step +3. Third step + +## Output Format + +Describe the expected output format. For review agents: + +```json +{"status": "pass|warn|fail|skip", "issues": [{"severity": "error|warning|suggestion", "confidence": "high|medium|none", "file": "", "line": 0, "message": "", "suggestedFix": ""}], "summary": ""} +``` + +## Skip + +Return skip status when: + +- Condition that makes this agent inapplicable + +## Ignore + +What this agent explicitly does not check (handled by other agents or out of scope). From 7f93ef34834d597710c8a3a245ddefaea234f73c Mon Sep 17 00:00:00 2001 From: Bryan Finster <bryan.finster@gmail.com> Date: Thu, 14 May 2026 11:50:30 -0500 Subject: [PATCH 3/3] fix(agent-create): move --dry check before file write; fix CLAUDE.md description Fixes structural defect where --dry flag was checked in Step 12 after Step 11 had already written the file to disk. --dry now short-circuits at the top of Step 11 before any disk write or audit invocation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --- plugins/agentic-dev-team/CLAUDE.md | 2 +- plugins/agentic-dev-team/skills/agent-create/SKILL.md | 10 ++++------ 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/plugins/agentic-dev-team/CLAUDE.md b/plugins/agentic-dev-team/CLAUDE.md index 488d46e..c076030 100644 --- a/plugins/agentic-dev-team/CLAUDE.md +++ b/plugins/agentic-dev-team/CLAUDE.md @@ -65,7 +65,7 @@ User-invocable workflows in `.claude/commands/`. All review commands are execute | `/review-agent` | `commands/review-agent.md` | worker | Run a single review agent (used for inline checkpoints) | | `/agent-audit` | `commands/agent-audit.md` | orchestrator | Audit agents/commands/hooks for structural compliance | | `/agent-eval` | `commands/agent-eval.md` | orchestrator | Run eval fixtures, grade accuracy, detect regressions | -| `/agent-add` | `commands/agent-add.md` | implementation | Scaffold a new review agent with eval compliance and doc updates | +| `/agent-add` | `commands/agent-add.md` | implementation | Create a new review or team agent following the official schema with token-efficiency budgets | | `/agent-remove` | `commands/agent-remove.md` | implementation | Remove an agent and all its registry entries and doc references | | `/add-plugin` | `commands/add-plugin.md` | implementation | Install a plugin and register it in settings.json | | `/apply-fixes` | `commands/apply-fixes.md` | implementation | Apply correction prompts from `/code-review` output | diff --git a/plugins/agentic-dev-team/skills/agent-create/SKILL.md b/plugins/agentic-dev-team/skills/agent-create/SKILL.md index c4a8946..1108327 100644 --- a/plugins/agentic-dev-team/skills/agent-create/SKILL.md +++ b/plugins/agentic-dev-team/skills/agent-create/SKILL.md @@ -271,7 +271,9 @@ After generating the body, count all lines (including blank lines). ## Step 11 — Run /agent-audit Validation Gate -Write the generated content to disk, then invoke the agent-audit skill: +**If `--dry` was passed**: display the complete generated file content to the user and stop. Do not write any file, do not run validation, do not update the registry or CLAUDE.md. + +Otherwise: write the generated content to disk, then invoke the agent-audit skill: `Skill(agent-audit plugins/agentic-dev-team/agents/<name>.md)` **If the audit returns errors:** @@ -289,11 +291,7 @@ Write the generated content to disk, then invoke the agent-audit skill: ## Step 12 — Present Draft and Confirm Write -Display the complete generated file content to the user. - -**If `--dry` was passed**: display the content and stop. Do not write any file and do not update the registry or CLAUDE.md. - -Otherwise, ask: `Write this file to plugins/agentic-dev-team/agents/<name>.md? (yes/no)` +Ask: `Write this file to plugins/agentic-dev-team/agents/<name>.md? (yes/no)` On `no`: delete the file written in Step 11, make no other changes, stop. On `yes`: the file is already on disk from Step 11; no re-write needed unless the user modified the draft.