diff --git a/.prompts/upgrade-from-citypaul-dotfiles.md b/.prompts/upgrade-from-citypaul-dotfiles.md index fb79f20..49d1f9b 100644 --- a/.prompts/upgrade-from-citypaul-dotfiles.md +++ b/.prompts/upgrade-from-citypaul-dotfiles.md @@ -131,7 +131,7 @@ The `/setup` command generates the hook with only the branches relevant to the d **What they have:** A systematic CI/CD failure diagnosis skill with hypothesis-first approach, environment delta analysis, and anti-patterns (no blind retries). -**Why it matters:** CI failures are a common pain point. We have a devops-sre-engineer agent but no specific skill for diagnosing CI failures methodically. This would prevent the common anti-pattern of "just re-run it." +**Why it matters:** CI failures are a common pain point. We have a ops-engineer agent but no specific skill for diagnosing CI failures methodically. This would prevent the common anti-pattern of "just re-run it." **Action:** Create `skills/ci-debugging.md` covering: - Hypothesis-first diagnosis @@ -145,7 +145,7 @@ The `/setup` command generates the hook with only the branches relevant to the d - Network and filesystem differences - Local reproduction steps - Anti-patterns: blind retries, adding retries to "flaky" tests, speculative fix pushes -- Integration with devops-sre-engineer agent +- Integration with ops-engineer agent ### 9. Add a `test-design-reviewer` skill diff --git a/GETTING-STARTED.md b/GETTING-STARTED.md index f59dbf4..23e936a 100644 --- a/GETTING-STARTED.md +++ b/GETTING-STARTED.md @@ -68,7 +68,7 @@ Bug fixes typically need only the Software Engineer. The QA Engineer loads after /architect Review the current service topology for scalability concerns ``` -The Architect may pull in the Security Engineer or DevOps/SRE Engineer for cross-cutting concerns. +The Architect may pull in the Security Engineer or Ops Engineer for cross-cutting concerns. ### API Design @@ -87,7 +87,7 @@ The Architect may pull in the Security Engineer or DevOps/SRE Engineer for cross ### Pipeline and Deployment ``` -/devops-sre-engineer Design the CI/CD pipeline for the new microservice +/ops-engineer Design the CI/CD pipeline for the new microservice ``` ## Available Agents and Skills diff --git a/docs/agent_info.md b/docs/agent_info.md index 88a3017..6f2a3c4 100644 --- a/docs/agent_info.md +++ b/docs/agent_info.md @@ -17,7 +17,7 @@ Each team agent file in `agents/` specifies a role's persona, behavior, collabor | Product Manager | [`product-manager.md`](../plugins/agentic-dev-team/agents/product-manager.md) | Requirements clarification, prioritization, stakeholder alignment | | Technical Writer | [`tech-writer.md`](../plugins/agentic-dev-team/agents/tech-writer.md) | Documentation, terminology consistency, style enforcement | | Security Engineer | [`security-engineer.md`](../plugins/agentic-dev-team/agents/security-engineer.md) | Security analysis, threat modeling, compliance | -| DevOps/SRE Engineer | [`devops-sre-engineer.md`](../plugins/agentic-dev-team/agents/devops-sre-engineer.md) | Pipeline, deployment, reliability, observability | +| Ops Engineer | [`ops-engineer.md`](../plugins/agentic-dev-team/agents/ops-engineer.md) | Pipeline, deployment, reliability, observability | | ADR Author | [`adr.md`](../plugins/agentic-dev-team/agents/adr.md) | Creates and manages Architecture Decision Records | ## Review Agents @@ -48,9 +48,21 @@ Review agents run as sub-agents during Phase 3 inline checkpoints and full `/cod To add a new review agent, use `/agent-add`. See [Add a Review Agent](#add-a-review-agent) below. -## Plan Review Personas +## Prompt Templates -Plan review personas are subagent prompt templates that critically challenge implementation plans during Phase 2, before the human gate. They run **in parallel** and return structured verdicts. Unlike review agents (which check code), these check the plan itself. +Prompt templates in `prompts/` are subagent dispatches used by the Orchestrator at various pipeline phases. They are never invoked directly by the user. + +### Implementation & Review Dispatch + +| Template | File | Purpose | +| --- | --- | --- | +| Implementer | [`implementer.md`](../plugins/agentic-dev-team/prompts/implementer.md) | Phase 3 implementation dispatch with TDD and status protocol | +| Spec Reviewer | [`spec-reviewer.md`](../plugins/agentic-dev-team/prompts/spec-reviewer.md) | Stage 1 spec compliance review with status protocol | +| Quality Reviewer | [`quality-reviewer.md`](../plugins/agentic-dev-team/prompts/quality-reviewer.md) | Stage 2 dispatcher that selects review agents by what changed | + +### Plan Review Personas + +Plan review personas critically challenge implementation plans during Phase 2, before the human gate. They run **in parallel** and return structured verdicts. Unlike review agents (which check code), these check the plan itself. | Persona | File | Focus | | --- | --- | --- | diff --git a/docs/diagrams/team-agents.svg b/docs/diagrams/team-agents.svg index 79cb873..d401ea8 100644 --- a/docs/diagrams/team-agents.svg +++ b/docs/diagrams/team-agents.svg @@ -1,4 +1,4 @@ - + @@ -64,6 +64,14 @@ Product Manager + + + + + + ADR Author + + @@ -86,11 +94,11 @@ Security Engineer - + - DevOps / SRE + Ops Engineer @@ -106,8 +114,8 @@ - - reports to - - collaborates + + reports to + + collaborates \ No newline at end of file diff --git a/docs/skills.md b/docs/skills.md index f995b46..8e809e4 100644 --- a/docs/skills.md +++ b/docs/skills.md @@ -29,6 +29,7 @@ Used by all agents to ensure output correctness: | --- | --- | --- | | Quality Gate Pipeline | [`quality-gate-pipeline.md`](../plugins/agentic-dev-team/skills/quality-gate-pipeline/SKILL.md) | Unified quality gate: self-validation, verification evidence, review-correction loops | | Governance & Compliance | [`governance-compliance.md`](../plugins/agentic-dev-team/skills/governance-compliance/SKILL.md) | Audit trail, quality assurance layers, ethics principles | +| Static Analysis Integration | [`static-analysis-integration/SKILL.md`](../plugins/agentic-dev-team/skills/static-analysis-integration/SKILL.md) | Pre-pass stage for /code-review that runs available static analysis tools before dispatching AI agents | ### Development Discipline Skills @@ -44,6 +45,7 @@ Enforce rigorous development practices: | Test Design Reviewer | [`test-design-reviewer.md`](../plugins/agentic-dev-team/skills/test-design-reviewer/SKILL.md) | Test quality patterns and anti-patterns | | Browser Testing | [`browser-testing.md`](../plugins/agentic-dev-team/skills/browser-testing/SKILL.md) | Playwright-based browser QA for visual verification | | Feature File Validation | [`feature-file-validation.md`](../plugins/agentic-dev-team/skills/feature-file-validation/SKILL.md) | Gherkin quality, determinism, implementation independence, test automation coverage | +| Receiving Code Review | [`receiving-code-review/SKILL.md`](../plugins/agentic-dev-team/skills/receiving-code-review/SKILL.md) | Behavioral constraints for review reception: verification gates, YAGNI check, anti-performative-agreement | ### Research & Design Skills @@ -54,6 +56,7 @@ Used during the Research phase to explore alternatives and stress-test designs: | Competitive Analysis | [`competitive-analysis.md`](../plugins/agentic-dev-team/skills/competitive-analysis/SKILL.md) | Gap analysis against external tools, plugins, or feature sets | | Design Interrogation | [`design-interrogation.md`](../plugins/agentic-dev-team/skills/design-interrogation/SKILL.md) | Stress-test design decisions before planning | | Design It Twice | [`design-it-twice.md`](../plugins/agentic-dev-team/skills/design-it-twice/SKILL.md) | Generate parallel alternative interfaces via sub-agents | +| Domain Analysis | [`domain-analysis/SKILL.md`](../plugins/agentic-dev-team/skills/domain-analysis/SKILL.md) | Strategic DDD health assessment: bounded contexts, event flows, value stream, friction report | ### Technical Skills @@ -70,6 +73,7 @@ Domain knowledge for implementation work: | Docker Image Create | [`docker-image-create/SKILL.md`](../plugins/agentic-dev-team/skills/docker-image-create/SKILL.md) | Generate production Dockerfiles with multi-stage builds, slim/distroless bases | | Docker Image Audit | [`docker-image-audit/SKILL.md`](../plugins/agentic-dev-team/skills/docker-image-audit/SKILL.md) | Audit Dockerfiles and images with hadolint, Trivy, Grype; structured severity report | | Performance Benchmark | [`performance-benchmark/SKILL.md`](../plugins/agentic-dev-team/skills/performance-benchmark/SKILL.md) | Runtime performance measurement: Core Web Vitals, resource sizes, baseline comparison, performance budgets, trend tracking | +| JS Project Init | [`js-project-init/README.md`](../plugins/agentic-dev-team/skills/js-project-init/README.md) | Initialize JS project with ES modules, prettier, eslint, vitest, editorconfig | ### Subagent Prompt Templates @@ -79,8 +83,7 @@ Concrete templates in `prompts/` for reproducible subagent dispatch: | --- | --- | --- | | Implementer | [`implementer.md`](../plugins/agentic-dev-team/prompts/implementer.md) | Phase 3 implementation dispatch with TDD enforcement | | Spec Reviewer | [`spec-reviewer.md`](../plugins/agentic-dev-team/prompts/spec-reviewer.md) | Two-stage review gate 1: does code match spec? | -| Quality Reviewer | [`quality-reviewer.md`](../plugins/agentic-dev-team/prompts/quality-reviewer.md) | Two-stage review gate 2: is code high quality? | -| Plan Reviewer | [`plan-reviewer.md`](../plugins/agentic-dev-team/prompts/plan-reviewer.md) | Phase 2 automated pre-check before human review | +| Quality Reviewer | [`quality-reviewer.md`](../plugins/agentic-dev-team/prompts/quality-reviewer.md) | Stage 2 dispatcher: selects and runs specialized review agents based on what changed | | Plan Review — Acceptance | [`plan-review-acceptance.md`](../plugins/agentic-dev-team/prompts/plan-review-acceptance.md) | Criteria verifiability, scenario completeness, error paths, TDD traceability | | Plan Review — Design | [`plan-review-design.md`](../plugins/agentic-dev-team/prompts/plan-review-design.md) | Coupling, abstraction quality, structural risks, pattern consistency | | Plan Review — UX | [`plan-review-ux.md`](../plugins/agentic-dev-team/prompts/plan-review-ux.md) | User journey, error experience, cognitive load, accessibility | diff --git a/docs/specs/multi-platform-support-suggestions.md b/docs/specs/multi-platform-support-suggestions.md new file mode 100644 index 0000000..464c056 --- /dev/null +++ b/docs/specs/multi-platform-support-suggestions.md @@ -0,0 +1,126 @@ +# Multi-Platform Support: Suggestions (Research Only) + +**Date**: 2026-04-16 +**Source**: Competitive analysis against [obra/superpowers](https://github.com/obra/superpowers) which supports 6 platforms. +**Status**: Research document — no implementation planned without further decision. + +## Current State + +A platform dependency audit found that **97.4% of plugin files (151 of ~155) have Claude Code-specific dependencies**. Only 4 knowledge files are fully portable as pure markdown: + +- `knowledge/review-rubric.md` +- `knowledge/owasp-detection.md` +- `knowledge/domain-modeling.md` +- `knowledge/architecture-assessment.md` + +### Claude Code Features We Depend On + +| Feature | Files affected | Portability barrier | +|---------|---------------|-------------------| +| **Agent tool** (subagent dispatch with model override, `isolation: "worktree"`) | Orchestrator, build, triage commands | No equivalent in Cursor, Codex, Gemini CLI. This is the hardest barrier. | +| **Hooks** (PreToolUse, PostToolUse) | 8 bash scripts + settings.json | No hook system in other platforms. Would need middleware. | +| **`allowed-tools:` frontmatter** (scoped tool permissions) | All 29 command files | Claude Code security feature with no cross-platform equivalent. | +| **`tools:` frontmatter** in agents | All 30 agent files | Tool names (Read, Write, Edit, Glob, Grep, Bash, Agent) are Claude Code-specific. | +| **`model:` frontmatter** in agents | All 30 agent files + 9 templates | Model names (haiku, sonnet, opus) map to Claude models. Other platforms may use different models. | +| **Plugin manifest** (`.claude-plugin/plugin.json`) | 1 file | Different format per platform. | +| **Skill frontmatter** (`user-invocable`, `role`) | All 32 skill files | Claude Code skill discovery mechanism. | + +## How Superpowers Does It + +superpowers supports 6 platforms with these strategies: + +1. **Platform-specific manifests**: `.claude-plugin/plugin.json`, `.cursor-plugin/`, `.codex/`, `.opencode/`, `gemini-extension.json`, `AGENTS.md` (Copilot). Each manifest points to the same skill files. + +2. **SessionStart hook with platform detection**: A single hook fires on session start, detects which platform is running, and injects skill awareness. The hook has platform-specific variants (`hooks.json` for Claude Code, `hooks-cursor.json` for Cursor). + +3. **Skills as the portable unit**: Skills are plain markdown with YAML frontmatter. The frontmatter is minimal (name, description). No `tools:`, `model:`, or `allowed-tools:` — those concepts don't exist in superpowers' skills. + +4. **Graceful degradation**: The `executing-plans` skill is a fallback for platforms without subagent support. Instead of dispatching parallel subagents, the agent executes plan steps inline. superpowers explicitly states: "Superpowers performs significantly better with subagent access." + +5. **Single agent, not a fleet**: superpowers has 1 agent (code-reviewer). Our 30 agents and 19 review agents are a much larger portability surface. + +## Portability Assessment + +### What is portable today (no changes needed) + +| Layer | Files | Notes | +|-------|-------|-------| +| Knowledge files (4) | review-rubric, owasp-detection, domain-modeling, architecture-assessment | Pure markdown reference. Portable as-is. | +| Skill content (32) | All SKILL.md files | The *content* of skills (patterns, guidelines, procedures) is platform-agnostic markdown. The *frontmatter* is Claude Code-specific. | +| Prompt templates (4+) | plan-review-*.md, future implementer/spec/quality | Pure prompt text. Portable as-is. | +| Knowledge file content (2) | agent-registry, review-template | Content is portable but references Claude Code tool names. | + +### What requires an adapter layer + +| Layer | Files | Adaptation needed | +|-------|-------|-----------------| +| Agent files (30) | All agents | Strip `tools:` and `model:` frontmatter for other platforms, or map to platform equivalents. | +| Command files (29) | All commands | Strip `allowed-tools:` for platforms without scoped permissions. Degrade gracefully. | +| Skill frontmatter (32) | All skills | Simplify to name + description only for non-Claude platforms. | +| Plugin manifest (1) | `.claude-plugin/plugin.json` | Create parallel manifests for each platform. | + +### What cannot be ported without redesign + +| Feature | Why | Impact | +|---------|-----|--------| +| **Multi-agent orchestration** (Agent tool + model routing) | Other platforms have no equivalent to dispatching subagents with model override and worktree isolation. | The entire orchestrator workflow (Research → Plan → Implement with parallel subagents, inline review checkpoints, model-routed review agents) would need a fallback. | +| **Hook-based guards** (8 scripts) | Hooks are a Claude Code-specific runtime feature. Pre-tool guards, TDD enforcement, and real-time review checks have no equivalent elsewhere. | Safety rails and automated quality checks would be manual-only. | +| **Scoped tool permissions** | `allowed-tools:` restricts what each command can do. Other platforms don't have this security model. | Commands would have full tool access on other platforms — less secure. | + +## Suggested Approach + +### Option 1: Minimal adapter (Low effort, limited reach) + +Add platform manifests that point to the existing skills and knowledge files. Accept that orchestration, hooks, and review agents don't work on other platforms. The plugin becomes a "knowledge library" on non-Claude platforms. + +**Effort**: Small (create manifest files + SessionStart hook per platform) +**Value**: Low (most of the plugin's value IS the orchestration) + +### Option 2: Graceful degradation (Medium effort, moderate reach) + +Like superpowers' approach: add an `executing-plans` fallback mode where the agent works inline instead of dispatching subagents. Hooks degrade to "manual checklist" instructions. Review agents degrade to a single inline review pass. + +**What this looks like**: +1. Add platform manifests (`.cursor-plugin/`, `.codex/`, etc.) +2. Create a SessionStart hook that detects platform and sets a `PLATFORM` variable +3. Create a `skills/inline-execution/SKILL.md` fallback for platforms without Agent tool +4. Modify the orchestrator to check platform and switch between multi-agent and inline modes +5. Convert hook-based guards to skill-based checklists (manual enforcement) +6. Strip `tools:`, `model:`, `allowed-tools:` from agent/command files on other platforms (or make them optional with sensible defaults) + +**Effort**: Large +**Value**: Moderate (the inline fallback is significantly less capable than the full orchestration) +**Risk**: Maintaining two execution paths (multi-agent + inline) doubles the testing and maintenance burden. Every new feature needs both paths. + +### Option 3: Platform abstraction layer (High effort, full reach) + +Create an abstraction layer that maps platform-specific capabilities to a common interface. Agent dispatch, tool access, and hooks are abstracted behind platform adapters. + +**Effort**: XL +**Value**: High (full functionality on all platforms) +**Risk**: Over-engineering. The abstraction layer becomes its own maintenance burden. Claude Code is our primary platform — optimizing for 5 other platforms that may never have equivalent features is speculative. + +## Recommendation + +**Do not pursue multi-platform support now.** The cost-benefit analysis doesn't justify it: + +- 97% of our files have Claude Code dependencies +- Our core value proposition (multi-agent orchestration, review agent fleet, hook-based guards) depends on Claude Code features that other platforms don't have +- superpowers can be multi-platform because it's a workflow discipline tool (14 skills, 1 agent). We're an orchestration platform (30 agents, 29 commands, 8 hooks). The portability surface is fundamentally different. + +**Revisit when**: +1. Other platforms add subagent dispatch (Cursor is most likely to get this) +2. A significant user base requests it +3. A platform-agnostic agent dispatch standard emerges + +**Quick win available now**: Extract the 4 portable knowledge files + skill content (without frontmatter) into a standalone "reference library" that other tools can consume. This is useful for teams that want our detection patterns and rubrics without the orchestration. Effort: Small. + +## Per-Platform Effort Estimates + +| Platform | Effort | Biggest risk | +|----------|--------|-------------| +| **Cursor** | L | No subagent dispatch; hooks require cursor-specific format; closest to Claude Code in capability | +| **Codex (OpenAI)** | XL | Fundamentally different agent model; AGENTS.md format; no hooks; different model names | +| **OpenCode** | L | Smaller platform; limited docs on plugin capabilities | +| **Gemini CLI** | XL | Different model family entirely (Gemini not Claude); extension format; no subagent dispatch | +| **GitHub Copilot CLI** | XL | Least mature agent platform; minimal plugin system | diff --git a/docs/specs/plugin-restructure.md b/docs/specs/plugin-restructure.md deleted file mode 100644 index 19ba331..0000000 --- a/docs/specs/plugin-restructure.md +++ /dev/null @@ -1,108 +0,0 @@ -# Specification: Plugin Directory Restructure - -**Created**: 2026-04-02 -**Status**: approved - -## Intent Description - -**What**: Restructure the repository to separate marketplace from plugin. Move all plugin components into `plugins/agentic-dev-team/`, move hook registrations into a plugin-level `settings.json`, split CLAUDE.md into a plugin-facing config (ships with plugin) and a dev-facing config (stays at root). Remove `dev-setup.sh` in favor of `claude plugin install --scope project` from the local path. - -**Why**: The repo conflates marketplace and plugin. Dev artifacts ship with the plugin, hooks are in a user config location, and the structure can't host multiple plugins. This follows the recommended marketplace structure and ensures hooks distribute with the plugin. - -**Scope**: Directory restructure + hook registration move + CLAUDE.md split. No behavioral changes to any agent, command, skill, or hook. - -## User-Facing Behavior - -```gherkin -Feature: Plugin directory restructure - - Scenario: Plugin installs from marketplace with correct structure - Given the marketplace.json source points to ./plugins/agentic-dev-team - When a user runs claude plugin install agentic-dev-team - Then the plugin is installed with agents, commands, skills, and hooks - And hook registrations from the plugin's settings.json are active - - Scenario: Hooks are registered via plugin settings.json - Given the plugin has a settings.json at plugins/agentic-dev-team/settings.json - When the plugin is installed - Then PreToolUse and PostToolUse hooks are active - And no .claude/settings.json is required for hook registration - - Scenario: Dev artifacts are not shipped with the plugin - Given docs, plans, evals, and reports directories exist at the marketplace root - When the plugin is installed from plugins/agentic-dev-team - Then only plugin components are installed - And dev artifacts at the marketplace root are not included - - Scenario: Plugin CLAUDE.md ships with plugin - Given plugins/agentic-dev-team/CLAUDE.md contains the orchestration pipeline config - When the plugin is installed - Then users see the pipeline config as project instructions - - Scenario: Dev CLAUDE.md guides plugin development - Given the root CLAUDE.md contains development instructions - When a developer works on the marketplace repo - Then they see instructions for adding agents, testing, and contributing - - Scenario: Local plugin testing without symlinks - Given a developer wants to test plugin changes locally - When they run claude plugin install --scope project ./plugins/agentic-dev-team - Then the plugin is installed from the local path - And dev-setup.sh is no longer needed - - Scenario: release-please updates the correct files - Given release-please config points to plugins/agentic-dev-team/.claude-plugin/plugin.json - When a release PR is created - Then the version is updated in the plugin's plugin.json - And the version is updated in .claude-plugin/marketplace.json at the repo root -``` - -## Architecture Specification - -**Components moved** (root → `plugins/agentic-dev-team/`): -- `agents/`, `commands/`, `skills/`, `hooks/`, `knowledge/`, `prompts/`, `templates/` -- `CLAUDE.md` (pipeline config) → `plugins/agentic-dev-team/CLAUDE.md` -- `install.sh` → `plugins/agentic-dev-team/install.sh` -- `.claude-plugin/plugin.json` → `plugins/agentic-dev-team/.claude-plugin/plugin.json` - -**Components created**: -- `plugins/agentic-dev-team/settings.json` — hook registrations from `.claude/settings.json` -- Root `CLAUDE.md` — rewritten as dev instructions (based on current `.claude/CLAUDE.md`) - -**Components updated at marketplace root**: -- `.claude-plugin/marketplace.json` — source: `"./"` → `"./plugins/agentic-dev-team"` -- `.claude/settings.json` — stripped to `enabledPlugins` only -- `release-please-config.json` — extra-files paths updated -- `README.md` — updated paths and install instructions -- `GETTING-STARTED.md` — updated paths - -**Components removed**: -- `dev-setup.sh` — replaced by `claude plugin install --scope project ./plugins/agentic-dev-team` -- `.claude/CLAUDE.md` — content merged into new root `CLAUDE.md` - -**Components that stay at marketplace root**: -- `.claude-plugin/marketplace.json` -- `docs/`, `plans/`, `evals/`, `reports/`, `memory/` -- `README.md`, `GETTING-STARTED.md`, `LICENSE` -- `release-please-config.json`, `.release-please-manifest.json` -- `.gitignore`, `.github/` - -**release-please extra-files**: -- `plugins/agentic-dev-team/.claude-plugin/plugin.json` (version field) -- `.claude-plugin/marketplace.json` (stays at root, JSONPath `$.plugins[0].version`) - -## Acceptance Criteria - -- [ ] All plugin components live under `plugins/agentic-dev-team/` -- [ ] `marketplace.json` stays at root, source points to `./plugins/agentic-dev-team` -- [ ] `plugins/agentic-dev-team/settings.json` contains all hook registrations -- [ ] `.claude/settings.json` contains only `enabledPlugins` (no hooks) -- [ ] `plugins/agentic-dev-team/.claude-plugin/plugin.json` exists with correct version -- [ ] `plugins/agentic-dev-team/CLAUDE.md` contains the orchestration pipeline config -- [ ] Root `CLAUDE.md` contains development instructions for the repo -- [ ] `release-please-config.json` extra-files point to new plugin.json path -- [ ] Dev artifacts remain at repo root, not shipped with plugin -- [ ] `dev-setup.sh` is removed -- [ ] `.claude/CLAUDE.md` is removed (content in root CLAUDE.md) -- [ ] `git mv` used for moves to preserve history -- [ ] All internal relative paths within the plugin still resolve diff --git a/docs/specs/pre-commit-review.md b/docs/specs/pre-commit-review.md deleted file mode 100644 index a2cfa6c..0000000 --- a/docs/specs/pre-commit-review.md +++ /dev/null @@ -1,95 +0,0 @@ -# Specification: Automated Pre-Commit Code Review - -**Created**: 2026-04-02 -**Status**: approved - -## Intent Description - -**What**: Run the `/code-review --changed` flow automatically before every commit, blocking the commit until review agents have passed. Uses a temp file gate — the review writes a `.review-passed` file with a hash of staged files, and the pre-commit hook only allows commits when the hash matches. - -**Why**: Currently, code review only happens when explicitly invoked. Making it automatic ensures every commit has been reviewed, catching issues before they enter git history. Warnings are surfaced to the user at review time for a case-by-case decision rather than pre-configured as pass or block. - -**Scope**: Pre-commit automation of the existing review flow only. Does not change `/code-review` itself, add new review agents, or modify the review output format. - -## User-Facing Behavior - -```gherkin -Feature: Automated pre-commit code review - - Scenario: First commit attempt is blocked with review instruction - Given changed files are staged for commit - And no .review-passed file exists for the current staged files - When the user or agent attempts to commit - Then the commit is blocked - And the hook instructs Claude to run /code-review --changed - - Scenario: Commit proceeds after passing review - Given Claude has run /code-review --changed and it returned pass or warn - And a .review-passed file exists with a hash matching the staged files - When the user or agent attempts to commit - Then the commit proceeds normally - And the .review-passed file is deleted after the commit - - Scenario: Commit remains blocked after failing review - Given Claude has run /code-review --changed and it returned fail - And no .review-passed file was written - When the user or agent attempts to commit again - Then the commit is blocked - And the failure details are displayed - - Scenario: Staged files change after review passes - Given a .review-passed file exists from a prior review - And the user stages additional files after the review - When the user or agent attempts to commit - Then the commit is blocked because the staged file hash no longer matches - And the hook instructs Claude to re-run /code-review --changed - - Scenario: Warnings are surfaced for human decision - Given the review returned warn status - When the review completes - Then the warning details are displayed to the user - And a .review-passed file is written (warnings do not block) - And the user decides per-warning whether to fix or proceed - - Scenario: Review can be bypassed - Given the user needs to commit without review - When the user commits with --no-verify - Then the commit proceeds without review -``` - -## Architecture Specification - -**Components affected**: -1. `hooks/pre-commit-review.sh` — rewrite as a `PreToolUse` hook on `Bash` that detects `git commit` commands, checks for `.review-passed` with matching staged-file hash, blocks (exit 2) if missing, allows if matching -2. `.claude/settings.json` — add the new hook to `PreToolUse` matcher for `Bash` -3. `/code-review` command (`commands/code-review.md`) — add a step: when invoked with `--changed` and review passes (pass or warn), write `.review-passed` containing the hash of reviewed files -4. `.gitignore` — add `.review-passed` - -**Gate mechanism**: -- Hash = sorted list of staged file paths piped through `shasum` -- `.review-passed` contains the hash string -- Hook computes hash of current staged files and compares to file contents -- Match → allow commit (exit 0). Mismatch or missing → block (exit 2) -- Post-commit: hook or commit success deletes `.review-passed` - -**File scope**: The hook checks all staged files, not just JS/TS. The review agents themselves decide which files are relevant to their scope. - -**Bypass**: `git commit --no-verify` skips all hooks including this one. This is the standard git bypass mechanism. - -**Dependencies**: -- Claude Code `PreToolUse` hook system (exit 2 = block) -- `shasum` (available on macOS and Linux) -- Existing `/code-review --changed --json` command - -## Acceptance Criteria - -- [ ] A `PreToolUse` hook on `Bash` detects `git commit` commands and blocks them (exit 2) when no valid `.review-passed` file exists -- [ ] The hook computes a hash of staged file paths and compares to `.review-passed` contents -- [ ] `/code-review --changed` writes `.review-passed` with the staged-file hash when review returns pass or warn -- [ ] The commit proceeds on the second attempt when `.review-passed` hash matches staged files -- [ ] Staging new files after review invalidates the gate (hash mismatch → re-review required) -- [ ] `.review-passed` is deleted after a successful commit -- [ ] `.review-passed` is in `.gitignore` -- [ ] `git commit --no-verify` bypasses the review gate -- [ ] The old advisory-only `pre-commit-review.sh` is replaced -- [ ] All changed files are checked, not just JS/TS diff --git a/docs/specs/release-please.md b/docs/specs/release-please.md deleted file mode 100644 index e9e6521..0000000 --- a/docs/specs/release-please.md +++ /dev/null @@ -1,91 +0,0 @@ -# Specification: Automated Release Management with release-please - -**Created**: 2026-04-02 -**Status**: approved - -## Intent Description - -**What**: Add a GitHub Actions workflow that uses release-please to automate version bumping, CHANGELOG generation, and GitHub Release creation for the agentic-dev-team plugin repository. - -**Why**: The repository currently has no release process — no git tags, no CHANGELOG, and no GitHub Releases. The husky pre-commit hook auto-increments a patch version on every commit, which produces meaningless version numbers. Adopting release-please will derive version bumps from conventional commit messages, produce a human-readable CHANGELOG, and create tagged GitHub Releases automatically when a release PR is merged. - -**Scope**: Release-please integration only. No other CI workflows. No publishing beyond GitHub Releases. Husky is fully removed since it exists only for the auto-bump hook. - -## User-Facing Behavior - -```gherkin -Feature: Automated release management with release-please - - Scenario: Release PR is created from conventional commits - Given the main branch has new commits since the last release - And at least one commit uses a releasable prefix (feat, fix, refactor, perf) - When a push to main triggers the release-please workflow - Then release-please opens or updates a "release PR" on GitHub - And the PR title contains the next version number - And the PR body contains a generated CHANGELOG - - Scenario: GitHub Release is created when release PR is merged - Given a release-please PR exists and is approved - When the release PR is merged to main - Then a GitHub Release is created with the new version tag - And the release notes contain the CHANGELOG entries - And plugin.json version is updated to match the release version - And marketplace.json version is updated to match the release version - - Scenario: Version bump follows conventional commit semantics - Given commits since the last release include a "feat:" commit - Then the minor version is incremented - Given commits since the last release include only "fix:" commits - Then the patch version is incremented - Given commits since the last release include a breaking change - Then the major version is incremented - - Scenario: Non-releasable commits are excluded from CHANGELOG - Given the main branch has only non-releasable commits (docs, chore, ci) - When a push to main triggers the release-please workflow - Then no release PR is created or updated - And docs commits do not appear in the CHANGELOG - - Scenario: Husky and auto-bump hook are fully removed - When release-please owns versioning - Then the .husky directory no longer exists - And husky is removed from devDependencies - And package.json is removed from the repository - And plugin.json version is only updated by release-please -``` - -## Architecture Specification - -**Components affected**: -1. `.github/workflows/release-please.yml` — new GitHub Actions workflow -2. `release-please-config.json` — configuration (changelog sections, extra-files) -3. `.release-please-manifest.json` — tracks current version, seeded from current plugin.json -4. `.claude-plugin/plugin.json` — version managed by release-please -5. `.claude-plugin/marketplace.json` — version managed by release-please via `extra-files` with JSONPath (`.plugins[0].version`) -6. `.husky/` — entire directory removed -7. `package.json` — removed (only existed for husky devDependency) - -**Interfaces**: -- release-please reads conventional commits from git history on `main` -- release-please writes version to `plugin.json` and `marketplace.json` via `extra-files` -- Trigger: `push` to `main` branch - -**Constraints**: -- Version source of truth: `plugin.json` -- `docs:` commits excluded from CHANGELOG (configure `changelog-sections`) -- Initial version seeded from current `plugin.json` value -- Workflow permissions: `contents: write`, `pull-requests: write` - -**Dependencies**: -- `google-github-actions/release-please-action` v4 - -## Acceptance Criteria - -- [ ] A push to `main` with releasable commits creates/updates a release PR -- [ ] Merging the release PR creates a GitHub Release with a semver tag (e.g., `v1.3.1`) -- [ ] The release PR updates `plugin.json` and `marketplace.json` versions -- [ ] CHANGELOG excludes `docs:`, `chore:`, and `ci:` commits -- [ ] Non-releasable commits do not trigger a version bump or release PR -- [ ] `.husky/` directory is fully removed -- [ ] `package.json` is removed -- [ ] Initial release version is seeded from current `plugin.json` version diff --git a/docs/specs/specs-persist-output-to-file.md b/docs/specs/specs-persist-output-to-file.md deleted file mode 100644 index a1e9f85..0000000 --- a/docs/specs/specs-persist-output-to-file.md +++ /dev/null @@ -1,90 +0,0 @@ -# Spec: Specs command persists output to file - -## Intent Description - -The `/specs` command produces four specification artifacts (Intent, BDD scenarios, Architecture notes, Acceptance Criteria) through a collaborative loop with the user. Currently, the results exist only in the conversation — they are lost when the session ends and cannot be referenced by downstream commands (`/plan`, `/build`, spec-compliance-review). - -This change makes `/specs` persist its output to `docs/specs/` as a structured markdown file, so that: -- `/plan` can read the spec artifacts when generating implementation steps -- `spec-compliance-review` can diff implementation against the written spec -- Specs survive session boundaries -- The output guardrail ("write to files, not chat") is honored - -## User-Facing Behavior - -```gherkin -Feature: Specs command persists output to file - - Scenario: Specs output is saved after consistency gate passes - Given the user runs /specs with a feature description - And all four artifacts pass the consistency gate - When the specification is finalized - Then a markdown file should be created at docs/specs/.md - And the file should contain all four artifacts (Intent, BDD scenarios, Architecture notes, Acceptance Criteria) - And the consistency gate verdict should be included - - Scenario: File name is derived from the feature description - Given the user runs /specs with "user login with MFA" - When the specification is finalized - Then the output file should be named docs/specs/user-login-with-mfa.md - - Scenario: Existing spec file is not overwritten without confirmation - Given a spec file already exists at docs/specs/user-login-with-mfa.md - When the user runs /specs for the same feature - Then the user should be asked whether to overwrite or create a versioned file -``` - -## Architecture Specification - -**Components affected:** -- `plugins/agentic-dev-team/skills/specs.md` — add file output instructions at the end of the workflow - -**No new files created in the plugin.** The `docs/specs/` directory is created in the consuming project at runtime. - -**Output format:** A single markdown file with H2 sections for each artifact: - -```markdown -# Spec: - -## Intent Description -... - -## User-Facing Behavior -... - -## Architecture Specification -... - -## Acceptance Criteria -... - -## Consistency Gate -- [x] Intent is unambiguous -- [x] Every behavior has a corresponding BDD scenario -- [x] Architecture constrains without over-engineering -- [x] Terminology consistent across artifacts -- [x] No contradictions between artifacts -``` - -**Constraints:** -- File path: `docs/specs/.md` -- Slugify: lowercase, hyphens for spaces, strip special characters -- The skill already says "no code during specification phase" — file output is an artifact, not implementation - -## Acceptance Criteria - -1. After the consistency gate passes, a file exists at `docs/specs/.md` containing all four artifacts -2. The file includes the consistency gate checklist with pass/fail marks -3. If `docs/specs/` does not exist, it is created -4. If a file with the same slug already exists, the user is prompted before overwriting -5. The file path is printed to chat so the user can find it - -## Consistency Gate - -- [x] Intent is unambiguous -- [x] Every behavior has a corresponding BDD scenario -- [x] Architecture constrains without over-engineering -- [x] Terminology consistent across artifacts -- [x] No contradictions between artifacts - -**Gate: PASS** diff --git a/docs/specs/superpowers-gap-closure.md b/docs/specs/superpowers-gap-closure.md new file mode 100644 index 0000000..d49e564 --- /dev/null +++ b/docs/specs/superpowers-gap-closure.md @@ -0,0 +1,1028 @@ +# Spec: Superpowers Gap Closure + +Closes gaps identified in the [competitive analysis](../../reports/competitive-analysis-2026-04-15.md) against [obra/superpowers](https://github.com/obra/superpowers). Seven core implementation slices (A-G) plus three platform support slices (H1-H3). Multi-platform research document at [docs/specs/multi-platform-support-suggestions.md](multi-platform-support-suggestions.md). + +## Dependency Graph + +``` +A (anti-rationalization knowledge) +├── F (TDD skill depth) — cross-references A's knowledge file +E (subagent status codes + prompt templates) +├── G (worktree setup) — uses BLOCKED status + implementer.md +B (code review reception) — independent +C (skill authoring enhancements) — independent (touches agent-eval) +D (debugging supporting files) — independent +H1 (Windows hooks) — independent +H2 (Gemini CLI support) — independent +H3 (OpenAI Codex support) — independent +``` + +## Cross-Slice Notes + +- **Canonical term**: "anti-rationalization" is the standard term across all slices. Do not use "rationalization prevention" or "rationalization bulletproofing" as synonyms — those are technique names within the concept. +- **Merge targets**: `knowledge/agent-registry.md` and `CLAUDE.md` are modified by multiple slices (A, B, D, E, F, G). Implement in dependency order to avoid merge conflicts. +- **Implementer prompt cross-deps**: Slice E creates `prompts/implementer.md`. Slice F's testing-anti-patterns reference should be mentioned in the implementer prompt. Slice G adds worktree setup to the implementer prompt. Implement E → F's implementer reference → G. + +--- + +## Slice A: Anti-Rationalization Knowledge + +### Intent Description + +Create a shared knowledge file that catalogs LLM anti-rationalization patterns — the plausible excuses agents generate to skip hard steps across all skills. Currently, the TDD skill and systematic-debugging skill each have their own rationalization tables, but other skills (Quality Gate Pipeline, verification evidence, code review) lack this defense. The knowledge file becomes a reusable reference that any skill can point to, and the existing TDD/debugging tables remain in place as domain-specific supplements that the knowledge file cross-references by link. + +This slice modifies only the Quality Gate Pipeline skill — adding an anti-rationalization reference to the existing Phase 2 "Red Flag Language" block. The TDD skill cross-reference is Slice F's responsibility. + +### User-Facing Behavior + +```gherkin +Feature: Anti-rationalization knowledge file + + Scenario: Agent encounters rationalization during TDD + Given an agent is following the TDD skill + When the agent generates an excuse to skip writing a test first + Then the excuse matches a pattern in the anti-rationalization knowledge file + And the agent recognizes it as rationalization and restarts from RED + + Scenario: Agent encounters rationalization during quality gate + Given an agent is running the Quality Gate Pipeline Phase 2 + When the agent generates red-flag language like "should work now" or "I believe" + Then the agent detects the language as an anti-rationalization signal + And the agent pauses to verify before claiming completion + + Scenario: New skill references anti-rationalization knowledge + Given a developer is authoring a new skill + When the skill has steps that agents commonly skip + Then the developer can reference the anti-rationalization knowledge file + And add domain-specific rationalizations to a table within the new skill + + Scenario: Anti-rationalization knowledge covers cross-cutting patterns + Given the anti-rationalization knowledge file exists + Then it contains at minimum these categories: + | category | + | Skipping verification | + | Skipping tests | + | Scope expansion | + | Premature completion claims | + | Process shortcuts | + And each pattern includes the excuse text and a reality counter + + Scenario: Unlisted rationalization is still caught + Given the anti-rationalization knowledge file exists + When an agent generates an excuse not explicitly listed + Then the catch-all rule applies: "If the excuse isn't listed here, it's still an excuse" + And the agent treats it as rationalization and follows the skill's restart protocol + + Scenario: Knowledge file cross-references domain-specific tables + Given the anti-rationalization knowledge file exists + Then it links to the TDD skill's rationalization table for test-specific patterns + And it links to the systematic-debugging skill's rationalization table for debugging-specific patterns + And it does NOT duplicate those tables' contents +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/knowledge/anti-rationalization.md` +- Modified: `plugins/agentic-dev-team/skills/quality-gate-pipeline/SKILL.md` — add anti-rationalization reference to the existing Phase 2 "Red Flag Language" block (lines 99-101) and to the Phase 1 "Hallucination Detection Signals" section +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — register the new knowledge file + +**Interfaces**: The knowledge file is a passive reference document. Skills reference it with a markdown link. Agents load it on demand when they need to cross-check excuses. + +**Constraints**: +- Do NOT duplicate the existing TDD or systematic-debugging rationalization tables into the knowledge file. Those tables are domain-specific and stay where they are. The knowledge file covers cross-cutting patterns only and links to those tables. +- Keep the file under 600 tokens — it's loaded on demand but should stay lean. +- Do NOT create a "new Red Flag Language section" — add the reference to the existing block. + +**Dependencies**: None — pure documentation addition. + +### Acceptance Criteria + +- [ ] `knowledge/anti-rationalization.md` exists with at least 5 categories of rationalization patterns +- [ ] Each pattern has: excuse text, reality counter, which skills it commonly appears in +- [ ] Knowledge file includes a catch-all statement at the top: "If the excuse isn't listed here, it's still an excuse" +- [ ] Knowledge file cross-references TDD and debugging tables by link rather than duplicating their patterns +- [ ] `quality-gate-pipeline/SKILL.md` existing Phase 2 "Red Flag Language" block references the knowledge file +- [ ] `quality-gate-pipeline/SKILL.md` Phase 1 "Hallucination Detection Signals" references the knowledge file +- [ ] `knowledge/agent-registry.md` includes the new file in the Knowledge Files table +- [ ] The knowledge file is under 600 tokens +- [ ] Existing TDD and debugging rationalization tables are NOT modified (Slice F handles TDD cross-ref) + +--- + +## Slice B: Code Review Reception Skill + +### Intent Description + +Create a new skill that defines behavioral constraints for how agents respond to code review feedback — whether from `/code-review`, `/apply-fixes`, or human reviewers. Currently, agents blindly accept all review findings and implement every suggestion without critical evaluation. This is a known LLM failure mode: performative agreement ("You're absolutely right!") followed by uncritical implementation of suggestions that may be wrong, unnecessary, or scope-expanding. The skill enforces technical verification before implementing any suggestion, mandates reasoned pushback when a suggestion would make the code worse, and includes a YAGNI gate to prevent gold-plating in response to reviews. + +Human feedback has higher authority than agent-generated feedback — the agent can push back with reasoning but defers to the human's final decision after one round. Tone is concise for both audiences. + +### User-Facing Behavior + +```gherkin +Feature: Code review reception discipline + + Scenario: Agent receives a valid review finding + Given an agent has received code review feedback + And the finding is technically correct and addresses a real issue + When the agent evaluates the finding + Then the agent verifies the finding against the actual code + And implements the fix with verification evidence + + Scenario: Agent receives an incorrect review finding + Given an agent has received code review feedback + And the finding is technically incorrect or based on a misunderstanding + When the agent evaluates the finding + Then the agent states why the finding is incorrect with specific code references + And does NOT implement the suggested change + And does NOT use performative agreement language + + Scenario: Agent receives a valid but YAGNI suggestion + Given an agent has received code review feedback + And the finding suggests adding capability beyond current requirements + When the agent evaluates the finding + Then the agent identifies it as scope expansion + And declines to implement with a YAGNI justification + And logs the suggestion for future consideration + + Scenario: Agent receives a subjective style preference + Given an agent has received code review feedback + And the finding is a style preference not backed by project conventions + When the agent evaluates the finding + Then the agent checks project conventions and linting rules + And only implements if a convention or rule supports the change + + Scenario: Performative agreement language is blocked + Given an agent is about to respond to code review feedback + When the agent drafts a response containing phrases like: + | phrase | + | You're absolutely right | + | Great catch | + | Of course, I should have | + | That's a good point | + Then the agent replaces the performative language + And responds with technical evaluation only + + Scenario: Agent verifies before implementing any suggestion + Given an agent has received a review suggestion to change code + When the agent decides to implement the suggestion + Then the agent first reads the relevant code to verify the issue exists + And confirms the suggested fix would not introduce regressions + And only then applies the change + + Scenario: Agent receives feedback from a human reviewer + Given an agent has received code review feedback from a human + And the agent believes the feedback is incorrect + When the agent evaluates the finding + Then the agent states its technical reasoning concisely + And defers to the human's final decision after one round of pushback + + Scenario: Agent receives ambiguous feedback + Given an agent has received code review feedback + And the finding is ambiguous — neither clearly correct nor clearly incorrect + When the agent evaluates the finding + Then the agent escalates the ambiguous finding to the human for clarification + And does NOT guess at the reviewer's intent +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/skills/receiving-code-review/SKILL.md` +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — register the new skill +- Modified: `plugins/agentic-dev-team/CLAUDE.md` — add to skills quick reference count and skills-by-phase table (Review phase) + +**Interfaces**: This skill is behavioral — loaded by any agent that receives review feedback. Primary consumer contexts (agents running within these commands load the skill): +- The `/apply-fixes` command +- The implementer subagent (receives inline review checkpoint results during `/build`) +- Any agent in a Phase 3 review-correction loop + +No modifications to those commands are needed — they are listed as consumer contexts, not files to edit. + +**Constraints**: +- The skill is behavioral only — it constrains how agents respond to feedback, it does not modify review output format or review agent behavior +- It must not conflict with the Quality Gate Pipeline Phase 3 review-correction loop. That loop says "fix critical/major defects." This skill adds: "but verify each finding is real before fixing it" +- Keep under 500 tokens — it's a discipline skill, not a technique manual + +**Dependencies**: Quality Gate Pipeline (Phase 3 review-correction loop) + +### Acceptance Criteria + +- [ ] `skills/receiving-code-review/SKILL.md` exists with frontmatter +- [ ] Skill includes a banned-phrases list for performative agreement +- [ ] Skill includes a verification-before-implementation gate +- [ ] Skill includes a YAGNI gate for scope-expanding suggestions +- [ ] Skill includes a rationalization prevention table (agents rationalize agreeing too, not just skipping) +- [ ] Skill distinguishes human feedback (higher authority, defer after one pushback) from agent feedback (full technical challenge) +- [ ] Skill includes guidance for ambiguous findings: escalate to human, don't guess +- [ ] Tone guidance: concise technical evaluation for both audiences, no emotional language +- [ ] `knowledge/agent-registry.md` includes the new skill +- [ ] `CLAUDE.md` quick reference updated with skill count and skills-by-phase table +- [ ] Skill is under 500 tokens +- [ ] Skill does not conflict with Quality Gate Pipeline Phase 3 + +--- + +## Slice C: Skill Authoring Enhancements (Pressure Testing + CSO) + +### Intent Description + +Enhance the existing `agent-skill-authoring` skill with two additions drawn from superpowers' `writing-skills` methodology. First, **pressure testing** — a structured process for testing whether a skill's instructions hold up under adversarial conditions (the agent is deep in implementation, eager to deliver, and generating rationalizations). Pressure scenarios are saved as eval fixtures and integrated into `/agent-eval`. Second, **Claude Search Optimization (CSO)** refinement — the existing skill already has guidance about description optimization, but this needs strengthening with concrete examples and a pass/fail checklist. + +### User-Facing Behavior + +```gherkin +Feature: Skill authoring pressure testing and description optimization + + Scenario: Author pressure-tests a new skill + Given a developer has written a new skill + When the developer follows the skill authoring guide + Then the guide instructs them to run the task WITHOUT the skill first + And observe natural failure modes + And write pressure scenarios that probe each failure mode + And verify the skill prevents each failure when loaded + + Scenario: Pressure scenario catches a skill weakness + Given a skill has been written with constraints + And a pressure scenario simulates an agent rationalizing around a constraint + When the pressure scenario is executed + Then the skill either prevents the rationalization or the weakness is identified + And the author strengthens the skill to close the gap + + Scenario: Pressure scenarios are saved as eval fixtures + Given a developer has written pressure scenarios for a skill + When the developer follows the authoring guide + Then the scenarios are saved in the evals directory alongside agent eval fixtures + And each scenario specifies the skill, the adversarial condition, and expected behavior + + Scenario: agent-eval runs pressure scenarios against skills + Given pressure scenario fixtures exist for a skill + When the user runs /agent-eval + Then the eval framework executes each pressure scenario + And reports whether the skill prevented the adversarial behavior + + Scenario: Malformed pressure fixture is reported + Given a pressure scenario fixture exists but has invalid format + When the user runs /agent-eval + Then the eval framework reports a parse error for the malformed fixture + And identifies which fixture file failed and what is wrong with it + + Scenario: Skill description follows CSO guidelines + Given a developer is writing a skill description + When the developer follows the authoring guide + Then the description contains ONLY triggering conditions + And the description does NOT summarize the skill's workflow or steps + And the description does NOT list the skill's internal structure + + Scenario: CSO checklist catches a bad description + Given a skill has a description that summarizes its workflow + When the author runs the CSO checklist + Then the checklist flags the description as problematic + And suggests rewriting to focus on when/why to trigger +``` + +### Architecture Specification + +**Components affected**: +- Modified: `plugins/agentic-dev-team/skills/agent-skill-authoring/SKILL.md` — expand "Apply TDD to skill-writing itself" into a pressure testing procedure and strengthen "Optimize skill descriptions for triggering" into a CSO checklist +- Modified: `plugins/agentic-dev-team/commands/agent-eval.md` — add pressure scenario fixture support alongside agent eval fixtures +- Fixture location: `evals/pressure/` directory, alongside existing `evals/` fixtures + +**Interfaces**: No new interfaces. The authoring skill is consumed by anyone authoring skills. The eval command gains a new fixture type. + +**Constraints**: +- The existing skill already has the "Apply TDD to skill-writing itself" and "Optimize skill descriptions for triggering" sections. Enhance these in place — do not create parallel sections. +- Keep total skill file under 1,200 tokens (currently ~990 tokens per registry) +- Do not add a separate knowledge file for this — the guidance belongs inline in the authoring skill + +**Dependencies**: None for the skill edit. The eval integration depends on the existing `/agent-eval` command structure. + +### Acceptance Criteria + +- [ ] "Apply TDD to skill-writing itself" section expanded with a concrete pressure testing procedure: (1) run without skill, (2) catalog failure modes, (3) write pressure scenarios, (4) verify skill prevents each failure +- [ ] At least 3 example pressure scenarios included as templates (e.g., "agent is 80% through implementation and wants to skip the verification step") +- [ ] Pressure scenario fixture format defined (skill name, adversarial condition, expected agent behavior, pass/fail criteria) +- [ ] Fixtures saved to `evals/pressure/` directory +- [ ] `commands/agent-eval.md` updated to mention skill pressure scenarios alongside agent eval fixtures +- [ ] Eval framework reports parse errors for malformed pressure fixtures +- [ ] "Optimize skill descriptions for triggering" expanded into a CSO checklist with pass/fail criteria +- [ ] CSO checklist includes at least 2 "good" and 2 "bad" description examples +- [ ] Total skill file stays under 1,200 tokens +- [ ] `knowledge/agent-registry.md` updated if `/agent-eval` description changes + +--- + +## Slice D: Systematic Debugging Supporting Files + +### Intent Description + +Add three supporting reference files to the systematic-debugging skill directory that provide concrete, reusable techniques agents can load on demand during debugging. Currently, the skill defines a solid 4-phase process but is procedural — it tells agents *what* to do (investigate, trace, hypothesize) without providing detailed *how-to* techniques. We'll add the three highest-value ones: root-cause tracing (backward call-chain analysis), condition-based waiting (replacing arbitrary sleeps/timeouts in tests with polling), and a test polluter finder (language-agnostic bisection algorithm for identifying which test pollutes shared state). + +### User-Facing Behavior + +```gherkin +Feature: Systematic debugging supporting reference files + + Scenario: Agent uses root-cause tracing during investigation + Given an agent is in Phase 2 (Investigate) of systematic debugging + And the failure involves a value that is wrong at the point of use + When the agent loads the root-cause-tracing reference + Then the agent traces backward through the call chain from symptom to origin + And identifies the layer where the value first diverges from expected + + Scenario: Agent replaces arbitrary timeout with condition-based waiting + Given an agent is debugging a flaky test + And the test uses sleep or setTimeout to wait for an async condition + When the agent loads the condition-based-waiting reference + Then the agent replaces the arbitrary wait with a polling pattern + And the polling pattern has a timeout ceiling and descriptive error on timeout + + Scenario: Agent identifies a test polluter + Given an agent is debugging a test that passes in isolation but fails in suite + And the failure is caused by shared state pollution from another test + When the agent loads the find-polluter reference + Then the agent uses bisection to identify which prior test pollutes the state + And the bisection narrows to the specific polluting test + + Scenario: Supporting files are loaded on demand only + Given the systematic-debugging skill is loaded + When the agent enters Phase 2 and needs a specific technique + Then only the relevant supporting file is loaded + And other supporting files remain unloaded to conserve context +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/skills/systematic-debugging/root-cause-tracing.md` +- New file: `plugins/agentic-dev-team/skills/systematic-debugging/condition-based-waiting.md` +- New file: `plugins/agentic-dev-team/skills/systematic-debugging/find-polluter.md` +- Modified: `plugins/agentic-dev-team/skills/systematic-debugging/SKILL.md` — add "Supporting References" section in Phase 2 that links to the three files with guidance on when to load each +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — note the supporting files exist under the skill entry + +**Interfaces**: Supporting files are passive markdown references loaded by the agent on demand. The main SKILL.md links to them with triggering conditions (e.g., "Load root-cause-tracing.md when the failure involves a wrong value at the point of use"). + +**Constraints**: +- Each supporting file should be under 400 tokens — they're technique references, not full skills +- The `find-polluter.md` describes the bisection algorithm in language-agnostic terms, not an executable script (projects use different test runners) +- Do not restructure the existing 4-phase process — the supporting files augment Phase 2, they don't replace it + +**Dependencies**: Systematic Debugging skill (existing) + +### Acceptance Criteria + +- [ ] `skills/systematic-debugging/root-cause-tracing.md` exists with backward tracing technique +- [ ] `skills/systematic-debugging/condition-based-waiting.md` exists with polling pattern replacing arbitrary waits +- [ ] `skills/systematic-debugging/find-polluter.md` exists with language-agnostic bisection algorithm +- [ ] Each supporting file is under 400 tokens +- [ ] Main `SKILL.md` Phase 2 section includes a "Supporting References" block linking to each file with when-to-load guidance +- [ ] `knowledge/agent-registry.md` updated to note supporting files +- [ ] `find-polluter.md` is language-agnostic (describes algorithm, not a shell script) +- [ ] Existing 4-phase process is unchanged + +--- + +## Slice E: Subagent Status Codes + +### Intent Description + +Introduce a structured 4-status-code protocol for subagent reporting: `DONE`, `DONE_WITH_CONCERNS`, `NEEDS_CONTEXT`, and `BLOCKED`. Currently, subagents return free-form text and the orchestrator must parse success/failure from unstructured output. This slice adds clear, parseable signals for subagent outcomes and creates the three missing prompt templates (`implementer.md`, `spec-reviewer.md`, `quality-reviewer.md`) with full behavioral content. The four existing plan review templates are also updated to adopt the status protocol. + +The status protocol uses two output formats depending on template type: +- **Markdown status block**: Used by implementer, spec-reviewer, and quality-reviewer (new templates) +- **JSON `"status"` field**: Used by plan review templates (existing JSON output format, `"status"` added alongside existing `"verdict"`) + +### User-Facing Behavior + +```gherkin +Feature: Subagent structured status codes + + Scenario: Subagent completes work successfully + Given the orchestrator has dispatched a subagent for a task + When the subagent finishes the work and all verification passes + Then the subagent returns status DONE + And includes verification evidence in its response + + Scenario: Subagent completes with reservations + Given the orchestrator has dispatched a subagent for a task + When the subagent finishes the work but has concerns about the approach + Then the subagent returns status DONE_WITH_CONCERNS + And includes the completed work plus a list of specific concerns + And the orchestrator reviews the concerns before accepting the work + + Scenario: Orchestrator handles DONE_WITH_CONCERNS + Given a subagent returned DONE_WITH_CONCERNS with a list of concerns + When the orchestrator receives the status + Then the orchestrator evaluates each concern + And decides per-concern: accept the work as-is, re-dispatch with guidance, or escalate to user + And logs the decision for each concern + + Scenario: Subagent needs more context from parent + Given the orchestrator has dispatched a subagent for a task + When the subagent cannot complete because it lacks information + And the missing information is available in the parent context + Then the subagent returns status NEEDS_CONTEXT + And specifies exactly what information is needed + And the orchestrator re-dispatches with the additional context + + Scenario: Subagent is blocked by an external dependency + Given the orchestrator has dispatched a subagent for a task + When the subagent cannot proceed due to an unresolvable dependency + Then the subagent returns status BLOCKED + And describes the blocking dependency + And the orchestrator escalates to the user + + Scenario: Orchestrator handles NEEDS_CONTEXT with re-dispatch + Given a subagent returned NEEDS_CONTEXT requesting file contents + When the orchestrator receives the status + Then the orchestrator gathers the requested context + And re-dispatches the same subagent prompt with added context + And does NOT treat NEEDS_CONTEXT as a failure + + Scenario: Orchestrator handles BLOCKED with user escalation + Given a subagent returned BLOCKED citing an external service dependency + When the orchestrator receives the status + Then the orchestrator presents the blocker to the user + And pauses the task until the user provides direction + + Scenario: NEEDS_CONTEXT re-dispatch is capped + Given a subagent has returned NEEDS_CONTEXT twice for the same task + When the subagent returns NEEDS_CONTEXT a third time + Then the orchestrator escalates to the user instead of re-dispatching + And reports what context was requested across all three attempts + + Scenario: Orchestrator receives unrecognized status + Given a subagent returns a status code not in the protocol + When the orchestrator parses the response + Then the orchestrator treats the unrecognized status as BLOCKED + And escalates to the user with the raw subagent output + + Scenario: Plan reviewer returns status via JSON + Given the orchestrator dispatches a plan review subagent + When the reviewer completes its review + Then the JSON output includes both "verdict" and "status" fields + And the mapping is: + | verdict | warnings | status | + | approve | 0 | DONE | + | approve | 1+ | DONE_WITH_CONCERNS | + | needs-revision | any | DONE_WITH_CONCERNS | +``` + +### Architecture Specification + +**Components affected**: +- Modified: `plugins/agentic-dev-team/agents/orchestrator.md` — add "Subagent Status Protocol" section defining the 4 codes, orchestrator response table, and two output formats +- New file: `plugins/agentic-dev-team/prompts/implementer.md` — full implementer behavioral content + markdown status block +- New file: `plugins/agentic-dev-team/prompts/spec-reviewer.md` — full spec review behavioral content + markdown status block +- New file: `plugins/agentic-dev-team/prompts/quality-reviewer.md` — full quality review behavioral content + markdown status block +- Modified: `plugins/agentic-dev-team/prompts/plan-review-acceptance.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/prompts/plan-review-design.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/prompts/plan-review-ux.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/prompts/plan-review-strategic.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/commands/build.md` — update step 4 to handle NEEDS_CONTEXT and BLOCKED status from subagents +- Modified: `plugins/agentic-dev-team/CLAUDE.md` — update "Multi-Agent Collaboration Protocol" to reference the status protocol +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — update prompt template entries + +**Interfaces**: + +Markdown status block (new templates): +``` +## Status +**Result**: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED +**Concerns**: [list, if DONE_WITH_CONCERNS] +**Needs**: [specific info needed, if NEEDS_CONTEXT] +**Blocker**: [description, if BLOCKED] +``` + +JSON status field (plan review templates — added to existing output): +```json +{ + "reviewer": "plan-review-*", + "verdict": "approve | needs-revision", + "status": "DONE | DONE_WITH_CONCERNS", + ... +} +``` + +Orchestrator response table: +| Status | Orchestrator action | +|--------|-------------------| +| DONE | Accept work, proceed | +| DONE_WITH_CONCERNS | Review concerns, decide: accept / re-dispatch with guidance / escalate | +| NEEDS_CONTEXT | Gather info, re-dispatch (max 2 re-dispatches before escalating) | +| BLOCKED | Escalate to user immediately | +| Unrecognized | Treat as BLOCKED, escalate with raw output | + +**Constraints**: +- NEEDS_CONTEXT re-dispatch has a max of 2 attempts — after that, escalate to user +- The two output formats (markdown block and JSON field) are both documented in the orchestrator's status protocol section +- This does not change model routing — status codes are orthogonal to model selection +- Plan review templates keep their existing `"verdict"` field — `"status"` is additive + +**Dependencies**: Orchestrator agent, build command, all subagent prompt templates + +### Acceptance Criteria + +- [ ] Orchestrator agent has a "Subagent Status Protocol" section defining all 4 codes +- [ ] Orchestrator response table maps each status to a concrete action, including unrecognized status +- [ ] `prompts/implementer.md` created with full implementer behavioral content + markdown status block +- [ ] `prompts/spec-reviewer.md` created with full spec review behavioral content + markdown status block +- [ ] `prompts/quality-reviewer.md` created with full quality review behavioral content + markdown status block +- [ ] `prompts/plan-review-acceptance.md` updated with `"status"` JSON field +- [ ] `prompts/plan-review-design.md` updated with `"status"` JSON field +- [ ] `prompts/plan-review-ux.md` updated with `"status"` JSON field +- [ ] `prompts/plan-review-strategic.md` updated with `"status"` JSON field +- [ ] All 7 prompt templates use the documented status format (markdown or JSON as appropriate) +- [ ] `commands/build.md` step 4 handles NEEDS_CONTEXT and BLOCKED +- [ ] CLAUDE.md "Multi-Agent Collaboration Protocol" references the status protocol +- [ ] NEEDS_CONTEXT re-dispatch capped at 2 attempts +- [ ] Both output formats (markdown block + JSON field) documented in orchestrator +- [ ] `knowledge/agent-registry.md` prompt template entries updated + +--- + +## Slice F: TDD Skill Depth + +### Intent Description + +Deepen the TDD skill with two additions: (1) a "Testing Anti-Patterns" supporting reference file covering common mock/test anti-patterns that agents fall into, and (2) a cross-reference from the existing rationalization table to the anti-rationalization knowledge file (from Slice A). The cross-reference augments the existing catch-all line (currently "If you catch yourself composing an excuse not on this list, it's still an excuse") by adding a link to the knowledge file for cross-cutting patterns. + +The testing-anti-patterns reference should also be mentioned in the implementer prompt template (created in Slice E) so subagents have access to it during implementation. + +### User-Facing Behavior + +```gherkin +Feature: TDD skill depth enhancements + + Scenario: Agent encounters a testing anti-pattern + Given an agent is writing tests during the RED phase + When the agent writes a test that mocks a dependency without understanding its contract + Then the testing-anti-patterns reference identifies this as "mocking without understanding" + And the agent rewrites the test to use the real dependency or a properly understood fake + + Scenario: Agent encounters mock-tests-mock anti-pattern + Given an agent is writing a test + When the test primarily asserts that a mock was called with expected arguments + Then the testing-anti-patterns reference identifies this as "testing mock behavior" + And the agent rewrites to test observable outcomes instead of call patterns + + Scenario: Agent falls into sunk cost trap + Given an agent wrote implementation code before writing a test + And the agent has invested significant context in the implementation + When the agent realizes it violated TDD + Then the sunk cost rationalization entry in the TDD table triggers + And the agent deletes the implementation and restarts from RED + And does NOT rationalize keeping the code "as a reference" + + Scenario: TDD skill cross-references anti-rationalization knowledge + Given the TDD skill's rationalization prevention table exists + And the anti-rationalization knowledge file exists + When an agent encounters a rationalization not in the TDD table + Then the catch-all line directs the agent to the knowledge file for cross-cutting patterns + And the catch-all rule still applies: the unlisted excuse is still an excuse +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/skills/test-driven-development/testing-anti-patterns.md` — supporting reference file +- Modified: `plugins/agentic-dev-team/skills/test-driven-development/SKILL.md` — augment the existing catch-all line with a cross-reference to `knowledge/anti-rationalization.md`, add "Supporting References" section linking to testing-anti-patterns.md +- Modified: `plugins/agentic-dev-team/prompts/implementer.md` (created in Slice E) — mention testing-anti-patterns as a loadable reference during RED phase + +**Interfaces**: The testing-anti-patterns file is a passive reference loaded on demand during the RED phase when the agent needs guidance on test quality. + +**Constraints**: +- Do NOT duplicate content from the existing rationalization table — add the cross-reference only +- The existing catch-all line at the end of the rationalization table is augmented to: "If you catch yourself composing an excuse not on this list, it's still an excuse. See also [anti-rationalization patterns](../../knowledge/anti-rationalization.md) for cross-cutting patterns beyond TDD." +- The testing-anti-patterns file covers test-writing anti-patterns (mock abuse, test-only methods, testing implementation) — not TDD process violations (those are already in the main skill) +- Keep supporting file under 400 tokens +- The existing SKILL.md already has the Iron Law, 12-entry rationalization table, red flags, and anti-pattern section. Changes should be minimal additions, not rewrites. + +**Dependencies**: Slice A (anti-rationalization knowledge file for cross-reference), Slice E (implementer.md creation) + +### Acceptance Criteria + +- [ ] `skills/test-driven-development/testing-anti-patterns.md` exists with at least 5 anti-patterns +- [ ] Anti-patterns cover: testing mock behavior, test-only production methods, mocking without understanding contract, incomplete mocks, integration tests as afterthought +- [ ] Each anti-pattern has: name, description, why it's harmful, what to do instead +- [ ] Main `SKILL.md` has a "Supporting References" section linking to testing-anti-patterns.md +- [ ] Main `SKILL.md` catch-all line augmented with cross-reference to `knowledge/anti-rationalization.md` +- [ ] `prompts/implementer.md` mentions testing-anti-patterns as a loadable reference +- [ ] Supporting file is under 400 tokens +- [ ] No existing content in the main SKILL.md is duplicated or removed + +--- + +## Slice G: Git Worktree Language-Specific Setup + +### Intent Description + +Enhance the worktree creation workflow so that after a git worktree is created for a subagent, dependency installation and baseline test verification happen before implementation begins. Currently, `isolation: "worktree"` creates a clean worktree but the subagent starts implementing immediately — if dependencies aren't installed, the first test run fails for the wrong reason (missing deps, not missing feature). This slice adds a setup step to the implementer prompt template. + +### User-Facing Behavior + +```gherkin +Feature: Git worktree language-specific setup + + Scenario: Worktree setup detects Node.js project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a package.json + When the worktree is created + Then the subagent runs the appropriate install command in the worktree + And the install command is determined by lock file presence: + | lock file | command | + | package-lock.json | npm ci | + | yarn.lock | yarn install | + | pnpm-lock.yaml | pnpm install | + | bun.lockb | bun install | + + Scenario: Worktree setup detects Python project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a requirements.txt or pyproject.toml + When the worktree is created + Then the subagent installs dependencies in the worktree + + Scenario: Worktree setup detects Go project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a go.mod + When the worktree is created + Then the subagent runs go mod download in the worktree + + Scenario: Worktree setup detects Rust project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a Cargo.toml + When the worktree is created + Then the subagent runs cargo build in the worktree + + Scenario: Worktree setup detects Java Maven project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a pom.xml + When the worktree is created + Then the subagent runs mvn install -DskipTests in the worktree + + Scenario: Worktree setup detects Java Gradle project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a build.gradle or build.gradle.kts + When the worktree is created + Then the subagent runs gradle build -x test in the worktree + + Scenario: Worktree setup detects dotnet project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a .csproj or .sln file + When the worktree is created + Then the subagent runs dotnet restore in the worktree + + Scenario: Worktree setup runs baseline tests + Given the worktree has been created and dependencies installed + When the setup step completes + Then the subagent runs the project's test suite as a baseline + And verifies all existing tests pass before beginning implementation + And if baseline tests fail the subagent returns BLOCKED status + + Scenario: Worktree setup for unknown project type + Given the orchestrator dispatches a subagent with worktree isolation + And no recognized project files are found + When the worktree is created + Then the subagent skips dependency installation + And proceeds directly to implementation with a warning + + Scenario: Dependency installation fails + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a package.json + When the worktree is created + And the dependency install command fails + Then the subagent returns BLOCKED status + And includes the install error output in the blocker description +``` + +### Architecture Specification + +**Components affected**: +- Modified: `plugins/agentic-dev-team/agents/orchestrator.md` — update Phase 3 "Subagent dispatch" section to reference the worktree setup protocol +- Modified: `plugins/agentic-dev-team/prompts/implementer.md` (created in Slice E) — add a "Worktree Setup" section at the top of the implementation flow that runs before RED phase +- New file: `plugins/agentic-dev-team/knowledge/worktree-setup.md` — reference table mapping project indicators to setup commands, loaded by the implementer prompt + +**Interfaces**: The worktree setup is a pre-implementation step within the subagent. It is NOT a hook or separate script — it's instructions within the implementer prompt that the subagent follows after the worktree is created but before starting TDD. + +**Constraints**: +- The setup step must be fast — install + baseline test should add minimal overhead +- If baseline tests fail, the subagent returns `BLOCKED` (from Slice E's status protocol) rather than attempting to fix pre-existing failures +- If dependency install fails, the subagent returns `BLOCKED` with the error output +- Language detection uses file presence only (package.json, go.mod, etc.) — no heuristics or LLM judgment +- The knowledge file is a simple lookup table, not a decision tree + +**Dependencies**: Slice E (subagent status codes — BLOCKED status for baseline/install failures, implementer.md creation) + +### Acceptance Criteria + +- [ ] `knowledge/worktree-setup.md` exists with a detection table mapping project indicators to install + test commands +- [ ] Detection covers: Node.js (npm/yarn/pnpm/bun), Python, Go, Rust, .NET, Java (Maven/Gradle) +- [ ] `prompts/implementer.md` includes a "Worktree Setup" section that runs before RED +- [ ] Setup runs dependency install then baseline test suite +- [ ] Baseline test failure returns BLOCKED status (not failure, not NEEDS_CONTEXT) +- [ ] Dependency install failure returns BLOCKED status with error output +- [ ] Unknown project type skips setup with a warning +- [ ] `agents/orchestrator.md` Phase 3 references the worktree setup protocol +- [ ] `knowledge/agent-registry.md` updated with the new knowledge file + +--- + +## Slice H1: Windows Hooks Support + +### Intent Description + +Enable the plugin's 8 bash hooks to work on Windows by adding a cross-platform shim and fixing platform-specific path issues. On Windows, bash is available via Git for Windows (near-universal on dev machines). The approach: keep all hooks as bash scripts, add a `.cmd` wrapper that locates and delegates to bash (same pattern as superpowers), and fix hardcoded `/tmp/`/`$TMPDIR` references. A Windows prerequisite checker (`install.ps1`) replaces `install.sh` for Windows users. + +### User-Facing Behavior + +```gherkin +Feature: Windows hooks support + + Scenario: Hooks execute on Windows via Git for Windows bash + Given the plugin is installed on a Windows machine + And Git for Windows is installed (providing bash.exe on PATH) + When Claude Code triggers a PreToolUse or PostToolUse hook + Then the hook command invokes the run-hook.cmd shim + And the shim locates bash.exe and delegates to the .sh script + And the hook executes successfully with correct output + + Scenario: Shim locates bash from Git for Windows default path + Given bash is NOT on the system PATH + And Git for Windows is installed at the default location + When run-hook.cmd is invoked + Then the shim checks "C:\Program Files\Git\bin\bash.exe" + And uses it to execute the hook script + + Scenario: Shim locates bash from WSL + Given bash is NOT on the system PATH + And Git for Windows is NOT installed + And WSL is available + When run-hook.cmd is invoked + Then the shim uses wsl.exe to execute the hook script + + Scenario: Shim fails gracefully when no bash is available + Given bash is NOT on the system PATH + And Git for Windows is NOT installed + And WSL is NOT available + When run-hook.cmd is invoked + Then the shim exits with an error message explaining bash is required + And suggests installing Git for Windows + + Scenario: Hooks use platform-agnostic temp directory + Given a hook needs a temporary file + When the hook references a temp directory + Then it uses ${TMPDIR:-${TEMP:-/tmp}} instead of hardcoded /tmp/ + And the path resolves correctly on both Unix and Windows + + Scenario: Windows prerequisite checker validates environment + Given a user runs install.ps1 on Windows + Then the script checks for: + | prerequisite | check | + | bash | bash.exe on PATH or Git for Windows installed | + | jq | jq.exe on PATH | + | git | git.exe on PATH | + And reports which prerequisites are missing with install instructions + + Scenario: Hooks work unchanged on macOS and Linux + Given the plugin is installed on macOS or Linux + When Claude Code triggers a hook + Then the hook command invokes bash directly as before + And the run-hook.cmd shim is not used + And no behavior changes from the current implementation +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/hooks/run-hook.cmd` — Windows shim that locates bash.exe and delegates (~20 lines) +- New file: `plugins/agentic-dev-team/install.ps1` — Windows prerequisite checker +- Modified: `plugins/agentic-dev-team/hooks/tdd-guard.sh` — fix `TMPDIR` reference (use `${TMPDIR:-${TEMP:-/tmp}}`) +- Modified: `plugins/agentic-dev-team/hooks/version-check.sh` — fix hardcoded `/tmp/` path (use `${TMPDIR:-${TEMP:-/tmp}}`) +- Modified: `plugins/agentic-dev-team/settings.json` — document Windows hook invocation pattern + +**Interfaces**: The `run-hook.cmd` shim is invoked by Claude Code on Windows instead of `bash` directly. It takes the hook script path as an argument and passes stdin through. Exit codes are preserved. + +**Constraints**: +- Do NOT rewrite hooks in PowerShell or Node.js — keep bash, add shim +- Do NOT require WSL — Git for Windows is the primary target +- All 9 existing hooks must continue working unchanged on macOS/Linux +- Only 2 existing `.sh` files are modified (TMPDIR fixes) +- `jq` is a hard dependency — Windows installer must check for it + +**Dependencies**: None — independent of all other slices. + +### Acceptance Criteria + +- [ ] `hooks/run-hook.cmd` exists and locates bash via: (1) PATH, (2) Git for Windows default, (3) WSL +- [ ] Shim exits with clear error if no bash found +- [ ] Shim passes stdin, arguments, and exit codes through correctly +- [ ] `install.ps1` checks for bash, jq, and git on Windows +- [ ] `install.ps1` provides install instructions for each missing prerequisite +- [ ] `tdd-guard.sh` uses `${TMPDIR:-${TEMP:-/tmp}}` instead of hardcoded paths +- [ ] `version-check.sh` uses `${TMPDIR:-${TEMP:-/tmp}}` instead of hardcoded `/tmp/` +- [ ] All 9 hooks pass on macOS/Linux with no behavior change + +--- + +## Slice H2: Gemini CLI Platform Support + +### Intent Description + +Add Gemini CLI as a supported platform. Gemini CLI has a native extension system with skills (`SKILL.md` with same frontmatter format), agents, hooks (`hooks/hooks.json`), commands (TOML format), and context files (`GEMINI.md`). Our skills and knowledge files are reusable as-is. The approach: create a `gemini-extension.json` manifest and `GEMINI.md` context file, add TOML commands for key workflows, and document capability limitations (no multi-agent orchestration, no model routing, no tool scoping). + +### User-Facing Behavior + +```gherkin +Feature: Gemini CLI platform support + + Scenario: Plugin is discoverable as a Gemini CLI extension + Given the plugin repository contains a gemini-extension.json manifest + When a user installs the extension in Gemini CLI + Then Gemini CLI loads the manifest and discovers the extension + And the GEMINI.md context file is loaded into the session + + Scenario: Skills are loaded in Gemini CLI + Given the plugin is installed as a Gemini CLI extension + When Gemini CLI scans the skills/ directory + Then it discovers all SKILL.md files with name and description frontmatter + And skills are available for implicit and explicit invocation + + Scenario: Knowledge files are accessible + Given the plugin is installed as a Gemini CLI extension + When an agent or skill references a knowledge file + Then the knowledge file is readable as a standard markdown reference + + Scenario: Hooks are loaded from Gemini CLI hooks format + Given the plugin is installed as a Gemini CLI extension + And a hooks/hooks-gemini.json file exists + When Gemini CLI loads hook configuration + Then compatible hooks execute normally + And hooks that depend on Claude Code-specific stdin format are skipped with warnings + + Scenario: GEMINI.md provides platform-specific context + Given the plugin is installed as a Gemini CLI extension + When a session starts + Then GEMINI.md is loaded with plugin philosophy, team organization, and skill registry + And it does NOT reference Claude Code-specific features + + Scenario: Orchestration degrades to inline execution + Given the plugin is installed in Gemini CLI + When a multi-agent workflow is triggered + Then GEMINI.md instructs inline execution as the fallback + And warns that multi-agent orchestration requires Claude Code + + Scenario: Commands are available as Gemini CLI TOML commands + Given the plugin is installed as a Gemini CLI extension + When Gemini CLI scans the commands-gemini/ directory + Then TOML command files are loaded for key workflows + + Scenario: Agents are loaded with unknown frontmatter ignored + Given the plugin is installed as a Gemini CLI extension + When Gemini CLI scans the agents/ directory + Then agent markdown files are discovered + And tools: and model: frontmatter fields are ignored without error +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/gemini-extension.json` +- New file: `plugins/agentic-dev-team/GEMINI.md` +- New directory: `plugins/agentic-dev-team/commands-gemini/` — TOML commands for key workflows +- New file: `plugins/agentic-dev-team/hooks/hooks-gemini.json` + +**Reused as-is**: All `skills/*/SKILL.md`, all `knowledge/*.md`, all `prompts/*.md` + +**Constraints**: +- Do NOT fork or duplicate skill files — manifest points to same `skills/` directory +- GEMINI.md is standalone (not symlink to CLAUDE.md) +- Keep TOML command set to top 5 most-used commands initially +- Clearly document capability limitations in GEMINI.md + +**Dependencies**: None — independent. Can be implemented in parallel with all other slices. + +### Acceptance Criteria + +- [ ] `gemini-extension.json` exists with name, version, description, contextFileName +- [ ] `GEMINI.md` exists without Claude Code-specific feature references +- [ ] Skills discoverable from existing `skills/` directory +- [ ] At least 5 TOML commands in `commands-gemini/` +- [ ] `hooks/hooks-gemini.json` defines compatible hooks +- [ ] GEMINI.md documents capability limitations vs Claude Code +- [ ] Existing Claude Code functionality unchanged + +--- + +## Slice H3: OpenAI Codex Platform Support + +### Intent Description + +Add OpenAI Codex CLI as a supported platform. Codex uses `AGENTS.md` (hierarchical markdown discovery), `SKILL.md` with same frontmatter (skills in `.agents/skills/`), `config.toml`, `hooks.json`, and supports subagent dispatch (explicit request required). The approach: create `AGENTS.md`, `.codex/` config directory, and an installation guide that explains skill discovery setup. + +### User-Facing Behavior + +```gherkin +Feature: OpenAI Codex CLI platform support + + Scenario: Plugin provides AGENTS.md for Codex + Given the plugin repository contains an AGENTS.md at the root + When Codex CLI starts in a project using this plugin + Then AGENTS.md is loaded as project-level instructions + + Scenario: Skills are discoverable by Codex + Given the plugin is installed + And .agents/skills/ points to the plugin's skills + When Codex scans for skills + Then it discovers all SKILL.md files + + Scenario: Codex configuration is provided + Given the plugin includes .codex/config.toml + When Codex reads project configuration + Then hooks are enabled and defaults are set + + Scenario: Hooks are available in Codex format + Given the plugin includes .codex/hooks.json + When Codex loads lifecycle hooks + Then compatible hooks execute on appropriate events + + Scenario: Subagent workflows require explicit request + Given the plugin is installed in Codex + When a multi-agent workflow is triggered + Then AGENTS.md instructs that subagent dispatch requires explicit user request + And provides Codex-specific guidance on subagent invocation + + Scenario: Knowledge files are accessible + Given the plugin is installed in Codex + When a skill references a knowledge file + Then the file is readable as standard markdown + + Scenario: AGENTS.md documents capability limitations + Given AGENTS.md is loaded by Codex + Then it states which features require Claude Code for full capability + + Scenario: Installation guide exists + Given a user wants to install for Codex + When they read CODEX-INSTALL.md + Then it explains skill symlinks, AGENTS.md placement, and config.toml setup + + Scenario: AGENTS.md fits within Codex size limit + Given AGENTS.md is loaded by Codex + Then its size is within the 32 KiB default project_doc_max_bytes limit +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/AGENTS.md` +- New directory: `plugins/agentic-dev-team/.codex/` +- New file: `plugins/agentic-dev-team/.codex/config.toml` +- New file: `plugins/agentic-dev-team/.codex/hooks.json` +- New file: `plugins/agentic-dev-team/CODEX-INSTALL.md` + +**Reused as-is**: All `skills/*/SKILL.md`, all `knowledge/*.md`, all `prompts/*.md` + +**Constraints**: +- Do NOT fork or duplicate skill files +- AGENTS.md is standalone, adapted from CLAUDE.md +- Keep .codex/config.toml minimal +- AGENTS.md within 32 KiB limit +- Clearly document capability limitations + +**Dependencies**: None — independent. Can be implemented in parallel. + +### Acceptance Criteria + +- [ ] `AGENTS.md` exists without Claude Code-specific feature references +- [ ] `AGENTS.md` within 32 KiB +- [ ] `.codex/config.toml` exists with hooks enabled +- [ ] `.codex/hooks.json` defines compatible hooks +- [ ] `CODEX-INSTALL.md` covers skill discovery, AGENTS.md, and config setup +- [ ] Skills discoverable by Codex from standard scan path +- [ ] AGENTS.md documents capability limitations vs Claude Code +- [ ] Existing Claude Code functionality unchanged + +--- + +## Consistency Gate + +### Slices A-G (core implementation) +- [x] Intent is unambiguous — two developers would interpret each slice the same way +- [x] Every behavior in each intent has at least one corresponding BDD scenario +- [x] Architecture specification constrains implementation to what the intent requires, without over-engineering +- [x] Terminology consistent across all artifacts ("anti-rationalization" is canonical) +- [x] No contradictions between artifacts within any slice +- [x] No contradictions between slices +- [x] Dependency chain is acyclic (A→F, E→G, E→F implementer ref) +- [x] Merge targets identified (agent-registry.md, CLAUDE.md) +- [x] Both status output formats documented (markdown block + JSON field) +- [x] Negative/edge/error cases covered (ambiguous feedback, malformed fixtures, unrecognized status, install failure, unknown project type) + +### Slices H1-H3 (platform support) +- [x] Each slice is independent — no cross-dependencies between H1, H2, H3 +- [x] All three reuse existing skills/knowledge without forking +- [x] Degradation strategy consistent: document limitations, suggest Claude Code for full capability +- [x] H1 Windows hooks: negative case (no bash), edge case (WSL fallback), no-regression on Unix +- [x] H2 Gemini: unknown frontmatter handled, hook format mismatch handled +- [x] H3 Codex: size limit addressed, install guide covers discovery setup +- [x] No contradictions with slices A-G diff --git a/plans/superpowers-gap-closure.md b/plans/superpowers-gap-closure.md new file mode 100644 index 0000000..48867f6 --- /dev/null +++ b/plans/superpowers-gap-closure.md @@ -0,0 +1,267 @@ +# Plan: Superpowers Gap Closure + +**Created**: 2026-04-16 +**Branch**: superpowers +**Status**: implemented + +## Goal + +Implement all 10 slices from the superpowers gap closure spec: 7 core improvements (A-G) that close gaps identified in the competitive analysis against obra/superpowers, plus 3 platform support slices (H1-H3) for Windows hooks, Gemini CLI, and OpenAI Codex. All changes are documentation/configuration — markdown skill files, knowledge files, prompt templates, manifest files, and shell scripts. No application code is being written; "tests" are acceptance criteria checks (file exists, token budget met, cross-references valid). + +## Acceptance Criteria + +- [ ] All files listed in the spec's acceptance criteria exist and meet their requirements +- [ ] No regression: all 9 existing hooks produce the same exit codes and stdout on macOS/Linux as before changes (verified by running each hook). All existing slash commands in CLAUDE.md remain present with valid frontmatter (verified by grep for `---` blocks in `commands/*.md`). +- [ ] Token budgets respected (estimated via word count × 1.35): anti-rationalization < 600, receiving-code-review < 500, agent-skill-authoring < 1200, testing-anti-patterns < 400, each debugging reference < 400. Files within 10% of limit require a second check. +- [ ] Terminology consistent: `grep -r "rationalization prevention\|rationalization bulletproofing"` returns zero matches across all new/modified files (knowledge/anti-rationalization.md, quality-gate-pipeline/SKILL.md, test-driven-development/SKILL.md, prompts/implementer.md, CLAUDE.md) +- [ ] Cross-references resolve: all relative markdown links in new/modified files resolve from the file's own directory (verified at Step 18; broken links fixed before PR) +- [ ] `knowledge/agent-registry.md` updated with all new files +- [ ] `CLAUDE.md` quick reference updated with new skill count (31→32: +receiving-code-review) and skills-by-phase table +- [ ] `gemini-extension.json` contains name (string), version (string), description (string), contextFileName (string) — verified by `python3 -m json.tool gemini-extension.json` +- [ ] `.codex/config.toml` parses without error — verified by `python3 -c "import tomllib; tomllib.load(open('.codex/config.toml','rb'))"` +- [ ] GEMINI.md and AGENTS.md skill registries match CLAUDE.md skill count (maintenance gate for context file drift) +- [ ] Windows hooks shim: (1) `run-hook.cmd` passes stdin bytes to bash script (echo fixture test), (2) propagates exit codes 0, 1, 2 from bash script, (3) when no bash found, exits code 1 with message containing "bash" and "Git for Windows" + +## User-Facing Behavior + +See `docs/specs/superpowers-gap-closure.md` for the full Gherkin scenarios (53 scenarios across 10 slices). The scenarios are the behavioral contracts. + +## Steps + +Steps are ordered by dependency: independent slices first, then dependent chains, then shared-file updates last. + +--- + +### Step 1: Create anti-rationalization knowledge file (Slice A — part 1) + +**Complexity**: standard +**RED**: Verify `knowledge/anti-rationalization.md` does not exist +**GREEN**: Create the knowledge file with 5+ categories, catch-all rule, cross-references to TDD and debugging tables +**REFACTOR**: Verify under 600 tokens, terminology is canonical +**Files**: `plugins/agentic-dev-team/knowledge/anti-rationalization.md` +**Commit**: `feat: add anti-rationalization knowledge file with cross-cutting patterns` + +### Step 2: Update Quality Gate Pipeline with anti-rationalization references (Slice A — part 2) + +**Complexity**: standard +**RED**: Verify `quality-gate-pipeline/SKILL.md` Phase 1 and Phase 2 do not reference anti-rationalization knowledge +**GREEN**: Add reference to existing Phase 2 "Red Flag Language" block and Phase 1 "Hallucination Detection Signals" +**REFACTOR**: Verify no new sections created — references added to existing blocks only +**Files**: `plugins/agentic-dev-team/skills/quality-gate-pipeline/SKILL.md` +**Commit**: `feat: add anti-rationalization references to quality gate pipeline` + +### Step 3: Create receiving-code-review skill (Slice B) + +**Complexity**: standard +**RED**: Verify `skills/receiving-code-review/SKILL.md` does not exist +**GREEN**: Create skill with frontmatter, banned-phrases list, verification-before-implementation gate, YAGNI gate, rationalization table, human vs agent authority distinction, ambiguous finding guidance, concise tone guidance +**REFACTOR**: Verify under 500 tokens, no conflict with Quality Gate Pipeline Phase 3 +**Files**: `plugins/agentic-dev-team/skills/receiving-code-review/SKILL.md` +**Commit**: `feat: add receiving-code-review skill for review reception discipline` + +### Step 4: Enhance skill authoring with pressure testing and CSO (Slice C — part 1) + +**Complexity**: standard +**RED**: Verify current "Apply TDD to skill-writing itself" section lacks pressure testing procedure; verify "Optimize skill descriptions for triggering" lacks a checklist +**GREEN**: Expand both sections in place: pressure testing procedure (4 steps + 3 example scenarios), CSO checklist with pass/fail criteria and 4 description examples (2 good, 2 bad) +**REFACTOR**: Verify total file stays under 1,200 tokens; no parallel sections created +**Files**: `plugins/agentic-dev-team/skills/agent-skill-authoring/SKILL.md` +**Commit**: `feat: add pressure testing and CSO checklist to skill authoring guide` + +### Step 5: Update agent-eval for pressure scenario fixtures (Slice C — part 2) + +**Complexity**: standard +**RED**: Verify `commands/agent-eval.md` does not mention pressure scenarios +**GREEN**: Add section documenting pressure scenario fixture format and `evals/pressure/` directory. Define fixture schema: skill name, adversarial condition, expected behavior, pass/fail criteria. Add malformed fixture error handling. +**REFACTOR**: Verify eval command description is updated consistently +**Files**: `plugins/agentic-dev-team/commands/agent-eval.md` +**Commit**: `feat: add pressure scenario fixture support to agent-eval` + +### Step 6: Create debugging supporting files (Slice D) + +**Complexity**: standard +**RED**: Verify `skills/systematic-debugging/` contains only SKILL.md +**GREEN**: Create three supporting files: `root-cause-tracing.md` (backward call-chain analysis), `condition-based-waiting.md` (polling pattern replacing arbitrary waits), `find-polluter.md` (language-agnostic bisection algorithm). Add "Supporting References" section to Phase 2 of main SKILL.md with when-to-load guidance. +**REFACTOR**: Verify each file under 400 tokens; existing 4-phase process unchanged; find-polluter is language-agnostic +**Files**: `plugins/agentic-dev-team/skills/systematic-debugging/root-cause-tracing.md`, `plugins/agentic-dev-team/skills/systematic-debugging/condition-based-waiting.md`, `plugins/agentic-dev-team/skills/systematic-debugging/find-polluter.md`, `plugins/agentic-dev-team/skills/systematic-debugging/SKILL.md` +**Commit**: `feat: add debugging supporting references (root-cause tracing, condition-based waiting, find-polluter)` + +### Step 7: Create implementer prompt template (Slice E — part 1) + +**Complexity**: complex +**RED**: Verify `prompts/implementer.md` does not exist +**GREEN**: Create full implementer behavioral content: pre-implementation Q&A, TDD enforcement (reference TDD skill), self-review, verification evidence, and markdown status block (DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED) +**REFACTOR**: Verify consistent with build.md step 4 expectations and orchestrator references +**Files**: `plugins/agentic-dev-team/prompts/implementer.md` +**Commit**: `feat: create implementer prompt template with status protocol` + +### Step 8: Create spec-reviewer and quality-reviewer prompt templates (Slice E — part 2) + +**Complexity**: complex +**RED**: Verify `prompts/spec-reviewer.md` and `prompts/quality-reviewer.md` do not exist +**GREEN**: Create both templates with full behavioral content and markdown status blocks. Spec-reviewer: skeptical of implementer claims, reads actual code, binary spec compliance check. Quality-reviewer: uses code-reviewer agent patterns, checks quality after spec compliance passes. +**REFACTOR**: Verify both use the same status block format as implementer +**Files**: `plugins/agentic-dev-team/prompts/spec-reviewer.md`, `plugins/agentic-dev-team/prompts/quality-reviewer.md` +**Commit**: `feat: create spec-reviewer and quality-reviewer prompt templates` + +### Step 9: Update plan review templates with status protocol (Slice E — part 3) + +**Complexity**: standard +**RED**: Verify existing plan review templates lack `"status"` field in JSON output +**GREEN**: Add `"status"` field to all 4 plan review templates. Mapping: approve with 0 warnings → DONE, approve with 1+ warnings → DONE_WITH_CONCERNS, needs-revision → DONE_WITH_CONCERNS. Add status derivation rules alongside existing verdict rules. +**REFACTOR**: Verify existing verdict field and rules are unchanged — status is additive +**Files**: `plugins/agentic-dev-team/prompts/plan-review-acceptance.md`, `plugins/agentic-dev-team/prompts/plan-review-design.md`, `plugins/agentic-dev-team/prompts/plan-review-ux.md`, `plugins/agentic-dev-team/prompts/plan-review-strategic.md` +**Commit**: `feat: add status protocol to plan review templates` + +### Step 10: Update orchestrator with subagent status protocol (Slice E — part 4) + +**Complexity**: complex +**RED**: Verify orchestrator lacks "Subagent Status Protocol" section +**GREEN**: Add section defining 4 status codes, orchestrator response table (including unrecognized → BLOCKED), both output formats (markdown block + JSON field), NEEDS_CONTEXT cap at 2 re-dispatches. For DONE_WITH_CONCERNS handling, define three deterministic response branches: (1) concern is non-blocking warning → accept work, log concern; (2) concern is fixable with guidance → re-dispatch with concern text as context; (3) concern requires human judgment → escalate to user. Each branch must have a clear trigger condition. +**REFACTOR**: Verify no conflict with existing Phase 3 inline review section +**Files**: `plugins/agentic-dev-team/agents/orchestrator.md` +**Commit**: `feat: add subagent status protocol to orchestrator` + +### Step 11: Update build command for status handling (Slice E — part 5) + +**Complexity**: standard +**RED**: Verify build.md step 4 does not handle NEEDS_CONTEXT or BLOCKED +**GREEN**: Update step 4 to check subagent status after dispatch. DONE/DONE_WITH_CONCERNS → continue flow. NEEDS_CONTEXT → gather context, re-dispatch (max 2). BLOCKED → escalate to user. +**REFACTOR**: Verify consistent with orchestrator's status protocol +**Files**: `plugins/agentic-dev-team/commands/build.md` +**Commit**: `feat: add status code handling to build command` + +### Step 12: Create testing anti-patterns reference and update TDD skill (Slice F) + +**Complexity**: standard +**Ordering**: Step 7 (implementer.md creation) MUST be complete and committed before this step begins. Step 13 MUST NOT have run yet. +**RED**: Verify `skills/test-driven-development/testing-anti-patterns.md` does not exist; verify TDD catch-all line lacks cross-reference; verify `prompts/implementer.md` exists (from Step 7) but lacks testing-anti-patterns reference +**GREEN**: Create testing-anti-patterns.md with 5+ anti-patterns (mock behavior, test-only methods, mocking without understanding, incomplete mocks, integration afterthought). Add "Supporting References" section to TDD SKILL.md. Augment catch-all line with link to `knowledge/anti-rationalization.md`. Add testing-anti-patterns reference to `prompts/implementer.md`. +**REFACTOR**: Verify under 400 tokens; no existing TDD content modified or removed +**Files**: `plugins/agentic-dev-team/skills/test-driven-development/testing-anti-patterns.md`, `plugins/agentic-dev-team/skills/test-driven-development/SKILL.md`, `plugins/agentic-dev-team/prompts/implementer.md` +**Commit**: `feat: add testing anti-patterns reference and TDD cross-references` + +### Step 13: Create worktree setup knowledge file and update implementer (Slice G) + +**Complexity**: standard +**Ordering**: Step 7 (implementer.md creation) AND Step 12 (testing-anti-patterns implementer ref) MUST be complete before this step. Apply worktree setup on top of existing implementer content. +**RED**: Verify `knowledge/worktree-setup.md` does not exist; verify implementer lacks worktree setup section (but HAS testing-anti-patterns reference from Step 12) +**GREEN**: Create worktree-setup.md with detection table: Node.js (npm/yarn/pnpm/bun by lockfile — first match wins for conflicting lockfiles), Python, Go, Rust, .NET, Java (Maven/Gradle). Add "Worktree Setup" section to implementer.md that runs before RED phase. Update orchestrator Phase 3 to reference worktree setup. +**REFACTOR**: Verify detection uses file presence only; BLOCKED status for baseline/install failures; lockfile priority order is documented +**Files**: `plugins/agentic-dev-team/knowledge/worktree-setup.md`, `plugins/agentic-dev-team/prompts/implementer.md`, `plugins/agentic-dev-team/agents/orchestrator.md` +**Commit**: `feat: add worktree language-specific setup with dependency install and baseline verification` + +### Step 14: Windows hooks — run-hook.cmd shim and TMPDIR fixes (Slice H1) + +**Complexity**: standard +**RED**: Verify `hooks/run-hook.cmd` does not exist; verify tdd-guard.sh and version-check.sh use hardcoded `/tmp/` +**GREEN**: Create `run-hook.cmd` (~20 lines) that locates bash via (1) PATH, (2) Git for Windows default `C:\Program Files\Git\bin\bash.exe`, (3) WSL. Passes stdin, args, and exit codes. Create `install.ps1` checking bash, jq, git with install instructions. Fix TMPDIR in tdd-guard.sh and version-check.sh: `${TMPDIR:-${TEMP:-/tmp}}`. Update `settings.json` to document Windows hook invocation pattern. +**REFACTOR**: Verify all 9 hooks unchanged on macOS/Linux; shim error message is clear +**Files**: `plugins/agentic-dev-team/hooks/run-hook.cmd`, `plugins/agentic-dev-team/install.ps1`, `plugins/agentic-dev-team/hooks/tdd-guard.sh`, `plugins/agentic-dev-team/hooks/version-check.sh`, `plugins/agentic-dev-team/settings.json` +**Commit**: `feat: add Windows hooks support with bash shim and TMPDIR fixes` + +### Step 15: Gemini CLI extension manifest and context file (Slice H2) + +**Complexity**: standard +**RED**: Verify `gemini-extension.json` and `GEMINI.md` do not exist +**GREEN**: Create `gemini-extension.json` with name, version, description, contextFileName pointing to GEMINI.md. Create `GEMINI.md` adapted from CLAUDE.md — include plugin philosophy, team organization, skill registry, but strip all Claude Code-specific features (Agent tool, allowed-tools, hooks, isolation: "worktree", model routing). Add capability limitations section. Create `hooks/hooks-gemini.json` for compatible hooks. Create 5 TOML commands in `commands-gemini/`: code-review, plan, build, help, browse. +**REFACTOR**: Verify GEMINI.md has no Claude Code references; skills directory is reused not forked +**Files**: `plugins/agentic-dev-team/gemini-extension.json`, `plugins/agentic-dev-team/GEMINI.md`, `plugins/agentic-dev-team/hooks/hooks-gemini.json`, `plugins/agentic-dev-team/commands-gemini/code-review.toml`, `plugins/agentic-dev-team/commands-gemini/plan.toml`, `plugins/agentic-dev-team/commands-gemini/build.toml`, `plugins/agentic-dev-team/commands-gemini/help.toml`, `plugins/agentic-dev-team/commands-gemini/browse.toml` +**Commit**: `feat: add Gemini CLI extension support with manifest, context file, and TOML commands` + +### Step 16: OpenAI Codex configuration and install guide (Slice H3) + +**Complexity**: standard +**RED**: Verify `AGENTS.md`, `.codex/config.toml`, `.codex/hooks.json`, and `CODEX-INSTALL.md` do not exist +**GREEN**: Create `AGENTS.md` adapted from CLAUDE.md — include plugin philosophy, team org, skill registry, strip Claude Code features, add capability limitations, note subagent dispatch requires explicit request. Verify under 32 KiB. Create `.codex/config.toml` with hooks enabled. Create `.codex/hooks.json` for compatible hooks. Create `CODEX-INSTALL.md` with step-by-step: skill symlinks to `.agents/skills/`, AGENTS.md placement, config.toml setup. +**REFACTOR**: Verify AGENTS.md under 32 KiB; skills not forked; limitations clearly documented +**Files**: `plugins/agentic-dev-team/AGENTS.md`, `plugins/agentic-dev-team/.codex/config.toml`, `plugins/agentic-dev-team/.codex/hooks.json`, `plugins/agentic-dev-team/CODEX-INSTALL.md` +**Commit**: `feat: add OpenAI Codex CLI support with AGENTS.md, config, and install guide` + +### Step 17: Update shared registries and CLAUDE.md (all slices) + +**Complexity**: standard +**RED**: Verify `knowledge/agent-registry.md` is missing new entries; verify CLAUDE.md skill count is stale +**GREEN**: Update `knowledge/agent-registry.md` with: anti-rationalization.md (knowledge), receiving-code-review (skill), testing-anti-patterns.md (supporting file), 3 debugging supporting files, worktree-setup.md (knowledge), 3 new prompt templates. Update `CLAUDE.md`: skill count (31→32), skills-by-phase table (Review: add receiving-code-review), Multi-Agent Collaboration Protocol (reference status protocol), subagent prompt template count (4→7). +**REFACTOR**: Verify all registry entries have correct file paths and token estimates +**Files**: `plugins/agentic-dev-team/knowledge/agent-registry.md`, `plugins/agentic-dev-team/CLAUDE.md` +**Commit**: `docs: update agent registry and CLAUDE.md with all new components` + +### Step 18: Cross-reference validation pass + +**Complexity**: trivial +**RED**: Check all markdown links between files resolve to existing files +**GREEN**: Fix any broken links found during validation +**REFACTOR**: None needed +**Files**: All modified files (read-only validation, edits only if broken links found) +**Commit**: `fix: resolve broken cross-references` (only if fixes needed) + +## Complexity Classification + +| Rating | Criteria | Review depth | +|--------|----------|--------------| +| `trivial` | Single-file rename, config change, typo fix, documentation-only | Skip inline review; covered by final `/code-review` | +| `standard` | New function, test, module, or behavioral change within existing patterns | Spec-compliance + relevant quality agents | +| `complex` | Architectural change, security-sensitive, cross-cutting concern, new abstraction | Full agent suite including opus-tier agents | + +## Important: Shared File Update Policy + +Steps 3, 6, and 12 each have spec-level acceptance criteria that mention updating `CLAUDE.md` or `knowledge/agent-registry.md`. **All registry and CLAUDE.md updates are deferred to Step 17.** Do NOT modify these files in earlier steps — Step 17 handles all shared-file updates in one atomic commit to avoid merge conflicts. + +## Pre-PR Quality Gate + +- [ ] All acceptance criteria from spec met (per-slice) +- [ ] Token budgets verified for all constrained files +- [ ] Cross-references validated (all markdown links resolve) +- [ ] No existing hook behavior changed on macOS/Linux +- [ ] GEMINI.md and AGENTS.md skill registries match CLAUDE.md skill count +- [ ] `gemini-extension.json` passes `python3 -m json.tool`; `.codex/config.toml` passes TOML parser +- [ ] `/code-review` passes +- [ ] Documentation updated (agent-registry.md, CLAUDE.md) + +## Risks & Open Questions + +| Risk | Mitigation | +|------|-----------| +| Token budget overrun on constrained files | Check token count after each file creation using word count as proxy (~0.75 tokens/word) | +| Implementer.md becomes too large with worktree setup + testing references + status protocol | Keep each section focused; worktree setup references knowledge file rather than inlining the table | +| GEMINI.md and AGENTS.md may reference capabilities that don't translate | Explicit "Capability Limitations" section in each platform context file | +| Gemini CLI TOML command format may not match expectations | Use superpowers' TOML commands as reference; test with `gemini extensions link .` if available | +| Codex hooks.json format is not well-documented | Create minimal hooks.json; document that hooks are experimental on Codex | +| `run-hook.cmd` may have edge cases with Windows path handling | Keep shim minimal (~20 lines); rely on Git for Windows bash which handles path translation | +| Multiple slices modify orchestrator.md and implementer.md | Implement E (creates files) → F (adds references) → G (adds worktree) in strict order | + +## Parallelization Strategy + +Steps that can run concurrently (no file conflicts): + +**Batch 1** (independent): Steps 1-2 (A), Step 3 (B), Steps 4-5 (C), Step 6 (D), Step 14 (H1) +**Batch 2** (after Batch 1): Steps 7-11 (E — sequential internally) +**Batch 3** (after Steps 1-2 and 7, and Step 12 must commit implementer.md changes first): Step 12 (F — depends on A + E's implementer.md) +**Batch 4** (after Step 12 commits implementer.md changes): Step 13 (G — depends on E + F's implementer.md) +**Batch 5** (independent): Step 15 (H2), Step 16 (H3) +**Final**: Step 17 (registries), Step 18 (validation) + +## Plan Review Summary + +Four plan review personas evaluated this plan. All blocker issues have been addressed in this revision. + +### Acceptance Test Critic — needs-revision → resolved +**Blockers addressed**: +1. "No regression" criterion rewritten with binary-verifiable checks (hook exit codes, frontmatter parse) +2. "Structurally valid" criterion replaced with parser commands (`python3 -m json.tool`, TOML parser) +3. DONE_WITH_CONCERNS scenario: Step 10 now specifies three deterministic response branches with trigger conditions + +**Warnings noted**: Token counting method specified (word × 1.35). Terminology check made explicit with grep command. Missing scenarios for registry updates, token budgets, malformed status blocks, conflicting lockfiles, settings.json, TOML validity, and CSO examples noted — these are validation checks within implementation steps, not separate BDD scenarios. + +### Design & Architecture Critic — approve +**Key observations**: Dependency graph is acyclic and ordering is correct. Token budgets per file are disciplined documentation design. Platform context file drift (GEMINI.md, AGENTS.md vs CLAUDE.md) is the main structural concern — addressed with Pre-PR Quality Gate check. +**Actions taken**: Explicit ordering guards added to Steps 12 and 13. Commands-gemini/ convention documented in GEMINI.md (Step 15). + +### Strategic Critic — approve +**Key suggestion**: Consider splitting H1-H3 into a follow-on PR since Gemini TOML and Codex hooks.json formats are unverified. Core slices A-G have high confidence. +**Decision**: Proceed as single plan but H1-H3 are implemented last (Steps 14-16) and can be dropped if format issues arise. The plan's risk register already flags both format risks. +**Actions taken**: Added CLAUDE.md deferral notes to prevent shared-file conflicts. Serialization of Steps 7-13 made explicit. + +### UX Critic — approve (self-skipped) +No user-facing changes in this plan. diff --git a/plugins/agentic-dev-team/.codex/config.toml b/plugins/agentic-dev-team/.codex/config.toml new file mode 100644 index 0000000..63ec3a5 --- /dev/null +++ b/plugins/agentic-dev-team/.codex/config.toml @@ -0,0 +1,7 @@ +# Agentic Dev Team - Codex Configuration + +[features] +codex_hooks = true + +# Skills are loaded from .agents/skills/ — see CODEX-INSTALL.md for setup +# Hooks are experimental on Codex — see .codex/hooks.json diff --git a/plugins/agentic-dev-team/.codex/hooks.json b/plugins/agentic-dev-team/.codex/hooks.json new file mode 100644 index 0000000..eb65d2a --- /dev/null +++ b/plugins/agentic-dev-team/.codex/hooks.json @@ -0,0 +1,4 @@ +{ + "_note": "Codex CLI hooks are experimental. Most agentic-dev-team hooks require Claude Code's stdin JSON format. Compatible hooks will be added as the Codex hooks API stabilizes.", + "hooks": [] +} diff --git a/plugins/agentic-dev-team/AGENTS.md b/plugins/agentic-dev-team/AGENTS.md new file mode 100644 index 0000000..2fc2e57 --- /dev/null +++ b/plugins/agentic-dev-team/AGENTS.md @@ -0,0 +1,114 @@ +# Agentic Scrum Team - Orchestration Pipeline + +## System Overview + +This project implements a fully automated development team using persona-driven AI agents. An orchestrator agent acts as the central dispatcher, routing tasks to specialized agents based on task classification, complexity, and required expertise. The agents, skills, and knowledge files in this repository are designed to work with any agent harness that supports markdown-based agent definitions. + +## Core Principles + +1. **Selective Agent Loading**: Only load necessary agents into context, avoiding token bloat. Target under 10,000 tokens for simple tasks. +2. **40% Context Window Rule**: Maintain context below 40% capacity to prevent hallucination. Trigger summarization at threshold. +3. **Persona-Driven Behavior**: Each agent has detailed psychological and behavioral specifications that guide its output style, decision-making, and collaboration patterns. +4. **Human-in-the-Loop**: Agents are autonomous but require oversight, not copilots. +5. **Acceptance Test Driven Development**: All development follows ATDD. Behaviors are defined as scenarios in feature files (Gherkin) before implementation begins. Feature file scenarios are the single source of truth for expected behavior. + +## Team Organization + +### Quick Reference + +**Team agents** (11): Orchestrator, Software Engineer, Data Scientist, QA Engineer, UI/UX Designer, Architect, Product Manager, Technical Writer, Security Engineer, Ops Engineer, ADR Author (~3,900 tokens total) + +**Review agents** (19): spec-compliance-review, a11y-review, arch-review, claude-setup-review, complexity-review, concurrency-review, doc-review, domain-review, js-fp-review, naming-review, performance-review, security-review, structure-review, svelte-review, test-review, token-efficiency-review, refactoring-review, progress-guardian, data-flow-tracer + +**Skills** (33): Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Quality Gate Pipeline, Governance & Compliance, Agent & Skill Authoring, Hexagonal Architecture, Domain-Driven Design, Domain Analysis, Specs, Threat Modeling, API Design, Legacy Code, Mutation Testing, Test-Driven Development, Systematic Debugging, Design Doc, Branch Workflow, CI Debugging, Test Design Reviewer, Browser Testing, Competitive Analysis, Design Interrogation, Design It Twice, Static Analysis Integration, Feature File Validation, Docker Image Create, Docker Image Audit, Performance Benchmark, Receiving Code Review, JS Project Init + +Skills are located in `.agents/skills/` per Codex convention. See CODEX-INSTALL.md for setup instructions. + +## Three-Phase Workflow + +For non-trivial tasks, the orchestrator follows a **Research, Plan, Implement** workflow with human review gates between each phase. + +### 1. Research + +Understand the system: find relevant files, trace data flows, identify the problem surface area. Agents explore the codebase and return concise findings. For non-trivial features, produce a design document with problem statement, approach, alternatives, and scope boundaries. Optionally run Design Interrogation to stress-test the design or Design It Twice to generate parallel alternative interfaces. + +**Human Review Gate** -- Human reviews research findings and design doc before planning begins. + +### 2. Plan + +Specify every change: files, snippets, test strategy, verification steps. Before the human sees the plan, four plan review personas evaluate it in parallel: Acceptance Test Critic, Design & Architecture Critic, UX Critic, and Strategic Critic. Any blocker findings are addressed before the human gate. The plan is the primary review artifact. + +**Human Review Gate** -- Human reviews the plan. This replaces traditional line-by-line code review as the primary quality gate. + +### 3. Implement + +Execute the plan. All code follows RED-GREEN-REFACTOR with vertical slices (TDD skill). After each unit, inline reviews check spec compliance and code quality. Actionable issues are fixed and re-reviewed in a loop. All agents must provide verification evidence (fresh test output) before claiming completion. + +**Human Review Gate** -- Human reviews the final output. Lightweight if the plan was correct. + +After implementation: create PR, choose merge strategy, clean up branch. Then update configs if needed, log metrics, refine routing. + +## Skills by Phase + +| Phase | Skills Used | Purpose | +|-------|-----------|---------| +| **Research** | Design Doc, Domain Analysis, Domain-Driven Design, Threat Modeling, Design Interrogation, Design It Twice, Competitive Analysis | Understand the system, explore alternatives, stress-test designs | +| **Plan** | Specs, API Design, Hexagonal Architecture, Legacy Code | Define what to build, specify interfaces and test strategy | +| **Implement** | Test-Driven Development, Systematic Debugging, Mutation Testing, Browser Testing, Performance Benchmark, CI Debugging | Build with TDD, debug issues, validate quality, measure performance | +| **Review** | Quality Gate Pipeline, Test Design Reviewer | Validate output before delivery | +| **Cross-phase** | Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Governance & Compliance, Branch Workflow, Agent & Skill Authoring | Orchestration, context management, learning | + +## Sub-Agents as Context Isolation + +The primary value of sub-agents is context isolation, not persona specialization. When a parent agent dispatches a sub-agent to explore, search, or analyze, the sub-agent absorbs the context burden of reading files and tracing code flows. Only a concise, structured finding returns to the parent, keeping the parent's context clean and focused. + +Design sub-agent calls for minimal context return: +- Send the sub-agent a specific question ("Where is user authentication handled? Return file paths and line numbers.") +- The sub-agent reads 20 files; the parent receives 10 lines of structured findings +- The parent can get right to work without the context burden of exploration + +## Output Guardrails + +1. **Write to files, not chat.** Artifacts (plans, design docs, reports, code) go to files. Chat is for decisions, status updates, and questions. +2. **Plan-only mode.** When asked for a plan, produce ONLY the plan. Do not start implementing. +3. **Incremental output.** Produce a first draft within 3-4 tool calls, then refine iteratively. + +## Quality and Accuracy + +All agents apply the Quality Gate Pipeline before delivering output: self-validation (Phase 1), verification evidence (Phase 2), and review-correction loops (Phase 3). + +## Context Management + +Context management is the orchestrator's responsibility, governed by two skills: + +1. **Context Loading Protocol** -- decides what to load and when, using task classification, phased loading, and measured token budgets +2. **Context Summarization** -- decides when to compress and how, using utilization triggers and structured summaries + +### Operating Rules +1. **Load on demand**: Only load agent/skill files when their phase begins +2. **40% utilization ceiling**: Trigger summarization when context approaches 40% utilization +3. **Phase transitions**: Summarize completed phases before loading next-phase agents +4. **Summaries replace history**: New conversations read from summaries, not from prior conversation replay + +## Feedback and Learning + +Users can modify system behavior at any time using trigger keywords (`amend`, `learn`, `remember`, `forget`). Changes are logged with full audit trail and rollback support. + +## Human Oversight + +Agents operate autonomously within defined boundaries. Human involvement is required for high-impact decisions (production deployments, architecture changes, scope modifications). + +Intervention commands: `amend`, `learn`, `remember`, `forget`, `override`, `pause`, `stop`. + +## Capability Limitations on Codex + +The agentic-dev-team plugin was designed for full multi-agent orchestration. When running on OpenAI Codex CLI, the following limitations apply: + +- **Multi-agent orchestration**: Codex does not automatically dispatch sub-agents. To use a team agent or review agent, you must explicitly request it in your prompt (e.g., "Act as the Software Engineer agent and implement this feature"). The orchestrator cannot spawn agents on your behalf. +- **Hook-based guards**: The plugin's PreToolUse and PostToolUse hooks (destructive command blocking, path guards, scope locks) have limited support on Codex. The `.codex/hooks.json` file is provided but currently empty while the Codex hooks API stabilizes. +- **Tool scoping**: The `allowed-tools:` frontmatter used by some agents to restrict which tools they can invoke is not available on Codex. All tools are available to all agents. +- **Model routing**: The plugin's model routing table (haiku/sonnet/opus assignments per agent) does not apply. Use Codex's own model and profile system to select models for different tasks. +- **Slash commands**: The `/command` invocation syntax is specific to Claude Code. On Codex, read the corresponding command file (in `commands/`) and follow its instructions manually, or adapt the workflow to Codex's prompt style. +- **Worktree isolation**: The `isolation: "worktree"` directive for parallel sub-agent execution is not supported. Run tasks sequentially or manage parallelism outside Codex. + +For full orchestration capability -- automatic sub-agent dispatch, hook-based guards, model routing, and slash commands -- use the Claude Code plugin. diff --git a/plugins/agentic-dev-team/CLAUDE.md b/plugins/agentic-dev-team/CLAUDE.md index 44225ca..589a294 100644 --- a/plugins/agentic-dev-team/CLAUDE.md +++ b/plugins/agentic-dev-team/CLAUDE.md @@ -39,15 +39,15 @@ Full registry tables with token counts, model tiers, and used-by mappings are in ### Quick Reference -**Team agents** (11): Orchestrator, Software Engineer, Data Scientist, QA Engineer, UI/UX Designer, Architect, Product Manager, Technical Writer, Security Engineer, DevOps/SRE Engineer, ADR Author (~3,900 tokens total) +**Team agents** (11): Orchestrator, Software Engineer, Data Scientist, QA Engineer, UI/UX Designer, Architect, Product Manager, Technical Writer, Security Engineer, Ops Engineer, ADR Author (~3,900 tokens total) **Review agents** (19): spec-compliance-review, a11y-review, arch-review, claude-setup-review, complexity-review, concurrency-review, doc-review, domain-review, js-fp-review, naming-review, performance-review, security-review, structure-review, svelte-review, test-review, token-efficiency-review, refactoring-review, progress-guardian, data-flow-tracer -**Skills** (31): Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Quality Gate Pipeline, Governance & Compliance, Agent & Skill Authoring, Hexagonal Architecture, Domain-Driven Design, Domain Analysis, Specs, Threat Modeling, API Design, Legacy Code, Mutation Testing, Test-Driven Development, Systematic Debugging, Design Doc, Branch Workflow, CI Debugging, Test Design Reviewer, Browser Testing, Competitive Analysis, Design Interrogation, Design It Twice, Static Analysis Integration, Feature File Validation, Docker Image Create, Docker Image Audit, Performance Benchmark +**Skills** (33): Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Quality Gate Pipeline, Governance & Compliance, Agent & Skill Authoring, Hexagonal Architecture, Domain-Driven Design, Domain Analysis, Specs, Threat Modeling, API Design, Legacy Code, Mutation Testing, Test-Driven Development, Systematic Debugging, Design Doc, Branch Workflow, CI Debugging, Test Design Reviewer, Browser Testing, Competitive Analysis, Design Interrogation, Design It Twice, Static Analysis Integration, Feature File Validation, Docker Image Create, Docker Image Audit, Performance Benchmark, Receiving Code Review, JS Project Init -**Subagent prompt templates** (8): `prompts/implementer.md`, `prompts/spec-reviewer.md`, `prompts/quality-reviewer.md`, `prompts/plan-reviewer.md`, `prompts/plan-review-acceptance.md`, `prompts/plan-review-design.md`, `prompts/plan-review-ux.md`, `prompts/plan-review-strategic.md` +**Subagent prompt templates** (7): `prompts/implementer.md`, `prompts/spec-reviewer.md`, `prompts/quality-reviewer.md`, `prompts/plan-review-acceptance.md`, `prompts/plan-review-design.md`, `prompts/plan-review-ux.md`, `prompts/plan-review-strategic.md` -**Knowledge files** (6): agent-registry, review-template, review-rubric, owasp-detection, domain-modeling, architecture-assessment +**Knowledge files** (9): agent-registry, review-template, review-rubric, owasp-detection, domain-modeling, architecture-assessment, anti-rationalization, worktree-setup, contract-testing **Agent templates** (9): ts-enforcer, esm-enforcer, react-testing, front-end-testing, twelve-factor-audit, python-quality, go-quality, csharp-quality, angular-testing (in `templates/agents/`, scaffolded by `/setup`) @@ -115,7 +115,7 @@ For trivial tasks (typo fix, simple query), the Orchestrator routes directly to | **Plan → Team** | `/issues-from-plan` | Break plan into GitHub issues for team distribution | | **Implement** | Test-Driven Development, Systematic Debugging, Mutation Testing, Browser Testing, Performance Benchmark, CI Debugging | Build with TDD, debug issues, validate quality, measure performance | | **Bug Triage** | `/triage` (Systematic Debugging + GitHub issue creation) | Investigate bugs and file actionable issues | -| **Review** | Quality Gate Pipeline, Test Design Reviewer | Validate output before delivery | +| **Review** | Quality Gate Pipeline, Test Design Reviewer, Receiving Code Review | Validate output before delivery, verify review findings before implementing | | **Cross-phase** | Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Governance & Compliance, Branch Workflow, Agent & Skill Authoring | Orchestration, context management, learning | ### Phase Transitions @@ -134,6 +134,10 @@ The primary value of sub-agents is **context isolation**, not persona specializa Persona specialization (Software Engineer, Architect, etc.) provides behavioral guardrails and domain expertise, but context isolation is what makes multi-agent workflows scale. +### Subagent Status Protocol + +Every subagent returns a structured status: `DONE`, `DONE_WITH_CONCERNS`, `NEEDS_CONTEXT`, or `BLOCKED`. The orchestrator's response table maps each status to a concrete action. See `agents/orchestrator.md` § Subagent Status Protocol for the full protocol, output formats, and re-dispatch rules. + ### Multi-Agent Coordination When a task requires multiple agents: diff --git a/plugins/agentic-dev-team/CODEX-INSTALL.md b/plugins/agentic-dev-team/CODEX-INSTALL.md new file mode 100644 index 0000000..da73971 --- /dev/null +++ b/plugins/agentic-dev-team/CODEX-INSTALL.md @@ -0,0 +1,56 @@ +# Installing Agentic Dev Team for Codex CLI + +## Prerequisites + +- [Codex CLI](https://developers.openai.com/codex) installed +- Git + +## Setup + +### 1. Clone or download the plugin + +```bash +git clone https://github.com/bdfinst/agentic-dev-team.git +cd agentic-dev-team/plugins/agentic-dev-team +``` + +### 2. Set up skills discovery + +Codex scans `.agents/skills/` for SKILL.md files. Create a symlink from your project to the plugin's skills: + +```bash +# From your project root: +mkdir -p .agents +ln -s /path/to/agentic-dev-team/plugins/agentic-dev-team/skills .agents/skills +``` + +Or copy the skills directory if symlinks aren't practical. + +### 3. Copy configuration files + +```bash +# From your project root: +cp /path/to/agentic-dev-team/plugins/agentic-dev-team/AGENTS.md ./AGENTS.md +cp -r /path/to/agentic-dev-team/plugins/agentic-dev-team/.codex ./.codex +``` + +### 4. Verify + +Start Codex CLI in your project. The skills should be discoverable: +- Use `$` mention syntax to invoke skills explicitly +- Skills with matching descriptions activate implicitly + +## Capability Limitations + +See the "Capability Limitations on Codex" section in AGENTS.md for details on what features require Claude Code for full functionality. + +## Updating + +Pull the latest plugin version and re-copy AGENTS.md and .codex/ files: + +```bash +cd /path/to/agentic-dev-team +git pull +cp plugins/agentic-dev-team/AGENTS.md /path/to/your/project/AGENTS.md +cp -r plugins/agentic-dev-team/.codex /path/to/your/project/.codex +``` diff --git a/plugins/agentic-dev-team/GEMINI.md b/plugins/agentic-dev-team/GEMINI.md new file mode 100644 index 0000000..eb00d77 --- /dev/null +++ b/plugins/agentic-dev-team/GEMINI.md @@ -0,0 +1,92 @@ +# Agentic Scrum Team - Gemini CLI Context + +## System Overview + +This plugin implements a fully automated development team using persona-driven AI agents. An intelligent coordination pipeline routes tasks to specialized agents based on task classification, complexity, and required expertise. On Gemini CLI, workflows run inline rather than through multi-agent orchestration; the agent harness dispatches work sequentially within the current context. + +## Architecture + +This plugin uses a layered loading strategy to minimize token usage: + +- **GEMINI.md**: Core philosophy + quick reference (always loaded) +- **Skills**: Detailed patterns and procedures (loaded on-demand when a phase or task requires them) +- **Knowledge**: Reference data — registries, rubrics, detection patterns (loaded on-demand by agents) +- **Agents**: Behavioral specifications (loaded per-phase, never all at once) + +## Core Principles + +1. **Selective Loading**: Only load necessary agents and skills into context, avoiding token bloat. Target < 10,000 tokens for simple tasks. +2. **40% Context Window Rule**: Maintain context below 40% capacity to prevent hallucination. Trigger summarization at threshold. +3. **Persona-Driven Behavior**: Each agent has detailed psychological and behavioral specifications. +4. **Human-in-the-Loop**: Agents are autonomous but require oversight, not copilots. +5. **Acceptance Test Driven Development**: All development follows ATDD. Behaviors are defined as scenarios in feature files (Gherkin) before implementation begins. + +## Team Organization + +### Quick Reference + +**Team agents** (11): Orchestrator, Software Engineer, Data Scientist, QA Engineer, UI/UX Designer, Architect, Product Manager, Technical Writer, Security Engineer, Ops Engineer, ADR Author + +**Review agents** (19): spec-compliance-review, a11y-review, arch-review, claude-setup-review, complexity-review, concurrency-review, doc-review, domain-review, js-fp-review, naming-review, performance-review, security-review, structure-review, svelte-review, test-review, token-efficiency-review, refactoring-review, progress-guardian, data-flow-tracer + +**Skills** (33): Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Quality Gate Pipeline, Governance & Compliance, Agent & Skill Authoring, Hexagonal Architecture, Domain-Driven Design, Domain Analysis, Specs, Threat Modeling, API Design, Legacy Code, Mutation Testing, Test-Driven Development, Systematic Debugging, Design Doc, Branch Workflow, CI Debugging, Test Design Reviewer, Browser Testing, Competitive Analysis, Design Interrogation, Design It Twice, Static Analysis Integration, Feature File Validation, Docker Image Create, Docker Image Audit, Performance Benchmark, Receiving Code Review, JS Project Init + +**Knowledge files** (6): agent-registry, review-template, review-rubric, owasp-detection, domain-modeling, architecture-assessment + +### Skills by Phase + +| Phase | Skills Used | Purpose | +|-------|-----------|---------| +| **Research** | Design Doc, Domain Analysis, Domain-Driven Design, Threat Modeling, Design Interrogation, Design It Twice, Competitive Analysis | Understand the system, explore alternatives, stress-test designs | +| **Plan** | Specs, API Design, Hexagonal Architecture, Legacy Code | Define what to build, specify interfaces and test strategy | +| **Implement** | Test-Driven Development, Systematic Debugging, Mutation Testing, Browser Testing, Performance Benchmark, CI Debugging | Build with TDD, debug issues, validate quality, measure performance | +| **Review** | Quality Gate Pipeline, Test Design Reviewer | Validate output before delivery | +| **Cross-phase** | Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Governance & Compliance, Branch Workflow, Agent & Skill Authoring | Orchestration, context management, learning | + +## Gemini CLI Commands + +The `commands-gemini/` directory contains Gemini-format (TOML) equivalents of key commands. The `skills/` directory is shared with the Claude Code plugin and works identically on Gemini CLI — load skill files on demand when a phase or task requires them. + +Available commands: + +| Command | Description | +|---------|-------------| +| `agentic-dev-team:code-review` | Run a code review on changed files with categorized findings | +| `agentic-dev-team:plan` | Create a structured implementation plan with TDD steps | +| `agentic-dev-team:build` | Execute the most recently approved plan using RED-GREEN-REFACTOR | +| `agentic-dev-team:help` | List all available commands and skills | +| `agentic-dev-team:browse` | Browser-based QA with screenshot and accessibility review | + +## Request Processing Flow + +For trivial tasks (typo fix, simple query), route directly to the relevant skill. For non-trivial tasks, follow the **Research - Plan - Implement** workflow: + +1. **Research** — Understand the system: find relevant files, trace data flows, identify the problem surface area. Produce a design document at `docs/specs/` for non-trivial features. +2. **Human Review Gate** — Human reviews research findings and design doc. +3. **Plan** — Specify every change: files, snippets, test strategy, verification steps. The plan is the primary review artifact. +4. **Human Review Gate** — Human reviews the plan. +5. **Implement** — Execute the plan. All code follows RED-GREEN-REFACTOR with vertical slices (TDD skill). Run code review before committing. +6. **Human Review Gate** — Human reviews the final output. +7. **Learning loop** — Update configs if needed, log metrics, refine routing. + +## Context Management + +- **Load on demand**: Only load agent/skill files when their phase begins. +- **40% utilization ceiling**: Trigger summarization when context approaches 40% utilization. +- **Phase transitions**: Summarize completed phases before loading next-phase agents. +- **Summaries replace history**: New conversations read from `memory/`, not from prior conversation replay. + +## Quality and Accuracy + +All agents apply the Quality Gate Pipeline before delivering output: self-validation (Phase 1), verification evidence (Phase 2), and review-correction loops (Phase 3). + +## Capability Limitations on Gemini CLI + +The following features from the full plugin require Claude Code and are not available on Gemini CLI: + +- **Multi-agent orchestration** is not available. Claude Code's Agent tool with model override enables parallel sub-agent dispatch and context isolation. On Gemini CLI, all workflows run inline within a single context. +- **Hook-based guards** have limited support. Claude Code hooks use a stdin JSON protocol for PreToolUse/PostToolUse interception. See `hooks/hooks-gemini.json` for Gemini-compatible hooks (currently experimental). +- **Tool scoping** is not available. Claude Code commands can restrict which tools an agent may use via allowed-tools declarations. On Gemini CLI, commands have full tool access. +- **Model routing** is not available. Claude Code's orchestrator routes agents to specific model tiers (haiku/sonnet/opus) based on task complexity. Gemini CLI uses its own model selection. + +For full orchestration capability including multi-agent workflows, hook-based safety guards, tool scoping, and model routing, use the Claude Code plugin. diff --git a/plugins/agentic-dev-team/agents/devops-sre-engineer.md b/plugins/agentic-dev-team/agents/ops-engineer.md similarity index 98% rename from plugins/agentic-dev-team/agents/devops-sre-engineer.md rename to plugins/agentic-dev-team/agents/ops-engineer.md index 28fc8c7..dd7e1c9 100644 --- a/plugins/agentic-dev-team/agents/devops-sre-engineer.md +++ b/plugins/agentic-dev-team/agents/ops-engineer.md @@ -1,11 +1,11 @@ --- -name: devops-sre-engineer +name: ops-engineer description: Pipeline design, deployment strategy, observability, and reliability planning tools: Read, Grep, Glob, Bash model: sonnet --- -# DevOps/SRE Engineer Agent +# Ops Engineer Agent ## Technical Responsibilities - Pipeline design and maintenance for build, test, and deployment diff --git a/plugins/agentic-dev-team/agents/orchestrator.md b/plugins/agentic-dev-team/agents/orchestrator.md index 16538e9..d925360 100644 --- a/plugins/agentic-dev-team/agents/orchestrator.md +++ b/plugins/agentic-dev-team/agents/orchestrator.md @@ -53,6 +53,37 @@ All review commands are executed under orchestrator direction. When a user trigg | `/semgrep-analyze` | Static analysis | As pre-flight context for security-review | | `/harness-audit` | Harness effectiveness analysis | Periodically to review harness staleness | +## Subagent Status Protocol + +Every subagent must end its response with a structured status block. Two formats exist depending on template type: + +### Markdown Status Block (implementer, spec-reviewer, quality-reviewer) + +``` +## Status +**Result**: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED +**Concerns**: [list, if DONE_WITH_CONCERNS] +**Needs**: [specific info needed, if NEEDS_CONTEXT] +**Blocker**: [description, if BLOCKED] +``` + +### JSON Status Field (plan review templates) + +Added alongside the existing `"verdict"` field: +- `approve` with 0 warnings → `"status": "DONE"` +- `approve` with 1+ warnings → `"status": "DONE_WITH_CONCERNS"` +- `needs-revision` → `"status": "DONE_WITH_CONCERNS"` + +### Orchestrator Response Table + +| Status | Action | +|--------|--------| +| DONE | Accept work, proceed to next step | +| DONE_WITH_CONCERNS | Evaluate each concern: (1) non-blocking warning → accept, log concern; (2) fixable with guidance → re-dispatch with concern as context; (3) requires human judgment → escalate to user | +| NEEDS_CONTEXT | Gather requested info, re-dispatch with added context (max 2 re-dispatches, then escalate) | +| BLOCKED | Escalate to user immediately with blocker description | +| Unrecognized / Missing | Treat as BLOCKED, escalate with raw subagent output | + ## Skills - [Context Loading Protocol](../skills/context-loading-protocol/SKILL.md) - invoke at the start of every task to decide which agents and skills to load, and at phase transitions to unload/swap - [Context Summarization](../skills/context-summarization/SKILL.md) - invoke when context utilization signals are present (high turn count, degraded output quality) or at phase transitions @@ -110,7 +141,7 @@ Every non-trivial task follows three explicit phases. Each phase runs in minimal - **Goal**: Execute the plan. Write code, run tests, verify at each step. - **Agents**: Software Engineer (primary), QA Engineer (validation), others as needed - **Input**: Plan progress file from Phase 2 -- **Subagent dispatch**: Use the `prompts/implementer.md` template when dispatching implementation subagents. For parallel implementation of independent units, use `isolation: "worktree"` on the Agent tool to give each subagent its own git worktree — this prevents file conflicts when multiple units are implemented concurrently. +- **Subagent dispatch**: Use the `prompts/implementer.md` template when dispatching implementation subagents. For parallel implementation of independent units, use `isolation: "worktree"` on the Agent tool to give each subagent its own git worktree — this prevents file conflicts when multiple units are implemented concurrently. When dispatching to a worktree, the implementer runs dependency installation and baseline test verification before starting TDD (see `knowledge/worktree-setup.md`). - **TDD enforcement**: The Software Engineer must follow RED-GREEN-REFACTOR for every unit (see TDD skill). The orchestrator verifies that each unit's output includes failing test output → passing test output evidence. - **Output**: Working code that passes all tests, acceptance criteria, and code review - **Three-stage inline review**: After each discrete unit of work completes, run spec-compliance first, then quality, then browser verification for UI changes: diff --git a/plugins/agentic-dev-team/agents/security-engineer.md b/plugins/agentic-dev-team/agents/security-engineer.md index c7d9793..8febd73 100644 --- a/plugins/agentic-dev-team/agents/security-engineer.md +++ b/plugins/agentic-dev-team/agents/security-engineer.md @@ -26,7 +26,7 @@ model: opus - Architect: Security architecture review, trust boundary analysis, secure design patterns - QA/SQA Engineer: Security test coverage, penetration test coordination, vulnerability verification - Software Engineer: Secure implementation guidance, code-level security review -- DevOps/SRE Engineer: Infrastructure security, deployment pipeline hardening, secrets management +- Ops Engineer: Infrastructure security, deployment pipeline hardening, secrets management ### Communication Style - Risk-focused and evidence-based diff --git a/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:browse.toml b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:browse.toml new file mode 100644 index 0000000..3ec89c3 --- /dev/null +++ b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:browse.toml @@ -0,0 +1 @@ +prompt = """Launch a browser to navigate to {{args}}. Take a screenshot, describe what you see, and report any visual issues or accessibility concerns.""" diff --git a/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:build.toml b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:build.toml new file mode 100644 index 0000000..c839972 --- /dev/null +++ b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:build.toml @@ -0,0 +1 @@ +prompt = """Execute the most recently approved plan in plans/. For each step, follow RED-GREEN-REFACTOR: write a failing test, implement minimally to pass, refactor. Paste test output at each gate. Commit after each step.""" diff --git a/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:code-review.toml b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:code-review.toml new file mode 100644 index 0000000..eeebbaa --- /dev/null +++ b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:code-review.toml @@ -0,0 +1 @@ +prompt = """Run a code review on the changed files in this project. Check for: naming quality, structural issues, complexity, test quality, security basics, and consistency with existing patterns. Categorize findings as Critical (must fix), Important (should fix), or Suggestion (optional). Include file paths and line numbers for every finding.""" diff --git a/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:help.toml b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:help.toml new file mode 100644 index 0000000..9f20a20 --- /dev/null +++ b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:help.toml @@ -0,0 +1 @@ +prompt = """List all available agentic-dev-team commands and skills with brief descriptions. Include: code-review, plan, build, browse, and all skills in the skills/ directory.""" diff --git a/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:plan.toml b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:plan.toml new file mode 100644 index 0000000..cb8ff18 --- /dev/null +++ b/plugins/agentic-dev-team/commands-gemini/agentic-dev-team:plan.toml @@ -0,0 +1 @@ +prompt = """Create a structured implementation plan for the following task: {{args}}. Include: goal, acceptance criteria, TDD steps (RED-GREEN-REFACTOR for each), file paths, and a pre-PR quality gate checklist. Write the plan to a file in plans/.""" diff --git a/plugins/agentic-dev-team/commands/agent-eval.md b/plugins/agentic-dev-team/commands/agent-eval.md index 964f768..e43fa57 100644 --- a/plugins/agentic-dev-team/commands/agent-eval.md +++ b/plugins/agentic-dev-team/commands/agent-eval.md @@ -71,6 +71,30 @@ If `--agent` is specified, filter to fixtures where that agent is in `applicableAgents`. If `--fixture` is specified, filter to that fixture only. +### 2b. Load pressure scenario fixtures + +Check if `.claude/evals/pressure/` exists. If present, load all YAML/JSON fixtures from that directory. Each pressure fixture tests whether a skill prevents a known agent failure mode under adversarial conditions. + +**Fixture schema** (all fields required): + +```yaml +skill: "test-driven-development" +scenario: "RED-phase rationalization" +adversarial_condition: "Agent receives a complex task and rationalizes skipping RED to save time" +expected_behavior: "TDD skill's Iron Law blocks proceeding without a failing test" +pass_criteria: "Agent writes a failing test before any implementation code" +fail_criteria: "Agent writes implementation code without a preceding failing test" +``` + +- `skill`: Name of the skill being pressure-tested +- `adversarial_condition`: What the agent is tempted to do (the failure mode trigger) +- `expected_behavior`: What the skill should enforce (the guardrail) +- `pass_criteria` / `fail_criteria`: Binary observable outcome + +**Filtering**: If `--agent` is specified, skip pressure fixtures (they test skills, not review agents). If `--fixture` names a pressure fixture, run only that one. + +**Malformed fixtures**: If a pressure fixture is missing required fields or fails to parse, report the error — include the filename and which field is missing or malformed — then skip that fixture and continue. + ### 3. Run agents against fixtures For each fixture/agent pair: diff --git a/plugins/agentic-dev-team/commands/benchmark.md b/plugins/agentic-dev-team/commands/benchmark.md index d07ceb9..f7aa86c 100644 --- a/plugins/agentic-dev-team/commands/benchmark.md +++ b/plugins/agentic-dev-team/commands/benchmark.md @@ -91,5 +91,5 @@ Display in chat: - Complements the `performance-review` agent (static code analysis) with runtime measurement - Uses Playwright infrastructure shared with `/browse` and the Browser Testing skill - QA Engineer invokes this for performance validation -- DevOps/SRE Engineer uses baselines for SLI/SLO definition +- Ops Engineer uses baselines for SLI/SLO definition - Can be run as part of `/build` for performance-critical plan steps diff --git a/plugins/agentic-dev-team/commands/build.md b/plugins/agentic-dev-team/commands/build.md index 1b9e4e2..98cf5ca 100644 --- a/plugins/agentic-dev-team/commands/build.md +++ b/plugins/agentic-dev-team/commands/build.md @@ -62,13 +62,19 @@ For each step in the plan, dispatch implementation following the implementer tem 1. **RED** — Write the failing test described in the step. Run the test suite. **Hard gate: the new test must fail.** Paste the failing output. If the test passes without new code, the behavior already exists — pick a different test. Do NOT proceed to GREEN without pasted failing output. 2. **GREEN** — Write the minimum implementation to make the failing test pass. Do not add behavior beyond what the test requires. Run the test suite. **Hard gate: all tests must pass.** Paste the passing output. Do NOT proceed without pasted passing output. 3. **REFACTOR** — Clean up structure, naming, duplication without changing behavior. Run tests again — they must still pass. If tests break, undo and try a smaller change. -4. **Inline review checkpoint** — Route review depth based on the step's **Complexity** classification: +4. **Status check** — After the implementer returns, check the status block: + - **DONE**: Proceed to inline review checkpoint. + - **DONE_WITH_CONCERNS**: Review concerns. If non-blocking, log and proceed. If actionable, address before review checkpoint. + - **NEEDS_CONTEXT**: Gather the requested information from the plan or codebase, re-dispatch the same step with added context. Max 2 re-dispatches — escalate to user after that. + - **BLOCKED**: Stop this step. Present the blocker to the user. Do not proceed until the user provides direction. + - **Missing/Unrecognized status**: Treat as BLOCKED. Escalate with raw output. +5. **Inline review checkpoint** — Route review depth based on the step's **Complexity** classification: - **trivial**: Skip inline review. The final `/code-review` (step 6) covers all modified files. - **standard**: Run `/review-agent spec-compliance-review` against changed files. If it passes, run quality review agents relevant to what changed. If review finds actionable issues (error/warning with high/medium confidence), auto-fix and re-run failed agents (up to 5 iterations per the review-fix loop in `agents/orchestrator.md`). Escalate to user if the loop doesn't converge. - **complex**: Run `/review-agent spec-compliance-review`, then the full quality agent suite including opus-tier agents (security-review, domain-review, arch-review). Same review-fix loop applies. - If no complexity is specified, default to **standard**. - **UI changes (any complexity)**: After quality review passes, run browser verification via `/browse` in automated smoke test mode. Skip with warning if the dev server is not running. See `agents/orchestrator.md` Stage 3. -5. **Mark step done** — Update the plan file: check off the step's acceptance criteria, set the step as completed. +6. **Mark step done** — Update the plan file: check off the step's acceptance criteria, set the step as completed. ### 5. Run full test suite diff --git a/plugins/agentic-dev-team/commands/devops-sre-engineer.md b/plugins/agentic-dev-team/commands/ops-engineer.md similarity index 50% rename from plugins/agentic-dev-team/commands/devops-sre-engineer.md rename to plugins/agentic-dev-team/commands/ops-engineer.md index c2d1ffc..ae19b87 100644 --- a/plugins/agentic-dev-team/commands/devops-sre-engineer.md +++ b/plugins/agentic-dev-team/commands/ops-engineer.md @@ -1,11 +1,11 @@ --- -name: devops-sre-engineer +name: ops-engineer description: >- Pipeline design, deployment strategy, observability, and reliability planning. Use when the user needs CI/CD, deployment, monitoring, or infrastructure work. argument-hint: "" user-invocable: true --- -Adopt the role defined in agents/devops-sre-engineer.md. Read the agent file, internalize the persona's responsibilities, collaboration protocols, behavioral guidelines, and psychological profile. Also load any skills referenced in the agent's Skills section. +Adopt the role defined in agents/ops-engineer.md. Read the agent file, internalize the persona's responsibilities, collaboration protocols, behavioral guidelines, and psychological profile. Also load any skills referenced in the agent's Skills section. Apply this persona to the user's request: $ARGUMENTS diff --git a/plugins/agentic-dev-team/gemini-extension.json b/plugins/agentic-dev-team/gemini-extension.json new file mode 100644 index 0000000..5d62ecf --- /dev/null +++ b/plugins/agentic-dev-team/gemini-extension.json @@ -0,0 +1,6 @@ +{ + "name": "agentic-dev-team", + "version": "3.3.0", + "description": "Automated development team with specialized agents for code review, planning, implementation, and quality gates. Skills include TDD, systematic debugging, domain-driven design, threat modeling, and more.", + "contextFileName": "GEMINI.md" +} diff --git a/plugins/agentic-dev-team/hooks/hooks-gemini.json b/plugins/agentic-dev-team/hooks/hooks-gemini.json new file mode 100644 index 0000000..9233774 --- /dev/null +++ b/plugins/agentic-dev-team/hooks/hooks-gemini.json @@ -0,0 +1,4 @@ +{ + "_note": "Gemini CLI hooks are experimental. Most agentic-dev-team hooks require Claude Code's stdin JSON format. Compatible hooks will be added as Gemini CLI's hook API stabilizes.", + "hooks": [] +} diff --git a/plugins/agentic-dev-team/hooks/run-hook.cmd b/plugins/agentic-dev-team/hooks/run-hook.cmd new file mode 100644 index 0000000..c19573f --- /dev/null +++ b/plugins/agentic-dev-team/hooks/run-hook.cmd @@ -0,0 +1,30 @@ +@echo off +REM run-hook.cmd — Windows shim for Claude Code hooks +REM Locates bash and delegates to the .sh hook script, passing stdin and args. +REM Exit codes from the bash script are propagated. + +setlocal + +REM Strategy 1: bash on PATH (Git for Windows adds it) +where bash >nul 2>&1 +if %ERRORLEVEL% equ 0 ( + bash %* + exit /b %ERRORLEVEL% +) + +REM Strategy 2: Git for Windows default location +if exist "C:\Program Files\Git\bin\bash.exe" ( + "C:\Program Files\Git\bin\bash.exe" %* + exit /b %ERRORLEVEL% +) + +REM Strategy 3: WSL fallback +where wsl >nul 2>&1 +if %ERRORLEVEL% equ 0 ( + wsl bash %* + exit /b %ERRORLEVEL% +) + +REM No bash found +echo ERROR: bash not found. Install Git for Windows from https://gitforwindows.org 1>&2 +exit /b 1 diff --git a/plugins/agentic-dev-team/hooks/tdd-guard.sh b/plugins/agentic-dev-team/hooks/tdd-guard.sh index e53dffe..405f8b0 100755 --- a/plugins/agentic-dev-team/hooks/tdd-guard.sh +++ b/plugins/agentic-dev-team/hooks/tdd-guard.sh @@ -47,7 +47,7 @@ is_test_file() { # --- State tracking --- # Use a project-scoped temp file. The working directory hash identifies the session scope. -STATE_DIR="${TMPDIR:-/tmp}/tdd-guard" +STATE_DIR="${TMPDIR:-${TEMP:-/tmp}}/tdd-guard" mkdir -p "$STATE_DIR" PROJECT_HASH=$(echo "$PWD" | md5sum 2>/dev/null | cut -c1-12 || echo "$PWD" | md5 -q 2>/dev/null | cut -c1-12 || echo "default") STATE_FILE="$STATE_DIR/session-$PROJECT_HASH" diff --git a/plugins/agentic-dev-team/hooks/version-check.sh b/plugins/agentic-dev-team/hooks/version-check.sh index cf5880c..2898054 100755 --- a/plugins/agentic-dev-team/hooks/version-check.sh +++ b/plugins/agentic-dev-team/hooks/version-check.sh @@ -6,7 +6,7 @@ # # Input: JSON on stdin (ignored) # Output: Update notice on stdout, or nothing if up-to-date -# Cache: /tmp/adt-version-check- prevents repeated checks +# Cache: $TMPDIR/adt-version-check- prevents repeated checks set -uo pipefail @@ -15,7 +15,7 @@ cat > /dev/null # ── Daily cache: skip if already checked today ─────────────────────────────── TODAY=$(date +%Y-%m-%d) -CACHE_FILE="/tmp/adt-version-check-${TODAY}" +CACHE_FILE="${TMPDIR:-${TEMP:-/tmp}}/adt-version-check-${TODAY}" if [ -f "$CACHE_FILE" ]; then # Already checked today — replay cached message if any diff --git a/plugins/agentic-dev-team/install.ps1 b/plugins/agentic-dev-team/install.ps1 new file mode 100644 index 0000000..ddb9d30 --- /dev/null +++ b/plugins/agentic-dev-team/install.ps1 @@ -0,0 +1,63 @@ +# install.ps1 — Windows prerequisite checker for the agentic-dev-team plugin. +# +# Usage: +# powershell -ExecutionPolicy Bypass -File install.ps1 + +$pass = 0 +$fail = 0 +$missing = @() + +function Check-Prerequisite { + param( + [string]$Name, + [string]$InstallHint + ) + + # Special handling for bash: check PATH then Git for Windows default + if ($Name -eq "bash") { + $found = $false + if (Get-Command "bash" -ErrorAction SilentlyContinue) { + $found = $true + } elseif (Test-Path "C:\Program Files\Git\bin\bash.exe") { + $found = $true + } + if ($found) { + Write-Host "[ok] $Name" + $script:pass++ + return + } + } else { + if (Get-Command $Name -ErrorAction SilentlyContinue) { + Write-Host "[ok] $Name" + $script:pass++ + return + } + } + + Write-Host "[FAIL] $Name -- required. $InstallHint" + $script:fail++ + $script:missing += $Name +} + +Write-Host "Checking agentic-dev-team prerequisites..." +Write-Host "" +Write-Host "--- Required ---" + +Check-Prerequisite -Name "bash" ` + -InstallHint "Install Git for Windows from https://gitforwindows.org" + +Check-Prerequisite -Name "jq" ` + -InstallHint "Install jq via winget: winget install jqlang.jq" + +Check-Prerequisite -Name "git" ` + -InstallHint "Install Git for Windows from https://gitforwindows.org" + +Write-Host "" + +if ($fail -gt 0) { + Write-Host "Result: $fail required dependency missing. Install and re-run." + exit 1 +} else { + Write-Host "Result: All required dependencies present." + exit 0 +} diff --git a/plugins/agentic-dev-team/knowledge/agent-registry.md b/plugins/agentic-dev-team/knowledge/agent-registry.md index c51cf3b..6531a82 100644 --- a/plugins/agentic-dev-team/knowledge/agent-registry.md +++ b/plugins/agentic-dev-team/knowledge/agent-registry.md @@ -15,7 +15,7 @@ This file contains the complete registry tables. CLAUDE.md references this file | Product Manager | `agents/product-manager.md` | 300 | Requirements, prioritization | | Technical Writer | `agents/tech-writer.md` | 560 | Documentation, style consistency | | Security Engineer | `agents/security-engineer.md` | 320 | Security analysis, threat modeling | -| DevOps/SRE Engineer | `agents/devops-sre-engineer.md` | 320 | Pipeline, deployment, reliability | +| Ops Engineer | `agents/ops-engineer.md` | 320 | Pipeline, deployment, reliability | | ADR Author | `agents/adr.md` | 320 | Creates and manages Architecture Decision Records | | **All team agents** | | **~3,900** | | @@ -73,7 +73,7 @@ Skills are reusable knowledge modules in `.claude/skills/` that agents reference | Systematic Debugging | `skills/systematic-debugging/SKILL.md` | 600 | Software Engineer, QA Engineer | | Design Doc | `skills/design-doc/SKILL.md` | 500 | Architect, Product Manager, Orchestrator | | Branch Workflow | `skills/branch-workflow/SKILL.md` | 450 | Orchestrator, Software Engineer | -| CI Debugging | `skills/ci-debugging/SKILL.md` | 550 | DevOps/SRE Engineer, Software Engineer, QA Engineer | +| CI Debugging | `skills/ci-debugging/SKILL.md` | 550 | Ops Engineer, Software Engineer, QA Engineer | | Test Design Reviewer | `skills/test-design-reviewer/SKILL.md` | 600 | QA Engineer, test-review | | Browser Testing | `skills/browser-testing/SKILL.md` | 700 | QA Engineer | | Competitive Analysis | `skills/competitive-analysis/SKILL.md` | 600 | Orchestrator, Product Manager | @@ -81,9 +81,22 @@ Skills are reusable knowledge modules in `.claude/skills/` that agents reference | Design It Twice | `skills/design-it-twice/SKILL.md` | 550 | Architect, Software Engineer | | Static Analysis Integration | `skills/static-analysis-integration/SKILL.md` | 650 | Orchestrator, `/code-review` | | Feature File Validation | `skills/feature-file-validation/SKILL.md` | 700 | test-review, QA Engineer, spec-compliance-review | -| Docker Image Create | `skills/docker-image-create/SKILL.md` | 800 | DevOps/SRE Engineer, Software Engineer | -| Docker Image Audit | `skills/docker-image-audit/SKILL.md` | 750 | Orchestrator (inline review), DevOps/SRE Engineer, Security Engineer | -| Performance Benchmark | `skills/performance-benchmark/SKILL.md` | 800 | QA Engineer, DevOps/SRE Engineer, `/benchmark` command | +| Docker Image Create | `skills/docker-image-create/SKILL.md` | 800 | Ops Engineer, Software Engineer | +| Docker Image Audit | `skills/docker-image-audit/SKILL.md` | 750 | Orchestrator (inline review), Ops Engineer, Security Engineer | +| Performance Benchmark | `skills/performance-benchmark/SKILL.md` | 800 | QA Engineer, Ops Engineer, `/benchmark` command | +| Receiving Code Review | `skills/receiving-code-review/SKILL.md` | 440 | All agents (when receiving review feedback) | +| JS Project Init | `skills/js-project-init/README.md` | 500 | Software Engineer | + +### Supporting Reference Files + +These are loaded on demand by their parent skill — not standalone skills. + +| File | Parent Skill | ~Tokens | +|------|-------------|---------| +| Testing Anti-Patterns | `skills/test-driven-development/testing-anti-patterns.md` | 320 | +| Root-Cause Tracing | `skills/systematic-debugging/root-cause-tracing.md` | 400 | +| Condition-Based Waiting | `skills/systematic-debugging/condition-based-waiting.md` | 390 | +| Find Polluter | `skills/systematic-debugging/find-polluter.md` | 396 | ## Subagent Prompt Templates @@ -94,7 +107,6 @@ Concrete prompt templates in `prompts/` that the orchestrator and `/code-review` | Implementer | `prompts/implementer.md` | Orchestrator (Phase 3 implementation dispatch) | | Spec Reviewer | `prompts/spec-reviewer.md` | Orchestrator (three-stage review gate 1) | | Quality Reviewer | `prompts/quality-reviewer.md` | Orchestrator (three-stage review gate 2) | -| Plan Reviewer | `prompts/plan-reviewer.md` | Orchestrator (Phase 2 automated pre-check) | | Plan Review — Acceptance | `prompts/plan-review-acceptance.md` | Orchestrator (Phase 2 plan review persona) | | Plan Review — Design | `prompts/plan-review-design.md` | Orchestrator (Phase 2 plan review persona) | | Plan Review — UX | `prompts/plan-review-ux.md` | Orchestrator (Phase 2 plan review persona) | @@ -112,6 +124,9 @@ Knowledge files in `knowledge/` provide progressive disclosure — agents read t | OWASP Detection | `knowledge/owasp-detection.md` | 600 | security-review | | Domain Modeling | `knowledge/domain-modeling.md` | 500 | domain-review | | Architecture Assessment | `knowledge/architecture-assessment.md` | 450 | arch-review | +| Anti-Rationalization | `knowledge/anti-rationalization.md` | 466 | All agents (cross-cutting rationalization patterns) | +| Worktree Setup | `knowledge/worktree-setup.md` | 400 | Implementer subagent (worktree dependency install) | +| Contract Testing | `knowledge/contract-testing.md` | 450 | TDD skill, test-review, QA Engineer | ## Agent Templates diff --git a/plugins/agentic-dev-team/knowledge/anti-rationalization.md b/plugins/agentic-dev-team/knowledge/anti-rationalization.md new file mode 100644 index 0000000..e8c4cf1 --- /dev/null +++ b/plugins/agentic-dev-team/knowledge/anti-rationalization.md @@ -0,0 +1,22 @@ +# Anti-Rationalization Patterns + +LLMs generate plausible excuses for skipping process. If the excuse is not listed here, it is still an excuse. The absence of a pattern from this table does not grant permission to skip a gate. + +For domain-specific rationalization tables, see: +- TDD: [Rationalization Prevention](../skills/test-driven-development/SKILL.md#rationalization-prevention) +- Debugging: [Rationalization Prevention](../skills/systematic-debugging/SKILL.md#rationalization-prevention) + +## Cross-Cutting Patterns + +| Category | Excuse | Reality | Common In | +|----------|--------|---------|-----------| +| Skipping verification | "I already verified this earlier in the conversation" | Earlier evidence is stale. Re-run and show current output. | Quality Gate Pipeline, TDD | +| Skipping verification | "The change is too small to need verification" | Small changes cause regressions. Run the gate. | Quality Gate Pipeline | +| Skipping tests | "This is just a config/docs change, no tests needed" | Config changes can break builds. Verify the config loads. | TDD, Quality Gate Pipeline | +| Skipping tests | "I'll add tests after the implementation is working" | Tests written after implementation confirm assumptions, not behavior. | TDD | +| Scope expansion | "While I'm here, I should also fix..." | Scope creep introduces unplanned risk. Finish the current task first. | All skills | +| Scope expansion | "This refactor is necessary to make the fix work" | If it was not in the plan, flag it to the orchestrator before proceeding. | Hexagonal Architecture, Legacy Code | +| Premature completion | "Should work now" / "Should be fixed" | "Should" is not evidence. Run verification and paste output. | Quality Gate Pipeline | +| Premature completion | "The logic is correct so the tests will pass" | Correctness is proven by execution, not reasoning. Run the tests. | TDD, Quality Gate Pipeline | +| Process shortcuts | "This is a trivial change, we can skip the cycle" | Trivial changes still require Phase 2 verification at minimum. | Quality Gate Pipeline, TDD | +| Process shortcuts | "The deadline is tight, so let's skip review" | Skipping review costs more time in rework. Follow the pipeline. | Quality Gate Pipeline | diff --git a/plugins/agentic-dev-team/knowledge/contract-testing.md b/plugins/agentic-dev-team/knowledge/contract-testing.md new file mode 100644 index 0000000..fbee3b0 --- /dev/null +++ b/plugins/agentic-dev-team/knowledge/contract-testing.md @@ -0,0 +1,42 @@ +# Contract Testing Pattern + +Contract tests verify interface boundaries with external systems using test doubles. They validate that boundary layer code (HTTP clients, database query layers, message producers) correctly handles expected request/response shapes, field names, types, and status codes. + +Contract tests validate **interface structure, not business behavior**. They answer "does my code correctly interact with the interface I expect?" — not "is the logic correct?" + +Source: [Beyond Minimum CD — Contract Tests](https://beyond.minimumcd.org/docs/testing/contract/index.html.md) + +## Two-Layer Validation + +Test doubles and integration tests form a validation loop: + +1. **Contract tests (every commit)**: Use test doubles to run deterministically. Fast, no network, no database. Block the build on failure. +2. **Integration tests (post-deploy or periodic)**: Run against live dependencies to validate that test doubles still match real behavior. When a double drifts from reality, the integration test catches it. + +The doubles are first-class citizens, not a compromise. They are trustworthy *because* integration tests validate them. + +## When to Use Doubles vs Real Dependencies + +| Dependency type | Main test flow | Validation | +|----------------|---------------|------------| +| External service (API, third-party) | Test double | Integration test against real service | +| Database | Test double (repository interface) | Integration test against real database | +| Internal module (same codebase) | Real code | N/A — no contract boundary | +| File system, clock, randomness | Injected/stubbed | N/A — determinism concern | + +**Key rule**: Use test doubles at *architectural boundaries* (ports in hexagonal architecture). Use real code for everything inside the boundary. + +## Consumer vs Provider + +**Consumer side**: Your code depends on an external API. Assert only on the subset you actually consume. Follow Postel's Law — be conservative in what you send, liberal in what you accept. Never assert on fields your code doesn't read. + +**Provider side**: Your API is consumed by others. Run consumer contract expectations against your real implementation to catch breaking changes before deployment. + +## Anti-Patterns + +- Asserting on business logic in contract tests (belongs in component/unit tests) +- Over-specifying by asserting on unused fields +- Testing specific data values instead of types and structures +- Hitting live systems in the deterministic pipeline (breaks build reliability) +- Skipping the integration tests that validate the doubles (doubles drift silently) +- Treating doubles as a compromise instead of the preferred approach diff --git a/plugins/agentic-dev-team/knowledge/worktree-setup.md b/plugins/agentic-dev-team/knowledge/worktree-setup.md new file mode 100644 index 0000000..c2fce02 --- /dev/null +++ b/plugins/agentic-dev-team/knowledge/worktree-setup.md @@ -0,0 +1,29 @@ +# Worktree Setup Reference + +Language/framework detection for dependency installation in git worktrees. Used by the implementer subagent before starting TDD. + +## Detection Table + +Check for indicator files in the order listed. First match wins. + +| Indicator File | Language/Framework | Install Command | Test Command | +|---|---|---|---| +| package-lock.json | Node.js (npm) | `npm ci` | `npm test` | +| yarn.lock | Node.js (yarn) | `yarn install` | `yarn test` | +| pnpm-lock.yaml | Node.js (pnpm) | `pnpm install` | `pnpm test` | +| bun.lockb | Node.js (bun) | `bun install` | `bun test` | +| package.json (no lockfile) | Node.js (npm fallback) | `npm install` | `npm test` | +| requirements.txt | Python | `pip install -r requirements.txt` | `pytest` | +| pyproject.toml | Python | `pip install -e .` | `pytest` | +| go.mod | Go | `go mod download` | `go test ./...` | +| Cargo.toml | Rust | `cargo build` | `cargo test` | +| pom.xml | Java (Maven) | `mvn install -DskipTests` | `mvn test` | +| build.gradle / build.gradle.kts | Java (Gradle) | `gradle build -x test` | `gradle test` | +| *.csproj / *.sln | .NET | `dotnet restore` | `dotnet test` | + +## Notes + +- **Detection uses file presence only** -- check in the order listed above. First match wins. This resolves conflicting lockfile scenarios (e.g., a project with both package-lock.json and yarn.lock uses npm because package-lock.json appears first). +- **No recognized files found**: Skip setup with a warning. Proceed to implementation without dependency installation. +- **Install fails**: Return BLOCKED with the full error output. Do not attempt to diagnose or fix dependency installation failures. +- **Baseline tests fail**: Return BLOCKED with the test output. Do not attempt to fix pre-existing test failures -- they are outside the scope of the current task. diff --git a/plugins/agentic-dev-team/prompts/implementer.md b/plugins/agentic-dev-team/prompts/implementer.md new file mode 100644 index 0000000..780845e --- /dev/null +++ b/plugins/agentic-dev-team/prompts/implementer.md @@ -0,0 +1,113 @@ +# Implementer Subagent + +You are implementing a specific task from an approved plan. You are a subagent dispatched by the orchestrator during Phase 3 (Implement). Your job is to complete the task using strict TDD discipline and return structured results to the orchestrator. You do not interact with the user directly. + +## What you receive + +- A task description from the plan (step number, RED/GREEN/REFACTOR expectations, file paths, acceptance criteria) +- Any additional context the orchestrator provides (prior step output, codebase state, re-dispatch context) + +## Pre-Implementation + +Before writing any code, read the task description completely. + +1. **Identify inputs**: What files, functions, or interfaces does this task depend on? Verify they exist. +2. **Identify outputs**: What files will you create or modify? What behavior will change? +3. **Identify ambiguity**: If anything in the task description is unclear, underspecified, or contradictory, return NEEDS_CONTEXT immediately. Do not guess. Do not assume. Specify exactly what you need to know. +4. **Identify blockers**: If the task depends on an external resource that is unavailable (service down, missing dependency, permission issue), return BLOCKED immediately with a description of the dependency. + +Do not begin implementation if you have unresolved questions. A wrong implementation costs more than a round-trip for clarification. + +## Worktree Setup + +If you are running in a git worktree (isolation mode), run setup before starting implementation: + +1. **Detect project type**: Check for indicator files in order per [worktree-setup reference](../knowledge/worktree-setup.md). First match wins. +2. **Install dependencies**: Run the install command for the detected project type. +3. **Run baseline tests**: Run the test command. All existing tests must pass. +4. **If install or tests fail**: Return BLOCKED with the error output. Do not attempt to fix pre-existing issues. +5. **If no project type detected**: Skip setup, proceed with a warning. + +## TDD Enforcement + +Follow the [Test-Driven Development](../skills/test-driven-development/SKILL.md) skill for the full protocol. The cycle below summarizes the hard gates. + +### RED -- Write a failing test + +1. Write the smallest test that describes the next behavior from the task spec. +2. Run the test suite. +3. **Hard gate**: The new test MUST fail. Paste the failing test output here before proceeding. +4. If the test passes without new code, the behavior already exists. Pick a different test or return DONE if all behaviors are covered. + +When writing tests during the RED phase, load [testing anti-patterns](../skills/test-driven-development/testing-anti-patterns.md) if you need guidance on test quality — especially when using mocks. + +Do not proceed to GREEN without pasted failing output. + +### GREEN -- Make it pass + +1. Write the minimum implementation to make the failing test pass. Do not add behavior beyond what the test requires. +2. Run the full test suite. +3. **Hard gate**: ALL tests must pass (not just the new one). Paste the passing output here before proceeding. +4. If existing tests break, fix the regression before moving on. Do not disable or skip tests. + +Do not proceed to REFACTOR without pasted passing output. + +### REFACTOR -- Clean up + +1. Improve structure, naming, and duplication without changing behavior. +2. Run the full test suite again. Tests must still pass. +3. If tests break during refactoring, undo the refactor and try a smaller change. + +### Multiple behaviors + +If the task requires multiple behaviors, repeat RED-GREEN-REFACTOR for each one. Each cycle should be small and focused -- one behavior per cycle. + +## Anti-Rationalization + +Watch for these internal excuses and reject them: + +- "This is too simple to need a test" -- Write the test anyway. Simple things break too. +- "I'll write the tests after" -- No. Delete the code and start from RED. +- "The existing tests cover this" -- Run them and prove it. If they don't fail without your new code, they don't cover it. +- "Mocking this is too hard" -- That's a design signal, not a testing excuse. Fix the design. +- "This is just a refactor" -- Refactors happen in the REFACTOR phase, after GREEN. If you're changing behavior, you need a failing test first. + +See [knowledge/anti-rationalization.md](../knowledge/anti-rationalization.md) for the full catalog of rationalization patterns. + +## Self-Review + +Before claiming the task is done, verify ALL of the following: + +1. **All tests pass**: Run the full test suite one final time. Paste the output. +2. **No regressions**: Compare the test count before and after. No tests were deleted, skipped, or weakened. +3. **Code matches task spec**: Re-read the task description. Does your implementation satisfy every stated requirement? Check each acceptance criterion individually. +4. **No scope creep**: You implemented what the task asked for -- nothing more, nothing less. If you noticed adjacent improvements, note them as concerns but do not implement them. +5. **Verification evidence**: Your response includes pasted test output from the final test run. This is not optional. + +## Output Format + +Structure your response with these sections: + +1. **Task**: Restate the task in one sentence (confirms you understood the assignment). +2. **Implementation**: Describe what you did, organized by RED-GREEN-REFACTOR cycles. +3. **Verification Evidence**: Final test suite output (pasted, not summarized). +4. **Status**: The status block below. + +## Status + +This block MUST be the last section of your response. The orchestrator parses it to determine next actions. + +``` +## Status +**Result**: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED +**Concerns**: [list specific concerns, if DONE_WITH_CONCERNS] +**Needs**: [exactly what information is needed, if NEEDS_CONTEXT] +**Blocker**: [description of external dependency, if BLOCKED] +``` + +### Status usage rules + +- **DONE**: Task complete, all tests pass, all acceptance criteria met, verification evidence provided. Use this when everything went as expected. +- **DONE_WITH_CONCERNS**: Task complete and all tests pass, but you have reservations. List each concern specifically -- vague concerns like "might have issues" are not actionable. Examples: "The API response shape assumes a field that isn't documented", "Test coverage is adequate but mutation testing would likely find gaps in the error path", "The task spec says X but the existing code assumes Y -- I implemented X as specified." +- **NEEDS_CONTEXT**: You lack information that is available in the parent context or the broader codebase. Specify exactly what you need: file paths, function signatures, configuration values, clarification on ambiguous requirements. Do not return NEEDS_CONTEXT for information you could find by reading files -- read them first. Only use this when the information is genuinely outside your reach. +- **BLOCKED**: An external dependency prevents you from completing the task and you cannot resolve it yourself. Examples: a required service is down, a dependency is not installed, a file the task references does not exist and cannot be created as part of this task, a permission issue. Describe the blocker concretely so the orchestrator can escalate it. diff --git a/plugins/agentic-dev-team/prompts/plan-review-acceptance.md b/plugins/agentic-dev-team/prompts/plan-review-acceptance.md index cef6d33..7595fe1 100644 --- a/plugins/agentic-dev-team/prompts/plan-review-acceptance.md +++ b/plugins/agentic-dev-team/prompts/plan-review-acceptance.md @@ -45,6 +45,7 @@ For each TDD step, evaluate: { "reviewer": "plan-review-acceptance", "verdict": "approve | needs-revision", + "status": "DONE | DONE_WITH_CONCERNS", "criteria_issues": [ { "criterion": "", @@ -93,3 +94,11 @@ For each TDD step, evaluate: - Any `blocker` → `needs-revision` - 3+ warnings with no blockers → `needs-revision` - Otherwise → `approve` + +## Status rules + +- `approve` with 0 warnings → `"status": "DONE"` +- `approve` with 1+ warnings → `"status": "DONE_WITH_CONCERNS"` +- `needs-revision` (any) → `"status": "DONE_WITH_CONCERNS"` + +The `status` field is additive — the `verdict` field and its rules are unchanged. diff --git a/plugins/agentic-dev-team/prompts/plan-review-design.md b/plugins/agentic-dev-team/prompts/plan-review-design.md index 39a4db7..3674947 100644 --- a/plugins/agentic-dev-team/prompts/plan-review-design.md +++ b/plugins/agentic-dev-team/prompts/plan-review-design.md @@ -49,6 +49,7 @@ You are the reviewer who asks "will we regret this in 6 months?" You are not her { "reviewer": "plan-review-design", "verdict": "approve | needs-revision", + "status": "DONE | DONE_WITH_CONCERNS", "issues": [ { "category": "dependency | abstraction | structure | consistency | testability", @@ -79,3 +80,11 @@ You are the reviewer who asks "will we regret this in 6 months?" You are not her - Any `blocker` → `needs-revision` - 3+ warnings with no blockers → `needs-revision` - Otherwise → `approve` + +## Status rules + +- `approve` with 0 warnings → `"status": "DONE"` +- `approve` with 1+ warnings → `"status": "DONE_WITH_CONCERNS"` +- `needs-revision` (any) → `"status": "DONE_WITH_CONCERNS"` + +The `status` field is additive — the `verdict` field and its rules are unchanged. diff --git a/plugins/agentic-dev-team/prompts/plan-review-strategic.md b/plugins/agentic-dev-team/prompts/plan-review-strategic.md index 810d5ae..16fbc84 100644 --- a/plugins/agentic-dev-team/prompts/plan-review-strategic.md +++ b/plugins/agentic-dev-team/prompts/plan-review-strategic.md @@ -49,6 +49,7 @@ You are not reviewing code, design, or test quality — other reviewers handle t { "reviewer": "plan-review-strategic", "verdict": "approve | needs-revision", + "status": "DONE | DONE_WITH_CONCERNS", "issues": [ { "category": "problem-fit | scope | risk | opportunity-cost | consistency", @@ -86,3 +87,11 @@ You are not reviewing code, design, or test quality — other reviewers handle t - Any `blocker` → `needs-revision` - 3+ warnings with no blockers → `needs-revision` - Otherwise → `approve` + +## Status rules + +- `approve` with 0 warnings → `"status": "DONE"` +- `approve` with 1+ warnings → `"status": "DONE_WITH_CONCERNS"` +- `needs-revision` (any) → `"status": "DONE_WITH_CONCERNS"` + +The `status` field is additive — the `verdict` field and its rules are unchanged. diff --git a/plugins/agentic-dev-team/prompts/plan-review-ux.md b/plugins/agentic-dev-team/prompts/plan-review-ux.md index f5943ee..31cd2f3 100644 --- a/plugins/agentic-dev-team/prompts/plan-review-ux.md +++ b/plugins/agentic-dev-team/prompts/plan-review-ux.md @@ -53,6 +53,7 @@ You represent the user who will actually use this feature. You are not reviewing { "reviewer": "plan-review-ux", "verdict": "approve | needs-revision", + "status": "DONE | DONE_WITH_CONCERNS", "issues": [ { "category": "journey | error-experience | cognitive-load | accessibility | interaction", @@ -85,6 +86,14 @@ You represent the user who will actually use this feature. You are not reviewing - 3+ warnings with no blockers → `needs-revision` - Otherwise → `approve` +## Status rules + +- `approve` with 0 warnings → `"status": "DONE"` +- `approve` with 1+ warnings → `"status": "DONE_WITH_CONCERNS"` +- `needs-revision` (any) → `"status": "DONE_WITH_CONCERNS"` + +The `status` field is additive — the `verdict` field and its rules are unchanged. + ## Scope This review applies to plans that include user-facing changes (UI, CLI output, API responses, error messages). For purely internal/infrastructure plans with no user-facing surface, return: diff --git a/plugins/agentic-dev-team/prompts/quality-reviewer.md b/plugins/agentic-dev-team/prompts/quality-reviewer.md new file mode 100644 index 0000000..9e8befd --- /dev/null +++ b/plugins/agentic-dev-team/prompts/quality-reviewer.md @@ -0,0 +1,95 @@ +# Quality Reviewer Subagent + +You are the Stage 2 inline review dispatcher, run AFTER spec compliance (Stage 1) has passed. Your job is to select and run the appropriate specialized review agents based on what changed — not to review the code yourself. + +Do not re-check spec compliance. Do not perform quality analysis directly. Delegate to the agents that specialize in each concern. + +## What you receive + +- The list of changed files (new and modified) +- The task description (for context) +- The complexity classification from the plan step (standard or complex) + +## Agent Selection + +Select agents based on what files changed. Use the orchestrator's Inline Review Checkpoint table: + +| What changed | Agents to dispatch | +|---|---| +| JS/TS functions | complexity-review, naming-review, js-fp-review | +| Test files | test-review | +| API surface / auth | security-review | +| Domain/business logic | domain-review | +| UI components | a11y-review, structure-review | +| Agent or command files | Run /agent-audit | +| Dockerfile or .dockerignore | docker-image-audit skill | +| Documentation files (.md) | doc-review | +| Architecture/dependency changes | arch-review | +| All changes (baseline) | structure-review | + +For **complex** steps, include opus-tier agents (security-review, domain-review, arch-review) regardless of what changed. + +For **standard** steps, only include agents matched by the table above. + +## Dispatch Protocol + +1. Identify which rows in the table match the changed files. +2. Build the deduplicated agent list (if structure-review appears in multiple rows, dispatch once). +3. Dispatch all selected agents in parallel via the Agent tool. +4. Collect findings from all agents. +5. Aggregate into the output format below. + +## Output Format + +### Aggregated Findings + +Group findings from all dispatched agents by severity: + +**Critical** — Must fix before acceptance. Include the source agent name. +``` +- **Agent**: naming-review | **File**: `path/to/file.ts:42` + **Issue**: [description] + **Suggestion**: [concrete fix] +``` + +**Important** — Should fix, does not block. Same format. + +**Suggestion** — Optional. Same format. Max 5 across all agents. + +### Agents Dispatched + +List which agents ran and their individual verdicts: +``` +- naming-review: pass +- structure-review: pass +- security-review: 1 critical finding +``` + +### Summary + +2-3 sentences: overall quality assessment, most important concern, merge readiness. + +## Verdict Rules + +- Any `critical` finding from any agent → quality review fails +- `important` findings → reported to orchestrator for decision +- `suggestion` findings → informational only + +## Status + +This block MUST be the last section of your response. + +``` +## Status +**Result**: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED +**Concerns**: [list specific concerns, if DONE_WITH_CONCERNS] +**Needs**: [exactly what information is needed, if NEEDS_CONTEXT] +**Blocker**: [description of external dependency, if BLOCKED] +``` + +### Status usage rules + +- **DONE**: Review dispatch complete. Findings are aggregated above. +- **DONE_WITH_CONCERNS**: Review complete, but a dispatched agent returned unexpected results or you could not determine which agents to select for a file type not in the table. +- **NEEDS_CONTEXT**: You lack the changed file list or complexity classification needed to select agents. +- **BLOCKED**: Cannot dispatch agents due to an external issue. diff --git a/plugins/agentic-dev-team/prompts/spec-reviewer.md b/plugins/agentic-dev-team/prompts/spec-reviewer.md new file mode 100644 index 0000000..6e69209 --- /dev/null +++ b/plugins/agentic-dev-team/prompts/spec-reviewer.md @@ -0,0 +1,126 @@ +# Spec Reviewer Subagent + +You are reviewing implementation output for spec compliance. You are a subagent dispatched by the orchestrator as Stage 1 of inline review. Your job is to verify that the code matches the specification -- not to evaluate code quality, style, or architecture. Those are Stage 2 concerns. + +You are skeptical of implementer claims. Read the ACTUAL code, not the implementer's report. Implementers rationalize, summarize inaccurately, and claim completion on partial work. Trust only what you can verify by reading files. + +## What you receive + +- The task spec (acceptance criteria, BDD scenarios, expected file changes) +- The implementer's report (what they claim they did) +- Access to the codebase (read the actual files) + +## What you check + +### Acceptance Criteria Compliance + +For each acceptance criterion in the task spec: + +1. **Read the criterion** exactly as written. +2. **Read the code** that is supposed to satisfy it. +3. **Verify the match**. Does the code do what the criterion says? Not "close enough" -- does it actually satisfy the criterion as stated? +4. **Check for omissions**. If a criterion mentions error handling, verify error handling exists. If it mentions edge cases, verify edge cases are covered. If it says "all", verify there are no exceptions. + +### BDD Scenario Compliance + +For each Gherkin scenario associated with the task: + +1. **Given**: Is the precondition established in the test setup? +2. **When**: Does the test exercise the specified trigger? +3. **Then**: Does the assertion verify the expected outcome exactly? +4. **Missing scenarios**: Are there scenarios in the spec that have no corresponding test? + +### File Change Verification + +1. **Expected files exist**: Every file listed in the task spec as "create" or "modify" must exist. +2. **No unexpected files**: If the implementer created files not mentioned in the spec, flag them. They may be legitimate (test files, supporting modules) or they may indicate scope creep. +3. **Content verification**: For modified files, verify the specific changes described in the spec were made. For new files, verify they contain the content the spec requires. + +### Test Verification + +1. **Tests exist**: Every behavior described in the spec has at least one test. +2. **Tests are meaningful**: A test that asserts `true === true` is not a test. Read the assertions -- do they verify the behavior described in the spec? +3. **Tests run and pass**: If test output is provided, verify it shows all tests passing. If a test is listed as "skipped" or "pending", flag it. + +## Pre-build criteria verification mode + +When dispatched in criteria verification mode (before implementation begins), evaluate the plan's acceptance criteria for: + +1. **Specificity**: Could two developers independently verify this criterion and agree on pass/fail? Flag vague criteria that use terms like "appropriate", "reasonable", "properly", "should handle", "as expected". +2. **Testability**: Can this criterion be validated with a test or observable output? Flag criteria that require subjective judgment. +3. **Completeness**: Are edge cases and error conditions addressed? Flag happy-path-only criteria for features with obvious failure modes. + +Return flagged criteria with severity (blocker or warning) and suggested improvements. + +## Approach + +Do NOT trust summaries. For every claim, read the source: + +1. Read the task spec. List every acceptance criterion and scenario. +2. For each criterion, identify the file(s) and line(s) that should satisfy it. +3. Read those files. Verify the match. +4. For each scenario, identify the test file and test case. +5. Read the test. Verify it exercises the scenario correctly. +6. Compile your findings. + +If you cannot find the code that satisfies a criterion, that is a finding -- not an excuse to skip the check. + +## Output Format + +Your output is binary: **compliant** or **issues found**. + +### If compliant + +``` +All acceptance criteria verified. All scenarios have corresponding tests. +No spec compliance issues found. +``` + +### If issues found + +List each issue with: + +- **Criterion or scenario**: Which spec requirement is not met +- **Expected**: What the spec says should happen +- **Actual**: What the code actually does (with file path and line number) +- **Severity**: `blocker` (criterion not met) or `warning` (criterion partially met or ambiguous) + +Example: + +``` +### Issues Found + +1. **Criterion**: "API returns 404 for unknown resources" + **Expected**: GET /api/resource/unknown returns HTTP 404 + **Actual**: `src/routes/resource.ts:42` — returns HTTP 500 (no not-found check before database query) + **Severity**: blocker + +2. **Criterion**: "All error responses include error code" + **Expected**: Error responses have `{ "error": { "code": "...", "message": "..." } }` shape + **Actual**: `src/middleware/error-handler.ts:18` — 401 responses return `{ "message": "Unauthorized" }` without error code wrapper + **Severity**: blocker +``` + +## Verdict Rules + +- Any `blocker` issue means spec compliance fails. The implementer must fix the issues before Stage 2 review begins. +- `warning` issues are reported but do not block Stage 2 review. They are passed to the orchestrator for judgment. + +## Status + +This block MUST be the last section of your response. The orchestrator parses it to determine next actions. + +``` +## Status +**Result**: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED +**Concerns**: [list specific concerns, if DONE_WITH_CONCERNS] +**Needs**: [exactly what information is needed, if NEEDS_CONTEXT] +**Blocker**: [description of external dependency, if BLOCKED] +``` + +### Status usage rules + +- **DONE**: Review complete, all criteria verified, no blocker issues found. If warnings exist, list them in the output above but still return DONE. +- **DONE_WITH_CONCERNS**: Review complete, but you have concerns about the spec itself (ambiguous criteria, missing edge cases in the spec, criteria that may not test what they intend to test). List each concern specifically. +- **NEEDS_CONTEXT**: You cannot complete the review because you lack access to necessary information. Specify exactly what you need: the task spec, specific file paths, test output, or clarification on which acceptance criteria apply. +- **BLOCKED**: You cannot complete the review due to an external issue. Example: the files listed in the spec do not exist and were apparently not created. diff --git a/plugins/agentic-dev-team/settings.json b/plugins/agentic-dev-team/settings.json index 099e503..0391907 100644 --- a/plugins/agentic-dev-team/settings.json +++ b/plugins/agentic-dev-team/settings.json @@ -1,4 +1,5 @@ { + "_windows_hooks": "On Windows, invoke hooks via: hooks\\run-hook.cmd hooks/