From 31c94f843cc905ee8b494c06d3f4e9dc50f1cc73 Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Thu, 16 Apr 2026 09:31:56 -0700 Subject: [PATCH 01/15] docs: add competitive analysis and specs for superpowers gap closure Competitive analysis comparing agentic-dev-team against obra/superpowers identified gaps in anti-rationalization techniques, code review reception, skill authoring depth, debugging references, subagent status codes, TDD depth, and worktree setup. Specs cover 7 core implementation slices (A-G) plus 3 platform support slices (H1-H3: Windows hooks, Gemini CLI, OpenAI Codex). Multi-platform research document recommends graceful degradation with platform-specific manifests reusing existing skills. Removes completed/obsolete spec files from prior work. --- .../multi-platform-support-suggestions.md | 126 ++ docs/specs/plugin-restructure.md | 108 -- docs/specs/pre-commit-review.md | 95 -- docs/specs/release-please.md | 91 -- docs/specs/specs-persist-output-to-file.md | 90 -- docs/specs/superpowers-gap-closure.md | 1028 +++++++++++++++++ reports/competitive-analysis-2026-04-15.md | 251 ++++ 7 files changed, 1405 insertions(+), 384 deletions(-) create mode 100644 docs/specs/multi-platform-support-suggestions.md delete mode 100644 docs/specs/plugin-restructure.md delete mode 100644 docs/specs/pre-commit-review.md delete mode 100644 docs/specs/release-please.md delete mode 100644 docs/specs/specs-persist-output-to-file.md create mode 100644 docs/specs/superpowers-gap-closure.md create mode 100644 reports/competitive-analysis-2026-04-15.md diff --git a/docs/specs/multi-platform-support-suggestions.md b/docs/specs/multi-platform-support-suggestions.md new file mode 100644 index 0000000..464c056 --- /dev/null +++ b/docs/specs/multi-platform-support-suggestions.md @@ -0,0 +1,126 @@ +# Multi-Platform Support: Suggestions (Research Only) + +**Date**: 2026-04-16 +**Source**: Competitive analysis against [obra/superpowers](https://github.com/obra/superpowers) which supports 6 platforms. +**Status**: Research document — no implementation planned without further decision. + +## Current State + +A platform dependency audit found that **97.4% of plugin files (151 of ~155) have Claude Code-specific dependencies**. Only 4 knowledge files are fully portable as pure markdown: + +- `knowledge/review-rubric.md` +- `knowledge/owasp-detection.md` +- `knowledge/domain-modeling.md` +- `knowledge/architecture-assessment.md` + +### Claude Code Features We Depend On + +| Feature | Files affected | Portability barrier | +|---------|---------------|-------------------| +| **Agent tool** (subagent dispatch with model override, `isolation: "worktree"`) | Orchestrator, build, triage commands | No equivalent in Cursor, Codex, Gemini CLI. This is the hardest barrier. | +| **Hooks** (PreToolUse, PostToolUse) | 8 bash scripts + settings.json | No hook system in other platforms. Would need middleware. | +| **`allowed-tools:` frontmatter** (scoped tool permissions) | All 29 command files | Claude Code security feature with no cross-platform equivalent. | +| **`tools:` frontmatter** in agents | All 30 agent files | Tool names (Read, Write, Edit, Glob, Grep, Bash, Agent) are Claude Code-specific. | +| **`model:` frontmatter** in agents | All 30 agent files + 9 templates | Model names (haiku, sonnet, opus) map to Claude models. Other platforms may use different models. | +| **Plugin manifest** (`.claude-plugin/plugin.json`) | 1 file | Different format per platform. | +| **Skill frontmatter** (`user-invocable`, `role`) | All 32 skill files | Claude Code skill discovery mechanism. | + +## How Superpowers Does It + +superpowers supports 6 platforms with these strategies: + +1. **Platform-specific manifests**: `.claude-plugin/plugin.json`, `.cursor-plugin/`, `.codex/`, `.opencode/`, `gemini-extension.json`, `AGENTS.md` (Copilot). Each manifest points to the same skill files. + +2. **SessionStart hook with platform detection**: A single hook fires on session start, detects which platform is running, and injects skill awareness. The hook has platform-specific variants (`hooks.json` for Claude Code, `hooks-cursor.json` for Cursor). + +3. **Skills as the portable unit**: Skills are plain markdown with YAML frontmatter. The frontmatter is minimal (name, description). No `tools:`, `model:`, or `allowed-tools:` — those concepts don't exist in superpowers' skills. + +4. **Graceful degradation**: The `executing-plans` skill is a fallback for platforms without subagent support. Instead of dispatching parallel subagents, the agent executes plan steps inline. superpowers explicitly states: "Superpowers performs significantly better with subagent access." + +5. **Single agent, not a fleet**: superpowers has 1 agent (code-reviewer). Our 30 agents and 19 review agents are a much larger portability surface. + +## Portability Assessment + +### What is portable today (no changes needed) + +| Layer | Files | Notes | +|-------|-------|-------| +| Knowledge files (4) | review-rubric, owasp-detection, domain-modeling, architecture-assessment | Pure markdown reference. Portable as-is. | +| Skill content (32) | All SKILL.md files | The *content* of skills (patterns, guidelines, procedures) is platform-agnostic markdown. The *frontmatter* is Claude Code-specific. | +| Prompt templates (4+) | plan-review-*.md, future implementer/spec/quality | Pure prompt text. Portable as-is. | +| Knowledge file content (2) | agent-registry, review-template | Content is portable but references Claude Code tool names. | + +### What requires an adapter layer + +| Layer | Files | Adaptation needed | +|-------|-------|-----------------| +| Agent files (30) | All agents | Strip `tools:` and `model:` frontmatter for other platforms, or map to platform equivalents. | +| Command files (29) | All commands | Strip `allowed-tools:` for platforms without scoped permissions. Degrade gracefully. | +| Skill frontmatter (32) | All skills | Simplify to name + description only for non-Claude platforms. | +| Plugin manifest (1) | `.claude-plugin/plugin.json` | Create parallel manifests for each platform. | + +### What cannot be ported without redesign + +| Feature | Why | Impact | +|---------|-----|--------| +| **Multi-agent orchestration** (Agent tool + model routing) | Other platforms have no equivalent to dispatching subagents with model override and worktree isolation. | The entire orchestrator workflow (Research → Plan → Implement with parallel subagents, inline review checkpoints, model-routed review agents) would need a fallback. | +| **Hook-based guards** (8 scripts) | Hooks are a Claude Code-specific runtime feature. Pre-tool guards, TDD enforcement, and real-time review checks have no equivalent elsewhere. | Safety rails and automated quality checks would be manual-only. | +| **Scoped tool permissions** | `allowed-tools:` restricts what each command can do. Other platforms don't have this security model. | Commands would have full tool access on other platforms — less secure. | + +## Suggested Approach + +### Option 1: Minimal adapter (Low effort, limited reach) + +Add platform manifests that point to the existing skills and knowledge files. Accept that orchestration, hooks, and review agents don't work on other platforms. The plugin becomes a "knowledge library" on non-Claude platforms. + +**Effort**: Small (create manifest files + SessionStart hook per platform) +**Value**: Low (most of the plugin's value IS the orchestration) + +### Option 2: Graceful degradation (Medium effort, moderate reach) + +Like superpowers' approach: add an `executing-plans` fallback mode where the agent works inline instead of dispatching subagents. Hooks degrade to "manual checklist" instructions. Review agents degrade to a single inline review pass. + +**What this looks like**: +1. Add platform manifests (`.cursor-plugin/`, `.codex/`, etc.) +2. Create a SessionStart hook that detects platform and sets a `PLATFORM` variable +3. Create a `skills/inline-execution/SKILL.md` fallback for platforms without Agent tool +4. Modify the orchestrator to check platform and switch between multi-agent and inline modes +5. Convert hook-based guards to skill-based checklists (manual enforcement) +6. Strip `tools:`, `model:`, `allowed-tools:` from agent/command files on other platforms (or make them optional with sensible defaults) + +**Effort**: Large +**Value**: Moderate (the inline fallback is significantly less capable than the full orchestration) +**Risk**: Maintaining two execution paths (multi-agent + inline) doubles the testing and maintenance burden. Every new feature needs both paths. + +### Option 3: Platform abstraction layer (High effort, full reach) + +Create an abstraction layer that maps platform-specific capabilities to a common interface. Agent dispatch, tool access, and hooks are abstracted behind platform adapters. + +**Effort**: XL +**Value**: High (full functionality on all platforms) +**Risk**: Over-engineering. The abstraction layer becomes its own maintenance burden. Claude Code is our primary platform — optimizing for 5 other platforms that may never have equivalent features is speculative. + +## Recommendation + +**Do not pursue multi-platform support now.** The cost-benefit analysis doesn't justify it: + +- 97% of our files have Claude Code dependencies +- Our core value proposition (multi-agent orchestration, review agent fleet, hook-based guards) depends on Claude Code features that other platforms don't have +- superpowers can be multi-platform because it's a workflow discipline tool (14 skills, 1 agent). We're an orchestration platform (30 agents, 29 commands, 8 hooks). The portability surface is fundamentally different. + +**Revisit when**: +1. Other platforms add subagent dispatch (Cursor is most likely to get this) +2. A significant user base requests it +3. A platform-agnostic agent dispatch standard emerges + +**Quick win available now**: Extract the 4 portable knowledge files + skill content (without frontmatter) into a standalone "reference library" that other tools can consume. This is useful for teams that want our detection patterns and rubrics without the orchestration. Effort: Small. + +## Per-Platform Effort Estimates + +| Platform | Effort | Biggest risk | +|----------|--------|-------------| +| **Cursor** | L | No subagent dispatch; hooks require cursor-specific format; closest to Claude Code in capability | +| **Codex (OpenAI)** | XL | Fundamentally different agent model; AGENTS.md format; no hooks; different model names | +| **OpenCode** | L | Smaller platform; limited docs on plugin capabilities | +| **Gemini CLI** | XL | Different model family entirely (Gemini not Claude); extension format; no subagent dispatch | +| **GitHub Copilot CLI** | XL | Least mature agent platform; minimal plugin system | diff --git a/docs/specs/plugin-restructure.md b/docs/specs/plugin-restructure.md deleted file mode 100644 index 19ba331..0000000 --- a/docs/specs/plugin-restructure.md +++ /dev/null @@ -1,108 +0,0 @@ -# Specification: Plugin Directory Restructure - -**Created**: 2026-04-02 -**Status**: approved - -## Intent Description - -**What**: Restructure the repository to separate marketplace from plugin. Move all plugin components into `plugins/agentic-dev-team/`, move hook registrations into a plugin-level `settings.json`, split CLAUDE.md into a plugin-facing config (ships with plugin) and a dev-facing config (stays at root). Remove `dev-setup.sh` in favor of `claude plugin install --scope project` from the local path. - -**Why**: The repo conflates marketplace and plugin. Dev artifacts ship with the plugin, hooks are in a user config location, and the structure can't host multiple plugins. This follows the recommended marketplace structure and ensures hooks distribute with the plugin. - -**Scope**: Directory restructure + hook registration move + CLAUDE.md split. No behavioral changes to any agent, command, skill, or hook. - -## User-Facing Behavior - -```gherkin -Feature: Plugin directory restructure - - Scenario: Plugin installs from marketplace with correct structure - Given the marketplace.json source points to ./plugins/agentic-dev-team - When a user runs claude plugin install agentic-dev-team - Then the plugin is installed with agents, commands, skills, and hooks - And hook registrations from the plugin's settings.json are active - - Scenario: Hooks are registered via plugin settings.json - Given the plugin has a settings.json at plugins/agentic-dev-team/settings.json - When the plugin is installed - Then PreToolUse and PostToolUse hooks are active - And no .claude/settings.json is required for hook registration - - Scenario: Dev artifacts are not shipped with the plugin - Given docs, plans, evals, and reports directories exist at the marketplace root - When the plugin is installed from plugins/agentic-dev-team - Then only plugin components are installed - And dev artifacts at the marketplace root are not included - - Scenario: Plugin CLAUDE.md ships with plugin - Given plugins/agentic-dev-team/CLAUDE.md contains the orchestration pipeline config - When the plugin is installed - Then users see the pipeline config as project instructions - - Scenario: Dev CLAUDE.md guides plugin development - Given the root CLAUDE.md contains development instructions - When a developer works on the marketplace repo - Then they see instructions for adding agents, testing, and contributing - - Scenario: Local plugin testing without symlinks - Given a developer wants to test plugin changes locally - When they run claude plugin install --scope project ./plugins/agentic-dev-team - Then the plugin is installed from the local path - And dev-setup.sh is no longer needed - - Scenario: release-please updates the correct files - Given release-please config points to plugins/agentic-dev-team/.claude-plugin/plugin.json - When a release PR is created - Then the version is updated in the plugin's plugin.json - And the version is updated in .claude-plugin/marketplace.json at the repo root -``` - -## Architecture Specification - -**Components moved** (root → `plugins/agentic-dev-team/`): -- `agents/`, `commands/`, `skills/`, `hooks/`, `knowledge/`, `prompts/`, `templates/` -- `CLAUDE.md` (pipeline config) → `plugins/agentic-dev-team/CLAUDE.md` -- `install.sh` → `plugins/agentic-dev-team/install.sh` -- `.claude-plugin/plugin.json` → `plugins/agentic-dev-team/.claude-plugin/plugin.json` - -**Components created**: -- `plugins/agentic-dev-team/settings.json` — hook registrations from `.claude/settings.json` -- Root `CLAUDE.md` — rewritten as dev instructions (based on current `.claude/CLAUDE.md`) - -**Components updated at marketplace root**: -- `.claude-plugin/marketplace.json` — source: `"./"` → `"./plugins/agentic-dev-team"` -- `.claude/settings.json` — stripped to `enabledPlugins` only -- `release-please-config.json` — extra-files paths updated -- `README.md` — updated paths and install instructions -- `GETTING-STARTED.md` — updated paths - -**Components removed**: -- `dev-setup.sh` — replaced by `claude plugin install --scope project ./plugins/agentic-dev-team` -- `.claude/CLAUDE.md` — content merged into new root `CLAUDE.md` - -**Components that stay at marketplace root**: -- `.claude-plugin/marketplace.json` -- `docs/`, `plans/`, `evals/`, `reports/`, `memory/` -- `README.md`, `GETTING-STARTED.md`, `LICENSE` -- `release-please-config.json`, `.release-please-manifest.json` -- `.gitignore`, `.github/` - -**release-please extra-files**: -- `plugins/agentic-dev-team/.claude-plugin/plugin.json` (version field) -- `.claude-plugin/marketplace.json` (stays at root, JSONPath `$.plugins[0].version`) - -## Acceptance Criteria - -- [ ] All plugin components live under `plugins/agentic-dev-team/` -- [ ] `marketplace.json` stays at root, source points to `./plugins/agentic-dev-team` -- [ ] `plugins/agentic-dev-team/settings.json` contains all hook registrations -- [ ] `.claude/settings.json` contains only `enabledPlugins` (no hooks) -- [ ] `plugins/agentic-dev-team/.claude-plugin/plugin.json` exists with correct version -- [ ] `plugins/agentic-dev-team/CLAUDE.md` contains the orchestration pipeline config -- [ ] Root `CLAUDE.md` contains development instructions for the repo -- [ ] `release-please-config.json` extra-files point to new plugin.json path -- [ ] Dev artifacts remain at repo root, not shipped with plugin -- [ ] `dev-setup.sh` is removed -- [ ] `.claude/CLAUDE.md` is removed (content in root CLAUDE.md) -- [ ] `git mv` used for moves to preserve history -- [ ] All internal relative paths within the plugin still resolve diff --git a/docs/specs/pre-commit-review.md b/docs/specs/pre-commit-review.md deleted file mode 100644 index a2cfa6c..0000000 --- a/docs/specs/pre-commit-review.md +++ /dev/null @@ -1,95 +0,0 @@ -# Specification: Automated Pre-Commit Code Review - -**Created**: 2026-04-02 -**Status**: approved - -## Intent Description - -**What**: Run the `/code-review --changed` flow automatically before every commit, blocking the commit until review agents have passed. Uses a temp file gate — the review writes a `.review-passed` file with a hash of staged files, and the pre-commit hook only allows commits when the hash matches. - -**Why**: Currently, code review only happens when explicitly invoked. Making it automatic ensures every commit has been reviewed, catching issues before they enter git history. Warnings are surfaced to the user at review time for a case-by-case decision rather than pre-configured as pass or block. - -**Scope**: Pre-commit automation of the existing review flow only. Does not change `/code-review` itself, add new review agents, or modify the review output format. - -## User-Facing Behavior - -```gherkin -Feature: Automated pre-commit code review - - Scenario: First commit attempt is blocked with review instruction - Given changed files are staged for commit - And no .review-passed file exists for the current staged files - When the user or agent attempts to commit - Then the commit is blocked - And the hook instructs Claude to run /code-review --changed - - Scenario: Commit proceeds after passing review - Given Claude has run /code-review --changed and it returned pass or warn - And a .review-passed file exists with a hash matching the staged files - When the user or agent attempts to commit - Then the commit proceeds normally - And the .review-passed file is deleted after the commit - - Scenario: Commit remains blocked after failing review - Given Claude has run /code-review --changed and it returned fail - And no .review-passed file was written - When the user or agent attempts to commit again - Then the commit is blocked - And the failure details are displayed - - Scenario: Staged files change after review passes - Given a .review-passed file exists from a prior review - And the user stages additional files after the review - When the user or agent attempts to commit - Then the commit is blocked because the staged file hash no longer matches - And the hook instructs Claude to re-run /code-review --changed - - Scenario: Warnings are surfaced for human decision - Given the review returned warn status - When the review completes - Then the warning details are displayed to the user - And a .review-passed file is written (warnings do not block) - And the user decides per-warning whether to fix or proceed - - Scenario: Review can be bypassed - Given the user needs to commit without review - When the user commits with --no-verify - Then the commit proceeds without review -``` - -## Architecture Specification - -**Components affected**: -1. `hooks/pre-commit-review.sh` — rewrite as a `PreToolUse` hook on `Bash` that detects `git commit` commands, checks for `.review-passed` with matching staged-file hash, blocks (exit 2) if missing, allows if matching -2. `.claude/settings.json` — add the new hook to `PreToolUse` matcher for `Bash` -3. `/code-review` command (`commands/code-review.md`) — add a step: when invoked with `--changed` and review passes (pass or warn), write `.review-passed` containing the hash of reviewed files -4. `.gitignore` — add `.review-passed` - -**Gate mechanism**: -- Hash = sorted list of staged file paths piped through `shasum` -- `.review-passed` contains the hash string -- Hook computes hash of current staged files and compares to file contents -- Match → allow commit (exit 0). Mismatch or missing → block (exit 2) -- Post-commit: hook or commit success deletes `.review-passed` - -**File scope**: The hook checks all staged files, not just JS/TS. The review agents themselves decide which files are relevant to their scope. - -**Bypass**: `git commit --no-verify` skips all hooks including this one. This is the standard git bypass mechanism. - -**Dependencies**: -- Claude Code `PreToolUse` hook system (exit 2 = block) -- `shasum` (available on macOS and Linux) -- Existing `/code-review --changed --json` command - -## Acceptance Criteria - -- [ ] A `PreToolUse` hook on `Bash` detects `git commit` commands and blocks them (exit 2) when no valid `.review-passed` file exists -- [ ] The hook computes a hash of staged file paths and compares to `.review-passed` contents -- [ ] `/code-review --changed` writes `.review-passed` with the staged-file hash when review returns pass or warn -- [ ] The commit proceeds on the second attempt when `.review-passed` hash matches staged files -- [ ] Staging new files after review invalidates the gate (hash mismatch → re-review required) -- [ ] `.review-passed` is deleted after a successful commit -- [ ] `.review-passed` is in `.gitignore` -- [ ] `git commit --no-verify` bypasses the review gate -- [ ] The old advisory-only `pre-commit-review.sh` is replaced -- [ ] All changed files are checked, not just JS/TS diff --git a/docs/specs/release-please.md b/docs/specs/release-please.md deleted file mode 100644 index e9e6521..0000000 --- a/docs/specs/release-please.md +++ /dev/null @@ -1,91 +0,0 @@ -# Specification: Automated Release Management with release-please - -**Created**: 2026-04-02 -**Status**: approved - -## Intent Description - -**What**: Add a GitHub Actions workflow that uses release-please to automate version bumping, CHANGELOG generation, and GitHub Release creation for the agentic-dev-team plugin repository. - -**Why**: The repository currently has no release process — no git tags, no CHANGELOG, and no GitHub Releases. The husky pre-commit hook auto-increments a patch version on every commit, which produces meaningless version numbers. Adopting release-please will derive version bumps from conventional commit messages, produce a human-readable CHANGELOG, and create tagged GitHub Releases automatically when a release PR is merged. - -**Scope**: Release-please integration only. No other CI workflows. No publishing beyond GitHub Releases. Husky is fully removed since it exists only for the auto-bump hook. - -## User-Facing Behavior - -```gherkin -Feature: Automated release management with release-please - - Scenario: Release PR is created from conventional commits - Given the main branch has new commits since the last release - And at least one commit uses a releasable prefix (feat, fix, refactor, perf) - When a push to main triggers the release-please workflow - Then release-please opens or updates a "release PR" on GitHub - And the PR title contains the next version number - And the PR body contains a generated CHANGELOG - - Scenario: GitHub Release is created when release PR is merged - Given a release-please PR exists and is approved - When the release PR is merged to main - Then a GitHub Release is created with the new version tag - And the release notes contain the CHANGELOG entries - And plugin.json version is updated to match the release version - And marketplace.json version is updated to match the release version - - Scenario: Version bump follows conventional commit semantics - Given commits since the last release include a "feat:" commit - Then the minor version is incremented - Given commits since the last release include only "fix:" commits - Then the patch version is incremented - Given commits since the last release include a breaking change - Then the major version is incremented - - Scenario: Non-releasable commits are excluded from CHANGELOG - Given the main branch has only non-releasable commits (docs, chore, ci) - When a push to main triggers the release-please workflow - Then no release PR is created or updated - And docs commits do not appear in the CHANGELOG - - Scenario: Husky and auto-bump hook are fully removed - When release-please owns versioning - Then the .husky directory no longer exists - And husky is removed from devDependencies - And package.json is removed from the repository - And plugin.json version is only updated by release-please -``` - -## Architecture Specification - -**Components affected**: -1. `.github/workflows/release-please.yml` — new GitHub Actions workflow -2. `release-please-config.json` — configuration (changelog sections, extra-files) -3. `.release-please-manifest.json` — tracks current version, seeded from current plugin.json -4. `.claude-plugin/plugin.json` — version managed by release-please -5. `.claude-plugin/marketplace.json` — version managed by release-please via `extra-files` with JSONPath (`.plugins[0].version`) -6. `.husky/` — entire directory removed -7. `package.json` — removed (only existed for husky devDependency) - -**Interfaces**: -- release-please reads conventional commits from git history on `main` -- release-please writes version to `plugin.json` and `marketplace.json` via `extra-files` -- Trigger: `push` to `main` branch - -**Constraints**: -- Version source of truth: `plugin.json` -- `docs:` commits excluded from CHANGELOG (configure `changelog-sections`) -- Initial version seeded from current `plugin.json` value -- Workflow permissions: `contents: write`, `pull-requests: write` - -**Dependencies**: -- `google-github-actions/release-please-action` v4 - -## Acceptance Criteria - -- [ ] A push to `main` with releasable commits creates/updates a release PR -- [ ] Merging the release PR creates a GitHub Release with a semver tag (e.g., `v1.3.1`) -- [ ] The release PR updates `plugin.json` and `marketplace.json` versions -- [ ] CHANGELOG excludes `docs:`, `chore:`, and `ci:` commits -- [ ] Non-releasable commits do not trigger a version bump or release PR -- [ ] `.husky/` directory is fully removed -- [ ] `package.json` is removed -- [ ] Initial release version is seeded from current `plugin.json` version diff --git a/docs/specs/specs-persist-output-to-file.md b/docs/specs/specs-persist-output-to-file.md deleted file mode 100644 index a1e9f85..0000000 --- a/docs/specs/specs-persist-output-to-file.md +++ /dev/null @@ -1,90 +0,0 @@ -# Spec: Specs command persists output to file - -## Intent Description - -The `/specs` command produces four specification artifacts (Intent, BDD scenarios, Architecture notes, Acceptance Criteria) through a collaborative loop with the user. Currently, the results exist only in the conversation — they are lost when the session ends and cannot be referenced by downstream commands (`/plan`, `/build`, spec-compliance-review). - -This change makes `/specs` persist its output to `docs/specs/` as a structured markdown file, so that: -- `/plan` can read the spec artifacts when generating implementation steps -- `spec-compliance-review` can diff implementation against the written spec -- Specs survive session boundaries -- The output guardrail ("write to files, not chat") is honored - -## User-Facing Behavior - -```gherkin -Feature: Specs command persists output to file - - Scenario: Specs output is saved after consistency gate passes - Given the user runs /specs with a feature description - And all four artifacts pass the consistency gate - When the specification is finalized - Then a markdown file should be created at docs/specs/.md - And the file should contain all four artifacts (Intent, BDD scenarios, Architecture notes, Acceptance Criteria) - And the consistency gate verdict should be included - - Scenario: File name is derived from the feature description - Given the user runs /specs with "user login with MFA" - When the specification is finalized - Then the output file should be named docs/specs/user-login-with-mfa.md - - Scenario: Existing spec file is not overwritten without confirmation - Given a spec file already exists at docs/specs/user-login-with-mfa.md - When the user runs /specs for the same feature - Then the user should be asked whether to overwrite or create a versioned file -``` - -## Architecture Specification - -**Components affected:** -- `plugins/agentic-dev-team/skills/specs.md` — add file output instructions at the end of the workflow - -**No new files created in the plugin.** The `docs/specs/` directory is created in the consuming project at runtime. - -**Output format:** A single markdown file with H2 sections for each artifact: - -```markdown -# Spec: - -## Intent Description -... - -## User-Facing Behavior -... - -## Architecture Specification -... - -## Acceptance Criteria -... - -## Consistency Gate -- [x] Intent is unambiguous -- [x] Every behavior has a corresponding BDD scenario -- [x] Architecture constrains without over-engineering -- [x] Terminology consistent across artifacts -- [x] No contradictions between artifacts -``` - -**Constraints:** -- File path: `docs/specs/.md` -- Slugify: lowercase, hyphens for spaces, strip special characters -- The skill already says "no code during specification phase" — file output is an artifact, not implementation - -## Acceptance Criteria - -1. After the consistency gate passes, a file exists at `docs/specs/.md` containing all four artifacts -2. The file includes the consistency gate checklist with pass/fail marks -3. If `docs/specs/` does not exist, it is created -4. If a file with the same slug already exists, the user is prompted before overwriting -5. The file path is printed to chat so the user can find it - -## Consistency Gate - -- [x] Intent is unambiguous -- [x] Every behavior has a corresponding BDD scenario -- [x] Architecture constrains without over-engineering -- [x] Terminology consistent across artifacts -- [x] No contradictions between artifacts - -**Gate: PASS** diff --git a/docs/specs/superpowers-gap-closure.md b/docs/specs/superpowers-gap-closure.md new file mode 100644 index 0000000..d49e564 --- /dev/null +++ b/docs/specs/superpowers-gap-closure.md @@ -0,0 +1,1028 @@ +# Spec: Superpowers Gap Closure + +Closes gaps identified in the [competitive analysis](../../reports/competitive-analysis-2026-04-15.md) against [obra/superpowers](https://github.com/obra/superpowers). Seven core implementation slices (A-G) plus three platform support slices (H1-H3). Multi-platform research document at [docs/specs/multi-platform-support-suggestions.md](multi-platform-support-suggestions.md). + +## Dependency Graph + +``` +A (anti-rationalization knowledge) +├── F (TDD skill depth) — cross-references A's knowledge file +E (subagent status codes + prompt templates) +├── G (worktree setup) — uses BLOCKED status + implementer.md +B (code review reception) — independent +C (skill authoring enhancements) — independent (touches agent-eval) +D (debugging supporting files) — independent +H1 (Windows hooks) — independent +H2 (Gemini CLI support) — independent +H3 (OpenAI Codex support) — independent +``` + +## Cross-Slice Notes + +- **Canonical term**: "anti-rationalization" is the standard term across all slices. Do not use "rationalization prevention" or "rationalization bulletproofing" as synonyms — those are technique names within the concept. +- **Merge targets**: `knowledge/agent-registry.md` and `CLAUDE.md` are modified by multiple slices (A, B, D, E, F, G). Implement in dependency order to avoid merge conflicts. +- **Implementer prompt cross-deps**: Slice E creates `prompts/implementer.md`. Slice F's testing-anti-patterns reference should be mentioned in the implementer prompt. Slice G adds worktree setup to the implementer prompt. Implement E → F's implementer reference → G. + +--- + +## Slice A: Anti-Rationalization Knowledge + +### Intent Description + +Create a shared knowledge file that catalogs LLM anti-rationalization patterns — the plausible excuses agents generate to skip hard steps across all skills. Currently, the TDD skill and systematic-debugging skill each have their own rationalization tables, but other skills (Quality Gate Pipeline, verification evidence, code review) lack this defense. The knowledge file becomes a reusable reference that any skill can point to, and the existing TDD/debugging tables remain in place as domain-specific supplements that the knowledge file cross-references by link. + +This slice modifies only the Quality Gate Pipeline skill — adding an anti-rationalization reference to the existing Phase 2 "Red Flag Language" block. The TDD skill cross-reference is Slice F's responsibility. + +### User-Facing Behavior + +```gherkin +Feature: Anti-rationalization knowledge file + + Scenario: Agent encounters rationalization during TDD + Given an agent is following the TDD skill + When the agent generates an excuse to skip writing a test first + Then the excuse matches a pattern in the anti-rationalization knowledge file + And the agent recognizes it as rationalization and restarts from RED + + Scenario: Agent encounters rationalization during quality gate + Given an agent is running the Quality Gate Pipeline Phase 2 + When the agent generates red-flag language like "should work now" or "I believe" + Then the agent detects the language as an anti-rationalization signal + And the agent pauses to verify before claiming completion + + Scenario: New skill references anti-rationalization knowledge + Given a developer is authoring a new skill + When the skill has steps that agents commonly skip + Then the developer can reference the anti-rationalization knowledge file + And add domain-specific rationalizations to a table within the new skill + + Scenario: Anti-rationalization knowledge covers cross-cutting patterns + Given the anti-rationalization knowledge file exists + Then it contains at minimum these categories: + | category | + | Skipping verification | + | Skipping tests | + | Scope expansion | + | Premature completion claims | + | Process shortcuts | + And each pattern includes the excuse text and a reality counter + + Scenario: Unlisted rationalization is still caught + Given the anti-rationalization knowledge file exists + When an agent generates an excuse not explicitly listed + Then the catch-all rule applies: "If the excuse isn't listed here, it's still an excuse" + And the agent treats it as rationalization and follows the skill's restart protocol + + Scenario: Knowledge file cross-references domain-specific tables + Given the anti-rationalization knowledge file exists + Then it links to the TDD skill's rationalization table for test-specific patterns + And it links to the systematic-debugging skill's rationalization table for debugging-specific patterns + And it does NOT duplicate those tables' contents +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/knowledge/anti-rationalization.md` +- Modified: `plugins/agentic-dev-team/skills/quality-gate-pipeline/SKILL.md` — add anti-rationalization reference to the existing Phase 2 "Red Flag Language" block (lines 99-101) and to the Phase 1 "Hallucination Detection Signals" section +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — register the new knowledge file + +**Interfaces**: The knowledge file is a passive reference document. Skills reference it with a markdown link. Agents load it on demand when they need to cross-check excuses. + +**Constraints**: +- Do NOT duplicate the existing TDD or systematic-debugging rationalization tables into the knowledge file. Those tables are domain-specific and stay where they are. The knowledge file covers cross-cutting patterns only and links to those tables. +- Keep the file under 600 tokens — it's loaded on demand but should stay lean. +- Do NOT create a "new Red Flag Language section" — add the reference to the existing block. + +**Dependencies**: None — pure documentation addition. + +### Acceptance Criteria + +- [ ] `knowledge/anti-rationalization.md` exists with at least 5 categories of rationalization patterns +- [ ] Each pattern has: excuse text, reality counter, which skills it commonly appears in +- [ ] Knowledge file includes a catch-all statement at the top: "If the excuse isn't listed here, it's still an excuse" +- [ ] Knowledge file cross-references TDD and debugging tables by link rather than duplicating their patterns +- [ ] `quality-gate-pipeline/SKILL.md` existing Phase 2 "Red Flag Language" block references the knowledge file +- [ ] `quality-gate-pipeline/SKILL.md` Phase 1 "Hallucination Detection Signals" references the knowledge file +- [ ] `knowledge/agent-registry.md` includes the new file in the Knowledge Files table +- [ ] The knowledge file is under 600 tokens +- [ ] Existing TDD and debugging rationalization tables are NOT modified (Slice F handles TDD cross-ref) + +--- + +## Slice B: Code Review Reception Skill + +### Intent Description + +Create a new skill that defines behavioral constraints for how agents respond to code review feedback — whether from `/code-review`, `/apply-fixes`, or human reviewers. Currently, agents blindly accept all review findings and implement every suggestion without critical evaluation. This is a known LLM failure mode: performative agreement ("You're absolutely right!") followed by uncritical implementation of suggestions that may be wrong, unnecessary, or scope-expanding. The skill enforces technical verification before implementing any suggestion, mandates reasoned pushback when a suggestion would make the code worse, and includes a YAGNI gate to prevent gold-plating in response to reviews. + +Human feedback has higher authority than agent-generated feedback — the agent can push back with reasoning but defers to the human's final decision after one round. Tone is concise for both audiences. + +### User-Facing Behavior + +```gherkin +Feature: Code review reception discipline + + Scenario: Agent receives a valid review finding + Given an agent has received code review feedback + And the finding is technically correct and addresses a real issue + When the agent evaluates the finding + Then the agent verifies the finding against the actual code + And implements the fix with verification evidence + + Scenario: Agent receives an incorrect review finding + Given an agent has received code review feedback + And the finding is technically incorrect or based on a misunderstanding + When the agent evaluates the finding + Then the agent states why the finding is incorrect with specific code references + And does NOT implement the suggested change + And does NOT use performative agreement language + + Scenario: Agent receives a valid but YAGNI suggestion + Given an agent has received code review feedback + And the finding suggests adding capability beyond current requirements + When the agent evaluates the finding + Then the agent identifies it as scope expansion + And declines to implement with a YAGNI justification + And logs the suggestion for future consideration + + Scenario: Agent receives a subjective style preference + Given an agent has received code review feedback + And the finding is a style preference not backed by project conventions + When the agent evaluates the finding + Then the agent checks project conventions and linting rules + And only implements if a convention or rule supports the change + + Scenario: Performative agreement language is blocked + Given an agent is about to respond to code review feedback + When the agent drafts a response containing phrases like: + | phrase | + | You're absolutely right | + | Great catch | + | Of course, I should have | + | That's a good point | + Then the agent replaces the performative language + And responds with technical evaluation only + + Scenario: Agent verifies before implementing any suggestion + Given an agent has received a review suggestion to change code + When the agent decides to implement the suggestion + Then the agent first reads the relevant code to verify the issue exists + And confirms the suggested fix would not introduce regressions + And only then applies the change + + Scenario: Agent receives feedback from a human reviewer + Given an agent has received code review feedback from a human + And the agent believes the feedback is incorrect + When the agent evaluates the finding + Then the agent states its technical reasoning concisely + And defers to the human's final decision after one round of pushback + + Scenario: Agent receives ambiguous feedback + Given an agent has received code review feedback + And the finding is ambiguous — neither clearly correct nor clearly incorrect + When the agent evaluates the finding + Then the agent escalates the ambiguous finding to the human for clarification + And does NOT guess at the reviewer's intent +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/skills/receiving-code-review/SKILL.md` +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — register the new skill +- Modified: `plugins/agentic-dev-team/CLAUDE.md` — add to skills quick reference count and skills-by-phase table (Review phase) + +**Interfaces**: This skill is behavioral — loaded by any agent that receives review feedback. Primary consumer contexts (agents running within these commands load the skill): +- The `/apply-fixes` command +- The implementer subagent (receives inline review checkpoint results during `/build`) +- Any agent in a Phase 3 review-correction loop + +No modifications to those commands are needed — they are listed as consumer contexts, not files to edit. + +**Constraints**: +- The skill is behavioral only — it constrains how agents respond to feedback, it does not modify review output format or review agent behavior +- It must not conflict with the Quality Gate Pipeline Phase 3 review-correction loop. That loop says "fix critical/major defects." This skill adds: "but verify each finding is real before fixing it" +- Keep under 500 tokens — it's a discipline skill, not a technique manual + +**Dependencies**: Quality Gate Pipeline (Phase 3 review-correction loop) + +### Acceptance Criteria + +- [ ] `skills/receiving-code-review/SKILL.md` exists with frontmatter +- [ ] Skill includes a banned-phrases list for performative agreement +- [ ] Skill includes a verification-before-implementation gate +- [ ] Skill includes a YAGNI gate for scope-expanding suggestions +- [ ] Skill includes a rationalization prevention table (agents rationalize agreeing too, not just skipping) +- [ ] Skill distinguishes human feedback (higher authority, defer after one pushback) from agent feedback (full technical challenge) +- [ ] Skill includes guidance for ambiguous findings: escalate to human, don't guess +- [ ] Tone guidance: concise technical evaluation for both audiences, no emotional language +- [ ] `knowledge/agent-registry.md` includes the new skill +- [ ] `CLAUDE.md` quick reference updated with skill count and skills-by-phase table +- [ ] Skill is under 500 tokens +- [ ] Skill does not conflict with Quality Gate Pipeline Phase 3 + +--- + +## Slice C: Skill Authoring Enhancements (Pressure Testing + CSO) + +### Intent Description + +Enhance the existing `agent-skill-authoring` skill with two additions drawn from superpowers' `writing-skills` methodology. First, **pressure testing** — a structured process for testing whether a skill's instructions hold up under adversarial conditions (the agent is deep in implementation, eager to deliver, and generating rationalizations). Pressure scenarios are saved as eval fixtures and integrated into `/agent-eval`. Second, **Claude Search Optimization (CSO)** refinement — the existing skill already has guidance about description optimization, but this needs strengthening with concrete examples and a pass/fail checklist. + +### User-Facing Behavior + +```gherkin +Feature: Skill authoring pressure testing and description optimization + + Scenario: Author pressure-tests a new skill + Given a developer has written a new skill + When the developer follows the skill authoring guide + Then the guide instructs them to run the task WITHOUT the skill first + And observe natural failure modes + And write pressure scenarios that probe each failure mode + And verify the skill prevents each failure when loaded + + Scenario: Pressure scenario catches a skill weakness + Given a skill has been written with constraints + And a pressure scenario simulates an agent rationalizing around a constraint + When the pressure scenario is executed + Then the skill either prevents the rationalization or the weakness is identified + And the author strengthens the skill to close the gap + + Scenario: Pressure scenarios are saved as eval fixtures + Given a developer has written pressure scenarios for a skill + When the developer follows the authoring guide + Then the scenarios are saved in the evals directory alongside agent eval fixtures + And each scenario specifies the skill, the adversarial condition, and expected behavior + + Scenario: agent-eval runs pressure scenarios against skills + Given pressure scenario fixtures exist for a skill + When the user runs /agent-eval + Then the eval framework executes each pressure scenario + And reports whether the skill prevented the adversarial behavior + + Scenario: Malformed pressure fixture is reported + Given a pressure scenario fixture exists but has invalid format + When the user runs /agent-eval + Then the eval framework reports a parse error for the malformed fixture + And identifies which fixture file failed and what is wrong with it + + Scenario: Skill description follows CSO guidelines + Given a developer is writing a skill description + When the developer follows the authoring guide + Then the description contains ONLY triggering conditions + And the description does NOT summarize the skill's workflow or steps + And the description does NOT list the skill's internal structure + + Scenario: CSO checklist catches a bad description + Given a skill has a description that summarizes its workflow + When the author runs the CSO checklist + Then the checklist flags the description as problematic + And suggests rewriting to focus on when/why to trigger +``` + +### Architecture Specification + +**Components affected**: +- Modified: `plugins/agentic-dev-team/skills/agent-skill-authoring/SKILL.md` — expand "Apply TDD to skill-writing itself" into a pressure testing procedure and strengthen "Optimize skill descriptions for triggering" into a CSO checklist +- Modified: `plugins/agentic-dev-team/commands/agent-eval.md` — add pressure scenario fixture support alongside agent eval fixtures +- Fixture location: `evals/pressure/` directory, alongside existing `evals/` fixtures + +**Interfaces**: No new interfaces. The authoring skill is consumed by anyone authoring skills. The eval command gains a new fixture type. + +**Constraints**: +- The existing skill already has the "Apply TDD to skill-writing itself" and "Optimize skill descriptions for triggering" sections. Enhance these in place — do not create parallel sections. +- Keep total skill file under 1,200 tokens (currently ~990 tokens per registry) +- Do not add a separate knowledge file for this — the guidance belongs inline in the authoring skill + +**Dependencies**: None for the skill edit. The eval integration depends on the existing `/agent-eval` command structure. + +### Acceptance Criteria + +- [ ] "Apply TDD to skill-writing itself" section expanded with a concrete pressure testing procedure: (1) run without skill, (2) catalog failure modes, (3) write pressure scenarios, (4) verify skill prevents each failure +- [ ] At least 3 example pressure scenarios included as templates (e.g., "agent is 80% through implementation and wants to skip the verification step") +- [ ] Pressure scenario fixture format defined (skill name, adversarial condition, expected agent behavior, pass/fail criteria) +- [ ] Fixtures saved to `evals/pressure/` directory +- [ ] `commands/agent-eval.md` updated to mention skill pressure scenarios alongside agent eval fixtures +- [ ] Eval framework reports parse errors for malformed pressure fixtures +- [ ] "Optimize skill descriptions for triggering" expanded into a CSO checklist with pass/fail criteria +- [ ] CSO checklist includes at least 2 "good" and 2 "bad" description examples +- [ ] Total skill file stays under 1,200 tokens +- [ ] `knowledge/agent-registry.md` updated if `/agent-eval` description changes + +--- + +## Slice D: Systematic Debugging Supporting Files + +### Intent Description + +Add three supporting reference files to the systematic-debugging skill directory that provide concrete, reusable techniques agents can load on demand during debugging. Currently, the skill defines a solid 4-phase process but is procedural — it tells agents *what* to do (investigate, trace, hypothesize) without providing detailed *how-to* techniques. We'll add the three highest-value ones: root-cause tracing (backward call-chain analysis), condition-based waiting (replacing arbitrary sleeps/timeouts in tests with polling), and a test polluter finder (language-agnostic bisection algorithm for identifying which test pollutes shared state). + +### User-Facing Behavior + +```gherkin +Feature: Systematic debugging supporting reference files + + Scenario: Agent uses root-cause tracing during investigation + Given an agent is in Phase 2 (Investigate) of systematic debugging + And the failure involves a value that is wrong at the point of use + When the agent loads the root-cause-tracing reference + Then the agent traces backward through the call chain from symptom to origin + And identifies the layer where the value first diverges from expected + + Scenario: Agent replaces arbitrary timeout with condition-based waiting + Given an agent is debugging a flaky test + And the test uses sleep or setTimeout to wait for an async condition + When the agent loads the condition-based-waiting reference + Then the agent replaces the arbitrary wait with a polling pattern + And the polling pattern has a timeout ceiling and descriptive error on timeout + + Scenario: Agent identifies a test polluter + Given an agent is debugging a test that passes in isolation but fails in suite + And the failure is caused by shared state pollution from another test + When the agent loads the find-polluter reference + Then the agent uses bisection to identify which prior test pollutes the state + And the bisection narrows to the specific polluting test + + Scenario: Supporting files are loaded on demand only + Given the systematic-debugging skill is loaded + When the agent enters Phase 2 and needs a specific technique + Then only the relevant supporting file is loaded + And other supporting files remain unloaded to conserve context +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/skills/systematic-debugging/root-cause-tracing.md` +- New file: `plugins/agentic-dev-team/skills/systematic-debugging/condition-based-waiting.md` +- New file: `plugins/agentic-dev-team/skills/systematic-debugging/find-polluter.md` +- Modified: `plugins/agentic-dev-team/skills/systematic-debugging/SKILL.md` — add "Supporting References" section in Phase 2 that links to the three files with guidance on when to load each +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — note the supporting files exist under the skill entry + +**Interfaces**: Supporting files are passive markdown references loaded by the agent on demand. The main SKILL.md links to them with triggering conditions (e.g., "Load root-cause-tracing.md when the failure involves a wrong value at the point of use"). + +**Constraints**: +- Each supporting file should be under 400 tokens — they're technique references, not full skills +- The `find-polluter.md` describes the bisection algorithm in language-agnostic terms, not an executable script (projects use different test runners) +- Do not restructure the existing 4-phase process — the supporting files augment Phase 2, they don't replace it + +**Dependencies**: Systematic Debugging skill (existing) + +### Acceptance Criteria + +- [ ] `skills/systematic-debugging/root-cause-tracing.md` exists with backward tracing technique +- [ ] `skills/systematic-debugging/condition-based-waiting.md` exists with polling pattern replacing arbitrary waits +- [ ] `skills/systematic-debugging/find-polluter.md` exists with language-agnostic bisection algorithm +- [ ] Each supporting file is under 400 tokens +- [ ] Main `SKILL.md` Phase 2 section includes a "Supporting References" block linking to each file with when-to-load guidance +- [ ] `knowledge/agent-registry.md` updated to note supporting files +- [ ] `find-polluter.md` is language-agnostic (describes algorithm, not a shell script) +- [ ] Existing 4-phase process is unchanged + +--- + +## Slice E: Subagent Status Codes + +### Intent Description + +Introduce a structured 4-status-code protocol for subagent reporting: `DONE`, `DONE_WITH_CONCERNS`, `NEEDS_CONTEXT`, and `BLOCKED`. Currently, subagents return free-form text and the orchestrator must parse success/failure from unstructured output. This slice adds clear, parseable signals for subagent outcomes and creates the three missing prompt templates (`implementer.md`, `spec-reviewer.md`, `quality-reviewer.md`) with full behavioral content. The four existing plan review templates are also updated to adopt the status protocol. + +The status protocol uses two output formats depending on template type: +- **Markdown status block**: Used by implementer, spec-reviewer, and quality-reviewer (new templates) +- **JSON `"status"` field**: Used by plan review templates (existing JSON output format, `"status"` added alongside existing `"verdict"`) + +### User-Facing Behavior + +```gherkin +Feature: Subagent structured status codes + + Scenario: Subagent completes work successfully + Given the orchestrator has dispatched a subagent for a task + When the subagent finishes the work and all verification passes + Then the subagent returns status DONE + And includes verification evidence in its response + + Scenario: Subagent completes with reservations + Given the orchestrator has dispatched a subagent for a task + When the subagent finishes the work but has concerns about the approach + Then the subagent returns status DONE_WITH_CONCERNS + And includes the completed work plus a list of specific concerns + And the orchestrator reviews the concerns before accepting the work + + Scenario: Orchestrator handles DONE_WITH_CONCERNS + Given a subagent returned DONE_WITH_CONCERNS with a list of concerns + When the orchestrator receives the status + Then the orchestrator evaluates each concern + And decides per-concern: accept the work as-is, re-dispatch with guidance, or escalate to user + And logs the decision for each concern + + Scenario: Subagent needs more context from parent + Given the orchestrator has dispatched a subagent for a task + When the subagent cannot complete because it lacks information + And the missing information is available in the parent context + Then the subagent returns status NEEDS_CONTEXT + And specifies exactly what information is needed + And the orchestrator re-dispatches with the additional context + + Scenario: Subagent is blocked by an external dependency + Given the orchestrator has dispatched a subagent for a task + When the subagent cannot proceed due to an unresolvable dependency + Then the subagent returns status BLOCKED + And describes the blocking dependency + And the orchestrator escalates to the user + + Scenario: Orchestrator handles NEEDS_CONTEXT with re-dispatch + Given a subagent returned NEEDS_CONTEXT requesting file contents + When the orchestrator receives the status + Then the orchestrator gathers the requested context + And re-dispatches the same subagent prompt with added context + And does NOT treat NEEDS_CONTEXT as a failure + + Scenario: Orchestrator handles BLOCKED with user escalation + Given a subagent returned BLOCKED citing an external service dependency + When the orchestrator receives the status + Then the orchestrator presents the blocker to the user + And pauses the task until the user provides direction + + Scenario: NEEDS_CONTEXT re-dispatch is capped + Given a subagent has returned NEEDS_CONTEXT twice for the same task + When the subagent returns NEEDS_CONTEXT a third time + Then the orchestrator escalates to the user instead of re-dispatching + And reports what context was requested across all three attempts + + Scenario: Orchestrator receives unrecognized status + Given a subagent returns a status code not in the protocol + When the orchestrator parses the response + Then the orchestrator treats the unrecognized status as BLOCKED + And escalates to the user with the raw subagent output + + Scenario: Plan reviewer returns status via JSON + Given the orchestrator dispatches a plan review subagent + When the reviewer completes its review + Then the JSON output includes both "verdict" and "status" fields + And the mapping is: + | verdict | warnings | status | + | approve | 0 | DONE | + | approve | 1+ | DONE_WITH_CONCERNS | + | needs-revision | any | DONE_WITH_CONCERNS | +``` + +### Architecture Specification + +**Components affected**: +- Modified: `plugins/agentic-dev-team/agents/orchestrator.md` — add "Subagent Status Protocol" section defining the 4 codes, orchestrator response table, and two output formats +- New file: `plugins/agentic-dev-team/prompts/implementer.md` — full implementer behavioral content + markdown status block +- New file: `plugins/agentic-dev-team/prompts/spec-reviewer.md` — full spec review behavioral content + markdown status block +- New file: `plugins/agentic-dev-team/prompts/quality-reviewer.md` — full quality review behavioral content + markdown status block +- Modified: `plugins/agentic-dev-team/prompts/plan-review-acceptance.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/prompts/plan-review-design.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/prompts/plan-review-ux.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/prompts/plan-review-strategic.md` — add `"status"` field to JSON output +- Modified: `plugins/agentic-dev-team/commands/build.md` — update step 4 to handle NEEDS_CONTEXT and BLOCKED status from subagents +- Modified: `plugins/agentic-dev-team/CLAUDE.md` — update "Multi-Agent Collaboration Protocol" to reference the status protocol +- Modified: `plugins/agentic-dev-team/knowledge/agent-registry.md` — update prompt template entries + +**Interfaces**: + +Markdown status block (new templates): +``` +## Status +**Result**: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED +**Concerns**: [list, if DONE_WITH_CONCERNS] +**Needs**: [specific info needed, if NEEDS_CONTEXT] +**Blocker**: [description, if BLOCKED] +``` + +JSON status field (plan review templates — added to existing output): +```json +{ + "reviewer": "plan-review-*", + "verdict": "approve | needs-revision", + "status": "DONE | DONE_WITH_CONCERNS", + ... +} +``` + +Orchestrator response table: +| Status | Orchestrator action | +|--------|-------------------| +| DONE | Accept work, proceed | +| DONE_WITH_CONCERNS | Review concerns, decide: accept / re-dispatch with guidance / escalate | +| NEEDS_CONTEXT | Gather info, re-dispatch (max 2 re-dispatches before escalating) | +| BLOCKED | Escalate to user immediately | +| Unrecognized | Treat as BLOCKED, escalate with raw output | + +**Constraints**: +- NEEDS_CONTEXT re-dispatch has a max of 2 attempts — after that, escalate to user +- The two output formats (markdown block and JSON field) are both documented in the orchestrator's status protocol section +- This does not change model routing — status codes are orthogonal to model selection +- Plan review templates keep their existing `"verdict"` field — `"status"` is additive + +**Dependencies**: Orchestrator agent, build command, all subagent prompt templates + +### Acceptance Criteria + +- [ ] Orchestrator agent has a "Subagent Status Protocol" section defining all 4 codes +- [ ] Orchestrator response table maps each status to a concrete action, including unrecognized status +- [ ] `prompts/implementer.md` created with full implementer behavioral content + markdown status block +- [ ] `prompts/spec-reviewer.md` created with full spec review behavioral content + markdown status block +- [ ] `prompts/quality-reviewer.md` created with full quality review behavioral content + markdown status block +- [ ] `prompts/plan-review-acceptance.md` updated with `"status"` JSON field +- [ ] `prompts/plan-review-design.md` updated with `"status"` JSON field +- [ ] `prompts/plan-review-ux.md` updated with `"status"` JSON field +- [ ] `prompts/plan-review-strategic.md` updated with `"status"` JSON field +- [ ] All 7 prompt templates use the documented status format (markdown or JSON as appropriate) +- [ ] `commands/build.md` step 4 handles NEEDS_CONTEXT and BLOCKED +- [ ] CLAUDE.md "Multi-Agent Collaboration Protocol" references the status protocol +- [ ] NEEDS_CONTEXT re-dispatch capped at 2 attempts +- [ ] Both output formats (markdown block + JSON field) documented in orchestrator +- [ ] `knowledge/agent-registry.md` prompt template entries updated + +--- + +## Slice F: TDD Skill Depth + +### Intent Description + +Deepen the TDD skill with two additions: (1) a "Testing Anti-Patterns" supporting reference file covering common mock/test anti-patterns that agents fall into, and (2) a cross-reference from the existing rationalization table to the anti-rationalization knowledge file (from Slice A). The cross-reference augments the existing catch-all line (currently "If you catch yourself composing an excuse not on this list, it's still an excuse") by adding a link to the knowledge file for cross-cutting patterns. + +The testing-anti-patterns reference should also be mentioned in the implementer prompt template (created in Slice E) so subagents have access to it during implementation. + +### User-Facing Behavior + +```gherkin +Feature: TDD skill depth enhancements + + Scenario: Agent encounters a testing anti-pattern + Given an agent is writing tests during the RED phase + When the agent writes a test that mocks a dependency without understanding its contract + Then the testing-anti-patterns reference identifies this as "mocking without understanding" + And the agent rewrites the test to use the real dependency or a properly understood fake + + Scenario: Agent encounters mock-tests-mock anti-pattern + Given an agent is writing a test + When the test primarily asserts that a mock was called with expected arguments + Then the testing-anti-patterns reference identifies this as "testing mock behavior" + And the agent rewrites to test observable outcomes instead of call patterns + + Scenario: Agent falls into sunk cost trap + Given an agent wrote implementation code before writing a test + And the agent has invested significant context in the implementation + When the agent realizes it violated TDD + Then the sunk cost rationalization entry in the TDD table triggers + And the agent deletes the implementation and restarts from RED + And does NOT rationalize keeping the code "as a reference" + + Scenario: TDD skill cross-references anti-rationalization knowledge + Given the TDD skill's rationalization prevention table exists + And the anti-rationalization knowledge file exists + When an agent encounters a rationalization not in the TDD table + Then the catch-all line directs the agent to the knowledge file for cross-cutting patterns + And the catch-all rule still applies: the unlisted excuse is still an excuse +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/skills/test-driven-development/testing-anti-patterns.md` — supporting reference file +- Modified: `plugins/agentic-dev-team/skills/test-driven-development/SKILL.md` — augment the existing catch-all line with a cross-reference to `knowledge/anti-rationalization.md`, add "Supporting References" section linking to testing-anti-patterns.md +- Modified: `plugins/agentic-dev-team/prompts/implementer.md` (created in Slice E) — mention testing-anti-patterns as a loadable reference during RED phase + +**Interfaces**: The testing-anti-patterns file is a passive reference loaded on demand during the RED phase when the agent needs guidance on test quality. + +**Constraints**: +- Do NOT duplicate content from the existing rationalization table — add the cross-reference only +- The existing catch-all line at the end of the rationalization table is augmented to: "If you catch yourself composing an excuse not on this list, it's still an excuse. See also [anti-rationalization patterns](../../knowledge/anti-rationalization.md) for cross-cutting patterns beyond TDD." +- The testing-anti-patterns file covers test-writing anti-patterns (mock abuse, test-only methods, testing implementation) — not TDD process violations (those are already in the main skill) +- Keep supporting file under 400 tokens +- The existing SKILL.md already has the Iron Law, 12-entry rationalization table, red flags, and anti-pattern section. Changes should be minimal additions, not rewrites. + +**Dependencies**: Slice A (anti-rationalization knowledge file for cross-reference), Slice E (implementer.md creation) + +### Acceptance Criteria + +- [ ] `skills/test-driven-development/testing-anti-patterns.md` exists with at least 5 anti-patterns +- [ ] Anti-patterns cover: testing mock behavior, test-only production methods, mocking without understanding contract, incomplete mocks, integration tests as afterthought +- [ ] Each anti-pattern has: name, description, why it's harmful, what to do instead +- [ ] Main `SKILL.md` has a "Supporting References" section linking to testing-anti-patterns.md +- [ ] Main `SKILL.md` catch-all line augmented with cross-reference to `knowledge/anti-rationalization.md` +- [ ] `prompts/implementer.md` mentions testing-anti-patterns as a loadable reference +- [ ] Supporting file is under 400 tokens +- [ ] No existing content in the main SKILL.md is duplicated or removed + +--- + +## Slice G: Git Worktree Language-Specific Setup + +### Intent Description + +Enhance the worktree creation workflow so that after a git worktree is created for a subagent, dependency installation and baseline test verification happen before implementation begins. Currently, `isolation: "worktree"` creates a clean worktree but the subagent starts implementing immediately — if dependencies aren't installed, the first test run fails for the wrong reason (missing deps, not missing feature). This slice adds a setup step to the implementer prompt template. + +### User-Facing Behavior + +```gherkin +Feature: Git worktree language-specific setup + + Scenario: Worktree setup detects Node.js project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a package.json + When the worktree is created + Then the subagent runs the appropriate install command in the worktree + And the install command is determined by lock file presence: + | lock file | command | + | package-lock.json | npm ci | + | yarn.lock | yarn install | + | pnpm-lock.yaml | pnpm install | + | bun.lockb | bun install | + + Scenario: Worktree setup detects Python project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a requirements.txt or pyproject.toml + When the worktree is created + Then the subagent installs dependencies in the worktree + + Scenario: Worktree setup detects Go project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a go.mod + When the worktree is created + Then the subagent runs go mod download in the worktree + + Scenario: Worktree setup detects Rust project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a Cargo.toml + When the worktree is created + Then the subagent runs cargo build in the worktree + + Scenario: Worktree setup detects Java Maven project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a pom.xml + When the worktree is created + Then the subagent runs mvn install -DskipTests in the worktree + + Scenario: Worktree setup detects Java Gradle project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a build.gradle or build.gradle.kts + When the worktree is created + Then the subagent runs gradle build -x test in the worktree + + Scenario: Worktree setup detects dotnet project + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a .csproj or .sln file + When the worktree is created + Then the subagent runs dotnet restore in the worktree + + Scenario: Worktree setup runs baseline tests + Given the worktree has been created and dependencies installed + When the setup step completes + Then the subagent runs the project's test suite as a baseline + And verifies all existing tests pass before beginning implementation + And if baseline tests fail the subagent returns BLOCKED status + + Scenario: Worktree setup for unknown project type + Given the orchestrator dispatches a subagent with worktree isolation + And no recognized project files are found + When the worktree is created + Then the subagent skips dependency installation + And proceeds directly to implementation with a warning + + Scenario: Dependency installation fails + Given the orchestrator dispatches a subagent with worktree isolation + And the project root contains a package.json + When the worktree is created + And the dependency install command fails + Then the subagent returns BLOCKED status + And includes the install error output in the blocker description +``` + +### Architecture Specification + +**Components affected**: +- Modified: `plugins/agentic-dev-team/agents/orchestrator.md` — update Phase 3 "Subagent dispatch" section to reference the worktree setup protocol +- Modified: `plugins/agentic-dev-team/prompts/implementer.md` (created in Slice E) — add a "Worktree Setup" section at the top of the implementation flow that runs before RED phase +- New file: `plugins/agentic-dev-team/knowledge/worktree-setup.md` — reference table mapping project indicators to setup commands, loaded by the implementer prompt + +**Interfaces**: The worktree setup is a pre-implementation step within the subagent. It is NOT a hook or separate script — it's instructions within the implementer prompt that the subagent follows after the worktree is created but before starting TDD. + +**Constraints**: +- The setup step must be fast — install + baseline test should add minimal overhead +- If baseline tests fail, the subagent returns `BLOCKED` (from Slice E's status protocol) rather than attempting to fix pre-existing failures +- If dependency install fails, the subagent returns `BLOCKED` with the error output +- Language detection uses file presence only (package.json, go.mod, etc.) — no heuristics or LLM judgment +- The knowledge file is a simple lookup table, not a decision tree + +**Dependencies**: Slice E (subagent status codes — BLOCKED status for baseline/install failures, implementer.md creation) + +### Acceptance Criteria + +- [ ] `knowledge/worktree-setup.md` exists with a detection table mapping project indicators to install + test commands +- [ ] Detection covers: Node.js (npm/yarn/pnpm/bun), Python, Go, Rust, .NET, Java (Maven/Gradle) +- [ ] `prompts/implementer.md` includes a "Worktree Setup" section that runs before RED +- [ ] Setup runs dependency install then baseline test suite +- [ ] Baseline test failure returns BLOCKED status (not failure, not NEEDS_CONTEXT) +- [ ] Dependency install failure returns BLOCKED status with error output +- [ ] Unknown project type skips setup with a warning +- [ ] `agents/orchestrator.md` Phase 3 references the worktree setup protocol +- [ ] `knowledge/agent-registry.md` updated with the new knowledge file + +--- + +## Slice H1: Windows Hooks Support + +### Intent Description + +Enable the plugin's 8 bash hooks to work on Windows by adding a cross-platform shim and fixing platform-specific path issues. On Windows, bash is available via Git for Windows (near-universal on dev machines). The approach: keep all hooks as bash scripts, add a `.cmd` wrapper that locates and delegates to bash (same pattern as superpowers), and fix hardcoded `/tmp/`/`$TMPDIR` references. A Windows prerequisite checker (`install.ps1`) replaces `install.sh` for Windows users. + +### User-Facing Behavior + +```gherkin +Feature: Windows hooks support + + Scenario: Hooks execute on Windows via Git for Windows bash + Given the plugin is installed on a Windows machine + And Git for Windows is installed (providing bash.exe on PATH) + When Claude Code triggers a PreToolUse or PostToolUse hook + Then the hook command invokes the run-hook.cmd shim + And the shim locates bash.exe and delegates to the .sh script + And the hook executes successfully with correct output + + Scenario: Shim locates bash from Git for Windows default path + Given bash is NOT on the system PATH + And Git for Windows is installed at the default location + When run-hook.cmd is invoked + Then the shim checks "C:\Program Files\Git\bin\bash.exe" + And uses it to execute the hook script + + Scenario: Shim locates bash from WSL + Given bash is NOT on the system PATH + And Git for Windows is NOT installed + And WSL is available + When run-hook.cmd is invoked + Then the shim uses wsl.exe to execute the hook script + + Scenario: Shim fails gracefully when no bash is available + Given bash is NOT on the system PATH + And Git for Windows is NOT installed + And WSL is NOT available + When run-hook.cmd is invoked + Then the shim exits with an error message explaining bash is required + And suggests installing Git for Windows + + Scenario: Hooks use platform-agnostic temp directory + Given a hook needs a temporary file + When the hook references a temp directory + Then it uses ${TMPDIR:-${TEMP:-/tmp}} instead of hardcoded /tmp/ + And the path resolves correctly on both Unix and Windows + + Scenario: Windows prerequisite checker validates environment + Given a user runs install.ps1 on Windows + Then the script checks for: + | prerequisite | check | + | bash | bash.exe on PATH or Git for Windows installed | + | jq | jq.exe on PATH | + | git | git.exe on PATH | + And reports which prerequisites are missing with install instructions + + Scenario: Hooks work unchanged on macOS and Linux + Given the plugin is installed on macOS or Linux + When Claude Code triggers a hook + Then the hook command invokes bash directly as before + And the run-hook.cmd shim is not used + And no behavior changes from the current implementation +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/hooks/run-hook.cmd` — Windows shim that locates bash.exe and delegates (~20 lines) +- New file: `plugins/agentic-dev-team/install.ps1` — Windows prerequisite checker +- Modified: `plugins/agentic-dev-team/hooks/tdd-guard.sh` — fix `TMPDIR` reference (use `${TMPDIR:-${TEMP:-/tmp}}`) +- Modified: `plugins/agentic-dev-team/hooks/version-check.sh` — fix hardcoded `/tmp/` path (use `${TMPDIR:-${TEMP:-/tmp}}`) +- Modified: `plugins/agentic-dev-team/settings.json` — document Windows hook invocation pattern + +**Interfaces**: The `run-hook.cmd` shim is invoked by Claude Code on Windows instead of `bash` directly. It takes the hook script path as an argument and passes stdin through. Exit codes are preserved. + +**Constraints**: +- Do NOT rewrite hooks in PowerShell or Node.js — keep bash, add shim +- Do NOT require WSL — Git for Windows is the primary target +- All 9 existing hooks must continue working unchanged on macOS/Linux +- Only 2 existing `.sh` files are modified (TMPDIR fixes) +- `jq` is a hard dependency — Windows installer must check for it + +**Dependencies**: None — independent of all other slices. + +### Acceptance Criteria + +- [ ] `hooks/run-hook.cmd` exists and locates bash via: (1) PATH, (2) Git for Windows default, (3) WSL +- [ ] Shim exits with clear error if no bash found +- [ ] Shim passes stdin, arguments, and exit codes through correctly +- [ ] `install.ps1` checks for bash, jq, and git on Windows +- [ ] `install.ps1` provides install instructions for each missing prerequisite +- [ ] `tdd-guard.sh` uses `${TMPDIR:-${TEMP:-/tmp}}` instead of hardcoded paths +- [ ] `version-check.sh` uses `${TMPDIR:-${TEMP:-/tmp}}` instead of hardcoded `/tmp/` +- [ ] All 9 hooks pass on macOS/Linux with no behavior change + +--- + +## Slice H2: Gemini CLI Platform Support + +### Intent Description + +Add Gemini CLI as a supported platform. Gemini CLI has a native extension system with skills (`SKILL.md` with same frontmatter format), agents, hooks (`hooks/hooks.json`), commands (TOML format), and context files (`GEMINI.md`). Our skills and knowledge files are reusable as-is. The approach: create a `gemini-extension.json` manifest and `GEMINI.md` context file, add TOML commands for key workflows, and document capability limitations (no multi-agent orchestration, no model routing, no tool scoping). + +### User-Facing Behavior + +```gherkin +Feature: Gemini CLI platform support + + Scenario: Plugin is discoverable as a Gemini CLI extension + Given the plugin repository contains a gemini-extension.json manifest + When a user installs the extension in Gemini CLI + Then Gemini CLI loads the manifest and discovers the extension + And the GEMINI.md context file is loaded into the session + + Scenario: Skills are loaded in Gemini CLI + Given the plugin is installed as a Gemini CLI extension + When Gemini CLI scans the skills/ directory + Then it discovers all SKILL.md files with name and description frontmatter + And skills are available for implicit and explicit invocation + + Scenario: Knowledge files are accessible + Given the plugin is installed as a Gemini CLI extension + When an agent or skill references a knowledge file + Then the knowledge file is readable as a standard markdown reference + + Scenario: Hooks are loaded from Gemini CLI hooks format + Given the plugin is installed as a Gemini CLI extension + And a hooks/hooks-gemini.json file exists + When Gemini CLI loads hook configuration + Then compatible hooks execute normally + And hooks that depend on Claude Code-specific stdin format are skipped with warnings + + Scenario: GEMINI.md provides platform-specific context + Given the plugin is installed as a Gemini CLI extension + When a session starts + Then GEMINI.md is loaded with plugin philosophy, team organization, and skill registry + And it does NOT reference Claude Code-specific features + + Scenario: Orchestration degrades to inline execution + Given the plugin is installed in Gemini CLI + When a multi-agent workflow is triggered + Then GEMINI.md instructs inline execution as the fallback + And warns that multi-agent orchestration requires Claude Code + + Scenario: Commands are available as Gemini CLI TOML commands + Given the plugin is installed as a Gemini CLI extension + When Gemini CLI scans the commands-gemini/ directory + Then TOML command files are loaded for key workflows + + Scenario: Agents are loaded with unknown frontmatter ignored + Given the plugin is installed as a Gemini CLI extension + When Gemini CLI scans the agents/ directory + Then agent markdown files are discovered + And tools: and model: frontmatter fields are ignored without error +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/gemini-extension.json` +- New file: `plugins/agentic-dev-team/GEMINI.md` +- New directory: `plugins/agentic-dev-team/commands-gemini/` — TOML commands for key workflows +- New file: `plugins/agentic-dev-team/hooks/hooks-gemini.json` + +**Reused as-is**: All `skills/*/SKILL.md`, all `knowledge/*.md`, all `prompts/*.md` + +**Constraints**: +- Do NOT fork or duplicate skill files — manifest points to same `skills/` directory +- GEMINI.md is standalone (not symlink to CLAUDE.md) +- Keep TOML command set to top 5 most-used commands initially +- Clearly document capability limitations in GEMINI.md + +**Dependencies**: None — independent. Can be implemented in parallel with all other slices. + +### Acceptance Criteria + +- [ ] `gemini-extension.json` exists with name, version, description, contextFileName +- [ ] `GEMINI.md` exists without Claude Code-specific feature references +- [ ] Skills discoverable from existing `skills/` directory +- [ ] At least 5 TOML commands in `commands-gemini/` +- [ ] `hooks/hooks-gemini.json` defines compatible hooks +- [ ] GEMINI.md documents capability limitations vs Claude Code +- [ ] Existing Claude Code functionality unchanged + +--- + +## Slice H3: OpenAI Codex Platform Support + +### Intent Description + +Add OpenAI Codex CLI as a supported platform. Codex uses `AGENTS.md` (hierarchical markdown discovery), `SKILL.md` with same frontmatter (skills in `.agents/skills/`), `config.toml`, `hooks.json`, and supports subagent dispatch (explicit request required). The approach: create `AGENTS.md`, `.codex/` config directory, and an installation guide that explains skill discovery setup. + +### User-Facing Behavior + +```gherkin +Feature: OpenAI Codex CLI platform support + + Scenario: Plugin provides AGENTS.md for Codex + Given the plugin repository contains an AGENTS.md at the root + When Codex CLI starts in a project using this plugin + Then AGENTS.md is loaded as project-level instructions + + Scenario: Skills are discoverable by Codex + Given the plugin is installed + And .agents/skills/ points to the plugin's skills + When Codex scans for skills + Then it discovers all SKILL.md files + + Scenario: Codex configuration is provided + Given the plugin includes .codex/config.toml + When Codex reads project configuration + Then hooks are enabled and defaults are set + + Scenario: Hooks are available in Codex format + Given the plugin includes .codex/hooks.json + When Codex loads lifecycle hooks + Then compatible hooks execute on appropriate events + + Scenario: Subagent workflows require explicit request + Given the plugin is installed in Codex + When a multi-agent workflow is triggered + Then AGENTS.md instructs that subagent dispatch requires explicit user request + And provides Codex-specific guidance on subagent invocation + + Scenario: Knowledge files are accessible + Given the plugin is installed in Codex + When a skill references a knowledge file + Then the file is readable as standard markdown + + Scenario: AGENTS.md documents capability limitations + Given AGENTS.md is loaded by Codex + Then it states which features require Claude Code for full capability + + Scenario: Installation guide exists + Given a user wants to install for Codex + When they read CODEX-INSTALL.md + Then it explains skill symlinks, AGENTS.md placement, and config.toml setup + + Scenario: AGENTS.md fits within Codex size limit + Given AGENTS.md is loaded by Codex + Then its size is within the 32 KiB default project_doc_max_bytes limit +``` + +### Architecture Specification + +**Components affected**: +- New file: `plugins/agentic-dev-team/AGENTS.md` +- New directory: `plugins/agentic-dev-team/.codex/` +- New file: `plugins/agentic-dev-team/.codex/config.toml` +- New file: `plugins/agentic-dev-team/.codex/hooks.json` +- New file: `plugins/agentic-dev-team/CODEX-INSTALL.md` + +**Reused as-is**: All `skills/*/SKILL.md`, all `knowledge/*.md`, all `prompts/*.md` + +**Constraints**: +- Do NOT fork or duplicate skill files +- AGENTS.md is standalone, adapted from CLAUDE.md +- Keep .codex/config.toml minimal +- AGENTS.md within 32 KiB limit +- Clearly document capability limitations + +**Dependencies**: None — independent. Can be implemented in parallel. + +### Acceptance Criteria + +- [ ] `AGENTS.md` exists without Claude Code-specific feature references +- [ ] `AGENTS.md` within 32 KiB +- [ ] `.codex/config.toml` exists with hooks enabled +- [ ] `.codex/hooks.json` defines compatible hooks +- [ ] `CODEX-INSTALL.md` covers skill discovery, AGENTS.md, and config setup +- [ ] Skills discoverable by Codex from standard scan path +- [ ] AGENTS.md documents capability limitations vs Claude Code +- [ ] Existing Claude Code functionality unchanged + +--- + +## Consistency Gate + +### Slices A-G (core implementation) +- [x] Intent is unambiguous — two developers would interpret each slice the same way +- [x] Every behavior in each intent has at least one corresponding BDD scenario +- [x] Architecture specification constrains implementation to what the intent requires, without over-engineering +- [x] Terminology consistent across all artifacts ("anti-rationalization" is canonical) +- [x] No contradictions between artifacts within any slice +- [x] No contradictions between slices +- [x] Dependency chain is acyclic (A→F, E→G, E→F implementer ref) +- [x] Merge targets identified (agent-registry.md, CLAUDE.md) +- [x] Both status output formats documented (markdown block + JSON field) +- [x] Negative/edge/error cases covered (ambiguous feedback, malformed fixtures, unrecognized status, install failure, unknown project type) + +### Slices H1-H3 (platform support) +- [x] Each slice is independent — no cross-dependencies between H1, H2, H3 +- [x] All three reuse existing skills/knowledge without forking +- [x] Degradation strategy consistent: document limitations, suggest Claude Code for full capability +- [x] H1 Windows hooks: negative case (no bash), edge case (WSL fallback), no-regression on Unix +- [x] H2 Gemini: unknown frontmatter handled, hook format mismatch handled +- [x] H3 Codex: size limit addressed, install guide covers discovery setup +- [x] No contradictions with slices A-G diff --git a/reports/competitive-analysis-2026-04-15.md b/reports/competitive-analysis-2026-04-15.md new file mode 100644 index 0000000..a809ce3 --- /dev/null +++ b/reports/competitive-analysis-2026-04-15.md @@ -0,0 +1,251 @@ +# Competitive Analysis: agentic-dev-team vs superpowers + +**Date**: 2026-04-15 +**Target**: [obra/superpowers](https://github.com/obra/superpowers) (v5.0.7, 431 commits, MIT license) +**Source type**: URL +**Author of target**: Jesse Vincent + +## Executive Summary + +superpowers is a focused, workflow-discipline plugin targeting 6 agent platforms (Claude Code, Cursor, Codex, OpenCode, Gemini CLI, GitHub Copilot CLI). It has 14 skills, 1 agent, and 3 deprecated commands. agentic-dev-team is broader (60+ capabilities across all SDLC phases) but narrower in platform support (Claude Code only). The key finding: superpowers excels at **depth per skill** — anti-rationalization techniques, pressure-tested documentation, and a novel "TDD for skills" authoring methodology — while agentic-dev-team excels at **breadth of coverage** across the full development lifecycle. There are 5 actionable gaps where superpowers does something we don't or does it better. + +## Capability Comparison + +### Workflow / Orchestration + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Design-before-code gate | `/specs` + `/design-doc` + `/design-interrogation` | `brainstorming` skill with visual companion + spec reviewer subagent | Different approach | +| Implementation planning | `/plan` command | `writing-plans` skill with 2-5 min task decomposition | Different approach | +| Subagent orchestration | Orchestrator agent with model routing table | `subagent-driven-development` skill with 4 status codes (DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED) | Different approach | +| Plan execution | `/build` command with TDD | `executing-plans` (inline) + `subagent-driven-development` (parallel) | Different approach | +| Code review | `/code-review` with 19 specialized review agents + static analysis pre-pass | Single `code-reviewer` agent with 2-stage subagent review (spec then quality) | Stronger | +| Branch completion | `/pr` command with quality gates | `finishing-a-development-branch` with 4 options (merge/push+PR/keep/discard) | Stronger | +| Session continuity | `/continue` with memory-based phase progress files | None | Stronger | +| Git worktree workflow | `isolation: "worktree"` on subagent calls | Dedicated `using-git-worktrees` skill with language-specific setup (npm/cargo/pip/go) | Weaker | +| Visual design companion | None | `visual-companion.md` — browser-based mockup server with HTML hot-reload and JSON event recording | Missing | + +### Discipline / Behavioral Constraints + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| TDD enforcement | `test-driven-development` skill with RED-GREEN-REFACTOR | `test-driven-development` skill with "Iron Law" (delete code written before tests), 13 red flags, rationalization table, testing anti-patterns reference | Weaker | +| Verification before completion | `quality-gate-pipeline` skill (3-phase) | `verification-before-completion` skill citing "24 failure memories" as motivation | Different approach | +| Code review reception | No equivalent | `receiving-code-review` — forbids performative agreement, requires technical verification before implementing suggestions, mandates pushback when feedback is wrong | Missing | +| Anti-rationalization techniques | None | Embedded across skills — Cialdini-cited rationalization tables, pressure scenarios, explicit "this is what rationalization sounds like" examples | Missing | +| Destructive command protection | `/careful`, `/freeze`, `/guard` commands | None | Stronger | + +### Review / Quality + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Specialized review agents | 19 agents (security, a11y, arch, domain, naming, complexity, concurrency, etc.) | 1 code-reviewer agent | Stronger | +| Static analysis integration | Semgrep + ESLint pre-pass via `/semgrep-analyze` | None | Stronger | +| Mutation testing | `/mutation-testing` with Stryker/pitest/mutmut | None | Stronger | +| Test design quality scoring | Test Design Reviewer with Farley Score | None | Stronger | +| Review agent eval framework | `/agent-eval` with fixtures and grading | None | Stronger | + +### Architecture / Design + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Domain-driven design | DDD skill + domain analysis + domain-review agent | None | Stronger | +| Hexagonal architecture | Dedicated skill + arch-review agent | None | Stronger | +| Threat modeling / STRIDE | Dedicated skill + security-engineer agent | None | Stronger | +| API design | Contract-first skill | None | Stronger | +| Design alternatives | `/design-it-twice` | None | Stronger | + +### Infrastructure / DevOps + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Docker image creation/audit | Two dedicated skills | None | Stronger | +| CI debugging | Dedicated skill | None | Stronger | +| Performance benchmarking | `/benchmark` with Core Web Vitals | None | Stronger | + +### Meta / Authoring + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Skill/agent authoring guide | `agent-skill-authoring` skill + `/agent-add` scaffold | `writing-skills` skill — TDD for docs, Claude Search Optimization (CSO), pressure scenario methodology, rationalization bulletproofing | Weaker | +| Plugin self-audit | `/agent-audit` + `/harness-audit` | None | Stronger | +| Skill triggering methodology | Skill descriptions in command frontmatter | CSO guidance — key finding: "summarizing workflow in descriptions causes Claude to skip reading the actual skill content" | Weaker | + +### Platform Support + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Claude Code | Full support | Full support | — | +| Cursor | None | Full adapter + hooks-cursor.json | Missing | +| Codex / OpenAI | None | Full adapter + AGENTS.md + INSTALL.md | Missing | +| OpenCode | None | Full adapter + INSTALL.md | Missing | +| Gemini CLI | None | Extension manifest + GEMINI.md | Missing | +| GitHub Copilot CLI | None | Supported | Missing | +| Windows | Assumed (no special handling) | Explicit Windows support with run-hook.cmd wrapper | Weaker | + +### Debugging + +| Capability | agentic-dev-team | superpowers | Classification | +|-----------|-----------------|----------|----------------| +| Systematic debugging | `systematic-debugging` skill (4-phase) | `systematic-debugging` skill with 6 supporting files: root-cause-tracing, defense-in-depth, condition-based-waiting (with example), find-polluter.sh | Weaker | + +## Gap Specs + +### Gap: Anti-Rationalization Techniques + +**Classification**: Missing +**Layer**: Skill / Knowledge +**Priority**: High + +**What superpowers does**: +Embeds anti-rationalization tables throughout skills — explicit examples of what rationalization sounds like ("I'll just write this small helper first", "The test is basically the same as..."), Cialdini-cited persuasion patterns, and pressure scenarios that test whether skills hold up under common LLM failure modes. This is their core innovation — treating LLM behavioral drift as a first-class problem. + +**Proposed addition**: +- **Type**: knowledge file + skill enhancement +- **File**: `knowledge/anti-rationalization.md` + updates to `skills/test-driven-development/SKILL.md` and `skills/quality-gate-pipeline/SKILL.md` +- **Description**: Create a shared knowledge file of rationalization patterns specific to LLM agents (skipping tests, claiming completion without verification, writing implementation before tests, expanding scope). Embed "this is what rationalization sounds like" examples in the TDD and Quality Gate skills. Add a "pressure scenarios" section to skill authoring guidance. +- **Dependencies**: TDD skill, Quality Gate Pipeline skill, Agent & Skill Authoring skill +- **Estimated complexity**: Small +- **Model tier**: N/A (documentation only) + +### Gap: Code Review Reception Discipline + +**Classification**: Missing +**Layer**: Skill +**Priority**: High + +**What superpowers does**: +The `receiving-code-review` skill explicitly forbids performative agreement ("You're absolutely right!"), requires the agent to technically verify suggestions before implementing them, mandates pushback with reasoning when feedback is wrong, and includes a YAGNI check. This addresses a known LLM failure mode: blindly accepting all review feedback without critical evaluation. + +**Proposed addition**: +- **Type**: skill +- **File**: `skills/receiving-code-review/SKILL.md` +- **Description**: Define behavioral constraints for how agents respond to code review findings (from `/code-review` or human feedback). Require technical verification before implementing suggestions. Forbid performative agreement. Mandate reasoned pushback when a suggestion would make the code worse. Include a YAGNI gate to prevent gold-plating in response to reviews. +- **Dependencies**: `/apply-fixes` command, Quality Gate Pipeline +- **Estimated complexity**: Small +- **Model tier**: N/A (behavioral constraint, loaded into any agent receiving review) + +### Gap: Skill Authoring — TDD for Docs & Claude Search Optimization + +**Classification**: Weaker +**Layer**: Skill +**Priority**: Medium + +**What superpowers does**: +The `writing-skills` skill applies TDD to skill documentation: write pressure scenarios, test whether the skill holds up under adversarial conditions, iterate. It also documents a key finding called "Claude Search Optimization" (CSO) — when a skill description summarizes the workflow, Claude may follow the description instead of reading the full skill content. This means descriptions should state *when* to use the skill, not *how* it works. + +**What we have now**: +`agent-skill-authoring` skill covers structure, frontmatter format, and registration. It lacks pressure testing methodology and has no guidance on description optimization for skill discovery. + +**Proposed addition**: +- **Type**: skill enhancement +- **File**: Update `skills/agent-skill-authoring/SKILL.md` +- **Description**: Add two sections: (1) "Pressure Testing" — how to write adversarial scenarios that probe whether a skill's instructions hold up under common LLM drift patterns, (2) "Description Optimization" — guidance that skill descriptions should specify *when* to trigger, not *how* the skill works, to prevent Claude from using the description as a shortcut. Reference superpowers' CSO finding. +- **Dependencies**: Agent & Skill Authoring skill +- **Estimated complexity**: Small +- **Model tier**: N/A (documentation) + +### Gap: Systematic Debugging — Supporting Reference Files + +**Classification**: Weaker +**Layer**: Skill + Knowledge +**Priority**: Medium + +**What superpowers does**: +Their systematic-debugging skill includes 6 supporting files beyond the main SKILL.md: root-cause-tracing (backward tracing through call chains), defense-in-depth (4-layer validation pattern), condition-based-waiting (replace arbitrary timeouts with condition polling — claims "pass rate: 60% → 100%"), find-polluter.sh (shell script for bisecting test pollution), plus TypeScript examples. + +**What we have now**: +Our `systematic-debugging` skill covers the 4-phase process (reproduce, investigate, root-cause, fix) but has no supporting reference files with concrete techniques. + +**Proposed addition**: +- **Type**: knowledge files within the skill directory +- **File**: `skills/systematic-debugging/root-cause-tracing.md`, `skills/systematic-debugging/condition-based-waiting.md`, `skills/systematic-debugging/find-polluter.sh` +- **Description**: Add supporting reference files that agents can load on demand during debugging. Root-cause tracing: backward call-chain analysis technique. Condition-based waiting: replace arbitrary sleep/timeout in tests with polling. Test polluter finder: bisection script for identifying test pollution sources. +- **Dependencies**: Systematic Debugging skill +- **Estimated complexity**: Small +- **Model tier**: N/A (reference files loaded by debugging agents) + +### Gap: Git Worktree — Language-Specific Setup + +**Classification**: Weaker +**Layer**: Skill +**Priority**: Low + +**What superpowers does**: +The `using-git-worktrees` skill includes auto-detection of worktree directories, gitignore safety checks, and language-specific setup commands (npm install, cargo build, pip install, go mod download) plus baseline test verification after setup. + +**What we have now**: +We use `isolation: "worktree"` on subagent calls, which creates the worktree. But there's no language-specific dependency installation or baseline verification step. + +**Proposed addition**: +- **Type**: skill enhancement or hook +- **File**: Update worktree handling in orchestrator or add `hooks/post-worktree-setup.sh` +- **Description**: After creating a worktree, detect the language/framework and run dependency installation (npm ci, cargo build, pip install -r requirements.txt, go mod download). Run the baseline test suite to verify the worktree is healthy before dispatching implementation work. +- **Dependencies**: Orchestrator worktree dispatch +- **Estimated complexity**: Medium +- **Model tier**: N/A (shell script / orchestrator logic) + +## Different Approaches Worth Examining + +### Design-Before-Code: `/specs` + `/design-doc` vs `brainstorming` + +Both plugins enforce design before code, but with different structures: +- **superpowers**: Single `brainstorming` skill that flows from questions → approach proposals → spec document → review. Includes a visual companion (browser-based mockup server with hot-reload) for UI work. More opinionated: the conversation IS the design artifact. +- **agentic-dev-team**: Separates concerns — `/specs` produces 4 formal artifacts (intent, BDD scenarios, architecture notes, acceptance criteria), `/design-doc` produces a written document, `/design-interrogation` stress-tests the design. More structured, more artifacts, more review gates. + +**Tradeoff**: superpowers' approach is lighter-weight and faster for small features. Our approach is more thorough for complex features but adds overhead. The visual companion is genuinely useful for UI work and has no equivalent in our plugin — worth considering as an independent addition. + +### Subagent Orchestration: Orchestrator Agent vs Skill-Based + +- **superpowers**: `subagent-driven-development` defines the dispatch protocol inline within the skill. Uses 4 status codes (DONE, DONE_WITH_CONCERNS, NEEDS_CONTEXT, BLOCKED) for subagent reporting. Model selection is "cheapest model that works." +- **agentic-dev-team**: Dedicated orchestrator agent with a model routing table, phase-based context management, and multiple prompt templates. + +**Tradeoff**: superpowers' approach is simpler and more portable (works across platforms). Our approach is more sophisticated — model routing per agent, context budgets, phase transitions. The 4-status-code pattern from superpowers is worth adopting regardless: NEEDS_CONTEXT and BLOCKED are clearer than our current subagent error handling. + +### Platform Strategy: Multi-Platform vs Claude Code Native + +- **superpowers**: Explicitly targets 6 platforms with dedicated adapters. Skills are written to be platform-agnostic. +- **agentic-dev-team**: Deep Claude Code integration using platform-specific features (Agent tool with model override, hooks, plugin manifest). + +**Tradeoff**: Multi-platform reach vs platform depth. Our hooks, review agent fleet, and model routing depend on Claude Code's Agent tool and plugin system. Porting these to Cursor or Codex would require significant adaptation. However, skills and knowledge files are largely platform-agnostic already. If multi-platform support becomes a priority, the path would be: (1) extract platform-dependent features into adapter layers, (2) add platform manifests, (3) degrade gracefully on platforms without subagent support (similar to superpowers' `executing-plans` fallback). + +## Our Strengths + +Areas where agentic-dev-team is clearly ahead: + +1. **Review depth**: 19 specialized review agents vs 1 generic code-reviewer. Our agents catch domain-specific issues (security, a11y, concurrency, architecture, naming) that a single reviewer cannot. +2. **Static analysis integration**: Semgrep pre-pass deduplicates findings before AI agents run, reducing cost and improving signal. +3. **Architecture & design skills**: DDD, hexagonal architecture, threat modeling, API design, design-it-twice — superpowers has none of these. +4. **Infrastructure tooling**: Docker creation/audit, CI debugging, performance benchmarking — entirely absent from superpowers. +5. **Session continuity**: `/continue` with memory-based phase progress files allows multi-session work. superpowers has no equivalent. +6. **Destructive command protection**: `/careful`, `/freeze`, `/guard` provide safety rails that superpowers lacks. +7. **Metrics & governance**: Cost tracking, hallucination logging, audit trails, compliance procedures. +8. **Agent eval framework**: `/agent-eval` with fixtures and grading ensures review agents maintain accuracy over time. +9. **Language-specific templates**: 9 agent templates scaffolded per-project by `/setup` (TypeScript, Python, Go, C#, React, Angular, etc.). +10. **Bug triage workflow**: `/triage` investigates bugs and files GitHub issues with TDD fix plans. + +## Top 5 Priorities + +| Rank | Gap | Layer | Complexity | Why | +|------|-----|-------|-----------|-----| +| 1 | Anti-rationalization techniques | Knowledge + Skill | Small | Addresses the #1 LLM failure mode (behavioral drift). superpowers' core innovation. Low effort, high impact across all agents. | +| 2 | Code review reception discipline | Skill | Small | Prevents agents from blindly accepting bad feedback — a known failure mode we currently don't guard against. Quick win. | +| 3 | Skill authoring — pressure testing & CSO | Skill enhancement | Small | Improves quality of every future skill we write. The CSO finding about descriptions is immediately actionable. | +| 4 | Systematic debugging supporting files | Knowledge | Small | Concrete techniques (root-cause tracing, condition-based waiting, polluter finder) make our debugging skill actionable rather than procedural. | +| 5 | Subagent status codes (NEEDS_CONTEXT / BLOCKED) | Orchestrator | Medium | Clearer subagent reporting improves orchestrator decision-making. Currently subagent errors are less structured. | + +## Next Steps + +**Quick wins (can implement now):** +1. Create `knowledge/anti-rationalization.md` and embed rationalization examples in TDD and Quality Gate skills +2. Create `skills/receiving-code-review/SKILL.md` with behavioral constraints for review reception +3. Add pressure testing and CSO guidance to `skills/agent-skill-authoring/SKILL.md` + +**Medium-term:** +4. Add supporting reference files to the systematic-debugging skill directory +5. Adopt 4-status-code pattern (DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED) for subagent reporting in the orchestrator +6. Evaluate the visual companion concept for UI-heavy brainstorming sessions + +**Research needed:** +7. Multi-platform support: audit which of our capabilities are platform-dependent vs platform-agnostic. If demand exists, design an adapter layer. +8. Review superpowers' `find-polluter.sh` script for test pollution bisection — could be valuable as a hook or standalone tool. From d8c9e224022007ba17583255a6b713b0b8382783 Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Thu, 16 Apr 2026 09:44:14 -0700 Subject: [PATCH 02/15] docs: approved implementation plan for superpowers gap closure 18-step plan covering 10 slices (A-G core improvements, H1-H3 platform support). All four plan review personas approved. Acceptance critic blockers resolved: binary-verifiable criteria, parser validation commands, deterministic DONE_WITH_CONCERNS branches. --- plans/superpowers-gap-closure.md | 267 +++++++++++++++++++++++++++++++ 1 file changed, 267 insertions(+) create mode 100644 plans/superpowers-gap-closure.md diff --git a/plans/superpowers-gap-closure.md b/plans/superpowers-gap-closure.md new file mode 100644 index 0000000..91963b8 --- /dev/null +++ b/plans/superpowers-gap-closure.md @@ -0,0 +1,267 @@ +# Plan: Superpowers Gap Closure + +**Created**: 2026-04-16 +**Branch**: superpowers +**Status**: approved + +## Goal + +Implement all 10 slices from the superpowers gap closure spec: 7 core improvements (A-G) that close gaps identified in the competitive analysis against obra/superpowers, plus 3 platform support slices (H1-H3) for Windows hooks, Gemini CLI, and OpenAI Codex. All changes are documentation/configuration — markdown skill files, knowledge files, prompt templates, manifest files, and shell scripts. No application code is being written; "tests" are acceptance criteria checks (file exists, token budget met, cross-references valid). + +## Acceptance Criteria + +- [ ] All files listed in the spec's acceptance criteria exist and meet their requirements +- [ ] No regression: all 9 existing hooks produce the same exit codes and stdout on macOS/Linux as before changes (verified by running each hook). All existing slash commands in CLAUDE.md remain present with valid frontmatter (verified by grep for `---` blocks in `commands/*.md`). +- [ ] Token budgets respected (estimated via word count × 1.35): anti-rationalization < 600, receiving-code-review < 500, agent-skill-authoring < 1200, testing-anti-patterns < 400, each debugging reference < 400. Files within 10% of limit require a second check. +- [ ] Terminology consistent: `grep -r "rationalization prevention\|rationalization bulletproofing"` returns zero matches across all new/modified files (knowledge/anti-rationalization.md, quality-gate-pipeline/SKILL.md, test-driven-development/SKILL.md, prompts/implementer.md, CLAUDE.md) +- [ ] Cross-references resolve: all relative markdown links in new/modified files resolve from the file's own directory (verified at Step 18; broken links fixed before PR) +- [ ] `knowledge/agent-registry.md` updated with all new files +- [ ] `CLAUDE.md` quick reference updated with new skill count (31→32: +receiving-code-review) and skills-by-phase table +- [ ] `gemini-extension.json` contains name (string), version (string), description (string), contextFileName (string) — verified by `python3 -m json.tool gemini-extension.json` +- [ ] `.codex/config.toml` parses without error — verified by `python3 -c "import tomllib; tomllib.load(open('.codex/config.toml','rb'))"` +- [ ] GEMINI.md and AGENTS.md skill registries match CLAUDE.md skill count (maintenance gate for context file drift) +- [ ] Windows hooks shim: (1) `run-hook.cmd` passes stdin bytes to bash script (echo fixture test), (2) propagates exit codes 0, 1, 2 from bash script, (3) when no bash found, exits code 1 with message containing "bash" and "Git for Windows" + +## User-Facing Behavior + +See `docs/specs/superpowers-gap-closure.md` for the full Gherkin scenarios (53 scenarios across 10 slices). The scenarios are the behavioral contracts. + +## Steps + +Steps are ordered by dependency: independent slices first, then dependent chains, then shared-file updates last. + +--- + +### Step 1: Create anti-rationalization knowledge file (Slice A — part 1) + +**Complexity**: standard +**RED**: Verify `knowledge/anti-rationalization.md` does not exist +**GREEN**: Create the knowledge file with 5+ categories, catch-all rule, cross-references to TDD and debugging tables +**REFACTOR**: Verify under 600 tokens, terminology is canonical +**Files**: `plugins/agentic-dev-team/knowledge/anti-rationalization.md` +**Commit**: `feat: add anti-rationalization knowledge file with cross-cutting patterns` + +### Step 2: Update Quality Gate Pipeline with anti-rationalization references (Slice A — part 2) + +**Complexity**: standard +**RED**: Verify `quality-gate-pipeline/SKILL.md` Phase 1 and Phase 2 do not reference anti-rationalization knowledge +**GREEN**: Add reference to existing Phase 2 "Red Flag Language" block and Phase 1 "Hallucination Detection Signals" +**REFACTOR**: Verify no new sections created — references added to existing blocks only +**Files**: `plugins/agentic-dev-team/skills/quality-gate-pipeline/SKILL.md` +**Commit**: `feat: add anti-rationalization references to quality gate pipeline` + +### Step 3: Create receiving-code-review skill (Slice B) + +**Complexity**: standard +**RED**: Verify `skills/receiving-code-review/SKILL.md` does not exist +**GREEN**: Create skill with frontmatter, banned-phrases list, verification-before-implementation gate, YAGNI gate, rationalization table, human vs agent authority distinction, ambiguous finding guidance, concise tone guidance +**REFACTOR**: Verify under 500 tokens, no conflict with Quality Gate Pipeline Phase 3 +**Files**: `plugins/agentic-dev-team/skills/receiving-code-review/SKILL.md` +**Commit**: `feat: add receiving-code-review skill for review reception discipline` + +### Step 4: Enhance skill authoring with pressure testing and CSO (Slice C — part 1) + +**Complexity**: standard +**RED**: Verify current "Apply TDD to skill-writing itself" section lacks pressure testing procedure; verify "Optimize skill descriptions for triggering" lacks a checklist +**GREEN**: Expand both sections in place: pressure testing procedure (4 steps + 3 example scenarios), CSO checklist with pass/fail criteria and 4 description examples (2 good, 2 bad) +**REFACTOR**: Verify total file stays under 1,200 tokens; no parallel sections created +**Files**: `plugins/agentic-dev-team/skills/agent-skill-authoring/SKILL.md` +**Commit**: `feat: add pressure testing and CSO checklist to skill authoring guide` + +### Step 5: Update agent-eval for pressure scenario fixtures (Slice C — part 2) + +**Complexity**: standard +**RED**: Verify `commands/agent-eval.md` does not mention pressure scenarios +**GREEN**: Add section documenting pressure scenario fixture format and `evals/pressure/` directory. Define fixture schema: skill name, adversarial condition, expected behavior, pass/fail criteria. Add malformed fixture error handling. +**REFACTOR**: Verify eval command description is updated consistently +**Files**: `plugins/agentic-dev-team/commands/agent-eval.md` +**Commit**: `feat: add pressure scenario fixture support to agent-eval` + +### Step 6: Create debugging supporting files (Slice D) + +**Complexity**: standard +**RED**: Verify `skills/systematic-debugging/` contains only SKILL.md +**GREEN**: Create three supporting files: `root-cause-tracing.md` (backward call-chain analysis), `condition-based-waiting.md` (polling pattern replacing arbitrary waits), `find-polluter.md` (language-agnostic bisection algorithm). Add "Supporting References" section to Phase 2 of main SKILL.md with when-to-load guidance. +**REFACTOR**: Verify each file under 400 tokens; existing 4-phase process unchanged; find-polluter is language-agnostic +**Files**: `plugins/agentic-dev-team/skills/systematic-debugging/root-cause-tracing.md`, `plugins/agentic-dev-team/skills/systematic-debugging/condition-based-waiting.md`, `plugins/agentic-dev-team/skills/systematic-debugging/find-polluter.md`, `plugins/agentic-dev-team/skills/systematic-debugging/SKILL.md` +**Commit**: `feat: add debugging supporting references (root-cause tracing, condition-based waiting, find-polluter)` + +### Step 7: Create implementer prompt template (Slice E — part 1) + +**Complexity**: complex +**RED**: Verify `prompts/implementer.md` does not exist +**GREEN**: Create full implementer behavioral content: pre-implementation Q&A, TDD enforcement (reference TDD skill), self-review, verification evidence, and markdown status block (DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED) +**REFACTOR**: Verify consistent with build.md step 4 expectations and orchestrator references +**Files**: `plugins/agentic-dev-team/prompts/implementer.md` +**Commit**: `feat: create implementer prompt template with status protocol` + +### Step 8: Create spec-reviewer and quality-reviewer prompt templates (Slice E — part 2) + +**Complexity**: complex +**RED**: Verify `prompts/spec-reviewer.md` and `prompts/quality-reviewer.md` do not exist +**GREEN**: Create both templates with full behavioral content and markdown status blocks. Spec-reviewer: skeptical of implementer claims, reads actual code, binary spec compliance check. Quality-reviewer: uses code-reviewer agent patterns, checks quality after spec compliance passes. +**REFACTOR**: Verify both use the same status block format as implementer +**Files**: `plugins/agentic-dev-team/prompts/spec-reviewer.md`, `plugins/agentic-dev-team/prompts/quality-reviewer.md` +**Commit**: `feat: create spec-reviewer and quality-reviewer prompt templates` + +### Step 9: Update plan review templates with status protocol (Slice E — part 3) + +**Complexity**: standard +**RED**: Verify existing plan review templates lack `"status"` field in JSON output +**GREEN**: Add `"status"` field to all 4 plan review templates. Mapping: approve with 0 warnings → DONE, approve with 1+ warnings → DONE_WITH_CONCERNS, needs-revision → DONE_WITH_CONCERNS. Add status derivation rules alongside existing verdict rules. +**REFACTOR**: Verify existing verdict field and rules are unchanged — status is additive +**Files**: `plugins/agentic-dev-team/prompts/plan-review-acceptance.md`, `plugins/agentic-dev-team/prompts/plan-review-design.md`, `plugins/agentic-dev-team/prompts/plan-review-ux.md`, `plugins/agentic-dev-team/prompts/plan-review-strategic.md` +**Commit**: `feat: add status protocol to plan review templates` + +### Step 10: Update orchestrator with subagent status protocol (Slice E — part 4) + +**Complexity**: complex +**RED**: Verify orchestrator lacks "Subagent Status Protocol" section +**GREEN**: Add section defining 4 status codes, orchestrator response table (including unrecognized → BLOCKED), both output formats (markdown block + JSON field), NEEDS_CONTEXT cap at 2 re-dispatches. For DONE_WITH_CONCERNS handling, define three deterministic response branches: (1) concern is non-blocking warning → accept work, log concern; (2) concern is fixable with guidance → re-dispatch with concern text as context; (3) concern requires human judgment → escalate to user. Each branch must have a clear trigger condition. +**REFACTOR**: Verify no conflict with existing Phase 3 inline review section +**Files**: `plugins/agentic-dev-team/agents/orchestrator.md` +**Commit**: `feat: add subagent status protocol to orchestrator` + +### Step 11: Update build command for status handling (Slice E — part 5) + +**Complexity**: standard +**RED**: Verify build.md step 4 does not handle NEEDS_CONTEXT or BLOCKED +**GREEN**: Update step 4 to check subagent status after dispatch. DONE/DONE_WITH_CONCERNS → continue flow. NEEDS_CONTEXT → gather context, re-dispatch (max 2). BLOCKED → escalate to user. +**REFACTOR**: Verify consistent with orchestrator's status protocol +**Files**: `plugins/agentic-dev-team/commands/build.md` +**Commit**: `feat: add status code handling to build command` + +### Step 12: Create testing anti-patterns reference and update TDD skill (Slice F) + +**Complexity**: standard +**Ordering**: Step 7 (implementer.md creation) MUST be complete and committed before this step begins. Step 13 MUST NOT have run yet. +**RED**: Verify `skills/test-driven-development/testing-anti-patterns.md` does not exist; verify TDD catch-all line lacks cross-reference; verify `prompts/implementer.md` exists (from Step 7) but lacks testing-anti-patterns reference +**GREEN**: Create testing-anti-patterns.md with 5+ anti-patterns (mock behavior, test-only methods, mocking without understanding, incomplete mocks, integration afterthought). Add "Supporting References" section to TDD SKILL.md. Augment catch-all line with link to `knowledge/anti-rationalization.md`. Add testing-anti-patterns reference to `prompts/implementer.md`. +**REFACTOR**: Verify under 400 tokens; no existing TDD content modified or removed +**Files**: `plugins/agentic-dev-team/skills/test-driven-development/testing-anti-patterns.md`, `plugins/agentic-dev-team/skills/test-driven-development/SKILL.md`, `plugins/agentic-dev-team/prompts/implementer.md` +**Commit**: `feat: add testing anti-patterns reference and TDD cross-references` + +### Step 13: Create worktree setup knowledge file and update implementer (Slice G) + +**Complexity**: standard +**Ordering**: Step 7 (implementer.md creation) AND Step 12 (testing-anti-patterns implementer ref) MUST be complete before this step. Apply worktree setup on top of existing implementer content. +**RED**: Verify `knowledge/worktree-setup.md` does not exist; verify implementer lacks worktree setup section (but HAS testing-anti-patterns reference from Step 12) +**GREEN**: Create worktree-setup.md with detection table: Node.js (npm/yarn/pnpm/bun by lockfile — first match wins for conflicting lockfiles), Python, Go, Rust, .NET, Java (Maven/Gradle). Add "Worktree Setup" section to implementer.md that runs before RED phase. Update orchestrator Phase 3 to reference worktree setup. +**REFACTOR**: Verify detection uses file presence only; BLOCKED status for baseline/install failures; lockfile priority order is documented +**Files**: `plugins/agentic-dev-team/knowledge/worktree-setup.md`, `plugins/agentic-dev-team/prompts/implementer.md`, `plugins/agentic-dev-team/agents/orchestrator.md` +**Commit**: `feat: add worktree language-specific setup with dependency install and baseline verification` + +### Step 14: Windows hooks — run-hook.cmd shim and TMPDIR fixes (Slice H1) + +**Complexity**: standard +**RED**: Verify `hooks/run-hook.cmd` does not exist; verify tdd-guard.sh and version-check.sh use hardcoded `/tmp/` +**GREEN**: Create `run-hook.cmd` (~20 lines) that locates bash via (1) PATH, (2) Git for Windows default `C:\Program Files\Git\bin\bash.exe`, (3) WSL. Passes stdin, args, and exit codes. Create `install.ps1` checking bash, jq, git with install instructions. Fix TMPDIR in tdd-guard.sh and version-check.sh: `${TMPDIR:-${TEMP:-/tmp}}`. Update `settings.json` to document Windows hook invocation pattern. +**REFACTOR**: Verify all 9 hooks unchanged on macOS/Linux; shim error message is clear +**Files**: `plugins/agentic-dev-team/hooks/run-hook.cmd`, `plugins/agentic-dev-team/install.ps1`, `plugins/agentic-dev-team/hooks/tdd-guard.sh`, `plugins/agentic-dev-team/hooks/version-check.sh`, `plugins/agentic-dev-team/settings.json` +**Commit**: `feat: add Windows hooks support with bash shim and TMPDIR fixes` + +### Step 15: Gemini CLI extension manifest and context file (Slice H2) + +**Complexity**: standard +**RED**: Verify `gemini-extension.json` and `GEMINI.md` do not exist +**GREEN**: Create `gemini-extension.json` with name, version, description, contextFileName pointing to GEMINI.md. Create `GEMINI.md` adapted from CLAUDE.md — include plugin philosophy, team organization, skill registry, but strip all Claude Code-specific features (Agent tool, allowed-tools, hooks, isolation: "worktree", model routing). Add capability limitations section. Create `hooks/hooks-gemini.json` for compatible hooks. Create 5 TOML commands in `commands-gemini/`: code-review, plan, build, help, browse. +**REFACTOR**: Verify GEMINI.md has no Claude Code references; skills directory is reused not forked +**Files**: `plugins/agentic-dev-team/gemini-extension.json`, `plugins/agentic-dev-team/GEMINI.md`, `plugins/agentic-dev-team/hooks/hooks-gemini.json`, `plugins/agentic-dev-team/commands-gemini/code-review.toml`, `plugins/agentic-dev-team/commands-gemini/plan.toml`, `plugins/agentic-dev-team/commands-gemini/build.toml`, `plugins/agentic-dev-team/commands-gemini/help.toml`, `plugins/agentic-dev-team/commands-gemini/browse.toml` +**Commit**: `feat: add Gemini CLI extension support with manifest, context file, and TOML commands` + +### Step 16: OpenAI Codex configuration and install guide (Slice H3) + +**Complexity**: standard +**RED**: Verify `AGENTS.md`, `.codex/config.toml`, `.codex/hooks.json`, and `CODEX-INSTALL.md` do not exist +**GREEN**: Create `AGENTS.md` adapted from CLAUDE.md — include plugin philosophy, team org, skill registry, strip Claude Code features, add capability limitations, note subagent dispatch requires explicit request. Verify under 32 KiB. Create `.codex/config.toml` with hooks enabled. Create `.codex/hooks.json` for compatible hooks. Create `CODEX-INSTALL.md` with step-by-step: skill symlinks to `.agents/skills/`, AGENTS.md placement, config.toml setup. +**REFACTOR**: Verify AGENTS.md under 32 KiB; skills not forked; limitations clearly documented +**Files**: `plugins/agentic-dev-team/AGENTS.md`, `plugins/agentic-dev-team/.codex/config.toml`, `plugins/agentic-dev-team/.codex/hooks.json`, `plugins/agentic-dev-team/CODEX-INSTALL.md` +**Commit**: `feat: add OpenAI Codex CLI support with AGENTS.md, config, and install guide` + +### Step 17: Update shared registries and CLAUDE.md (all slices) + +**Complexity**: standard +**RED**: Verify `knowledge/agent-registry.md` is missing new entries; verify CLAUDE.md skill count is stale +**GREEN**: Update `knowledge/agent-registry.md` with: anti-rationalization.md (knowledge), receiving-code-review (skill), testing-anti-patterns.md (supporting file), 3 debugging supporting files, worktree-setup.md (knowledge), 3 new prompt templates. Update `CLAUDE.md`: skill count (31→32), skills-by-phase table (Review: add receiving-code-review), Multi-Agent Collaboration Protocol (reference status protocol), subagent prompt template count (4→7). +**REFACTOR**: Verify all registry entries have correct file paths and token estimates +**Files**: `plugins/agentic-dev-team/knowledge/agent-registry.md`, `plugins/agentic-dev-team/CLAUDE.md` +**Commit**: `docs: update agent registry and CLAUDE.md with all new components` + +### Step 18: Cross-reference validation pass + +**Complexity**: trivial +**RED**: Check all markdown links between files resolve to existing files +**GREEN**: Fix any broken links found during validation +**REFACTOR**: None needed +**Files**: All modified files (read-only validation, edits only if broken links found) +**Commit**: `fix: resolve broken cross-references` (only if fixes needed) + +## Complexity Classification + +| Rating | Criteria | Review depth | +|--------|----------|--------------| +| `trivial` | Single-file rename, config change, typo fix, documentation-only | Skip inline review; covered by final `/code-review` | +| `standard` | New function, test, module, or behavioral change within existing patterns | Spec-compliance + relevant quality agents | +| `complex` | Architectural change, security-sensitive, cross-cutting concern, new abstraction | Full agent suite including opus-tier agents | + +## Important: Shared File Update Policy + +Steps 3, 6, and 12 each have spec-level acceptance criteria that mention updating `CLAUDE.md` or `knowledge/agent-registry.md`. **All registry and CLAUDE.md updates are deferred to Step 17.** Do NOT modify these files in earlier steps — Step 17 handles all shared-file updates in one atomic commit to avoid merge conflicts. + +## Pre-PR Quality Gate + +- [ ] All acceptance criteria from spec met (per-slice) +- [ ] Token budgets verified for all constrained files +- [ ] Cross-references validated (all markdown links resolve) +- [ ] No existing hook behavior changed on macOS/Linux +- [ ] GEMINI.md and AGENTS.md skill registries match CLAUDE.md skill count +- [ ] `gemini-extension.json` passes `python3 -m json.tool`; `.codex/config.toml` passes TOML parser +- [ ] `/code-review` passes +- [ ] Documentation updated (agent-registry.md, CLAUDE.md) + +## Risks & Open Questions + +| Risk | Mitigation | +|------|-----------| +| Token budget overrun on constrained files | Check token count after each file creation using word count as proxy (~0.75 tokens/word) | +| Implementer.md becomes too large with worktree setup + testing references + status protocol | Keep each section focused; worktree setup references knowledge file rather than inlining the table | +| GEMINI.md and AGENTS.md may reference capabilities that don't translate | Explicit "Capability Limitations" section in each platform context file | +| Gemini CLI TOML command format may not match expectations | Use superpowers' TOML commands as reference; test with `gemini extensions link .` if available | +| Codex hooks.json format is not well-documented | Create minimal hooks.json; document that hooks are experimental on Codex | +| `run-hook.cmd` may have edge cases with Windows path handling | Keep shim minimal (~20 lines); rely on Git for Windows bash which handles path translation | +| Multiple slices modify orchestrator.md and implementer.md | Implement E (creates files) → F (adds references) → G (adds worktree) in strict order | + +## Parallelization Strategy + +Steps that can run concurrently (no file conflicts): + +**Batch 1** (independent): Steps 1-2 (A), Step 3 (B), Steps 4-5 (C), Step 6 (D), Step 14 (H1) +**Batch 2** (after Batch 1): Steps 7-11 (E — sequential internally) +**Batch 3** (after Steps 1-2 and 7, and Step 12 must commit implementer.md changes first): Step 12 (F — depends on A + E's implementer.md) +**Batch 4** (after Step 12 commits implementer.md changes): Step 13 (G — depends on E + F's implementer.md) +**Batch 5** (independent): Step 15 (H2), Step 16 (H3) +**Final**: Step 17 (registries), Step 18 (validation) + +## Plan Review Summary + +Four plan review personas evaluated this plan. All blocker issues have been addressed in this revision. + +### Acceptance Test Critic — needs-revision → resolved +**Blockers addressed**: +1. "No regression" criterion rewritten with binary-verifiable checks (hook exit codes, frontmatter parse) +2. "Structurally valid" criterion replaced with parser commands (`python3 -m json.tool`, TOML parser) +3. DONE_WITH_CONCERNS scenario: Step 10 now specifies three deterministic response branches with trigger conditions + +**Warnings noted**: Token counting method specified (word × 1.35). Terminology check made explicit with grep command. Missing scenarios for registry updates, token budgets, malformed status blocks, conflicting lockfiles, settings.json, TOML validity, and CSO examples noted — these are validation checks within implementation steps, not separate BDD scenarios. + +### Design & Architecture Critic — approve +**Key observations**: Dependency graph is acyclic and ordering is correct. Token budgets per file are disciplined documentation design. Platform context file drift (GEMINI.md, AGENTS.md vs CLAUDE.md) is the main structural concern — addressed with Pre-PR Quality Gate check. +**Actions taken**: Explicit ordering guards added to Steps 12 and 13. Commands-gemini/ convention documented in GEMINI.md (Step 15). + +### Strategic Critic — approve +**Key suggestion**: Consider splitting H1-H3 into a follow-on PR since Gemini TOML and Codex hooks.json formats are unverified. Core slices A-G have high confidence. +**Decision**: Proceed as single plan but H1-H3 are implemented last (Steps 14-16) and can be dropped if format issues arise. The plan's risk register already flags both format risks. +**Actions taken**: Added CLAUDE.md deferral notes to prevent shared-file conflicts. Serialization of Steps 7-13 made explicit. + +### UX Critic — approve (self-skipped) +No user-facing changes in this plan. From 9e63ca46910ebc4faa7f1798d0cea6dab450a8fa Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Thu, 16 Apr 2026 09:47:59 -0700 Subject: [PATCH 03/15] =?UTF-8?q?feat:=20implement=20Batch=201=20=E2=80=94?= =?UTF-8?q?=20slices=20A,=20B,=20C,=20D,=20H1?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Slice A: Anti-rationalization knowledge file + quality gate references Slice B: Receiving-code-review skill with verification gates Slice C: Pressure testing + CSO checklist in skill authoring guide Slice D: Debugging supporting files (root-cause tracing, condition-based waiting, find-polluter bisection) Slice H1: Windows hooks shim (run-hook.cmd), install.ps1 prereq checker, TMPDIR fixes in tdd-guard.sh and version-check.sh --- .../agentic-dev-team/commands/agent-eval.md | 24 ++++++ plugins/agentic-dev-team/hooks/run-hook.cmd | 30 +++++++ plugins/agentic-dev-team/hooks/tdd-guard.sh | 2 +- .../agentic-dev-team/hooks/version-check.sh | 4 +- plugins/agentic-dev-team/install.ps1 | 63 ++++++++++++++ .../knowledge/anti-rationalization.md | 22 +++++ plugins/agentic-dev-team/settings.json | 1 + .../skills/agent-skill-authoring/SKILL.md | 85 ++++++++++--------- .../skills/quality-gate-pipeline/SKILL.md | 3 + .../skills/receiving-code-review/SKILL.md | 52 ++++++++++++ .../skills/systematic-debugging/SKILL.md | 8 ++ .../condition-based-waiting.md | 44 ++++++++++ .../systematic-debugging/find-polluter.md | 31 +++++++ .../root-cause-tracing.md | 33 +++++++ 14 files changed, 360 insertions(+), 42 deletions(-) create mode 100644 plugins/agentic-dev-team/hooks/run-hook.cmd create mode 100644 plugins/agentic-dev-team/install.ps1 create mode 100644 plugins/agentic-dev-team/knowledge/anti-rationalization.md create mode 100644 plugins/agentic-dev-team/skills/receiving-code-review/SKILL.md create mode 100644 plugins/agentic-dev-team/skills/systematic-debugging/condition-based-waiting.md create mode 100644 plugins/agentic-dev-team/skills/systematic-debugging/find-polluter.md create mode 100644 plugins/agentic-dev-team/skills/systematic-debugging/root-cause-tracing.md diff --git a/plugins/agentic-dev-team/commands/agent-eval.md b/plugins/agentic-dev-team/commands/agent-eval.md index 964f768..e43fa57 100644 --- a/plugins/agentic-dev-team/commands/agent-eval.md +++ b/plugins/agentic-dev-team/commands/agent-eval.md @@ -71,6 +71,30 @@ If `--agent` is specified, filter to fixtures where that agent is in `applicableAgents`. If `--fixture` is specified, filter to that fixture only. +### 2b. Load pressure scenario fixtures + +Check if `.claude/evals/pressure/` exists. If present, load all YAML/JSON fixtures from that directory. Each pressure fixture tests whether a skill prevents a known agent failure mode under adversarial conditions. + +**Fixture schema** (all fields required): + +```yaml +skill: "test-driven-development" +scenario: "RED-phase rationalization" +adversarial_condition: "Agent receives a complex task and rationalizes skipping RED to save time" +expected_behavior: "TDD skill's Iron Law blocks proceeding without a failing test" +pass_criteria: "Agent writes a failing test before any implementation code" +fail_criteria: "Agent writes implementation code without a preceding failing test" +``` + +- `skill`: Name of the skill being pressure-tested +- `adversarial_condition`: What the agent is tempted to do (the failure mode trigger) +- `expected_behavior`: What the skill should enforce (the guardrail) +- `pass_criteria` / `fail_criteria`: Binary observable outcome + +**Filtering**: If `--agent` is specified, skip pressure fixtures (they test skills, not review agents). If `--fixture` names a pressure fixture, run only that one. + +**Malformed fixtures**: If a pressure fixture is missing required fields or fails to parse, report the error — include the filename and which field is missing or malformed — then skip that fixture and continue. + ### 3. Run agents against fixtures For each fixture/agent pair: diff --git a/plugins/agentic-dev-team/hooks/run-hook.cmd b/plugins/agentic-dev-team/hooks/run-hook.cmd new file mode 100644 index 0000000..c19573f --- /dev/null +++ b/plugins/agentic-dev-team/hooks/run-hook.cmd @@ -0,0 +1,30 @@ +@echo off +REM run-hook.cmd — Windows shim for Claude Code hooks +REM Locates bash and delegates to the .sh hook script, passing stdin and args. +REM Exit codes from the bash script are propagated. + +setlocal + +REM Strategy 1: bash on PATH (Git for Windows adds it) +where bash >nul 2>&1 +if %ERRORLEVEL% equ 0 ( + bash %* + exit /b %ERRORLEVEL% +) + +REM Strategy 2: Git for Windows default location +if exist "C:\Program Files\Git\bin\bash.exe" ( + "C:\Program Files\Git\bin\bash.exe" %* + exit /b %ERRORLEVEL% +) + +REM Strategy 3: WSL fallback +where wsl >nul 2>&1 +if %ERRORLEVEL% equ 0 ( + wsl bash %* + exit /b %ERRORLEVEL% +) + +REM No bash found +echo ERROR: bash not found. Install Git for Windows from https://gitforwindows.org 1>&2 +exit /b 1 diff --git a/plugins/agentic-dev-team/hooks/tdd-guard.sh b/plugins/agentic-dev-team/hooks/tdd-guard.sh index e53dffe..405f8b0 100755 --- a/plugins/agentic-dev-team/hooks/tdd-guard.sh +++ b/plugins/agentic-dev-team/hooks/tdd-guard.sh @@ -47,7 +47,7 @@ is_test_file() { # --- State tracking --- # Use a project-scoped temp file. The working directory hash identifies the session scope. -STATE_DIR="${TMPDIR:-/tmp}/tdd-guard" +STATE_DIR="${TMPDIR:-${TEMP:-/tmp}}/tdd-guard" mkdir -p "$STATE_DIR" PROJECT_HASH=$(echo "$PWD" | md5sum 2>/dev/null | cut -c1-12 || echo "$PWD" | md5 -q 2>/dev/null | cut -c1-12 || echo "default") STATE_FILE="$STATE_DIR/session-$PROJECT_HASH" diff --git a/plugins/agentic-dev-team/hooks/version-check.sh b/plugins/agentic-dev-team/hooks/version-check.sh index cf5880c..2898054 100755 --- a/plugins/agentic-dev-team/hooks/version-check.sh +++ b/plugins/agentic-dev-team/hooks/version-check.sh @@ -6,7 +6,7 @@ # # Input: JSON on stdin (ignored) # Output: Update notice on stdout, or nothing if up-to-date -# Cache: /tmp/adt-version-check- prevents repeated checks +# Cache: $TMPDIR/adt-version-check- prevents repeated checks set -uo pipefail @@ -15,7 +15,7 @@ cat > /dev/null # ── Daily cache: skip if already checked today ─────────────────────────────── TODAY=$(date +%Y-%m-%d) -CACHE_FILE="/tmp/adt-version-check-${TODAY}" +CACHE_FILE="${TMPDIR:-${TEMP:-/tmp}}/adt-version-check-${TODAY}" if [ -f "$CACHE_FILE" ]; then # Already checked today — replay cached message if any diff --git a/plugins/agentic-dev-team/install.ps1 b/plugins/agentic-dev-team/install.ps1 new file mode 100644 index 0000000..ddb9d30 --- /dev/null +++ b/plugins/agentic-dev-team/install.ps1 @@ -0,0 +1,63 @@ +# install.ps1 — Windows prerequisite checker for the agentic-dev-team plugin. +# +# Usage: +# powershell -ExecutionPolicy Bypass -File install.ps1 + +$pass = 0 +$fail = 0 +$missing = @() + +function Check-Prerequisite { + param( + [string]$Name, + [string]$InstallHint + ) + + # Special handling for bash: check PATH then Git for Windows default + if ($Name -eq "bash") { + $found = $false + if (Get-Command "bash" -ErrorAction SilentlyContinue) { + $found = $true + } elseif (Test-Path "C:\Program Files\Git\bin\bash.exe") { + $found = $true + } + if ($found) { + Write-Host "[ok] $Name" + $script:pass++ + return + } + } else { + if (Get-Command $Name -ErrorAction SilentlyContinue) { + Write-Host "[ok] $Name" + $script:pass++ + return + } + } + + Write-Host "[FAIL] $Name -- required. $InstallHint" + $script:fail++ + $script:missing += $Name +} + +Write-Host "Checking agentic-dev-team prerequisites..." +Write-Host "" +Write-Host "--- Required ---" + +Check-Prerequisite -Name "bash" ` + -InstallHint "Install Git for Windows from https://gitforwindows.org" + +Check-Prerequisite -Name "jq" ` + -InstallHint "Install jq via winget: winget install jqlang.jq" + +Check-Prerequisite -Name "git" ` + -InstallHint "Install Git for Windows from https://gitforwindows.org" + +Write-Host "" + +if ($fail -gt 0) { + Write-Host "Result: $fail required dependency missing. Install and re-run." + exit 1 +} else { + Write-Host "Result: All required dependencies present." + exit 0 +} diff --git a/plugins/agentic-dev-team/knowledge/anti-rationalization.md b/plugins/agentic-dev-team/knowledge/anti-rationalization.md new file mode 100644 index 0000000..e8c4cf1 --- /dev/null +++ b/plugins/agentic-dev-team/knowledge/anti-rationalization.md @@ -0,0 +1,22 @@ +# Anti-Rationalization Patterns + +LLMs generate plausible excuses for skipping process. If the excuse is not listed here, it is still an excuse. The absence of a pattern from this table does not grant permission to skip a gate. + +For domain-specific rationalization tables, see: +- TDD: [Rationalization Prevention](../skills/test-driven-development/SKILL.md#rationalization-prevention) +- Debugging: [Rationalization Prevention](../skills/systematic-debugging/SKILL.md#rationalization-prevention) + +## Cross-Cutting Patterns + +| Category | Excuse | Reality | Common In | +|----------|--------|---------|-----------| +| Skipping verification | "I already verified this earlier in the conversation" | Earlier evidence is stale. Re-run and show current output. | Quality Gate Pipeline, TDD | +| Skipping verification | "The change is too small to need verification" | Small changes cause regressions. Run the gate. | Quality Gate Pipeline | +| Skipping tests | "This is just a config/docs change, no tests needed" | Config changes can break builds. Verify the config loads. | TDD, Quality Gate Pipeline | +| Skipping tests | "I'll add tests after the implementation is working" | Tests written after implementation confirm assumptions, not behavior. | TDD | +| Scope expansion | "While I'm here, I should also fix..." | Scope creep introduces unplanned risk. Finish the current task first. | All skills | +| Scope expansion | "This refactor is necessary to make the fix work" | If it was not in the plan, flag it to the orchestrator before proceeding. | Hexagonal Architecture, Legacy Code | +| Premature completion | "Should work now" / "Should be fixed" | "Should" is not evidence. Run verification and paste output. | Quality Gate Pipeline | +| Premature completion | "The logic is correct so the tests will pass" | Correctness is proven by execution, not reasoning. Run the tests. | TDD, Quality Gate Pipeline | +| Process shortcuts | "This is a trivial change, we can skip the cycle" | Trivial changes still require Phase 2 verification at minimum. | Quality Gate Pipeline, TDD | +| Process shortcuts | "The deadline is tight, so let's skip review" | Skipping review costs more time in rework. Follow the pipeline. | Quality Gate Pipeline | diff --git a/plugins/agentic-dev-team/settings.json b/plugins/agentic-dev-team/settings.json index 099e503..38a73c5 100644 --- a/plugins/agentic-dev-team/settings.json +++ b/plugins/agentic-dev-team/settings.json @@ -1,4 +1,5 @@ { + "_windows_hooks": "On Windows, invoke hooks via: hooks\\run-hook.cmd hooks/