From 1a6128debcd65d2efcf7d3d42f6353919fcbde46 Mon Sep 17 00:00:00 2001 From: Marlin Forbes Date: Sat, 2 May 2026 08:06:28 +0200 Subject: [PATCH 1/2] feat(harness): name the PTC loop and add /critique command Closes #6 --- skills/harness/README.md | 6 +++--- skills/harness/SKILL.md | 10 +++++++--- skills/harness/assets/CLAUDE.md.tmpl | 7 +++++++ skills/harness/assets/commands/critique.md | 16 ++++++++++++++++ skills/harness/scripts/install.sh | 4 ++-- skills/harness/scripts/uninstall.sh | 4 ++-- 6 files changed, 37 insertions(+), 10 deletions(-) create mode 100644 skills/harness/assets/commands/critique.md diff --git a/skills/harness/README.md b/skills/harness/README.md index 8713c86..c571ada 100644 --- a/skills/harness/README.md +++ b/skills/harness/README.md @@ -59,7 +59,7 @@ If the skill is being invoked from a checkout (no `.claude/` ancestor), the scri | ------------------------ | --------------------------------------------------------------------------------------------------- | | `~/.claude/CLAUDE.md` | Operating-contract template (default stance, editing rules, expected tools) | | `~/.claude/hooks/` | `block-force-push.sh`, `format-on-edit.sh`, `post-compact-reinject.sh`, `verify-before-stop.sh` | -| `~/.claude/commands/` | `/verify` (run the project's pass/fail check), `/plan` (Goal/Constraints/Acceptance template) | +| `~/.claude/commands/` | `/verify` (project pass/fail), `/plan` (Goal/Constraints/Acceptance), `/critique` (mid-flow critique pass) | | `~/.claude/projects//memory/` | `MEMORY.md` index + `user_role`, `feedback_concise`, `feedback_plan_first`, `feedback_verification` | | `~/.claude/settings.json`| Adds `env.CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` + 4 hook entries (only if missing) | @@ -69,7 +69,7 @@ If the skill is being invoked from a checkout (no `.claude/` ancestor), the scri | --------------- | ------------------------------------------------------------------------------------------------ | | `/CLAUDE.md` | Operating contract — **skipped if a project CLAUDE.md already exists** (`--force` overrides) | | `/.claude/hooks/` | Same 4 hooks | -| `/.claude/commands/` | `/verify`, `/plan` | +| `/.claude/commands/` | `/verify`, `/plan`, `/critique` | | `/.claude/settings.json` | 4 hook entries with **project-relative** paths (`.claude/hooks/...`) | | Memory | NOT seeded at project scope — memory is per-user by design and lives under `$HOME` | | Env var | NOT set at project scope — `CLAUDE_CODE_AUTO_COMPACT_WINDOW` is session-wide | @@ -170,7 +170,7 @@ None. The skill detects intent from natural language. If unclear, it runs `statu | `README.md` | This — human-facing overview | | `assets/CLAUDE.md.tmpl` | Operating-contract template | | `assets/hooks/*.sh` | Four hook scripts | -| `assets/commands/*.md` | `/verify`, `/plan` | +| `assets/commands/*.md` | `/verify`, `/plan`, `/critique` | | `assets/memory/*.tmpl` | MEMORY.md index + 3 feedback memories + user_role template | | `scripts/install.sh` | Idempotent installer (`--dry-run` / `--force` / `--skip-*`) | | `scripts/uninstall.sh` | Symmetric uninstaller (content-match check; `--all` for full sweep) | diff --git a/skills/harness/SKILL.md b/skills/harness/SKILL.md index 5a8f337..540920c 100644 --- a/skills/harness/SKILL.md +++ b/skills/harness/SKILL.md @@ -65,7 +65,7 @@ What lands at user scope: | ------------------------ | --------------------------------------------------------------------------------------------------- | | Operating contract | `~/.claude/CLAUDE.md` | | Hooks | `~/.claude/hooks/{block-force-push,format-on-edit,post-compact-reinject,verify-before-stop}.sh` | -| Slash commands | `~/.claude/commands/{verify,plan}.md` | +| Slash commands | `~/.claude/commands/{verify,plan,critique}.md` | | Auto-memory | `~/.claude/projects//memory/{MEMORY.md, user_role, feedback_concise, feedback_plan_first, feedback_verification}` | | settings.json | Adds `env.CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` + 4 hook entries (uses `~/.claude/hooks/...` form) | @@ -75,7 +75,7 @@ What lands at project scope (`/.claude/`): | --------------- | ----------------------------------------------------------------------------------------------- | | Operating contract | `/CLAUDE.md` (skipped if it already exists — most projects have one. `--force` overrides) | | Hooks | `/.claude/hooks/*.sh` | -| Commands | `/.claude/commands/*.md` | +| Commands | `/.claude/commands/{verify,plan,critique}.md` | | settings.json | `/.claude/settings.json` — 4 hook entries with `.claude/hooks/...` (project-relative) form. **No env var, no memory** at project scope. | | Memory | (skipped — memory is per-user by design and lives under `$HOME` regardless of project scope) | @@ -278,6 +278,10 @@ Env knobs (mirror `snapshot.sh`): Scope: the snapshot repo doesn't mirror harness scripts or local search roots, so the remote agent does its own in-process structural pass (index sync, frontmatter) and adds the semantic checks. Stale-citation analysis stays local-only — the search roots aren't available remotely. The remote agent PRs `audits/memory/YYYY-MM-DD.md` with proposed edits and never modifies any memory entry. +## PostToolUse critique hook — recommendation + +We considered a hook that fires after N consecutive `Edit`/`Write` calls and runs an automatic critique pass (diff summary + `advisor()` check). **Recommendation: don't ship it.** The boundary critiques we already have (`verify-before-stop.sh` at Stop, `advisor()` calls bracketing non-trivial work, `/critique` available on demand mid-flow) cover the same gap with much better signal-to-noise. An always-on PostToolUse critique would (a) burn tokens and wall-clock on edits that don't need it, (b) train the agent to ignore the noise, and (c) duplicate what `/critique` already provides on user demand. Revisit if users report that mid-flow drift is escaping all three boundary surfaces — but the right next step there would be tuning when `/critique` gets *suggested*, not making it automatic. + ## Constraints - **Never auto-fill stack signals or user_role.** Templates have placeholders; ask the user to fill them. @@ -294,7 +298,7 @@ Scope: the snapshot repo doesn't mirror harness scripts or local search roots, s | `README.md` | Human-facing overview (with sources / inspiration) | | `assets/CLAUDE.md.tmpl` | Operating-contract template | | `assets/hooks/*.sh` | Four hook scripts | -| `assets/commands/*.md` | `/verify`, `/plan` | +| `assets/commands/*.md` | `/verify`, `/plan`, `/critique` | | `assets/memory/*.tmpl` | MEMORY.md index + 3 feedback memories + user_role template | | `scripts/install.sh` | Idempotent installer (`--dry-run` / `--force` / `--skip-*`) | | `scripts/uninstall.sh` | Symmetric uninstaller (content-match check; `--all` for full sweep) | diff --git a/skills/harness/assets/CLAUDE.md.tmpl b/skills/harness/assets/CLAUDE.md.tmpl index 5b85037..4d78bad 100644 --- a/skills/harness/assets/CLAUDE.md.tmpl +++ b/skills/harness/assets/CLAUDE.md.tmpl @@ -15,11 +15,18 @@ - Don't add comments unless the WHY is non-obvious. Don't add docs unless asked. - Don't add error handling for impossible cases. Don't introduce abstractions for hypothetical futures. +## The PTC loop +Work in this harness is a **Plan → Tool → Critique** loop, named so we share vocabulary. Reflexion (Shinn et al. 2023, https://arxiv.org/abs/2303.11366) is the canonical reference for the pattern. +- **Plan** — `/plan`, plan mode (shift-tab twice). Draft Goal/Constraints/Acceptance before editing. +- **Tool** — `Edit`, `Write`, `Bash`, sub-agents. Where the work happens. +- **Critique** — `verify-before-stop.sh` and `advisor()` at boundaries; `/critique` *between* tool steps when an interpretation is about to harden. Don't only critique at Stop. + ## Tools I expect you to use - `advisor()` before committing to an approach on tasks longer than a few steps, and before declaring done. - Sub-agents (`Explore`, `general-purpose`) for parallel research — don't duplicate their searches in the main thread. - `/verify` to run the project's pass/fail check. - `/plan` to draft Goal/Constraints/Acceptance for a non-trivial task. +- `/critique` to deliberately critique the most recent change mid-flow (don't wait for Stop). ## Stack signals diff --git a/skills/harness/assets/commands/critique.md b/skills/harness/assets/commands/critique.md new file mode 100644 index 0000000..d756abe --- /dev/null +++ b/skills/harness/assets/commands/critique.md @@ -0,0 +1,16 @@ +--- +description: Critique the most recent change before continuing — diff re-read + advisor() pass +--- + +A deliberate critique pass *between* steps, without waiting for Stop. Use mid-flow when an interpretation is about to harden into a chain of edits, or before declaring a milestone done. + +Procedure: + +1. **Re-read the change.** Run `git diff HEAD` (catches both staged and unstaged). If the working tree is clean (already committed, or no change yet), summarise the last few `Edit`/`Write` calls from the tool transcript instead. +2. **Critique it yourself first.** One paragraph: what does this change actually do, what could be wrong, what assumption is it leaning on? +3. **Call `advisor()`.** The advisor sees the full transcript — frame the question concretely. Examples: "Is the off-by-one in `foo()` actually a bug or am I misreading the loop?", "Does this migration handle the concurrent-write case I claimed it did?", "I rewrote X to use Y — is the boundary correct, or did I leak Y's concerns into a caller?" +4. **Report a punch list.** Under 200 words. What stands, what's shaky, what's the next concrete check. + +If `$ARGUMENTS` is provided, treat it as a scope hint — `/critique migration safety`, `/critique the test I just wrote` — and focus the diff re-read and the advisor question on that. + +Do not start new implementation work in this turn. The output of `/critique` is *findings*, not edits — the user (or the next turn) decides what to do about them. diff --git a/skills/harness/scripts/install.sh b/skills/harness/scripts/install.sh index baf2333..6042466 100755 --- a/skills/harness/scripts/install.sh +++ b/skills/harness/scripts/install.sh @@ -201,13 +201,13 @@ EOF if [ "$SCOPE" = "user" ]; then plan_line "$TARGET/CLAUDE.md" "$SKIP_CLAUDE_MD" "0" "operating contract — edit Stack signals after install" plan_line "$TARGET/hooks/*.sh" "$SKIP_HOOKS" "0" "4 guardrails: block-force-push / format-on-edit / post-compact-reinject / verify-before-stop" - plan_line "$TARGET/commands/{verify,plan}.md" "$SKIP_COMMANDS" "0" "slash commands" + plan_line "$TARGET/commands/{verify,plan,critique}.md" "$SKIP_COMMANDS" "0" "slash commands" plan_line "$MEMORY_DIR/*.md" "$SKIP_MEMORY" "0" "MEMORY.md index + 4 auto-memory templates" plan_line "$TARGET/settings.json" "$SKIP_SETTINGS" "0" "additive: env.CLAUDE_CODE_AUTO_COMPACT_WINDOW + 4 hook entries (NEVER touches permissions, marketplaces, statusLine, advisorModel, theme)" else plan_line "$(dirname "$TARGET")/CLAUDE.md" "$SKIP_CLAUDE_MD" "0" "operating contract — skipped if project already has a CLAUDE.md (use --force to override)" plan_line "$TARGET/hooks/*.sh" "$SKIP_HOOKS" "0" "4 guardrails" - plan_line "$TARGET/commands/{verify,plan}.md" "$SKIP_COMMANDS" "0" "slash commands" + plan_line "$TARGET/commands/{verify,plan,critique}.md" "$SKIP_COMMANDS" "0" "slash commands" plan_line "memory templates" "$SKIP_MEMORY" "$PROJECT_SCOPE_NO_MEMORY" "memory is per-user; not seeded at project scope" plan_line "$TARGET/settings.json" "$SKIP_SETTINGS" "0" "additive: 4 hook entries (no env var at project scope)" echo " (nothing in \$HOME is modified at project scope)" diff --git a/skills/harness/scripts/uninstall.sh b/skills/harness/scripts/uninstall.sh index d28dbd8..e558a2f 100755 --- a/skills/harness/scripts/uninstall.sh +++ b/skills/harness/scripts/uninstall.sh @@ -168,8 +168,8 @@ cat < Date: Sat, 2 May 2026 08:22:44 +0200 Subject: [PATCH 2/2] fix(harness): address Copilot review on PR #9 - /critique now handles untracked files (Read after git status --porcelain), fresh repos / non-git dirs, and clean trees with the change in HEAD (git show HEAD). - install.sh / uninstall.sh banners use commands/*.md instead of a hand-maintained brace list, matching the hooks convention so the dry-run output can't drift from the actual install glob. --- skills/harness/assets/commands/critique.md | 5 ++++- skills/harness/scripts/install.sh | 4 ++-- skills/harness/scripts/uninstall.sh | 4 ++-- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/skills/harness/assets/commands/critique.md b/skills/harness/assets/commands/critique.md index d756abe..c29f6dd 100644 --- a/skills/harness/assets/commands/critique.md +++ b/skills/harness/assets/commands/critique.md @@ -6,7 +6,10 @@ A deliberate critique pass *between* steps, without waiting for Stop. Use mid-fl Procedure: -1. **Re-read the change.** Run `git diff HEAD` (catches both staged and unstaged). If the working tree is clean (already committed, or no change yet), summarise the last few `Edit`/`Write` calls from the tool transcript instead. +1. **Re-read the change.** Pick the branch that matches the repo state — don't run all three: + - **Dirty working tree (git):** `git diff HEAD` for tracked changes, then `git status --porcelain` and `Read` any `??` (untracked) entries directly — `git diff` skips brand-new files. + - **Clean tree, change is the last commit (git):** `git show HEAD` to re-read what was just committed. Catches formatter output, Bash-generated files, and edits from earlier turns that the transcript may have aged out. + - **Fresh repo with no commits, or not a git repo:** summarise the last few `Edit`/`Write`/`Bash` calls that touched files, working from the tool transcript. 2. **Critique it yourself first.** One paragraph: what does this change actually do, what could be wrong, what assumption is it leaning on? 3. **Call `advisor()`.** The advisor sees the full transcript — frame the question concretely. Examples: "Is the off-by-one in `foo()` actually a bug or am I misreading the loop?", "Does this migration handle the concurrent-write case I claimed it did?", "I rewrote X to use Y — is the boundary correct, or did I leak Y's concerns into a caller?" 4. **Report a punch list.** Under 200 words. What stands, what's shaky, what's the next concrete check. diff --git a/skills/harness/scripts/install.sh b/skills/harness/scripts/install.sh index 6042466..048d178 100755 --- a/skills/harness/scripts/install.sh +++ b/skills/harness/scripts/install.sh @@ -201,13 +201,13 @@ EOF if [ "$SCOPE" = "user" ]; then plan_line "$TARGET/CLAUDE.md" "$SKIP_CLAUDE_MD" "0" "operating contract — edit Stack signals after install" plan_line "$TARGET/hooks/*.sh" "$SKIP_HOOKS" "0" "4 guardrails: block-force-push / format-on-edit / post-compact-reinject / verify-before-stop" - plan_line "$TARGET/commands/{verify,plan,critique}.md" "$SKIP_COMMANDS" "0" "slash commands" + plan_line "$TARGET/commands/*.md" "$SKIP_COMMANDS" "0" "slash commands" plan_line "$MEMORY_DIR/*.md" "$SKIP_MEMORY" "0" "MEMORY.md index + 4 auto-memory templates" plan_line "$TARGET/settings.json" "$SKIP_SETTINGS" "0" "additive: env.CLAUDE_CODE_AUTO_COMPACT_WINDOW + 4 hook entries (NEVER touches permissions, marketplaces, statusLine, advisorModel, theme)" else plan_line "$(dirname "$TARGET")/CLAUDE.md" "$SKIP_CLAUDE_MD" "0" "operating contract — skipped if project already has a CLAUDE.md (use --force to override)" plan_line "$TARGET/hooks/*.sh" "$SKIP_HOOKS" "0" "4 guardrails" - plan_line "$TARGET/commands/{verify,plan,critique}.md" "$SKIP_COMMANDS" "0" "slash commands" + plan_line "$TARGET/commands/*.md" "$SKIP_COMMANDS" "0" "slash commands" plan_line "memory templates" "$SKIP_MEMORY" "$PROJECT_SCOPE_NO_MEMORY" "memory is per-user; not seeded at project scope" plan_line "$TARGET/settings.json" "$SKIP_SETTINGS" "0" "additive: 4 hook entries (no env var at project scope)" echo " (nothing in \$HOME is modified at project scope)" diff --git a/skills/harness/scripts/uninstall.sh b/skills/harness/scripts/uninstall.sh index e558a2f..d7614f5 100755 --- a/skills/harness/scripts/uninstall.sh +++ b/skills/harness/scripts/uninstall.sh @@ -168,8 +168,8 @@ cat <