diff --git a/.gitignore b/.gitignore index 4d3500f..e66ee9e 100755 --- a/.gitignore +++ b/.gitignore @@ -7,7 +7,8 @@ devBenches/flutterBench/ devBenches/frappeBench/ devBenches/javaBench/ devBenches/dotNetBench/ -devBenches/pythonBench/ +devBenches/pyBench/ +devBenches/phpBench/ # Installation tracking (local only) .installed-benches.json @@ -17,11 +18,23 @@ config/version-manifest.json # Common ignores .DS_Store +__pycache__/ +*.py[cod] *.log *.tmp node_modules/ + +# Secret and machine-local environment files. Keep template examples tracked. .env +.env.* +*.env +!.env.example +!.env.sample +!.env.template +*.backup *.bak +*.bak.* +secrets/ *.swp *.swo @@ -38,4 +51,6 @@ bioBenches/gentecBench/ logs/ .codex .codex/ +.claude/dashboard.md +.claude/speckit-history.md sysBenches/opsBench/ diff --git a/PROTOCOL.md b/PROTOCOL.md new file mode 100644 index 0000000..5447ac5 --- /dev/null +++ b/PROTOCOL.md @@ -0,0 +1,154 @@ +# Asynchronous Clarify Protocol + +A custom clarify step for [GitHub Spec Kit](https://github.com/github/spec-kit) + +Claude Code. It fans question generation out across reviewer angles, logs the questions +to a file, and finishes clarify **automatically** once they are answered — no manual +re-running. + +## Why + +Spec Kit's stock `/speckit.clarify` is synchronous: it asks, you answer, all in one +sitting. Real clarification has human latency — a domain expert may take a day to answer +a compliance question. This protocol decouples **question generation** (cheap, parallel, +done by AI) from **answering** (slow, human) from **application** (one edit to `spec.md`), +and uses a polling loop so the loop, not you, watches for completion. + +## The four files + +| File | Role | +|------|------| +| `.claude/commands/openClarify.md` | Orchestrator. Resolves the feature dir, initializes the log, triggers the generation fan-out, registers the poll loop. Never answers questions. | +| `.claude/commands/openClarify-resume.md` | Poll tick. Reads the log and branches cheaply; only the all-answered tick edits `spec.md`. Enforces the critical-class human gate. | +| `templates/clarify-log.template.md` | The log schema: two top-of-file sentinels + per-question blocks. | +| `PROTOCOL.md` | This document. | + +## Data flow + +``` +/openClarify [feature-dir] + ├─ verify spec.md exists + ├─ init clarify-log.md from template (GENERATION: PENDING, CLARIFY: IN_PROGRESS) + ├─ workflow: fan out reviewer angles ──┐ + │ data-model ┐ │ each appends OPEN question blocks + │ edge-cases ┤ │ merge + dedupe, cap 25 + │ security-compliance ┤ │ then flip GENERATION: COMPLETE + │ testability ┤ │ + │ integration ┘ │ + └─ register /loop 10m /openClarify-resume + + ... humans answer blocks in clarify-log.md over time ... + +/openClarify-resume (every 10 min) + ├─ log missing? → no-op + ├─ CLARIFY: COMPLETE? → no-op, stop loop + ├─ GENERATION != COMPLETE → no-op (sentinel guard: no early firing) + ├─ any status: OPEN? → no-op (still waiting) + ├─ critical answered by architect-ai? → no-op (human escalation) + └─ all answered, criticals human-signed → edit spec.md, CLARIFY: COMPLETE, stop loop +``` + +## The log schema + +Two **sentinels** at the top of `clarify-log.md` are the entire coordination contract: + +- `GENERATION: PENDING | COMPLETE` — set `COMPLETE` only when the fan-out has written + every question. Until then the poller refuses to act. +- `CLARIFY: IN_PROGRESS | COMPLETE` — set `COMPLETE` only after answers are applied to + `spec.md`. + +Each question is a block: + +``` +## Q3 +- id: q3 +- status: OPEN # OPEN | ANSWERED +- class: normal # normal | critical +- agent: security-compliance # which reviewer angle raised it +- question: How long is PHI retained after account deletion? +- answer: +- answered_by: # human | architect-ai +- ts: +``` + +## Three design guarantees + +1. **Sentinel guard against early firing.** The poller treats `GENERATION: COMPLETE` as a + precondition. A log that is mid-generation can momentarily show "zero OPEN questions" + simply because no questions have been written yet — the guard stops the poller from + misreading that as "all answered" and prematurely editing `spec.md`. + +2. **Critical-class human escalation.** A `class: critical` question (clinical / regulated / + safety-impacting) answered by `architect-ai` does **not** clear. Completion blocks until + a human re-answers or confirms it. The AI can draft; only a human signature releases the + gate. + +3. **Cheap read-and-branch ticks.** Nearly every poll tick just greps the sentinels and a + handful of `status:` lines, then exits. Exactly one tick — the one that sees everything + answered and all criticals human-signed — does the expensive `spec.md` edit. Polling + every 10 minutes is therefore nearly free. + +## Relationship to stock `/speckit.clarify` — audit pass + +This protocol **does the clarification work**, then stock `/speckit.clarify` runs **after** +as an **audit**, not as the primary clarifier. + +Because `/openClarify-resume` writes answers in stock's canonical shape — a +`## Clarifications` section with `### Session YYYY-MM-DD` and `- Q: … → A: …` bullets, plus +the answer folded into the relevant spec section — a subsequent stock run sees those points +as already resolved. Stock decides what to ask by scanning **spec sections** against its +coverage taxonomy (Clear / Partial / Missing), so the folding in §3b is what actually makes +the audit quiet, not the log bullets. + +**How to use the audit:** after our protocol marks `CLARIFY: COMPLETE`, run stock +`/speckit.clarify` (the `speckit-clarify` skill) a few times. + +- **No new questions** → our five reviewer angles covered the spec to stock's standard. Proceed to `/speckit.plan`. +- **New questions** → a real coverage gap. Most will land in taxonomy categories our angles + don't target: **functional scope & behavior, interaction/UX flow, non-functional + (performance, scalability, reliability, observability), constraints & tradeoffs, and + terminology consistency**. Our angles cover data-model, edge-cases, security/compliance, + testability, and integration — so those five categories are the expected blind spots. + +Treat stock's output as a **regression check on our generation coverage**. If a category +keeps surfacing, add a reviewer angle for it to the fan-out in `/openClarify`. + +> Note: in this environment stock clarify is overridden (see `~/.claude` global config) to +> ask up to **25** questions in **block form**, written to `/clarify-questions.md` +> — so the audit produces a diffable file rather than a one-at-a-time interactive loop. + +``` +/openClarify → (async answers) → resume applies + canonical format → CLARIFY: COMPLETE + │ + /speckit.clarify ×N (audit) + │ + new questions? → new clarify cycle ; else → /speckit.plan +``` + +## Known gaps / operational notes + +- **Generation script is authored on first run.** The `workflow` keyword lets the Claude + Code runtime author the fan-out's internal script the first time `/openClarify` + runs. **Save that run as `/openClarify-generate`** so subsequent features reuse it + instead of re-authoring the fan-out each time. + +- **`/loop` is session-scoped with a 3-day cap.** It only survives while the session is + alive and stops after ~3 days. For human turnaround longer than that, swap the in-session + loop for an external scheduler: + + ``` + cron + claude -p /openClarify-resume + ``` + + e.g. a crontab entry running `claude -p "/openClarify-resume specs/my-feature/"` every + 15 minutes, which survives restarts and arbitrary human latency. + +## Usage + +``` +# 1. start (defaults to specs//) +/openClarify + +# 2. humans edit clarify-log.md, filling answer / answered_by / status: ANSWERED + +# 3. nothing else to do — the loop applies answers to spec.md and stops itself +``` diff --git a/README.md b/README.md index 3340980..6c318f7 100755 --- a/README.md +++ b/README.md @@ -24,14 +24,15 @@ Safe to run repeatedly. Installed benches show `✓ up to date` and are skipped. ## Docker Image Layers ``` -Layer 0: workbench-base:latest — Ubuntu 24.04 + git, zsh, curl, AI CLIs, bun - ├─ Layer 1a: dev-bench-base:latest — Python, Node.js LTS, npm, dev tools, testing tools, Playwright Chromium +Layer 0: workbench-base:latest — Ubuntu 24.04 + git, zsh, curl, shared AI CLIs, bun + ├─ Layer 1a: dev-bench-base:latest — Python, Node.js LTS, npm, dev tools, OpenSpec, spec-kit, testing tools, Playwright Chromium │ ├─ Layer 2: cpp-bench:latest — GCC, CMake, vcpkg │ ├─ Layer 2: dotnet-bench:latest — .NET SDK 8/9 │ ├─ Layer 2: flutter-bench:latest — Flutter SDK, Dart, Android tools │ ├─ Layer 2: frappe-bench:latest — MariaDB client, Redis, Nginx, bench CLI (Node.js 20) │ ├─ Layer 2: java-bench:latest — OpenJDK 21, Maven, Gradle, Spring CLI - │ ├─ Layer 2: python-bench:latest — Python dev tools (thin layer on 1a) + │ ├─ Layer 2: php-bench:latest — PHP 8.3, Composer, PHPUnit, Xdebug + │ ├─ Layer 2: py-bench:latest — Python dev tools (thin layer on 1a) │ └─ Layer 2: go-bench:latest — Go toolchain ├─ Layer 1b: sys-bench-base:latest — Kubernetes, Terraform, cloud CLIs │ └─ Layer 2: cloud-bench:latest — Cloud admin tools @@ -109,7 +110,8 @@ workBenches/ │ ├── frappeBench/ ← Frappe/ERPNext bench (opensoft/frappeBench) │ ├── goBench/ ← Go bench (opensoft/goBench) │ ├── javaBench/ ← Java bench (opensoft/javaBench) -│ └── pythonBench/ ← Python bench (opensoft/pythonBench) +│ ├── phpBench/ ← PHP bench (opensoft/phpBench) +│ └── pyBench/ ← Python bench (opensoft/pyBench) ├── sysBenches/ │ ├── base-image/ ← Layer 1b: sys-bench-base Dockerfile │ ├── cloudBench/ ← Cloud admin bench (opensoft/cloudBench) @@ -187,6 +189,7 @@ npm global packages install to `~/.npm-global` (no sudo required). | frappeBench | [opensoft/frappeBench](https://github.com/opensoft/frappeBench) | | goBench | [opensoft/goBench](https://github.com/opensoft/goBench) | | javaBench | [opensoft/javaBench](https://github.com/opensoft/javaBench) | -| pythonBench | [opensoft/pythonBench](https://github.com/opensoft/pythonBench) | +| phpBench | [opensoft/phpBench](https://github.com/opensoft/phpBench) | +| pyBench | [opensoft/pyBench](https://github.com/opensoft/pyBench) | | gentecBench | [opensoft/gentecBench](https://github.com/opensoft/gentecBench) | | simBench | [opensoft/simBench](https://github.com/opensoft/simBench) | diff --git a/base-image/Dockerfile b/base-image/Dockerfile index e8356c9..bcbdf4f 100644 --- a/base-image/Dockerfile +++ b/base-image/Dockerfile @@ -9,7 +9,7 @@ FROM ubuntu:24.04 # Container version labels LABEL layer="0" LABEL layer.name="workbench-base" -LABEL layer.version="2.0.0" +LABEL layer.version="2.0.1" LABEL layer.description="System base with AI CLIs for all workBenches (user-agnostic)" # Everything runs as root — no user in this layer @@ -119,9 +119,14 @@ RUN npm install -g tldr \ # Install uv to /usr/local/bin (system-wide) RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/local/bin sh -# Install spec-kit system-wide -RUN uv tool install specify-cli --from git+https://github.com/github/spec-kit.git --python-preference system \ - || echo "spec-kit installation skipped (non-fatal)" +# Ensure spec-driven CLIs are not inherited from cached Layer 0 state. +RUN npm uninstall -g @fission-ai/openspec || true \ + && rm -f /usr/bin/openspec /usr/local/bin/openspec \ + && rm -rf /usr/lib/node_modules/@fission-ai/openspec /usr/local/lib/node_modules/@fission-ai/openspec \ + && uv tool uninstall specify-cli || true \ + && env UV_TOOL_BIN_DIR=/usr/local/bin UV_TOOL_DIR=/opt/uv/tools uv tool uninstall specify-cli || true \ + && rm -f /root/.local/bin/specify /usr/local/bin/specify \ + && rm -rf /root/.local/share/uv/tools/specify-cli /opt/uv/tools/specify-cli # ======================================== # BUN RUNTIME (system-wide at /opt/bun) @@ -172,14 +177,139 @@ RUN echo 'export ZSH="$HOME/.oh-my-zsh"' > /etc/skel/.zshrc && \ RUN echo 'eval "$(zoxide init bash)"' >> /etc/skel/.bashrc && \ echo 'export PATH="$HOME/.local/bin:/opt/bun/bin:$PATH"' >> /etc/skel/.bashrc -# Provide AI helper aliases system-wide so all benches inherit them even when +# Provide AI helpers system-wide so all benches inherit them even when # a workspace does not mount a host shell profile. -RUN echo '' >> /etc/zsh/zshrc && \ - echo '# WorkBench AI helpers' >> /etc/zsh/zshrc && \ - echo 'alias yolo="claude --dangerously-skip-permissions --teammate-mode tmux"' >> /etc/zsh/zshrc && \ - echo '' >> /etc/bash.bashrc && \ - echo '# WorkBench AI helpers' >> /etc/bash.bashrc && \ - echo 'alias yolo="claude --dangerously-skip-permissions --teammate-mode tmux"' >> /etc/bash.bashrc +RUN cat <<'EOF' >> /etc/zsh/zshrc + +# WorkBench AI helpers +_yolo_shell_quote() { + local quoted="" arg + + for arg in "$@"; do + quoted="${quoted} $(printf '%q' "$arg")" + done + + printf '%s\n' "${quoted# }" +} + +unalias yolo 2>/dev/null || true +yolo() { + local session_name command_string prompt_file + local -a prompt_args + + if ! command -v claude >/dev/null 2>&1; then + echo "yolo: Claude CLI not found on PATH" >&2 + return 1 + fi + + if ! command -v tmux >/dev/null 2>&1; then + echo "yolo: tmux not found on PATH" >&2 + return 1 + fi + + prompt_file="" + for candidate_prompt_file in \ + "$HOME/.claude/prompts/speckit-dashboard-full.md" \ + "/usr/local/share/ct/claude/prompts/speckit-dashboard-full.md" \ + "$HOME/.claude/prompts/speckit-dashboard-bootstrap.md"; do + if [ -r "$candidate_prompt_file" ]; then + prompt_file="$candidate_prompt_file" + break + fi + done + prompt_args=() + if [ -n "$prompt_file" ]; then + prompt_args=(--append-system-prompt-file "$prompt_file") + fi + + if [ -n "${TMUX:-}" ]; then + claude --dangerously-skip-permissions --teammate-mode tmux "${prompt_args[@]}" "$@" + return $? + fi + + session_name="yolo-$(date +%Y%m%d%H%M%S)-$$" + command_string=$(_yolo_shell_quote \ + claude \ + --dangerously-skip-permissions \ + --teammate-mode tmux \ + "${prompt_args[@]}" \ + "$@") || return 1 + + tmux new-session -d -s "$session_name" -c "$PWD" "exec $command_string" || { + echo "yolo: failed to start tmux session" >&2 + return 1 + } + + tmux set-option -t "$session_name" mouse on >/dev/null 2>&1 || true + tmux attach-session -t "$session_name" +} +EOF + +RUN cat <<'EOF' >> /etc/bash.bashrc + +# WorkBench AI helpers +_yolo_shell_quote() { + local quoted="" arg + + for arg in "$@"; do + quoted="${quoted} $(printf '%q' "$arg")" + done + + printf '%s\n' "${quoted# }" +} + +unalias yolo 2>/dev/null || true +yolo() { + local session_name command_string prompt_file + local -a prompt_args + + if ! command -v claude >/dev/null 2>&1; then + echo "yolo: Claude CLI not found on PATH" >&2 + return 1 + fi + + if ! command -v tmux >/dev/null 2>&1; then + echo "yolo: tmux not found on PATH" >&2 + return 1 + fi + + prompt_file="" + for candidate_prompt_file in \ + "$HOME/.claude/prompts/speckit-dashboard-full.md" \ + "/usr/local/share/ct/claude/prompts/speckit-dashboard-full.md" \ + "$HOME/.claude/prompts/speckit-dashboard-bootstrap.md"; do + if [ -r "$candidate_prompt_file" ]; then + prompt_file="$candidate_prompt_file" + break + fi + done + prompt_args=() + if [ -n "$prompt_file" ]; then + prompt_args=(--append-system-prompt-file "$prompt_file") + fi + + if [ -n "${TMUX:-}" ]; then + claude --dangerously-skip-permissions --teammate-mode tmux "${prompt_args[@]}" "$@" + return $? + fi + + session_name="yolo-$(date +%Y%m%d%H%M%S)-$$" + command_string=$(_yolo_shell_quote \ + claude \ + --dangerously-skip-permissions \ + --teammate-mode tmux \ + "${prompt_args[@]}" \ + "$@") || return 1 + + tmux new-session -d -s "$session_name" -c "$PWD" "exec $command_string" || { + echo "yolo: failed to start tmux session" >&2 + return 1 + } + + tmux set-option -t "$session_name" mouse on >/dev/null 2>&1 || true + tmux attach-session -t "$session_name" +} +EOF # ======================================== # OPENCODE CONFIGURATION (into /etc/skel) @@ -192,22 +322,11 @@ COPY files/opencode/oh-my-opencode.json /etc/skel/.config/opencode/ COPY files/opencode/agent/ /etc/skel/.config/opencode/agent/ COPY files/opencode/context/ /etc/skel/.config/opencode/context/ -# ======================================== -# OPSX COMMANDS & SKILLS (Claude Code, into /etc/skel) -# ======================================== -# Upgraded OpenSpec workflows with agent team orchestration. -# Uses opsx-* / opsx: prefix — openspec init/update won't overwrite these. - -RUN mkdir -p /etc/skel/.claude/commands/opsx \ - && mkdir -p /etc/skel/.claude/skills/opsx-clarify \ - && mkdir -p /etc/skel/.claude/skills/opsx-analyze -COPY files/claude/commands/opsx/ /etc/skel/.claude/commands/opsx/ -COPY files/claude/skills/opsx-clarify/ /etc/skel/.claude/skills/opsx-clarify/ -COPY files/claude/skills/opsx-analyze/ /etc/skel/.claude/skills/opsx-analyze/ - # ======================================== # SPECKIT SKILLS (Claude Code, into /etc/skel) # ======================================== +# OpenSpec/opsx commands are installed with the devBench OpenSpec layer, where +# the openspec CLI is present. Layer 0 keeps only non-OpenSpec Claude defaults. # Agent-team-enhanced Speckit workflows. Each skill upgrades the corresponding # project-level /speckit.* command with parallel specialist agents. # Uses speckit-* prefix — speckit init/update won't overwrite these. diff --git a/base-image/build.sh b/base-image/build.sh index a5e96da..cb23a74 100755 --- a/base-image/build.sh +++ b/base-image/build.sh @@ -13,20 +13,24 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" cd "$SCRIPT_DIR" # Parse arguments (--user is accepted but ignored for backward compat) +NO_CACHE="${NO_CACHE:-false}" while [[ $# -gt 0 ]]; do case $1 in --user) shift 2 ;; + --no-cache) NO_CACHE=true; shift ;; *) shift ;; esac done echo "Configuration:" echo " Tag: workbench-base:latest (user-agnostic)" +echo " No cache: $NO_CACHE" echo "" # Build the image echo "Building workbench-base:latest..." docker build \ + $([ "$NO_CACHE" = true ] && printf '%s\n' "--no-cache") \ -t "workbench-base:latest" \ . diff --git a/base-image/install-ai-clis.sh b/base-image/install-ai-clis.sh index be8727d..9ed7cb9 100755 --- a/base-image/install-ai-clis.sh +++ b/base-image/install-ai-clis.sh @@ -1,6 +1,6 @@ #!/bin/bash # Shared AI CLI Installation Script -# Version: 2.0.2 +# Version: 2.0.3 # # USER-AGNOSTIC: Runs as root, installs to system-wide paths. # All npm globals go to /usr/local (default root prefix). @@ -9,11 +9,12 @@ # Claude Code goes to /usr/local/bin. # # Installs: -# - OpenCode (from Opensoft/opencode fork) +# - OpenCode (built from the upstream anomalyco/opencode repository) # - oh-my-opencode plugin (from git: darrenhinde/oh-my-opencode) # Includes built-in agents: Sisyphus, oracle, librarian, explore, frontend, etc. # - Auth plugins (opencode-gemini-auth, opencode-openai-codex-auth) # - Other AI CLIs (Codex, Gemini, Copilot, etc.) +# - Google Antigravity CLI (agy), checksum-gated opt-in # - Claude Code (via native installer, not npm) # # Note: OpenAgents agent files (openagent.md, opencoder.md) are copied via @@ -31,6 +32,9 @@ set -e DEBUG="${DEBUG:-1}" COMMAND_TIMEOUT="${COMMAND_TIMEOUT:-120}" # 2 minutes per command BUN_OPERATIONS_TIMEOUT="${BUN_OPERATIONS_TIMEOUT:-180}" # 3 minutes for bun ops +INSTALL_ANTIGRAVITY_CLI="${INSTALL_ANTIGRAVITY_CLI:-0}" +ANTIGRAVITY_INSTALL_URL="${ANTIGRAVITY_INSTALL_URL:-https://antigravity.google/cli/install.sh}" +ANTIGRAVITY_INSTALL_SHA256="${ANTIGRAVITY_INSTALL_SHA256:-}" log_debug() { if [ "$DEBUG" = "1" ]; then @@ -69,6 +73,22 @@ run_with_timeout() { fi } +ensure_system_uv_tool_paths() { + mkdir -p "$SYSTEM_UV_TOOL_DIR" "$SYSTEM_UV_TOOL_BIN_DIR" /root/.local/share/uv + ln -sfn "$SYSTEM_UV_TOOL_DIR" /root/.local/share/uv/tools +} + +run_system_uv_tool_install() { + local description="$1" + shift + + ensure_system_uv_tool_paths + run_with_timeout "$COMMAND_TIMEOUT" "$description" env \ + UV_TOOL_BIN_DIR="$SYSTEM_UV_TOOL_BIN_DIR" \ + UV_TOOL_DIR="$SYSTEM_UV_TOOL_DIR" \ + uv tool install "$@" --python-preference system +} + check_system_resources() { log_debug "Checking system resources..." log_debug "Memory: $(free -h | head -2)" @@ -92,6 +112,8 @@ check_system_resources export BUN_INSTALL="${BUN_INSTALL:-/opt/bun}" export PATH="/opt/bun/bin:$PATH" +SYSTEM_UV_TOOL_DIR="${SYSTEM_UV_TOOL_DIR:-/opt/uv/tools}" +SYSTEM_UV_TOOL_BIN_DIR="${SYSTEM_UV_TOOL_BIN_DIR:-/usr/local/bin}" log_debug "Verifying Bun installation" if which bun >/dev/null 2>&1; then @@ -101,11 +123,6 @@ else log_error "Bun not found in PATH (expected at /opt/bun/bin)" fi -log_info "Installing OpenSpec..." -if ! run_with_timeout "$COMMAND_TIMEOUT" "OpenSpec npm install" npm install -g @fission-ai/openspec@latest; then - log_error "OpenSpec installation failed (continuing)" -fi - log_info "Installing Claude Code CLI (native installer)..." # Native installer downloads to $HOME/.claude/downloads/ then runs 'claude install' # which places a launcher in ~/.local/bin/. Since we run as root, we need to @@ -142,21 +159,40 @@ if ! run_with_timeout "$COMMAND_TIMEOUT" "Gemini npm install" npm install -g @go log_error "Gemini CLI installation failed (continuing)" fi +if [ "$INSTALL_ANTIGRAVITY_CLI" = "1" ] || [ "$INSTALL_ANTIGRAVITY_CLI" = "true" ]; then + log_info "Installing Google Antigravity CLI..." + if [ -z "$ANTIGRAVITY_INSTALL_SHA256" ]; then + log_error "Antigravity install requested but ANTIGRAVITY_INSTALL_SHA256 is not set (skipping)" + else + antigravity_installer="$(mktemp)" + if run_with_timeout "120" "Antigravity installer download" \ + curl -fsSL "$ANTIGRAVITY_INSTALL_URL" -o "$antigravity_installer" && + printf '%s %s\n' "$ANTIGRAVITY_INSTALL_SHA256" "$antigravity_installer" | sha256sum -c - >/dev/null 2>&1 && + run_with_timeout "300" "Antigravity CLI install" bash "$antigravity_installer" --skip-aliases --skip-path; then + if [ -x "$HOME/.local/bin/agy" ] && [ ! -x /usr/local/bin/agy ]; then + cp "$HOME/.local/bin/agy" /usr/local/bin/agy + chmod +x /usr/local/bin/agy + fi + log_info "Antigravity CLI installed to $(command -v agy || printf '/usr/local/bin/agy')" + else + log_error "Antigravity CLI installation failed or checksum verification failed (continuing)" + fi + rm -f "$antigravity_installer" + fi +else + log_info "Skipping Google Antigravity CLI install; set INSTALL_ANTIGRAVITY_CLI=1 and ANTIGRAVITY_INSTALL_SHA256 to enable" +fi + log_info "Installing GitHub Copilot CLI..." if ! run_with_timeout "$COMMAND_TIMEOUT" "GitHub Copilot npm install" npm install -g @githubnext/github-copilot-cli; then log_error "GitHub Copilot installation failed (continuing)" fi -log_info "Installing Grok CLI (xAI)..." -if ! run_with_timeout "$COMMAND_TIMEOUT" "Grok npm install" npm install -g @xai-org/grok-cli; then - log_error "Grok CLI not available via npm (skipping)" -fi - -log_info "Installing OpenCode AI (from Opensoft fork)..." -# OpenCode: open source AI coding agent (https://github.com/Opensoft/opencode) -# Install from Opensoft fork instead of npm (sst version) -log_debug "Cloning OpenCode repository from Opensoft..." -if ! run_with_timeout "$COMMAND_TIMEOUT" "OpenCode git clone" git clone --depth 1 https://github.com/Opensoft/opencode.git /tmp/opencode; then +log_info "Installing OpenCode AI (from upstream source)..." +# OpenCode: open source AI coding agent (https://github.com/anomalyco/opencode) +# Build directly from upstream source. +log_debug "Cloning OpenCode repository from upstream..." +if ! run_with_timeout "$COMMAND_TIMEOUT" "OpenCode git clone" git clone --depth 1 https://github.com/anomalyco/opencode.git /tmp/opencode; then log_error "Failed to clone OpenCode repository (skipping OpenCode installation)" else cd /tmp/opencode @@ -293,8 +329,7 @@ fi log_info "Installing NotebookLM tools..." # notebooklm-py: Python CLI + API for NotebookLM (notebooklm command) # Base install only — no browser deps needed in container; auth mounted from host -if run_with_timeout "$COMMAND_TIMEOUT" "notebooklm-py install" uv tool install notebooklm-py --python-preference system; then - [ -f "$HOME/.local/bin/notebooklm" ] && ln -sf "$HOME/.local/bin/notebooklm" /usr/local/bin/notebooklm +if run_system_uv_tool_install "notebooklm-py install" notebooklm-py; then log_info "notebooklm-py CLI installed (notebooklm)" else log_error "notebooklm-py installation failed (continuing)" @@ -302,11 +337,9 @@ fi # notebooklm-mcp-cli: MCP server + nlm CLI for AI agent integration # Auth is done on the host (requires browser); tokens mounted into container -# uv tool install puts binaries in ~/.local/bin (root), so symlink to /usr/local/bin -if run_with_timeout "$COMMAND_TIMEOUT" "NotebookLM MCP CLI install" uv tool install notebooklm-mcp-cli --python-preference system; then - for bin in nlm notebooklm-mcp; do - [ -f "$HOME/.local/bin/$bin" ] && ln -sf "$HOME/.local/bin/$bin" "/usr/local/bin/$bin" - done +# Install into a shared uv tools directory instead of root's home so bench +# users can execute the launchers from /usr/local/bin. +if run_system_uv_tool_install "NotebookLM MCP CLI install" notebooklm-mcp-cli; then log_info "NotebookLM MCP CLI installed (nlm, notebooklm-mcp)" else log_error "NotebookLM MCP CLI installation failed (continuing)" @@ -331,12 +364,15 @@ if [ "${#missing_clis[@]}" -gt 0 ]; then fi log_info "Installed tools:" -log_info " - OpenSpec" log_info " - Claude Code (claude) [native installer]" log_info " - OpenAI Codex (codex)" log_info " - Google Gemini (gemini)" -log_info " - GitHub Copilot (copilot)" -log_info " - Grok (grok)" +if command -v agy >/dev/null 2>&1; then + log_info " - Google Antigravity CLI (agy)" +else + log_info " - Google Antigravity CLI (agy) [checksum-gated opt-in, skipped]" +fi +log_info " - GitHub Copilot CLI (github-copilot-cli)" log_info " - OpenCode (opencode)" log_info " - oh-my-opencode (darrenhinde fork with built-in agents)" log_info " - Letta Code (letta)" diff --git a/bench-config.json.backup b/bench-config.json.backup deleted file mode 100755 index 711c351..0000000 --- a/bench-config.json.backup +++ /dev/null @@ -1,55 +0,0 @@ -{ - "infrastructure": { - "specKit": { - "url": "git@github.com:opensoft/specKit.git", - "path": "specKit", - "description": "Infrastructure and specification kit - always installed" - } - }, - "benches": { - "cloudBench": { - "url": "git@github.com:opensoft/cloudBench.git", - "path": "sysBenches/cloudBench", - "description": "Cloud infrastructure and operations tools" - }, - "pythonBench": { - "url": "git@github.com:opensoft/pythonBench.git", - "path": "devBench/pythonBench", - "description": "Python development environment and tools" - }, - "javaBench": { - "url": "git@github.com:opensoft/javaBench.git", - "path": "devBench/javaBench", - "description": "Java development environment and tools" - }, - "dotNetBench": { - "url": "git@github.com:opensoft/dotNetBench.git", - "path": "devBench/dotNetBench", - "description": ".NET development environment and tools" - }, - "flutterBench": { - "url": "git@github.com:opensoft/flutterBench.git", - "path": "devBench/flutterBench", - "description": "Flutter/Dart development environment and tools", - "project_scripts": [ - { - "name": "flutter", - "script": "scripts/new-flutter-project.sh", - "description": "Create a new Flutter project with DevContainer setup", - "includes_speckit": true - }, - { - "name": "dartwing", - "script": "scripts/new-dartwing-project.sh", - "description": "Create a new DartWing project with specialized configuration", - "includes_speckit": true - } - ] - }, - "cppBench": { - "url": "git@github.com:opensoft/cppBench.git", - "path": "devBench/cppBench", - "description": "C++ development environment and tools" - } - } -} diff --git a/bioBenches/base-image/build.sh b/bioBenches/base-image/build.sh index 704ec0c..2edf744 100755 --- a/bioBenches/base-image/build.sh +++ b/bioBenches/base-image/build.sh @@ -18,9 +18,11 @@ LEGACY_IMAGE="$(legacy_family_base_image bio)" cd "$SCRIPT_DIR" # Parse arguments (--user is accepted but ignored for backward compat) +NO_CACHE="${NO_CACHE:-false}" while [[ $# -gt 0 ]]; do case $1 in --user) shift 2 ;; + --no-cache) NO_CACHE=true; shift ;; *) shift ;; esac done @@ -28,6 +30,7 @@ done echo "Configuration:" echo " Tag: $CANONICAL_IMAGE (user-agnostic)" echo " Legacy alias: $LEGACY_IMAGE" +echo " No cache: $NO_CACHE" echo "" # Check if Layer 0 exists @@ -43,6 +46,7 @@ fi # Build the image echo "Building $CANONICAL_IMAGE..." docker build \ + $([ "$NO_CACHE" = true ] && printf '%s\n' "--no-cache") \ -t "$CANONICAL_IMAGE" \ . tag_family_base_legacy_alias bio diff --git a/config/bench-config.json b/config/bench-config.json index 7d31b2b..b90c88b 100755 --- a/config/bench-config.json +++ b/config/bench-config.json @@ -13,9 +13,9 @@ "description": "Cloud infrastructure and operations tools", "ai_keywords": ["cloud", "infrastructure", "devops", "kubernetes", "docker", "deployment", "admin", "monitoring", "logging"] }, - "pythonBench": { - "url": "git@github.com:opensoft/pythonBench.git", - "path": "devBenches/pythonBench", + "pyBench": { + "url": "git@github.com:opensoft/pyBench.git", + "path": "devBenches/pyBench", "description": "Python development environment and tools", "ai_keywords": ["python", "django", "flask", "fastapi", "pandas", "numpy", "machine learning", "ml", "data science", "AI", "artificial intelligence", "web scraping"], "project_scripts": [ @@ -27,6 +27,20 @@ } ] }, + "phpBench": { + "url": "git@github.com:opensoft/phpBench.git", + "path": "devBenches/phpBench", + "description": "PHP development environment and tools", + "ai_keywords": ["php", "composer", "phpunit", "laravel", "symfony", "wordpress", "drupal", "xdebug", "web application"], + "project_scripts": [ + { + "name": "php", + "script": "scripts/new-php-project.sh", + "description": "Create a new PHP project with Composer, PHPUnit, and SonarCloud coverage setup", + "includes_speckit": false + } + ] + }, "javaBench": { "url": "git@github.com:opensoft/javaBench.git", "path": "devBenches/javaBench", diff --git a/config/claude/workflows/deep-swarm-code-review.js b/config/claude/workflows/deep-swarm-code-review.js new file mode 100644 index 0000000..b91cbb2 --- /dev/null +++ b/config/claude/workflows/deep-swarm-code-review.js @@ -0,0 +1,438 @@ +export const meta = { + name: 'deep-swarm-code-review', + description: 'Swarm of expert subagents deep-reviews a PR / branch / uncommitted diff, adversarially verifies each finding, and (in PR mode) automatically posts all confirmed findings as inline comments on the PR', + whenToUse: 'Deep multi-agent code review. Auto-targets: open PR for the current branch, else committed branch-vs-main, else uncommitted working-tree changes. In PR mode it ALWAYS posts all confirmed findings to the PR automatically (set args.post=false to suppress, args.dedupeAgainstExisting=true to skip findings already commented on the PR). Override with args {mode:"pr"|"branch"|"uncommitted", prNumber, base}.', + phases: [ + { title: 'Scope', detail: 'detect review target (PR / branch / uncommitted) + partition the diff' }, + { title: 'Review', detail: 'multi-pass swarm — each pass adds expert lenses, finer file units, deeper digging' }, + { title: 'Verify', detail: 'independent skeptic verifies each finding + validates the diff line' }, + { title: 'Post', detail: 'auto-publish one consolidated GitHub review of all confirmed findings (PR mode)' }, + ], +} + +// ============================================================================ +// args (all optional — workflow auto-detects when omitted): +// mode: 'pr' | 'branch' | 'uncommitted' +// prNumber: number (pr mode) +// base: base ref for committed diffs (default auto: origin/main || main) +// post: boolean — post results to GitHub (default true in pr mode) +// repoRoot: absolute path (default: current working dir of agents) +// ============================================================================ +const cfg = args || {} +const REPO_ROOT = cfg.repoRoot || '.' +const POST = cfg.post !== false // PR mode auto-posts unless explicitly disabled +const DEDUPE = cfg.dedupeAgainstExisting === true // skip findings already commented on the PR + +// ---- Phase 0: detect the review target ------------------------------------- +phase('Scope') + +const SCOPE_SCHEMA = { + type: 'object', + required: ['mode', 'base', 'summary'], + properties: { + mode: { type: 'string', enum: ['pr', 'branch', 'uncommitted'] }, + prNumber: { type: 'integer' }, + base: { type: 'string' }, + branch: { type: 'string' }, + summary: { type: 'string' }, + }, +} + +let scope +if (cfg.mode) { + scope = { mode: cfg.mode, prNumber: cfg.prNumber, base: cfg.base || 'origin/main', summary: 'from args' } +} else { + scope = await agent( + `Determine what this code review should target. Repo root: ${REPO_ROOT}. Use Bash (git, gh). + +Decide ONE mode, in this priority order: +1. 'pr' — if 'gh pr view --json number,baseRefName,headRefName' shows an OPEN PR for the CURRENT branch. Capture prNumber and base (the PR's baseRefName, e.g. origin/main or main). +2. 'uncommitted' — else if 'git status --porcelain' shows tracked changes (the working tree is dirty). base = HEAD. +3. 'branch' — else review committed work on this branch vs its base. base = whichever of 'origin/main' or 'main' exists (prefer origin/main). If the current branch IS main/master with no PR and a clean tree, still pick 'branch' with base = the previous commit's parent (HEAD~1) and note it in summary. + +Return the chosen mode, base ref string (usable in 'git diff ...HEAD' for pr/branch, or literally 'HEAD' for uncommitted), prNumber if pr, branch name, and a one-line summary of what will be reviewed.`, + { label: 'scope:detect', phase: 'Scope', schema: SCOPE_SCHEMA }, + ) +} + +const MODE = scope.mode +const BASE = scope.base || 'origin/main' +const PRNUM = scope.prNumber || cfg.prNumber +log(`Target: ${MODE}${PRNUM ? ' #' + PRNUM : ''} (base=${BASE}) — ${scope.summary}`) + +// How each reviewer obtains its slice of the diff, by mode. +function diffSpec(files) { + const fileArgs = files.map(f => `'${f}'`).join(' ') + if (MODE === 'uncommitted') { + return `Review UNCOMMITTED changes only:\n git -C ${REPO_ROOT} diff HEAD -- ${fileArgs}\n (also 'git -C ${REPO_ROOT} status --porcelain -- ${fileArgs}' for new untracked files).` + } + return `BASE detection (run first):\n BASE=$(git -C ${REPO_ROOT} merge-base HEAD ${BASE} 2>/dev/null || git -C ${REPO_ROOT} merge-base HEAD main || echo ${BASE})\nThen review the committed diff:\n git -C ${REPO_ROOT} diff "$BASE"...HEAD -- ${fileArgs}` +} + +// ---- Phase 1 setup: discover changed files and group them ------------------ +// A reviewer agent reads the changed-file list and partitions it into coherent +// subsystem groups, so the workflow adapts to whatever diff it is pointed at. +const GROUPS_SCHEMA = { + type: 'object', + required: ['groups'], + properties: { + groups: { + type: 'array', + items: { + type: 'object', + required: ['name', 'persona', 'files'], + properties: { + name: { type: 'string' }, + persona: { type: 'string' }, + files: { type: 'array', items: { type: 'string' } }, + }, + }, + }, + }, +} + +const listCmd = MODE === 'uncommitted' + ? `git -C ${REPO_ROOT} diff --name-only HEAD; git -C ${REPO_ROOT} ls-files --others --exclude-standard` + : `BASE=$(git -C ${REPO_ROOT} merge-base HEAD ${BASE} 2>/dev/null || git -C ${REPO_ROOT} merge-base HEAD main || echo ${BASE}); git -C ${REPO_ROOT} diff --name-only "$BASE"...HEAD` + +const partition = await agent( + `List the changed files for this review and partition them into coherent review groups. + +Run: + ${listCmd} + +Then group the changed files into 8–24 subsystem groups so each group is a coherent unit one expert can review well (group by directory / language / feature; keep related scripts together; isolate large/high-risk files into their own group). For each group give: a short kebab 'name', a 'persona' (the kind of expert best suited — e.g. "a defensive Bash engineer", "a Docker layered-build expert", "a senior Python engineer", "a PowerShell automation expert", "a config/JSON correctness reviewer", "a refactor-safety auditor for renamed/removed paths"), and the exact repo-relative 'files' (every changed file must appear in exactly one group). Aim to cover EVERY changed file.`, + { label: 'scope:partition', phase: 'Scope', schema: GROUPS_SCHEMA }, +) + +const GROUPS = (partition.groups || []).filter(g => g.files && g.files.length) +if (!GROUPS.length) { + log('No changed files found — nothing to review.') + return { mode: MODE, base: BASE, confirmedCount: 0, confirmed: [] } +} +log(`Swarm: ${GROUPS.length} expert reviewers over ${GROUPS.reduce((n, g) => n + g.files.length, 0)} changed files`) + +// ---- shared reviewer guidance ---------------------------------------------- +const SHARED_RULES = ` +You are reviewing a real change. Work from the actual diff and full file context — do NOT speculate. + +SCOPE: Report only problems introduced or touched by THIS diff. Ignore pre-existing issues in unchanged lines. + +LOOK FOR (weight by real impact): +- Correctness / logic bugs, wrong conditionals, off-by-one, bad expansion, unset-var use. +- Shell robustness: unquoted expansions, word-splitting, missing 'set -euo pipefail' where it matters, ignored exit codes, fragile parsing, non-portable bashisms in /bin/sh, eval misuse, unguarded cd, unsafe rm globs. +- Security: command injection, curl|bash of untrusted input, secret/token leakage, unsafe temp files, world-readable creds, permissions. +- Cross-platform / cross-shell parity (bash vs zsh vs PowerShell; macOS vs Linux: sed -i, mktemp, readlink). +- Dockerfile: cache busting, missing cleanup/--no-install-recommends, root vs user, version pinning where it matters, COPY/chmod correctness. +- Config/JSON/YAML: invalid syntax, wrong keys, broken references to renamed/removed paths. +- Dead code, broken cross-file refs, renames the diff didn't propagate everywhere. + +PRECISION (critical for the next stage): +- "file" MUST be the repo-relative path exactly as in the diff. +- "line" MUST be a line number in the NEW (post-change) file — a line on the RIGHT side of the diff (an added '+' line, or a context line inside a changed hunk). Read the file to get the exact number. Prefer an added '+' line. +- "body" is GitHub-Markdown: state the concrete problem, why it matters, and a specific fix (a short \`\`\`suggestion\`\`\` block is ideal). +- Quality over quantity. Skip pure style nits with no functional impact. A finding you are not fairly confident is real does more harm than good downstream. + +Return findings via the structured tool. An empty list is a valid answer.` + +function reviewerPrompt(u) { + const known = (u.known && u.known.length) + ? `\nALREADY-REPORTED in this area by earlier reviewers — do NOT repeat these. Find DIFFERENT, deeper, or adjacent problems the others missed:\n${u.known.map(k => ` - ${k.file}:${k.line} — ${k.title}`).join('\n')}\n` + : '' + const deep = u.depth + ? 'This is a DEEP pass: read each touched file in full, trace control/data flow into the sibling files it sources or calls (and that call it), and reason about non-obvious failure modes, edge cases, and cross-file interactions — not just surface-level bugs.\n' + : '' + return `You are ${u.persona}. Repo root: ${REPO_ROOT}. +Review lens for this pass: ${u.lensDesc} + +Review these changed files: +${u.files.map(f => ' - ' + f).join('\n')} + +${diffSpec(u.files)} + +Use Bash (git diff / grep) and Read freely to get the diff and full surrounding context before judging. +${deep}${known}${SHARED_RULES}` +} + +const FINDINGS_SCHEMA = { + type: 'object', + required: ['findings'], + properties: { + findings: { + type: 'array', + items: { + type: 'object', + required: ['file', 'line', 'severity', 'category', 'title', 'body', 'confidence'], + properties: { + file: { type: 'string' }, + line: { type: 'integer' }, + severity: { type: 'string', enum: ['critical', 'high', 'medium', 'low'] }, + category: { type: 'string' }, + title: { type: 'string' }, + body: { type: 'string' }, + confidence: { type: 'string', enum: ['high', 'medium', 'low'] }, + }, + }, + }, + }, +} + +const VERDICT_SCHEMA = { + type: 'object', + required: ['keep', 'reason'], + properties: { + keep: { type: 'boolean' }, + reason: { type: 'string' }, + adjustedLine: { type: 'integer' }, + adjustedSeverity: { type: 'string', enum: ['critical', 'high', 'medium', 'low'] }, + refinedBody: { type: 'string' }, + inDiff: { type: 'boolean' }, + }, +} + +function verifyPrompt(f) { + return `You are an independent, skeptical senior reviewer. A prior reviewer raised the finding below. REFUTE it unless it clearly holds up. Default keep=false when uncertain, when it is style-only, or when it concerns unchanged/pre-existing code. + +Repo root: ${REPO_ROOT} +Finding file: ${f.file} +Claimed NEW-file line: ${f.line} +Severity: ${f.severity} | Category: ${f.category} +Title: ${f.title} +Body: +${f.body} + +Steps: +1. ${diffSpec([f.file]).split('\n').join('\n ')} + Confirm the cited line is part of this diff (an added '+' line or context line inside a changed hunk). Set inDiff. If the issue is real but the line is slightly off, put the correct NEW-file line (one that IS in the diff) in adjustedLine. +2. Read surrounding code to confirm the problem is REAL with practical impact, not a misreading. +3. Decide keep (true only if real, impactful, tied to changed lines). One-sentence reason. Optionally set adjustedSeverity and improve refinedBody (GitHub-Markdown). + +Return via the structured tool.` +} + +// ---- Multi-pass swarm — wider lenses + finer granularity + deeper digging each pass +// Each pass adds more expert lenses, splits the diff into finer units, and tells +// every reviewer what earlier passes already found so it hunts for NEW, deeper issues. +const PASSES = Math.max(1, cfg.passes || 3) +const MAX_UNITS = cfg.maxReviewersPerPass || 120 // per-pass reviewer cap (cost guard) + +const LENS_SETS = [ + // Pass 1 — broad sweep, one generalist per subsystem + [{ key: 'core', desc: 'overall correctness, logic bugs, and the highest-impact robustness problems' }], + // Pass 2 — specialist quartet, applied per subsystem + [ + { key: 'security', desc: 'security: command/regex injection, secret & token handling, file permissions, unsafe temp files, curl|bash of untrusted input' }, + { key: 'robustness', desc: 'shell robustness & portability: quoting/word-splitting, set -euo pipefail interactions, ignored exit codes, GNU-vs-BSD/macOS, bash-vs-zsh-vs-POSIX' }, + { key: 'consistency', desc: 'cross-file consistency: renamed/removed paths, parser parity between sibling scripts, docs/READMEs that contradict behavior, broken references' }, + { key: 'control-flow', desc: 'control-flow & tool/API semantics: wrong conditionals, early/no-op exits, broken orchestration, misused CLIs/builtins, idempotency on re-run' }, + ], + // Pass 3+ — full battery, per-file granularity, deep flow tracing + [ + { key: 'concurrency', desc: 'concurrency, races, locking, and idempotency under parallel or repeated invocation' }, + { key: 'error-handling', desc: 'error handling & failure modes: partial failures, missing guards, silent skips, cleanup/trap correctness' }, + { key: 'edge-cases', desc: 'edge cases & input validation: empty/whitespace/unicode/missing inputs, unusual paths, boundary conditions' }, + { key: 'perf-resource', desc: 'performance & resource use: redundant work, repeated network/subprocess calls, unbounded loops, leaks' }, + { key: 'docs-ux', desc: 'documentation/UX accuracy: help text, READMEs, comments, and error messages vs actual behavior' }, + ], +] + +function chunk(arr, size) { const out = []; for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size)); return out } +const sevRank = { critical: 0, high: 1, medium: 2, low: 3 } + +const confirmedAll = [] +const seenByFile = {} +function titleKey(t) { return (t || '').toLowerCase().replace(/[^a-z0-9 ]/g, '').split(/\s+/).filter(Boolean).slice(0, 6).join(' ') } +// Dup only if same file AND within ±3 lines AND (identical line OR similar title). +// Different-angle findings (e.g. a security vs a perf issue) on nearby lines survive. +function isNew(f) { + const tk = titleKey(f.title) + for (const e of (seenByFile[f.file] || [])) { + if (Math.abs(e.line - f.line) <= 3 && (e.line === f.line || e.tkey === tk)) return false + } + return true +} +function remember(f) { (seenByFile[f.file] = seenByFile[f.file] || []).push({ line: f.line, tkey: titleKey(f.title) }) } +function knownFor(files) { const s = new Set(files); return confirmedAll.filter(f => s.has(f.file)).map(f => ({ file: f.file, line: f.line, title: f.title })) } +function normalize(x) { + return { + file: x.file, + line: (Number.isInteger(x.verdict.adjustedLine) ? x.verdict.adjustedLine : x.line), + severity: x.verdict.adjustedSeverity || x.severity, + category: x.category, title: x.title, + body: x.verdict.refinedBody || x.body, + confidence: x.confidence, group: x.group, + inDiff: x.verdict.inDiff !== false, verifyReason: x.verdict.reason, + } +} + +let rawTotal = 0 +for (let p = 1; p <= PASSES; p++) { + // widen across passes: each pass applies a DISTINCT tier of lenses (it does NOT + // re-run earlier tiers — re-running 'core' every pass just rediscovers pass-1 + // findings and wastes the budget). Reviewers still get earlier findings as context. + const tier = Math.min(p - 1, LENS_SETS.length - 1) + const lenses = LENS_SETS[tier] + // deepen: finer file chunks + full-file deep reads on later passes + const chunkSize = p <= 1 ? 999 : (p === 2 ? 3 : 2) + const deep = p >= 3 + + // Build per-lens unit lists, then interleave round-robin so that if the per-pass + // cap trims, it trims EVENLY across lenses and subsystems (never starves a lens). + const perLens = lenses.map(lens => { + const arr = [] + for (const g of GROUPS) for (const fc of chunk(g.files, chunkSize)) { + arr.push({ + name: `${g.name}/${lens.key}`, + persona: lens.key === 'core' ? g.persona : `${g.persona}, reviewing specifically through a ${lens.key} lens`, + lensDesc: lens.desc, files: fc, depth: deep, known: knownFor(fc), + }) + } + return arr + }) + const totalUnits = perLens.reduce((n, a) => n + a.length, 0) + let units = [] + for (let i = 0; units.length < totalUnits; i++) { + for (const arr of perLens) if (i < arr.length) units.push(arr[i]) + } + if (units.length > MAX_UNITS) { log(`Pass ${p}: ${totalUnits} reviewer units → capped to ${MAX_UNITS} (interleaved across lenses; raise args.maxReviewersPerPass for fuller coverage)`); units = units.slice(0, MAX_UNITS) } + + phase(`Pass ${p} · Review`) + log(`Pass ${p}/${PASSES}: ${units.length} expert reviewers — lenses [${lenses.map(l => l.key).join(', ')}], chunk=${chunkSize}${deep ? ', deep' : ''}`) + + const reviewed = await pipeline( + units, + u => agent(reviewerPrompt(u), { label: `p${p}:${u.name}`, phase: `Pass ${p} · Review`, schema: FINDINGS_SCHEMA }) + .then(r => ({ u, findings: (r && r.findings) || [] })) + .catch(() => ({ u, findings: [] })), + (res) => parallel((res.findings).map(f => () => + agent(verifyPrompt(f), { label: `p${p}:verify:${(f.file || '').split('/').pop()}:${f.line}`, phase: `Pass ${p} · Verify`, schema: VERDICT_SCHEMA }) + .then(v => ({ ...f, group: res.u.name, verdict: v })) + .catch(() => null) + )), + ) + + const passRaw = reviewed.flat().filter(Boolean) + rawTotal += passRaw.length + const passConfirmed = passRaw.filter(x => x.verdict && x.verdict.keep).map(normalize) + let added = 0 + for (const f of passConfirmed) { if (isNew(f)) { confirmedAll.push(f); remember(f); added++ } } + log(`Pass ${p}: +${added} net-new confirmed (running total ${confirmedAll.length})`) + + if (budget.total && budget.remaining() < 80000) { log(`Budget low (${Math.round(budget.remaining() / 1000)}k left) — stopping after pass ${p}.`); break } +} + +confirmedAll.sort((a, b) => (sevRank[a.severity] - sevRank[b.severity]) || a.file.localeCompare(b.file) || a.line - b.line) +const counts = confirmedAll.reduce((m, f) => (m[f.severity] = (m[f.severity] || 0) + 1, m), {}) +log(`Total confirmed (deduped) across passes: ${confirmedAll.length} from ${rawTotal} raw — ${JSON.stringify(counts)}`) + +// ---- Phase 3: auto-post ONE consolidated GitHub review (PR mode) ----------- +// In PR mode the workflow ALWAYS posts every confirmed finding (unless post=false). +// The posting agent follows an exact, deterministic procedure so it is reliable +// unattended: it parses the diff with the embedded python script (no guessing +// about which lines are commentable) and submits one COMMENT review. +let posted = { attempted: false } +if (POST && MODE === 'pr' && PRNUM && confirmedAll.length) { + phase('Post') + const payload = JSON.stringify({ prNumber: PRNUM, counts, dedupe: DEDUPE, findings: confirmedAll }) + const postResult = await agent( + `Publish the verified findings below as ONE consolidated GitHub pull-request review on PR #${PRNUM}, with each finding as an inline comment. Repo root: ${REPO_ROOT}. This DOES publish to GitHub — that is the intended behavior, post everything that maps. Use Bash (gh, git, python3). + +FINDINGS JSON (write it to a temp file, e.g. /tmp/swarm_findings.json): +${payload} + +Run EXACTLY this procedure (do not improvise the diff parsing): + +STEP 1 — fetch the diff and owner/repo: + gh pr diff ${PRNUM} > /tmp/pr_${PRNUM}.diff + OWNER_REPO=$(gh repo view --json owner,name --jq '.owner.login + "/" + .name') + +STEP 2 — run this python3 script verbatim (it parses commentable RIGHT-side lines, snaps each finding to a valid line within ±3, optionally dedupes against existing PR comments, and writes the review payload): + + cat > /tmp/build_review.py <<'PY' + import json, re, subprocess, sys + PR = "${PRNUM}" + data = json.load(open('/tmp/swarm_findings.json')) + findings = data['findings']; counts = data['counts']; dedupe = data.get('dedupe', False) + # 1. valid RIGHT-side (new-file) line numbers per path + valid = {}; cur=None; new_ln=None + for line in open('/tmp/pr_%s.diff' % PR): + if line.startswith('diff --git '): cur=None; new_ln=None; continue + if line.startswith('+++ '): + p=line[4:].strip(); cur=None if p=='/dev/null' else (p[2:] if p.startswith('b/') else p) + if cur: valid.setdefault(cur,set()) + continue + if line.startswith('@@'): + m=re.search(r'\\+(\\d+)(?:,(\\d+))?',line); new_ln=int(m.group(1)) if m else None; continue + if cur is None or new_ln is None: continue + if line.startswith('+') and not line.startswith('+++'): valid[cur].add(new_ln); new_ln+=1 + elif line.startswith('-') and not line.startswith('---'): pass + elif line.startswith('\\\\'): pass + else: valid[cur].add(new_ln); new_ln+=1 + # 2. optional dedupe vs existing PR comments (existing comments file passed via EXISTING_JSON env) + import os + posted={} + if dedupe and os.environ.get('EXISTING_JSON'): + for c in json.load(open(os.environ['EXISTING_JSON'])): + ln=c.get('line') or c.get('original_line') + if c.get('path') and ln: posted.setdefault(c['path'],[]).append(ln) + # 3. map findings + comments=[]; unmapped=[] + for f in findings: + if dedupe and any(abs(l-f['line'])<=6 for l in posted.get(f['file'],[])): + continue + vs=valid.get(f['file']); ln=f['line']; chosen=None + if vs: + if ln in vs: chosen=ln + else: + cands=[l for l in vs if abs(l-ln)<=3] + if cands: chosen=min(cands,key=lambda l:(abs(l-ln),l)) + if chosen is not None: + comments.append({"path":f['file'],"line":chosen,"side":"RIGHT","body":"**[%s]** %s"%(f['severity'],f['body'])}) + else: + unmapped.append(f) + # 4. summary body + hi=[f for f in findings if f['severity']=='high' or f['severity']=='critical'] + lines=["## 🤖 AI Swarm Code Review",""] + lines.append("Deep multi-agent review: expert subagents partitioned the diff by subsystem; every finding was adversarially verified by an independent skeptic before posting.") + lines.append("") + lines.append("**Confirmed findings: %d** — %s. %d posted as inline comments below." % (len(findings), json.dumps(counts), len(comments))) + if hi: + lines.append(""); lines.append("Highlights (high severity):") + for f in hi[:6]: lines.append("- **%s** — %s" % (f['file'].split('/')[-1], f['title'])) + if unmapped: + lines.append(""); lines.append("Findings that could not be mapped to a diff line (shown here instead):") + for f in unmapped: lines.append("- **[%s] %s:%s** — %s" % (f['severity'], f['file'], f['line'], f['title'])) + lines.append(""); lines.append("_Advisory; severities are the swarm's estimate. Generated with Claude Code._") + payload={"event":"COMMENT","body":"\\n".join(lines),"comments":comments} + json.dump(payload, open('/tmp/review_payload.json','w')) + print("MAPPED",len(comments),"UNMAPPED",len(unmapped)) + PY + ${DEDUPE ? `gh api --paginate repos/$OWNER_REPO/pulls/${PRNUM}/comments > /tmp/existing_comments.json; EXISTING_JSON=/tmp/existing_comments.json python3 /tmp/build_review.py` : `python3 /tmp/build_review.py`} + +STEP 3 — submit ONE review: + gh api --method POST repos/$OWNER_REPO/pulls/${PRNUM}/reviews --input /tmp/review_payload.json --jq '{id, state, html_url}' + +STEP 4 — if the API returns 422 mentioning a specific line/path, remove that one comment from /tmp/review_payload.json (python or jq) and resubmit so the rest still post. Repeat at most 3 times. + +STEP 5 — verify and report: count how many comments belong to the new review id and return a concise report: number of inline comments posted, number unmapped, and the review html_url.`, + { label: 'post:github-review', phase: 'Post' }, + ) + posted = { attempted: true, report: postResult } + log('Auto-posted GitHub review.') +} else if (POST && MODE === 'pr' && !confirmedAll.length) { + log('No confirmed findings — nothing to post.') +} else if (!POST && MODE === 'pr') { + log('post=false — skipping GitHub posting (findings returned in result).') +} + +return { + mode: MODE, + base: BASE, + prNumber: PRNUM, + passes: PASSES, + rawFindings: rawTotal, + confirmedCount: confirmedAll.length, + counts, + confirmed: confirmedAll, + posted, +} diff --git a/config/shell/zshrc b/config/shell/zshrc index c3b0621..69d627e 100755 --- a/config/shell/zshrc +++ b/config/shell/zshrc @@ -609,8 +609,69 @@ alias devbench-status='docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Por alias devbench-stop='docker stop java_bench dot_net_bench flutter_bench 2>/dev/null || true' # End DevBench Aliases -# Claude alias -alias yolo="claude --dangerously-skip-permissions --teammate-mode tmux" +# Claude helper. Runs the main Claude session inside tmux so another operator +# can capture and drive it, while still letting Claude use tmux for teammates. +_yolo_shell_quote() { + local quoted="" arg + + for arg in "$@"; do + quoted="${quoted} $(printf '%q' "$arg")" + done + + printf '%s\n' "${quoted# }" +} + +unalias yolo 2>/dev/null || true +yolo() { + local session_name command_string prompt_file + local -a prompt_args + + if ! command -v claude >/dev/null 2>&1; then + echo "yolo: Claude CLI not found on PATH" >&2 + return 1 + fi + + if ! command -v tmux >/dev/null 2>&1; then + echo "yolo: tmux not found on PATH" >&2 + return 1 + fi + + prompt_file="" + for candidate_prompt_file in \ + "$HOME/.claude/prompts/speckit-dashboard-full.md" \ + "/usr/local/share/ct/claude/prompts/speckit-dashboard-full.md" \ + "$HOME/.claude/prompts/speckit-dashboard-bootstrap.md"; do + if [ -r "$candidate_prompt_file" ]; then + prompt_file="$candidate_prompt_file" + break + fi + done + prompt_args=() + if [ -n "$prompt_file" ]; then + prompt_args=(--append-system-prompt-file "$prompt_file") + fi + + if [ -n "${TMUX:-}" ]; then + claude --dangerously-skip-permissions --teammate-mode tmux "${prompt_args[@]}" "$@" + return $? + fi + + session_name="yolo-$(date +%Y%m%d%H%M%S)-$$" + command_string=$(_yolo_shell_quote \ + claude \ + --dangerously-skip-permissions \ + --teammate-mode tmux \ + "${prompt_args[@]}" \ + "$@") || return 1 + + tmux new-session -d -s "$session_name" -c "$PWD" "exec $command_string" || { + echo "yolo: failed to start tmux session" >&2 + return 1 + } + + tmux set-option -t "$session_name" mouse on >/dev/null 2>&1 || true + tmux attach-session -t "$session_name" +} export PATH="$HOME/.npm-global/bin:$PATH" export PATH="/workspace/.venv/bin:$PATH" @@ -641,4 +702,3 @@ else fi unset __conda_setup # <<< conda initialize <<< - diff --git a/devBenches/.devcontainer/devcontainer.json b/devBenches/.devcontainer/devcontainer.json index 4bfe9d3..4581d8a 100644 --- a/devBenches/.devcontainer/devcontainer.json +++ b/devBenches/.devcontainer/devcontainer.json @@ -3,6 +3,7 @@ "dockerComposeFile": ["docker-compose.yml", "docker-compose.override.yml"], "service": "devbench", "workspaceFolder": "/workspace", + "initializeCommand": "bash ${localWorkspaceFolder}/scripts/ensure-sonarqube-mcp.sh", "shutdownAction": "stopCompose", "forwardPorts": [1455], diff --git a/devBenches/README.md b/devBenches/README.md index b15001d..8c8a71a 100755 --- a/devBenches/README.md +++ b/devBenches/README.md @@ -10,7 +10,8 @@ Each subfolder is a separate git repository containing a complete development en - **`dotNetBench/`** - .NET development environment with DevContainer - **`flutterBench/`** - Flutter/Dart development environment with DevContainer - **`javaBench/`** - Java development environment with DevContainer -- **`pythonBench/`** - Python development environment with DevContainer +- **`phpBench/`** - PHP development environment with DevContainer +- **`pyBench/`** - Python development environment with DevContainer ## Layered Containers (Current Standard) @@ -20,6 +21,17 @@ All benches are moving to the layered image model described in `workBenches/docs - **Layer 2**: `-bench:latest` - **Layer 3**: `-bench:{user}` (user personalization) +Layer 1a carries the shared developer tooling used by all devBenches, including: +- `sonar-scanner` - SonarScanner CLI for project analysis uploads to SonarQube Server or SonarQube Cloud +- `sonar` - SonarQube CLI for issue/project workflows, secrets scanning, and agent integrations +- `sonar-env` - container-safe Sonar environment loader that reads `~/.config/sonarqube/sonar.env` +- `gt` - Graphite CLI for stacked pull request workflows + +Devcontainers should mount `~/.config/sonarqube` read-only or mount the full host +home directory. The shared `sonar-env` helper then loads tokens at runtime and +sets `SONARQUBE_CLI_KEYCHAIN_FILE` to a writable file-backed keychain so `sonar` +does not depend on a desktop keychain service inside containers. + ## Legacy Monolithic DevContainers (Deprecated) Some benches still include a `.devcontainer/` directory with a monolithic Dockerfile. These are **legacy** and should not be used as the source of truth. Use the layered images and bench-level build scripts instead; treat monolithic Dockerfiles as deprecated artifacts until removed. diff --git a/devBenches/base-image/Dockerfile b/devBenches/base-image/Dockerfile index 3403649..acf9793 100644 --- a/devBenches/base-image/Dockerfile +++ b/devBenches/base-image/Dockerfile @@ -1,6 +1,6 @@ # Layer 1a: Developer Base Image -# Extends Layer 0 with Python, Node.js LTS, and dev tools -# AI CLIs are inherited from Layer 0 (workbench-base) +# Extends Layer 0 with Python, Node.js LTS, dev tools, and spec CLIs +# Most AI CLIs are inherited from Layer 0 (workbench-base) # Used by ALL developer benches (Frappe, Flutter, .NET, etc.) # USER-AGNOSTIC: No user creation — Layer 3 handles user setup @@ -9,7 +9,7 @@ FROM workbench-base:latest # Container version labels LABEL layer="1" LABEL layer.name="dev-bench-base" -LABEL layer.version="2.2.0" +LABEL layer.version="2.2.3" LABEL layer.description="Developer base with Python, Node.js LTS, dev tools, Playwright browsers, and generic testing tools (user-agnostic)" # Everything runs as root @@ -50,11 +50,12 @@ RUN pip install --break-system-packages \ pytest \ ipython -# Install Node.js development tools (husky, commitlint) +# Install Node.js development tools and cross-repo workflow CLIs. RUN npm install -g \ husky \ @commitlint/cli \ - @commitlint/config-conventional + @commitlint/config-conventional \ + @withgraphite/graphite-cli@stable # ======================================== # DEVELOPER TOOLS SETUP @@ -63,7 +64,71 @@ RUN npm install -g \ # Verify Python and pip RUN python3 --version && pip --version -# uv, AI CLIs are inherited from Layer 0 (workbench-base) +# uv and most AI CLIs are inherited from Layer 0 (workbench-base) + +# ======================================== +# SPEC-DRIVEN DEVELOPMENT TOOLS +# ======================================== + +# Remove any inherited copies so Layer 1a is the clear owner of these tools. +RUN npm uninstall -g @fission-ai/openspec || true \ + && rm -f /usr/bin/openspec /usr/local/bin/openspec \ + && rm -rf /usr/lib/node_modules/@fission-ai/openspec /usr/local/lib/node_modules/@fission-ai/openspec \ + && uv tool uninstall specify-cli || true \ + && env UV_TOOL_BIN_DIR=/usr/local/bin UV_TOOL_DIR=/opt/uv/tools uv tool uninstall specify-cli || true \ + && rm -f /root/.local/bin/specify /usr/local/bin/specify \ + && rm -rf /root/.local/share/uv/tools/specify-cli /opt/uv/tools/specify-cli + +# Install spec-driven CLIs only in developer benches, not every bench family. +RUN mkdir -p /opt/uv/tools /root/.local/share/uv \ + && UV_TOOL_BIN_DIR=/usr/local/bin UV_TOOL_DIR=/opt/uv/tools \ + uv tool install specify-cli --from git+https://github.com/github/spec-kit.git --python-preference system \ + && ln -sfn /opt/uv/tools /root/.local/share/uv/tools \ + || echo "spec-kit installation skipped (non-fatal)" + +RUN npm install -g @fission-ai/openspec@latest \ + || echo "OpenSpec installation skipped (non-fatal)" + +# Shared OpenSpec/Speckit project bootstrapper. Keep it with the spec-driven +# CLIs so every developer bench can initialize the same agent context files. +COPY files/openspeckit/setup-openspeckit /usr/local/bin/setup-openspeckit +RUN chmod 0755 /usr/local/bin/setup-openspeckit \ + && ln -sfn setup-openspeckit /usr/local/bin/setup-openspec-speckit-project + +# OpenSpec Claude commands and skills live with the devBench OpenSpec CLI. +RUN mkdir -p /etc/skel/.claude/commands/opsx \ + && mkdir -p /etc/skel/.claude/skills/opsx-clarify \ + && mkdir -p /etc/skel/.claude/skills/opsx-analyze +COPY files/claude/commands/opsx/ /etc/skel/.claude/commands/opsx/ +COPY files/claude/skills/opsx-clarify/ /etc/skel/.claude/skills/opsx-clarify/ +COPY files/claude/skills/opsx-analyze/ /etc/skel/.claude/skills/opsx-analyze/ + +# Shared, project-agnostic Speckit worktree helpers for all developer benches. +COPY files/ct/ct-functions.zsh /usr/local/share/ct/ct-functions.zsh +COPY files/ct/claude/ /usr/local/share/ct/claude/ +RUN chmod 0644 /usr/local/share/ct/ct-functions.zsh \ + && chmod 0755 /usr/local/share/ct/claude/speckit-dashboard.sh \ + /usr/local/share/ct/claude/speckit-dashboard-sync.sh \ + /usr/local/share/ct/claude/speckit-dash-toggle.sh \ + && chmod 0644 /usr/local/share/ct/claude/prompts/speckit-dashboard-full.md \ + && mkdir -p /etc/skel/.claude/prompts \ + && cp /usr/local/share/ct/claude/speckit-dashboard.sh /etc/skel/.claude/speckit-dashboard.sh \ + && cp /usr/local/share/ct/claude/speckit-dashboard-sync.sh /etc/skel/.claude/speckit-dashboard-sync.sh \ + && cp /usr/local/share/ct/claude/speckit-dash-toggle.sh /etc/skel/.claude/speckit-dash-toggle.sh \ + && cp /usr/local/share/ct/claude/prompts/speckit-dashboard-full.md /etc/skel/.claude/prompts/speckit-dashboard-full.md \ + && chmod 0755 /etc/skel/.claude/speckit-dashboard.sh \ + /etc/skel/.claude/speckit-dashboard-sync.sh \ + /etc/skel/.claude/speckit-dash-toggle.sh \ + && chmod 0644 /etc/skel/.claude/prompts/speckit-dashboard-full.md + +# Speckit worktree bootstrap. This installs a stable command outside Speckit +# itself so generated Speckit files can be refreshed and the worktree workflow +# can be reapplied afterwards. +COPY files/speckit-worktree/templates /usr/local/share/speckit-worktree/templates +COPY files/speckit-worktree/speckit-worktree-enable /usr/local/bin/speckit-worktree-enable +RUN chmod 0755 /usr/local/bin/speckit-worktree-enable && \ + find /usr/local/share/speckit-worktree/templates -type f \( -name '*.sh' -o -name '*.zsh' -o -name '*.ps1' \) -exec chmod 0755 {} + && \ + find /usr/local/share/speckit-worktree/templates -type f ! \( -name '*.sh' -o -name '*.zsh' -o -name '*.ps1' \) -exec chmod 0644 {} + # System-wide Corepack cache ENV COREPACK_HOME=/opt/corepack @@ -80,6 +145,14 @@ RUN mkdir -p /opt/corepack && \ COPY install-testing-tools.sh /tmp/ RUN bash /tmp/install-testing-tools.sh && rm -f /tmp/install-testing-tools.sh +# Container-safe SonarQube/SonarCloud environment. These helpers make auth +# usable without libsecret. +COPY files/sonarqube/sonarqube-cli-env.sh /usr/local/share/sonarqube/sonarqube-cli-env.sh +COPY files/sonarqube/sonar-env /usr/local/bin/sonar-env +RUN chmod 0644 /usr/local/share/sonarqube/sonarqube-cli-env.sh \ + && chmod 0755 /usr/local/bin/sonar-env \ + && ln -sfn /usr/local/share/sonarqube/sonarqube-cli-env.sh /etc/profile.d/sonarqube-cli.sh + # ======================================== # ZSH PLUGINS (into /etc/skel) # ======================================== @@ -99,6 +172,15 @@ RUN if [ ! -f /etc/skel/.zshrc ]; then \ # Update /etc/skel/.zshrc to include plugins RUN sed -i 's/plugins=(git)/plugins=(git zsh-autosuggestions zsh-syntax-highlighting)/' /etc/skel/.zshrc +# Source Speckit worktree helpers from global shell startup so they remain +# available even when benches mount a host ~/.zshrc over the generated one. +RUN printf '\n# DevBench Speckit worktree helpers\n[[ -f /usr/local/share/ct/ct-functions.zsh ]] && source /usr/local/share/ct/ct-functions.zsh\n' >> /etc/zsh/zshrc && \ + printf '\n# DevBench Speckit worktree helpers\n[ -f /usr/local/share/ct/ct-functions.zsh ] && . /usr/local/share/ct/ct-functions.zsh\n' >> /etc/bash.bashrc + +# Source SonarQube CLI env defaults from global interactive shell startup. +RUN printf '\n# DevBench SonarQube CLI environment\n[ -f /usr/local/share/sonarqube/sonarqube-cli-env.sh ] && . /usr/local/share/sonarqube/sonarqube-cli-env.sh\n' >> /etc/zsh/zshrc && \ + printf '\n# DevBench SonarQube CLI environment\n[ -f /usr/local/share/sonarqube/sonarqube-cli-env.sh ] && . /usr/local/share/sonarqube/sonarqube-cli-env.sh\n' >> /etc/bash.bashrc + # Add bash_profile to /etc/skel (force zsh when bash is requested) RUN echo '# Force zsh when bash is requested' > /etc/skel/.bash_profile && \ echo 'if [ -n "$PS1" ] && [ -z "$ZSH_VERSION" ]; then' >> /etc/skel/.bash_profile && \ diff --git a/devBenches/base-image/build.sh b/devBenches/base-image/build.sh index 6e7b6a3..ab2e564 100755 --- a/devBenches/base-image/build.sh +++ b/devBenches/base-image/build.sh @@ -18,9 +18,11 @@ LEGACY_IMAGE="$(legacy_family_base_image dev)" cd "$SCRIPT_DIR" # Parse arguments (--user is accepted but ignored for backward compat) +NO_CACHE="${NO_CACHE:-false}" while [[ $# -gt 0 ]]; do case $1 in --user) shift 2 ;; + --no-cache) NO_CACHE=true; shift ;; *) shift ;; esac done @@ -28,6 +30,7 @@ done echo "Configuration:" echo " Tag: $CANONICAL_IMAGE (user-agnostic)" echo " Legacy alias: $LEGACY_IMAGE" +echo " No cache: $NO_CACHE" echo "" # Check if Layer 0 exists @@ -43,6 +46,7 @@ fi # Build the image echo "Building $CANONICAL_IMAGE..." docker build \ + $([ "$NO_CACHE" = true ] && printf '%s\n' "--no-cache") \ -t "$CANONICAL_IMAGE" \ . tag_family_base_legacy_alias dev diff --git a/devBenches/base-image/files/claude/commands/opsx/apply.md b/devBenches/base-image/files/claude/commands/opsx/apply.md new file mode 100644 index 0000000..3b9d1f9 --- /dev/null +++ b/devBenches/base-image/files/claude/commands/opsx/apply.md @@ -0,0 +1,226 @@ +--- +name: "OPSX: Apply" +description: Implement tasks from an OpenSpec change using agent teams for parallel execution +category: Workflow +tags: [workflow, artifacts, experimental, teams] +--- + +Implement tasks from an OpenSpec change. Uses agent teams to parallelize independent tasks across non-overlapping file groups. + +**Input**: Optionally specify a change name (e.g., `/opsx:apply add-auth`). If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes. + +--- + +## Phase 1: Select & Load Context + +1. **Select the change** + + If a name is provided, use it. Otherwise: + - Infer from conversation context if the user mentioned a change + - Auto-select if only one active change exists + - If ambiguous, run `openspec list --json` and use **AskUserQuestion** to let the user select + + Always announce: "Using change: " + +2. **Check status** + ```bash + openspec status --change "" --json + ``` + Parse `schemaName`, artifact status, and which artifact contains tasks. + +3. **Get apply instructions** + ```bash + openspec instructions apply --change "" --json + ``` + - If `state: "blocked"` (missing artifacts): show message, suggest `/opsx:propose` + - If `state: "all_done"`: congratulate, suggest `/opsx:archive` + - Otherwise: proceed + +4. **Read context files** + + Read ALL files from `contextFiles` in the apply instructions output (proposal, design, specs, tasks, clarifications if present). + +5. **Show progress** + - Schema being used + - "N/M tasks complete" + - Remaining tasks overview + +--- + +## Phase 2: Task Analysis & Grouping + +Before implementing, analyze the pending tasks to determine the execution strategy. + +### Step 1: Build the task dependency graph + +For each pending task, determine: +- **Which files it will create or modify** (infer from the task description and design.md) +- **Which tasks it depends on** (does it reference output from another task?) +- **Which tasks are independent** (no shared files, no dependency) + +### Step 2: Choose execution strategy + +**Sequential (no team)** — Use when: +- 5 or fewer pending tasks +- Most tasks depend on each other linearly +- Tasks touch overlapping files +- The user asks to go one-by-one + +**Parallel (agent team)** — Use when: +- 6+ pending tasks remaining +- Tasks can be grouped into 2+ independent clusters +- Clusters touch non-overlapping files + +If parallel, proceed to Phase 3. If sequential, skip to Phase 4. + +### Step 3: Group tasks into work packages + +Group independent tasks into **work packages**, where each package: +- Contains tasks that share related files (same service, same model, same test file) +- Has **zero file overlap** with other packages +- Has its internal tasks ordered by dependency + +Example grouping: +``` +Package A (buffer-service): Tasks 1.1, 1.2, 1.3, 2.1, 2.2 → touches buffer_service.dart, buffer_manifest.dart +Package B (packaging-service): Tasks 6.1, 6.2, 7.1, 7.2 → touches packaging_service.dart, package_metadata.dart +Package C (ui-widgets): Tasks 5.2, 5.3, 12.3 → touches orphan_dialog.dart, incomplete_dialog.dart +Package D (integration-tests): Tasks 13.1, 13.2 → touches test/ files +``` + +**Dependencies between packages**: If Package B depends on Package A completing first, mark it. Only packages with no blockers get spawned in the first wave. + +--- + +## Phase 3: Team Execution (Parallel) + +### Create the team +Use **TeamCreate** to create a team (e.g., `apply-`). + +### Create task items +Use **TaskCreate** for each work package. Include: +- All tasks in the package with their descriptions +- The files to create/modify +- Context: which design.md sections and spec scenarios are relevant +- Dependencies on other packages (use `addBlockedBy` if needed) + +### Spawn teammates +Use the team subagent mechanism to spawn one `general-purpose` teammate per +work package, in parallel. Each teammate must be launched into the team created +above, for example: + +```text +Task({ + team_name: "apply-", + name: "", + subagent_type: "general-purpose", + run_in_background: true +}) +``` + +Do not use plain one-shot Agent subagents for this phase. They cannot claim +team tasks, receive inbox messages, or participate in shutdown. Each teammate +prompt must include: + +1. The team name +2. The task ID to claim +3. Full file paths for all context files (proposal, design, specs, clarifications) +4. The specific tasks to implement, in order +5. The files they own (create/modify only these) +6. Instruction to mark tasks complete with `- [ ]` → `- [x]` in tasks.md — **but only their assigned tasks** +7. Instruction to report back when done or blocked + +**CRITICAL file ownership rules:** +- Each agent ONLY modifies files in its assigned package +- `tasks.md` checkbox updates: each agent updates ONLY its own task checkboxes +- If an agent discovers it needs to modify a file owned by another agent, it reports the dependency instead of making the change + +### Monitor & coordinate +- Wait for agents to complete or report blockers +- If an agent is blocked on another package, check if the blocking package is done +- When a wave completes, check for newly-unblocked packages and spawn the next wave +- Handle conflicts: if two agents report needing the same file, reassign one + +### Shutdown +After all packages complete: +- Send **shutdown_request** to all teammates +- **TeamDelete** to clean up + +--- + +## Phase 4: Sequential Execution (Fallback) + +For each pending task: +- Show which task is being worked on +- Make the code changes required +- Keep changes minimal and focused +- Mark task complete: `- [ ]` → `- [x]` +- Continue to next task + +**Pause if:** +- Task is unclear → ask for clarification +- Implementation reveals a design issue → suggest updating artifacts +- Error or blocker encountered → report and wait for guidance +- User interrupts + +--- + +## Phase 5: Completion + +### Show final status + +``` +## Implementation Complete + +**Change:** +**Schema:** +**Strategy:** [Sequential | Parallel — N agents, M waves] +**Progress:** N/N tasks complete + +### Completed This Session +- [x] Task 1.1 — description +- [x] Task 1.2 — description +... + +All tasks complete! Run `/opsx:archive` to archive this change. +``` + +### On pause (issue encountered) + +``` +## Implementation Paused + +**Change:** +**Schema:** +**Progress:** N/M tasks complete + +### Issue Encountered + + +**Options:** +1.