diff --git a/.gitignore b/.gitignore
index 4d3500f..e66ee9e 100755
--- a/.gitignore
+++ b/.gitignore
@@ -7,7 +7,8 @@ devBenches/flutterBench/
devBenches/frappeBench/
devBenches/javaBench/
devBenches/dotNetBench/
-devBenches/pythonBench/
+devBenches/pyBench/
+devBenches/phpBench/
# Installation tracking (local only)
.installed-benches.json
@@ -17,11 +18,23 @@ config/version-manifest.json
# Common ignores
.DS_Store
+__pycache__/
+*.py[cod]
*.log
*.tmp
node_modules/
+
+# Secret and machine-local environment files. Keep template examples tracked.
.env
+.env.*
+*.env
+!.env.example
+!.env.sample
+!.env.template
+*.backup
*.bak
+*.bak.*
+secrets/
*.swp
*.swo
@@ -38,4 +51,6 @@ bioBenches/gentecBench/
logs/
.codex
.codex/
+.claude/dashboard.md
+.claude/speckit-history.md
sysBenches/opsBench/
diff --git a/PROTOCOL.md b/PROTOCOL.md
new file mode 100644
index 0000000..5447ac5
--- /dev/null
+++ b/PROTOCOL.md
@@ -0,0 +1,154 @@
+# Asynchronous Clarify Protocol
+
+A custom clarify step for [GitHub Spec Kit](https://github.com/github/spec-kit) +
+Claude Code. It fans question generation out across reviewer angles, logs the questions
+to a file, and finishes clarify **automatically** once they are answered — no manual
+re-running.
+
+## Why
+
+Spec Kit's stock `/speckit.clarify` is synchronous: it asks, you answer, all in one
+sitting. Real clarification has human latency — a domain expert may take a day to answer
+a compliance question. This protocol decouples **question generation** (cheap, parallel,
+done by AI) from **answering** (slow, human) from **application** (one edit to `spec.md`),
+and uses a polling loop so the loop, not you, watches for completion.
+
+## The four files
+
+| File | Role |
+|------|------|
+| `.claude/commands/openClarify.md` | Orchestrator. Resolves the feature dir, initializes the log, triggers the generation fan-out, registers the poll loop. Never answers questions. |
+| `.claude/commands/openClarify-resume.md` | Poll tick. Reads the log and branches cheaply; only the all-answered tick edits `spec.md`. Enforces the critical-class human gate. |
+| `templates/clarify-log.template.md` | The log schema: two top-of-file sentinels + per-question blocks. |
+| `PROTOCOL.md` | This document. |
+
+## Data flow
+
+```
+/openClarify [feature-dir]
+ ├─ verify spec.md exists
+ ├─ init clarify-log.md from template (GENERATION: PENDING, CLARIFY: IN_PROGRESS)
+ ├─ workflow: fan out reviewer angles ──┐
+ │ data-model ┐ │ each appends OPEN question blocks
+ │ edge-cases ┤ │ merge + dedupe, cap 25
+ │ security-compliance ┤ │ then flip GENERATION: COMPLETE
+ │ testability ┤ │
+ │ integration ┘ │
+ └─ register /loop 10m /openClarify-resume
+
+ ... humans answer blocks in clarify-log.md over time ...
+
+/openClarify-resume (every 10 min)
+ ├─ log missing? → no-op
+ ├─ CLARIFY: COMPLETE? → no-op, stop loop
+ ├─ GENERATION != COMPLETE → no-op (sentinel guard: no early firing)
+ ├─ any status: OPEN? → no-op (still waiting)
+ ├─ critical answered by architect-ai? → no-op (human escalation)
+ └─ all answered, criticals human-signed → edit spec.md, CLARIFY: COMPLETE, stop loop
+```
+
+## The log schema
+
+Two **sentinels** at the top of `clarify-log.md` are the entire coordination contract:
+
+- `GENERATION: PENDING | COMPLETE` — set `COMPLETE` only when the fan-out has written
+ every question. Until then the poller refuses to act.
+- `CLARIFY: IN_PROGRESS | COMPLETE` — set `COMPLETE` only after answers are applied to
+ `spec.md`.
+
+Each question is a block:
+
+```
+## Q3
+- id: q3
+- status: OPEN # OPEN | ANSWERED
+- class: normal # normal | critical
+- agent: security-compliance # which reviewer angle raised it
+- question: How long is PHI retained after account deletion?
+- answer:
+- answered_by: # human | architect-ai
+- ts:
+```
+
+## Three design guarantees
+
+1. **Sentinel guard against early firing.** The poller treats `GENERATION: COMPLETE` as a
+ precondition. A log that is mid-generation can momentarily show "zero OPEN questions"
+ simply because no questions have been written yet — the guard stops the poller from
+ misreading that as "all answered" and prematurely editing `spec.md`.
+
+2. **Critical-class human escalation.** A `class: critical` question (clinical / regulated /
+ safety-impacting) answered by `architect-ai` does **not** clear. Completion blocks until
+ a human re-answers or confirms it. The AI can draft; only a human signature releases the
+ gate.
+
+3. **Cheap read-and-branch ticks.** Nearly every poll tick just greps the sentinels and a
+ handful of `status:` lines, then exits. Exactly one tick — the one that sees everything
+ answered and all criticals human-signed — does the expensive `spec.md` edit. Polling
+ every 10 minutes is therefore nearly free.
+
+## Relationship to stock `/speckit.clarify` — audit pass
+
+This protocol **does the clarification work**, then stock `/speckit.clarify` runs **after**
+as an **audit**, not as the primary clarifier.
+
+Because `/openClarify-resume` writes answers in stock's canonical shape — a
+`## Clarifications` section with `### Session YYYY-MM-DD` and `- Q: … → A: …` bullets, plus
+the answer folded into the relevant spec section — a subsequent stock run sees those points
+as already resolved. Stock decides what to ask by scanning **spec sections** against its
+coverage taxonomy (Clear / Partial / Missing), so the folding in §3b is what actually makes
+the audit quiet, not the log bullets.
+
+**How to use the audit:** after our protocol marks `CLARIFY: COMPLETE`, run stock
+`/speckit.clarify` (the `speckit-clarify` skill) a few times.
+
+- **No new questions** → our five reviewer angles covered the spec to stock's standard. Proceed to `/speckit.plan`.
+- **New questions** → a real coverage gap. Most will land in taxonomy categories our angles
+ don't target: **functional scope & behavior, interaction/UX flow, non-functional
+ (performance, scalability, reliability, observability), constraints & tradeoffs, and
+ terminology consistency**. Our angles cover data-model, edge-cases, security/compliance,
+ testability, and integration — so those five categories are the expected blind spots.
+
+Treat stock's output as a **regression check on our generation coverage**. If a category
+keeps surfacing, add a reviewer angle for it to the fan-out in `/openClarify`.
+
+> Note: in this environment stock clarify is overridden (see `~/.claude` global config) to
+> ask up to **25** questions in **block form**, written to `/clarify-questions.md`
+> — so the audit produces a diffable file rather than a one-at-a-time interactive loop.
+
+```
+/openClarify → (async answers) → resume applies + canonical format → CLARIFY: COMPLETE
+ │
+ /speckit.clarify ×N (audit)
+ │
+ new questions? → new clarify cycle ; else → /speckit.plan
+```
+
+## Known gaps / operational notes
+
+- **Generation script is authored on first run.** The `workflow` keyword lets the Claude
+ Code runtime author the fan-out's internal script the first time `/openClarify`
+ runs. **Save that run as `/openClarify-generate`** so subsequent features reuse it
+ instead of re-authoring the fan-out each time.
+
+- **`/loop` is session-scoped with a 3-day cap.** It only survives while the session is
+ alive and stops after ~3 days. For human turnaround longer than that, swap the in-session
+ loop for an external scheduler:
+
+ ```
+ cron + claude -p /openClarify-resume
+ ```
+
+ e.g. a crontab entry running `claude -p "/openClarify-resume specs/my-feature/"` every
+ 15 minutes, which survives restarts and arbitrary human latency.
+
+## Usage
+
+```
+# 1. start (defaults to specs//)
+/openClarify
+
+# 2. humans edit clarify-log.md, filling answer / answered_by / status: ANSWERED
+
+# 3. nothing else to do — the loop applies answers to spec.md and stops itself
+```
diff --git a/README.md b/README.md
index 3340980..6c318f7 100755
--- a/README.md
+++ b/README.md
@@ -24,14 +24,15 @@ Safe to run repeatedly. Installed benches show `✓ up to date` and are skipped.
## Docker Image Layers
```
-Layer 0: workbench-base:latest — Ubuntu 24.04 + git, zsh, curl, AI CLIs, bun
- ├─ Layer 1a: dev-bench-base:latest — Python, Node.js LTS, npm, dev tools, testing tools, Playwright Chromium
+Layer 0: workbench-base:latest — Ubuntu 24.04 + git, zsh, curl, shared AI CLIs, bun
+ ├─ Layer 1a: dev-bench-base:latest — Python, Node.js LTS, npm, dev tools, OpenSpec, spec-kit, testing tools, Playwright Chromium
│ ├─ Layer 2: cpp-bench:latest — GCC, CMake, vcpkg
│ ├─ Layer 2: dotnet-bench:latest — .NET SDK 8/9
│ ├─ Layer 2: flutter-bench:latest — Flutter SDK, Dart, Android tools
│ ├─ Layer 2: frappe-bench:latest — MariaDB client, Redis, Nginx, bench CLI (Node.js 20)
│ ├─ Layer 2: java-bench:latest — OpenJDK 21, Maven, Gradle, Spring CLI
- │ ├─ Layer 2: python-bench:latest — Python dev tools (thin layer on 1a)
+ │ ├─ Layer 2: php-bench:latest — PHP 8.3, Composer, PHPUnit, Xdebug
+ │ ├─ Layer 2: py-bench:latest — Python dev tools (thin layer on 1a)
│ └─ Layer 2: go-bench:latest — Go toolchain
├─ Layer 1b: sys-bench-base:latest — Kubernetes, Terraform, cloud CLIs
│ └─ Layer 2: cloud-bench:latest — Cloud admin tools
@@ -109,7 +110,8 @@ workBenches/
│ ├── frappeBench/ ← Frappe/ERPNext bench (opensoft/frappeBench)
│ ├── goBench/ ← Go bench (opensoft/goBench)
│ ├── javaBench/ ← Java bench (opensoft/javaBench)
-│ └── pythonBench/ ← Python bench (opensoft/pythonBench)
+│ ├── phpBench/ ← PHP bench (opensoft/phpBench)
+│ └── pyBench/ ← Python bench (opensoft/pyBench)
├── sysBenches/
│ ├── base-image/ ← Layer 1b: sys-bench-base Dockerfile
│ ├── cloudBench/ ← Cloud admin bench (opensoft/cloudBench)
@@ -187,6 +189,7 @@ npm global packages install to `~/.npm-global` (no sudo required).
| frappeBench | [opensoft/frappeBench](https://github.com/opensoft/frappeBench) |
| goBench | [opensoft/goBench](https://github.com/opensoft/goBench) |
| javaBench | [opensoft/javaBench](https://github.com/opensoft/javaBench) |
-| pythonBench | [opensoft/pythonBench](https://github.com/opensoft/pythonBench) |
+| phpBench | [opensoft/phpBench](https://github.com/opensoft/phpBench) |
+| pyBench | [opensoft/pyBench](https://github.com/opensoft/pyBench) |
| gentecBench | [opensoft/gentecBench](https://github.com/opensoft/gentecBench) |
| simBench | [opensoft/simBench](https://github.com/opensoft/simBench) |
diff --git a/base-image/Dockerfile b/base-image/Dockerfile
index e8356c9..bcbdf4f 100644
--- a/base-image/Dockerfile
+++ b/base-image/Dockerfile
@@ -9,7 +9,7 @@ FROM ubuntu:24.04
# Container version labels
LABEL layer="0"
LABEL layer.name="workbench-base"
-LABEL layer.version="2.0.0"
+LABEL layer.version="2.0.1"
LABEL layer.description="System base with AI CLIs for all workBenches (user-agnostic)"
# Everything runs as root — no user in this layer
@@ -119,9 +119,14 @@ RUN npm install -g tldr \
# Install uv to /usr/local/bin (system-wide)
RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/local/bin sh
-# Install spec-kit system-wide
-RUN uv tool install specify-cli --from git+https://github.com/github/spec-kit.git --python-preference system \
- || echo "spec-kit installation skipped (non-fatal)"
+# Ensure spec-driven CLIs are not inherited from cached Layer 0 state.
+RUN npm uninstall -g @fission-ai/openspec || true \
+ && rm -f /usr/bin/openspec /usr/local/bin/openspec \
+ && rm -rf /usr/lib/node_modules/@fission-ai/openspec /usr/local/lib/node_modules/@fission-ai/openspec \
+ && uv tool uninstall specify-cli || true \
+ && env UV_TOOL_BIN_DIR=/usr/local/bin UV_TOOL_DIR=/opt/uv/tools uv tool uninstall specify-cli || true \
+ && rm -f /root/.local/bin/specify /usr/local/bin/specify \
+ && rm -rf /root/.local/share/uv/tools/specify-cli /opt/uv/tools/specify-cli
# ========================================
# BUN RUNTIME (system-wide at /opt/bun)
@@ -172,14 +177,139 @@ RUN echo 'export ZSH="$HOME/.oh-my-zsh"' > /etc/skel/.zshrc && \
RUN echo 'eval "$(zoxide init bash)"' >> /etc/skel/.bashrc && \
echo 'export PATH="$HOME/.local/bin:/opt/bun/bin:$PATH"' >> /etc/skel/.bashrc
-# Provide AI helper aliases system-wide so all benches inherit them even when
+# Provide AI helpers system-wide so all benches inherit them even when
# a workspace does not mount a host shell profile.
-RUN echo '' >> /etc/zsh/zshrc && \
- echo '# WorkBench AI helpers' >> /etc/zsh/zshrc && \
- echo 'alias yolo="claude --dangerously-skip-permissions --teammate-mode tmux"' >> /etc/zsh/zshrc && \
- echo '' >> /etc/bash.bashrc && \
- echo '# WorkBench AI helpers' >> /etc/bash.bashrc && \
- echo 'alias yolo="claude --dangerously-skip-permissions --teammate-mode tmux"' >> /etc/bash.bashrc
+RUN cat <<'EOF' >> /etc/zsh/zshrc
+
+# WorkBench AI helpers
+_yolo_shell_quote() {
+ local quoted="" arg
+
+ for arg in "$@"; do
+ quoted="${quoted} $(printf '%q' "$arg")"
+ done
+
+ printf '%s\n' "${quoted# }"
+}
+
+unalias yolo 2>/dev/null || true
+yolo() {
+ local session_name command_string prompt_file
+ local -a prompt_args
+
+ if ! command -v claude >/dev/null 2>&1; then
+ echo "yolo: Claude CLI not found on PATH" >&2
+ return 1
+ fi
+
+ if ! command -v tmux >/dev/null 2>&1; then
+ echo "yolo: tmux not found on PATH" >&2
+ return 1
+ fi
+
+ prompt_file=""
+ for candidate_prompt_file in \
+ "$HOME/.claude/prompts/speckit-dashboard-full.md" \
+ "/usr/local/share/ct/claude/prompts/speckit-dashboard-full.md" \
+ "$HOME/.claude/prompts/speckit-dashboard-bootstrap.md"; do
+ if [ -r "$candidate_prompt_file" ]; then
+ prompt_file="$candidate_prompt_file"
+ break
+ fi
+ done
+ prompt_args=()
+ if [ -n "$prompt_file" ]; then
+ prompt_args=(--append-system-prompt-file "$prompt_file")
+ fi
+
+ if [ -n "${TMUX:-}" ]; then
+ claude --dangerously-skip-permissions --teammate-mode tmux "${prompt_args[@]}" "$@"
+ return $?
+ fi
+
+ session_name="yolo-$(date +%Y%m%d%H%M%S)-$$"
+ command_string=$(_yolo_shell_quote \
+ claude \
+ --dangerously-skip-permissions \
+ --teammate-mode tmux \
+ "${prompt_args[@]}" \
+ "$@") || return 1
+
+ tmux new-session -d -s "$session_name" -c "$PWD" "exec $command_string" || {
+ echo "yolo: failed to start tmux session" >&2
+ return 1
+ }
+
+ tmux set-option -t "$session_name" mouse on >/dev/null 2>&1 || true
+ tmux attach-session -t "$session_name"
+}
+EOF
+
+RUN cat <<'EOF' >> /etc/bash.bashrc
+
+# WorkBench AI helpers
+_yolo_shell_quote() {
+ local quoted="" arg
+
+ for arg in "$@"; do
+ quoted="${quoted} $(printf '%q' "$arg")"
+ done
+
+ printf '%s\n' "${quoted# }"
+}
+
+unalias yolo 2>/dev/null || true
+yolo() {
+ local session_name command_string prompt_file
+ local -a prompt_args
+
+ if ! command -v claude >/dev/null 2>&1; then
+ echo "yolo: Claude CLI not found on PATH" >&2
+ return 1
+ fi
+
+ if ! command -v tmux >/dev/null 2>&1; then
+ echo "yolo: tmux not found on PATH" >&2
+ return 1
+ fi
+
+ prompt_file=""
+ for candidate_prompt_file in \
+ "$HOME/.claude/prompts/speckit-dashboard-full.md" \
+ "/usr/local/share/ct/claude/prompts/speckit-dashboard-full.md" \
+ "$HOME/.claude/prompts/speckit-dashboard-bootstrap.md"; do
+ if [ -r "$candidate_prompt_file" ]; then
+ prompt_file="$candidate_prompt_file"
+ break
+ fi
+ done
+ prompt_args=()
+ if [ -n "$prompt_file" ]; then
+ prompt_args=(--append-system-prompt-file "$prompt_file")
+ fi
+
+ if [ -n "${TMUX:-}" ]; then
+ claude --dangerously-skip-permissions --teammate-mode tmux "${prompt_args[@]}" "$@"
+ return $?
+ fi
+
+ session_name="yolo-$(date +%Y%m%d%H%M%S)-$$"
+ command_string=$(_yolo_shell_quote \
+ claude \
+ --dangerously-skip-permissions \
+ --teammate-mode tmux \
+ "${prompt_args[@]}" \
+ "$@") || return 1
+
+ tmux new-session -d -s "$session_name" -c "$PWD" "exec $command_string" || {
+ echo "yolo: failed to start tmux session" >&2
+ return 1
+ }
+
+ tmux set-option -t "$session_name" mouse on >/dev/null 2>&1 || true
+ tmux attach-session -t "$session_name"
+}
+EOF
# ========================================
# OPENCODE CONFIGURATION (into /etc/skel)
@@ -192,22 +322,11 @@ COPY files/opencode/oh-my-opencode.json /etc/skel/.config/opencode/
COPY files/opencode/agent/ /etc/skel/.config/opencode/agent/
COPY files/opencode/context/ /etc/skel/.config/opencode/context/
-# ========================================
-# OPSX COMMANDS & SKILLS (Claude Code, into /etc/skel)
-# ========================================
-# Upgraded OpenSpec workflows with agent team orchestration.
-# Uses opsx-* / opsx: prefix — openspec init/update won't overwrite these.
-
-RUN mkdir -p /etc/skel/.claude/commands/opsx \
- && mkdir -p /etc/skel/.claude/skills/opsx-clarify \
- && mkdir -p /etc/skel/.claude/skills/opsx-analyze
-COPY files/claude/commands/opsx/ /etc/skel/.claude/commands/opsx/
-COPY files/claude/skills/opsx-clarify/ /etc/skel/.claude/skills/opsx-clarify/
-COPY files/claude/skills/opsx-analyze/ /etc/skel/.claude/skills/opsx-analyze/
-
# ========================================
# SPECKIT SKILLS (Claude Code, into /etc/skel)
# ========================================
+# OpenSpec/opsx commands are installed with the devBench OpenSpec layer, where
+# the openspec CLI is present. Layer 0 keeps only non-OpenSpec Claude defaults.
# Agent-team-enhanced Speckit workflows. Each skill upgrades the corresponding
# project-level /speckit.* command with parallel specialist agents.
# Uses speckit-* prefix — speckit init/update won't overwrite these.
diff --git a/base-image/build.sh b/base-image/build.sh
index a5e96da..cb23a74 100755
--- a/base-image/build.sh
+++ b/base-image/build.sh
@@ -13,20 +13,24 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR"
# Parse arguments (--user is accepted but ignored for backward compat)
+NO_CACHE="${NO_CACHE:-false}"
while [[ $# -gt 0 ]]; do
case $1 in
--user) shift 2 ;;
+ --no-cache) NO_CACHE=true; shift ;;
*) shift ;;
esac
done
echo "Configuration:"
echo " Tag: workbench-base:latest (user-agnostic)"
+echo " No cache: $NO_CACHE"
echo ""
# Build the image
echo "Building workbench-base:latest..."
docker build \
+ $([ "$NO_CACHE" = true ] && printf '%s\n' "--no-cache") \
-t "workbench-base:latest" \
.
diff --git a/base-image/install-ai-clis.sh b/base-image/install-ai-clis.sh
index be8727d..9ed7cb9 100755
--- a/base-image/install-ai-clis.sh
+++ b/base-image/install-ai-clis.sh
@@ -1,6 +1,6 @@
#!/bin/bash
# Shared AI CLI Installation Script
-# Version: 2.0.2
+# Version: 2.0.3
#
# USER-AGNOSTIC: Runs as root, installs to system-wide paths.
# All npm globals go to /usr/local (default root prefix).
@@ -9,11 +9,12 @@
# Claude Code goes to /usr/local/bin.
#
# Installs:
-# - OpenCode (from Opensoft/opencode fork)
+# - OpenCode (built from the upstream anomalyco/opencode repository)
# - oh-my-opencode plugin (from git: darrenhinde/oh-my-opencode)
# Includes built-in agents: Sisyphus, oracle, librarian, explore, frontend, etc.
# - Auth plugins (opencode-gemini-auth, opencode-openai-codex-auth)
# - Other AI CLIs (Codex, Gemini, Copilot, etc.)
+# - Google Antigravity CLI (agy), checksum-gated opt-in
# - Claude Code (via native installer, not npm)
#
# Note: OpenAgents agent files (openagent.md, opencoder.md) are copied via
@@ -31,6 +32,9 @@ set -e
DEBUG="${DEBUG:-1}"
COMMAND_TIMEOUT="${COMMAND_TIMEOUT:-120}" # 2 minutes per command
BUN_OPERATIONS_TIMEOUT="${BUN_OPERATIONS_TIMEOUT:-180}" # 3 minutes for bun ops
+INSTALL_ANTIGRAVITY_CLI="${INSTALL_ANTIGRAVITY_CLI:-0}"
+ANTIGRAVITY_INSTALL_URL="${ANTIGRAVITY_INSTALL_URL:-https://antigravity.google/cli/install.sh}"
+ANTIGRAVITY_INSTALL_SHA256="${ANTIGRAVITY_INSTALL_SHA256:-}"
log_debug() {
if [ "$DEBUG" = "1" ]; then
@@ -69,6 +73,22 @@ run_with_timeout() {
fi
}
+ensure_system_uv_tool_paths() {
+ mkdir -p "$SYSTEM_UV_TOOL_DIR" "$SYSTEM_UV_TOOL_BIN_DIR" /root/.local/share/uv
+ ln -sfn "$SYSTEM_UV_TOOL_DIR" /root/.local/share/uv/tools
+}
+
+run_system_uv_tool_install() {
+ local description="$1"
+ shift
+
+ ensure_system_uv_tool_paths
+ run_with_timeout "$COMMAND_TIMEOUT" "$description" env \
+ UV_TOOL_BIN_DIR="$SYSTEM_UV_TOOL_BIN_DIR" \
+ UV_TOOL_DIR="$SYSTEM_UV_TOOL_DIR" \
+ uv tool install "$@" --python-preference system
+}
+
check_system_resources() {
log_debug "Checking system resources..."
log_debug "Memory: $(free -h | head -2)"
@@ -92,6 +112,8 @@ check_system_resources
export BUN_INSTALL="${BUN_INSTALL:-/opt/bun}"
export PATH="/opt/bun/bin:$PATH"
+SYSTEM_UV_TOOL_DIR="${SYSTEM_UV_TOOL_DIR:-/opt/uv/tools}"
+SYSTEM_UV_TOOL_BIN_DIR="${SYSTEM_UV_TOOL_BIN_DIR:-/usr/local/bin}"
log_debug "Verifying Bun installation"
if which bun >/dev/null 2>&1; then
@@ -101,11 +123,6 @@ else
log_error "Bun not found in PATH (expected at /opt/bun/bin)"
fi
-log_info "Installing OpenSpec..."
-if ! run_with_timeout "$COMMAND_TIMEOUT" "OpenSpec npm install" npm install -g @fission-ai/openspec@latest; then
- log_error "OpenSpec installation failed (continuing)"
-fi
-
log_info "Installing Claude Code CLI (native installer)..."
# Native installer downloads to $HOME/.claude/downloads/ then runs 'claude install'
# which places a launcher in ~/.local/bin/. Since we run as root, we need to
@@ -142,21 +159,40 @@ if ! run_with_timeout "$COMMAND_TIMEOUT" "Gemini npm install" npm install -g @go
log_error "Gemini CLI installation failed (continuing)"
fi
+if [ "$INSTALL_ANTIGRAVITY_CLI" = "1" ] || [ "$INSTALL_ANTIGRAVITY_CLI" = "true" ]; then
+ log_info "Installing Google Antigravity CLI..."
+ if [ -z "$ANTIGRAVITY_INSTALL_SHA256" ]; then
+ log_error "Antigravity install requested but ANTIGRAVITY_INSTALL_SHA256 is not set (skipping)"
+ else
+ antigravity_installer="$(mktemp)"
+ if run_with_timeout "120" "Antigravity installer download" \
+ curl -fsSL "$ANTIGRAVITY_INSTALL_URL" -o "$antigravity_installer" &&
+ printf '%s %s\n' "$ANTIGRAVITY_INSTALL_SHA256" "$antigravity_installer" | sha256sum -c - >/dev/null 2>&1 &&
+ run_with_timeout "300" "Antigravity CLI install" bash "$antigravity_installer" --skip-aliases --skip-path; then
+ if [ -x "$HOME/.local/bin/agy" ] && [ ! -x /usr/local/bin/agy ]; then
+ cp "$HOME/.local/bin/agy" /usr/local/bin/agy
+ chmod +x /usr/local/bin/agy
+ fi
+ log_info "Antigravity CLI installed to $(command -v agy || printf '/usr/local/bin/agy')"
+ else
+ log_error "Antigravity CLI installation failed or checksum verification failed (continuing)"
+ fi
+ rm -f "$antigravity_installer"
+ fi
+else
+ log_info "Skipping Google Antigravity CLI install; set INSTALL_ANTIGRAVITY_CLI=1 and ANTIGRAVITY_INSTALL_SHA256 to enable"
+fi
+
log_info "Installing GitHub Copilot CLI..."
if ! run_with_timeout "$COMMAND_TIMEOUT" "GitHub Copilot npm install" npm install -g @githubnext/github-copilot-cli; then
log_error "GitHub Copilot installation failed (continuing)"
fi
-log_info "Installing Grok CLI (xAI)..."
-if ! run_with_timeout "$COMMAND_TIMEOUT" "Grok npm install" npm install -g @xai-org/grok-cli; then
- log_error "Grok CLI not available via npm (skipping)"
-fi
-
-log_info "Installing OpenCode AI (from Opensoft fork)..."
-# OpenCode: open source AI coding agent (https://github.com/Opensoft/opencode)
-# Install from Opensoft fork instead of npm (sst version)
-log_debug "Cloning OpenCode repository from Opensoft..."
-if ! run_with_timeout "$COMMAND_TIMEOUT" "OpenCode git clone" git clone --depth 1 https://github.com/Opensoft/opencode.git /tmp/opencode; then
+log_info "Installing OpenCode AI (from upstream source)..."
+# OpenCode: open source AI coding agent (https://github.com/anomalyco/opencode)
+# Build directly from upstream source.
+log_debug "Cloning OpenCode repository from upstream..."
+if ! run_with_timeout "$COMMAND_TIMEOUT" "OpenCode git clone" git clone --depth 1 https://github.com/anomalyco/opencode.git /tmp/opencode; then
log_error "Failed to clone OpenCode repository (skipping OpenCode installation)"
else
cd /tmp/opencode
@@ -293,8 +329,7 @@ fi
log_info "Installing NotebookLM tools..."
# notebooklm-py: Python CLI + API for NotebookLM (notebooklm command)
# Base install only — no browser deps needed in container; auth mounted from host
-if run_with_timeout "$COMMAND_TIMEOUT" "notebooklm-py install" uv tool install notebooklm-py --python-preference system; then
- [ -f "$HOME/.local/bin/notebooklm" ] && ln -sf "$HOME/.local/bin/notebooklm" /usr/local/bin/notebooklm
+if run_system_uv_tool_install "notebooklm-py install" notebooklm-py; then
log_info "notebooklm-py CLI installed (notebooklm)"
else
log_error "notebooklm-py installation failed (continuing)"
@@ -302,11 +337,9 @@ fi
# notebooklm-mcp-cli: MCP server + nlm CLI for AI agent integration
# Auth is done on the host (requires browser); tokens mounted into container
-# uv tool install puts binaries in ~/.local/bin (root), so symlink to /usr/local/bin
-if run_with_timeout "$COMMAND_TIMEOUT" "NotebookLM MCP CLI install" uv tool install notebooklm-mcp-cli --python-preference system; then
- for bin in nlm notebooklm-mcp; do
- [ -f "$HOME/.local/bin/$bin" ] && ln -sf "$HOME/.local/bin/$bin" "/usr/local/bin/$bin"
- done
+# Install into a shared uv tools directory instead of root's home so bench
+# users can execute the launchers from /usr/local/bin.
+if run_system_uv_tool_install "NotebookLM MCP CLI install" notebooklm-mcp-cli; then
log_info "NotebookLM MCP CLI installed (nlm, notebooklm-mcp)"
else
log_error "NotebookLM MCP CLI installation failed (continuing)"
@@ -331,12 +364,15 @@ if [ "${#missing_clis[@]}" -gt 0 ]; then
fi
log_info "Installed tools:"
-log_info " - OpenSpec"
log_info " - Claude Code (claude) [native installer]"
log_info " - OpenAI Codex (codex)"
log_info " - Google Gemini (gemini)"
-log_info " - GitHub Copilot (copilot)"
-log_info " - Grok (grok)"
+if command -v agy >/dev/null 2>&1; then
+ log_info " - Google Antigravity CLI (agy)"
+else
+ log_info " - Google Antigravity CLI (agy) [checksum-gated opt-in, skipped]"
+fi
+log_info " - GitHub Copilot CLI (github-copilot-cli)"
log_info " - OpenCode (opencode)"
log_info " - oh-my-opencode (darrenhinde fork with built-in agents)"
log_info " - Letta Code (letta)"
diff --git a/bench-config.json.backup b/bench-config.json.backup
deleted file mode 100755
index 711c351..0000000
--- a/bench-config.json.backup
+++ /dev/null
@@ -1,55 +0,0 @@
-{
- "infrastructure": {
- "specKit": {
- "url": "git@github.com:opensoft/specKit.git",
- "path": "specKit",
- "description": "Infrastructure and specification kit - always installed"
- }
- },
- "benches": {
- "cloudBench": {
- "url": "git@github.com:opensoft/cloudBench.git",
- "path": "sysBenches/cloudBench",
- "description": "Cloud infrastructure and operations tools"
- },
- "pythonBench": {
- "url": "git@github.com:opensoft/pythonBench.git",
- "path": "devBench/pythonBench",
- "description": "Python development environment and tools"
- },
- "javaBench": {
- "url": "git@github.com:opensoft/javaBench.git",
- "path": "devBench/javaBench",
- "description": "Java development environment and tools"
- },
- "dotNetBench": {
- "url": "git@github.com:opensoft/dotNetBench.git",
- "path": "devBench/dotNetBench",
- "description": ".NET development environment and tools"
- },
- "flutterBench": {
- "url": "git@github.com:opensoft/flutterBench.git",
- "path": "devBench/flutterBench",
- "description": "Flutter/Dart development environment and tools",
- "project_scripts": [
- {
- "name": "flutter",
- "script": "scripts/new-flutter-project.sh",
- "description": "Create a new Flutter project with DevContainer setup",
- "includes_speckit": true
- },
- {
- "name": "dartwing",
- "script": "scripts/new-dartwing-project.sh",
- "description": "Create a new DartWing project with specialized configuration",
- "includes_speckit": true
- }
- ]
- },
- "cppBench": {
- "url": "git@github.com:opensoft/cppBench.git",
- "path": "devBench/cppBench",
- "description": "C++ development environment and tools"
- }
- }
-}
diff --git a/bioBenches/base-image/build.sh b/bioBenches/base-image/build.sh
index 704ec0c..2edf744 100755
--- a/bioBenches/base-image/build.sh
+++ b/bioBenches/base-image/build.sh
@@ -18,9 +18,11 @@ LEGACY_IMAGE="$(legacy_family_base_image bio)"
cd "$SCRIPT_DIR"
# Parse arguments (--user is accepted but ignored for backward compat)
+NO_CACHE="${NO_CACHE:-false}"
while [[ $# -gt 0 ]]; do
case $1 in
--user) shift 2 ;;
+ --no-cache) NO_CACHE=true; shift ;;
*) shift ;;
esac
done
@@ -28,6 +30,7 @@ done
echo "Configuration:"
echo " Tag: $CANONICAL_IMAGE (user-agnostic)"
echo " Legacy alias: $LEGACY_IMAGE"
+echo " No cache: $NO_CACHE"
echo ""
# Check if Layer 0 exists
@@ -43,6 +46,7 @@ fi
# Build the image
echo "Building $CANONICAL_IMAGE..."
docker build \
+ $([ "$NO_CACHE" = true ] && printf '%s\n' "--no-cache") \
-t "$CANONICAL_IMAGE" \
.
tag_family_base_legacy_alias bio
diff --git a/config/bench-config.json b/config/bench-config.json
index 7d31b2b..b90c88b 100755
--- a/config/bench-config.json
+++ b/config/bench-config.json
@@ -13,9 +13,9 @@
"description": "Cloud infrastructure and operations tools",
"ai_keywords": ["cloud", "infrastructure", "devops", "kubernetes", "docker", "deployment", "admin", "monitoring", "logging"]
},
- "pythonBench": {
- "url": "git@github.com:opensoft/pythonBench.git",
- "path": "devBenches/pythonBench",
+ "pyBench": {
+ "url": "git@github.com:opensoft/pyBench.git",
+ "path": "devBenches/pyBench",
"description": "Python development environment and tools",
"ai_keywords": ["python", "django", "flask", "fastapi", "pandas", "numpy", "machine learning", "ml", "data science", "AI", "artificial intelligence", "web scraping"],
"project_scripts": [
@@ -27,6 +27,20 @@
}
]
},
+ "phpBench": {
+ "url": "git@github.com:opensoft/phpBench.git",
+ "path": "devBenches/phpBench",
+ "description": "PHP development environment and tools",
+ "ai_keywords": ["php", "composer", "phpunit", "laravel", "symfony", "wordpress", "drupal", "xdebug", "web application"],
+ "project_scripts": [
+ {
+ "name": "php",
+ "script": "scripts/new-php-project.sh",
+ "description": "Create a new PHP project with Composer, PHPUnit, and SonarCloud coverage setup",
+ "includes_speckit": false
+ }
+ ]
+ },
"javaBench": {
"url": "git@github.com:opensoft/javaBench.git",
"path": "devBenches/javaBench",
diff --git a/config/claude/workflows/deep-swarm-code-review.js b/config/claude/workflows/deep-swarm-code-review.js
new file mode 100644
index 0000000..b91cbb2
--- /dev/null
+++ b/config/claude/workflows/deep-swarm-code-review.js
@@ -0,0 +1,438 @@
+export const meta = {
+ name: 'deep-swarm-code-review',
+ description: 'Swarm of expert subagents deep-reviews a PR / branch / uncommitted diff, adversarially verifies each finding, and (in PR mode) automatically posts all confirmed findings as inline comments on the PR',
+ whenToUse: 'Deep multi-agent code review. Auto-targets: open PR for the current branch, else committed branch-vs-main, else uncommitted working-tree changes. In PR mode it ALWAYS posts all confirmed findings to the PR automatically (set args.post=false to suppress, args.dedupeAgainstExisting=true to skip findings already commented on the PR). Override with args {mode:"pr"|"branch"|"uncommitted", prNumber, base}.',
+ phases: [
+ { title: 'Scope', detail: 'detect review target (PR / branch / uncommitted) + partition the diff' },
+ { title: 'Review', detail: 'multi-pass swarm — each pass adds expert lenses, finer file units, deeper digging' },
+ { title: 'Verify', detail: 'independent skeptic verifies each finding + validates the diff line' },
+ { title: 'Post', detail: 'auto-publish one consolidated GitHub review of all confirmed findings (PR mode)' },
+ ],
+}
+
+// ============================================================================
+// args (all optional — workflow auto-detects when omitted):
+// mode: 'pr' | 'branch' | 'uncommitted'
+// prNumber: number (pr mode)
+// base: base ref for committed diffs (default auto: origin/main || main)
+// post: boolean — post results to GitHub (default true in pr mode)
+// repoRoot: absolute path (default: current working dir of agents)
+// ============================================================================
+const cfg = args || {}
+const REPO_ROOT = cfg.repoRoot || '.'
+const POST = cfg.post !== false // PR mode auto-posts unless explicitly disabled
+const DEDUPE = cfg.dedupeAgainstExisting === true // skip findings already commented on the PR
+
+// ---- Phase 0: detect the review target -------------------------------------
+phase('Scope')
+
+const SCOPE_SCHEMA = {
+ type: 'object',
+ required: ['mode', 'base', 'summary'],
+ properties: {
+ mode: { type: 'string', enum: ['pr', 'branch', 'uncommitted'] },
+ prNumber: { type: 'integer' },
+ base: { type: 'string' },
+ branch: { type: 'string' },
+ summary: { type: 'string' },
+ },
+}
+
+let scope
+if (cfg.mode) {
+ scope = { mode: cfg.mode, prNumber: cfg.prNumber, base: cfg.base || 'origin/main', summary: 'from args' }
+} else {
+ scope = await agent(
+ `Determine what this code review should target. Repo root: ${REPO_ROOT}. Use Bash (git, gh).
+
+Decide ONE mode, in this priority order:
+1. 'pr' — if 'gh pr view --json number,baseRefName,headRefName' shows an OPEN PR for the CURRENT branch. Capture prNumber and base (the PR's baseRefName, e.g. origin/main or main).
+2. 'uncommitted' — else if 'git status --porcelain' shows tracked changes (the working tree is dirty). base = HEAD.
+3. 'branch' — else review committed work on this branch vs its base. base = whichever of 'origin/main' or 'main' exists (prefer origin/main). If the current branch IS main/master with no PR and a clean tree, still pick 'branch' with base = the previous commit's parent (HEAD~1) and note it in summary.
+
+Return the chosen mode, base ref string (usable in 'git diff ...HEAD' for pr/branch, or literally 'HEAD' for uncommitted), prNumber if pr, branch name, and a one-line summary of what will be reviewed.`,
+ { label: 'scope:detect', phase: 'Scope', schema: SCOPE_SCHEMA },
+ )
+}
+
+const MODE = scope.mode
+const BASE = scope.base || 'origin/main'
+const PRNUM = scope.prNumber || cfg.prNumber
+log(`Target: ${MODE}${PRNUM ? ' #' + PRNUM : ''} (base=${BASE}) — ${scope.summary}`)
+
+// How each reviewer obtains its slice of the diff, by mode.
+function diffSpec(files) {
+ const fileArgs = files.map(f => `'${f}'`).join(' ')
+ if (MODE === 'uncommitted') {
+ return `Review UNCOMMITTED changes only:\n git -C ${REPO_ROOT} diff HEAD -- ${fileArgs}\n (also 'git -C ${REPO_ROOT} status --porcelain -- ${fileArgs}' for new untracked files).`
+ }
+ return `BASE detection (run first):\n BASE=$(git -C ${REPO_ROOT} merge-base HEAD ${BASE} 2>/dev/null || git -C ${REPO_ROOT} merge-base HEAD main || echo ${BASE})\nThen review the committed diff:\n git -C ${REPO_ROOT} diff "$BASE"...HEAD -- ${fileArgs}`
+}
+
+// ---- Phase 1 setup: discover changed files and group them ------------------
+// A reviewer agent reads the changed-file list and partitions it into coherent
+// subsystem groups, so the workflow adapts to whatever diff it is pointed at.
+const GROUPS_SCHEMA = {
+ type: 'object',
+ required: ['groups'],
+ properties: {
+ groups: {
+ type: 'array',
+ items: {
+ type: 'object',
+ required: ['name', 'persona', 'files'],
+ properties: {
+ name: { type: 'string' },
+ persona: { type: 'string' },
+ files: { type: 'array', items: { type: 'string' } },
+ },
+ },
+ },
+ },
+}
+
+const listCmd = MODE === 'uncommitted'
+ ? `git -C ${REPO_ROOT} diff --name-only HEAD; git -C ${REPO_ROOT} ls-files --others --exclude-standard`
+ : `BASE=$(git -C ${REPO_ROOT} merge-base HEAD ${BASE} 2>/dev/null || git -C ${REPO_ROOT} merge-base HEAD main || echo ${BASE}); git -C ${REPO_ROOT} diff --name-only "$BASE"...HEAD`
+
+const partition = await agent(
+ `List the changed files for this review and partition them into coherent review groups.
+
+Run:
+ ${listCmd}
+
+Then group the changed files into 8–24 subsystem groups so each group is a coherent unit one expert can review well (group by directory / language / feature; keep related scripts together; isolate large/high-risk files into their own group). For each group give: a short kebab 'name', a 'persona' (the kind of expert best suited — e.g. "a defensive Bash engineer", "a Docker layered-build expert", "a senior Python engineer", "a PowerShell automation expert", "a config/JSON correctness reviewer", "a refactor-safety auditor for renamed/removed paths"), and the exact repo-relative 'files' (every changed file must appear in exactly one group). Aim to cover EVERY changed file.`,
+ { label: 'scope:partition', phase: 'Scope', schema: GROUPS_SCHEMA },
+)
+
+const GROUPS = (partition.groups || []).filter(g => g.files && g.files.length)
+if (!GROUPS.length) {
+ log('No changed files found — nothing to review.')
+ return { mode: MODE, base: BASE, confirmedCount: 0, confirmed: [] }
+}
+log(`Swarm: ${GROUPS.length} expert reviewers over ${GROUPS.reduce((n, g) => n + g.files.length, 0)} changed files`)
+
+// ---- shared reviewer guidance ----------------------------------------------
+const SHARED_RULES = `
+You are reviewing a real change. Work from the actual diff and full file context — do NOT speculate.
+
+SCOPE: Report only problems introduced or touched by THIS diff. Ignore pre-existing issues in unchanged lines.
+
+LOOK FOR (weight by real impact):
+- Correctness / logic bugs, wrong conditionals, off-by-one, bad expansion, unset-var use.
+- Shell robustness: unquoted expansions, word-splitting, missing 'set -euo pipefail' where it matters, ignored exit codes, fragile parsing, non-portable bashisms in /bin/sh, eval misuse, unguarded cd, unsafe rm globs.
+- Security: command injection, curl|bash of untrusted input, secret/token leakage, unsafe temp files, world-readable creds, permissions.
+- Cross-platform / cross-shell parity (bash vs zsh vs PowerShell; macOS vs Linux: sed -i, mktemp, readlink).
+- Dockerfile: cache busting, missing cleanup/--no-install-recommends, root vs user, version pinning where it matters, COPY/chmod correctness.
+- Config/JSON/YAML: invalid syntax, wrong keys, broken references to renamed/removed paths.
+- Dead code, broken cross-file refs, renames the diff didn't propagate everywhere.
+
+PRECISION (critical for the next stage):
+- "file" MUST be the repo-relative path exactly as in the diff.
+- "line" MUST be a line number in the NEW (post-change) file — a line on the RIGHT side of the diff (an added '+' line, or a context line inside a changed hunk). Read the file to get the exact number. Prefer an added '+' line.
+- "body" is GitHub-Markdown: state the concrete problem, why it matters, and a specific fix (a short \`\`\`suggestion\`\`\` block is ideal).
+- Quality over quantity. Skip pure style nits with no functional impact. A finding you are not fairly confident is real does more harm than good downstream.
+
+Return findings via the structured tool. An empty list is a valid answer.`
+
+function reviewerPrompt(u) {
+ const known = (u.known && u.known.length)
+ ? `\nALREADY-REPORTED in this area by earlier reviewers — do NOT repeat these. Find DIFFERENT, deeper, or adjacent problems the others missed:\n${u.known.map(k => ` - ${k.file}:${k.line} — ${k.title}`).join('\n')}\n`
+ : ''
+ const deep = u.depth
+ ? 'This is a DEEP pass: read each touched file in full, trace control/data flow into the sibling files it sources or calls (and that call it), and reason about non-obvious failure modes, edge cases, and cross-file interactions — not just surface-level bugs.\n'
+ : ''
+ return `You are ${u.persona}. Repo root: ${REPO_ROOT}.
+Review lens for this pass: ${u.lensDesc}
+
+Review these changed files:
+${u.files.map(f => ' - ' + f).join('\n')}
+
+${diffSpec(u.files)}
+
+Use Bash (git diff / grep) and Read freely to get the diff and full surrounding context before judging.
+${deep}${known}${SHARED_RULES}`
+}
+
+const FINDINGS_SCHEMA = {
+ type: 'object',
+ required: ['findings'],
+ properties: {
+ findings: {
+ type: 'array',
+ items: {
+ type: 'object',
+ required: ['file', 'line', 'severity', 'category', 'title', 'body', 'confidence'],
+ properties: {
+ file: { type: 'string' },
+ line: { type: 'integer' },
+ severity: { type: 'string', enum: ['critical', 'high', 'medium', 'low'] },
+ category: { type: 'string' },
+ title: { type: 'string' },
+ body: { type: 'string' },
+ confidence: { type: 'string', enum: ['high', 'medium', 'low'] },
+ },
+ },
+ },
+ },
+}
+
+const VERDICT_SCHEMA = {
+ type: 'object',
+ required: ['keep', 'reason'],
+ properties: {
+ keep: { type: 'boolean' },
+ reason: { type: 'string' },
+ adjustedLine: { type: 'integer' },
+ adjustedSeverity: { type: 'string', enum: ['critical', 'high', 'medium', 'low'] },
+ refinedBody: { type: 'string' },
+ inDiff: { type: 'boolean' },
+ },
+}
+
+function verifyPrompt(f) {
+ return `You are an independent, skeptical senior reviewer. A prior reviewer raised the finding below. REFUTE it unless it clearly holds up. Default keep=false when uncertain, when it is style-only, or when it concerns unchanged/pre-existing code.
+
+Repo root: ${REPO_ROOT}
+Finding file: ${f.file}
+Claimed NEW-file line: ${f.line}
+Severity: ${f.severity} | Category: ${f.category}
+Title: ${f.title}
+Body:
+${f.body}
+
+Steps:
+1. ${diffSpec([f.file]).split('\n').join('\n ')}
+ Confirm the cited line is part of this diff (an added '+' line or context line inside a changed hunk). Set inDiff. If the issue is real but the line is slightly off, put the correct NEW-file line (one that IS in the diff) in adjustedLine.
+2. Read surrounding code to confirm the problem is REAL with practical impact, not a misreading.
+3. Decide keep (true only if real, impactful, tied to changed lines). One-sentence reason. Optionally set adjustedSeverity and improve refinedBody (GitHub-Markdown).
+
+Return via the structured tool.`
+}
+
+// ---- Multi-pass swarm — wider lenses + finer granularity + deeper digging each pass
+// Each pass adds more expert lenses, splits the diff into finer units, and tells
+// every reviewer what earlier passes already found so it hunts for NEW, deeper issues.
+const PASSES = Math.max(1, cfg.passes || 3)
+const MAX_UNITS = cfg.maxReviewersPerPass || 120 // per-pass reviewer cap (cost guard)
+
+const LENS_SETS = [
+ // Pass 1 — broad sweep, one generalist per subsystem
+ [{ key: 'core', desc: 'overall correctness, logic bugs, and the highest-impact robustness problems' }],
+ // Pass 2 — specialist quartet, applied per subsystem
+ [
+ { key: 'security', desc: 'security: command/regex injection, secret & token handling, file permissions, unsafe temp files, curl|bash of untrusted input' },
+ { key: 'robustness', desc: 'shell robustness & portability: quoting/word-splitting, set -euo pipefail interactions, ignored exit codes, GNU-vs-BSD/macOS, bash-vs-zsh-vs-POSIX' },
+ { key: 'consistency', desc: 'cross-file consistency: renamed/removed paths, parser parity between sibling scripts, docs/READMEs that contradict behavior, broken references' },
+ { key: 'control-flow', desc: 'control-flow & tool/API semantics: wrong conditionals, early/no-op exits, broken orchestration, misused CLIs/builtins, idempotency on re-run' },
+ ],
+ // Pass 3+ — full battery, per-file granularity, deep flow tracing
+ [
+ { key: 'concurrency', desc: 'concurrency, races, locking, and idempotency under parallel or repeated invocation' },
+ { key: 'error-handling', desc: 'error handling & failure modes: partial failures, missing guards, silent skips, cleanup/trap correctness' },
+ { key: 'edge-cases', desc: 'edge cases & input validation: empty/whitespace/unicode/missing inputs, unusual paths, boundary conditions' },
+ { key: 'perf-resource', desc: 'performance & resource use: redundant work, repeated network/subprocess calls, unbounded loops, leaks' },
+ { key: 'docs-ux', desc: 'documentation/UX accuracy: help text, READMEs, comments, and error messages vs actual behavior' },
+ ],
+]
+
+function chunk(arr, size) { const out = []; for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size)); return out }
+const sevRank = { critical: 0, high: 1, medium: 2, low: 3 }
+
+const confirmedAll = []
+const seenByFile = {}
+function titleKey(t) { return (t || '').toLowerCase().replace(/[^a-z0-9 ]/g, '').split(/\s+/).filter(Boolean).slice(0, 6).join(' ') }
+// Dup only if same file AND within ±3 lines AND (identical line OR similar title).
+// Different-angle findings (e.g. a security vs a perf issue) on nearby lines survive.
+function isNew(f) {
+ const tk = titleKey(f.title)
+ for (const e of (seenByFile[f.file] || [])) {
+ if (Math.abs(e.line - f.line) <= 3 && (e.line === f.line || e.tkey === tk)) return false
+ }
+ return true
+}
+function remember(f) { (seenByFile[f.file] = seenByFile[f.file] || []).push({ line: f.line, tkey: titleKey(f.title) }) }
+function knownFor(files) { const s = new Set(files); return confirmedAll.filter(f => s.has(f.file)).map(f => ({ file: f.file, line: f.line, title: f.title })) }
+function normalize(x) {
+ return {
+ file: x.file,
+ line: (Number.isInteger(x.verdict.adjustedLine) ? x.verdict.adjustedLine : x.line),
+ severity: x.verdict.adjustedSeverity || x.severity,
+ category: x.category, title: x.title,
+ body: x.verdict.refinedBody || x.body,
+ confidence: x.confidence, group: x.group,
+ inDiff: x.verdict.inDiff !== false, verifyReason: x.verdict.reason,
+ }
+}
+
+let rawTotal = 0
+for (let p = 1; p <= PASSES; p++) {
+ // widen across passes: each pass applies a DISTINCT tier of lenses (it does NOT
+ // re-run earlier tiers — re-running 'core' every pass just rediscovers pass-1
+ // findings and wastes the budget). Reviewers still get earlier findings as context.
+ const tier = Math.min(p - 1, LENS_SETS.length - 1)
+ const lenses = LENS_SETS[tier]
+ // deepen: finer file chunks + full-file deep reads on later passes
+ const chunkSize = p <= 1 ? 999 : (p === 2 ? 3 : 2)
+ const deep = p >= 3
+
+ // Build per-lens unit lists, then interleave round-robin so that if the per-pass
+ // cap trims, it trims EVENLY across lenses and subsystems (never starves a lens).
+ const perLens = lenses.map(lens => {
+ const arr = []
+ for (const g of GROUPS) for (const fc of chunk(g.files, chunkSize)) {
+ arr.push({
+ name: `${g.name}/${lens.key}`,
+ persona: lens.key === 'core' ? g.persona : `${g.persona}, reviewing specifically through a ${lens.key} lens`,
+ lensDesc: lens.desc, files: fc, depth: deep, known: knownFor(fc),
+ })
+ }
+ return arr
+ })
+ const totalUnits = perLens.reduce((n, a) => n + a.length, 0)
+ let units = []
+ for (let i = 0; units.length < totalUnits; i++) {
+ for (const arr of perLens) if (i < arr.length) units.push(arr[i])
+ }
+ if (units.length > MAX_UNITS) { log(`Pass ${p}: ${totalUnits} reviewer units → capped to ${MAX_UNITS} (interleaved across lenses; raise args.maxReviewersPerPass for fuller coverage)`); units = units.slice(0, MAX_UNITS) }
+
+ phase(`Pass ${p} · Review`)
+ log(`Pass ${p}/${PASSES}: ${units.length} expert reviewers — lenses [${lenses.map(l => l.key).join(', ')}], chunk=${chunkSize}${deep ? ', deep' : ''}`)
+
+ const reviewed = await pipeline(
+ units,
+ u => agent(reviewerPrompt(u), { label: `p${p}:${u.name}`, phase: `Pass ${p} · Review`, schema: FINDINGS_SCHEMA })
+ .then(r => ({ u, findings: (r && r.findings) || [] }))
+ .catch(() => ({ u, findings: [] })),
+ (res) => parallel((res.findings).map(f => () =>
+ agent(verifyPrompt(f), { label: `p${p}:verify:${(f.file || '').split('/').pop()}:${f.line}`, phase: `Pass ${p} · Verify`, schema: VERDICT_SCHEMA })
+ .then(v => ({ ...f, group: res.u.name, verdict: v }))
+ .catch(() => null)
+ )),
+ )
+
+ const passRaw = reviewed.flat().filter(Boolean)
+ rawTotal += passRaw.length
+ const passConfirmed = passRaw.filter(x => x.verdict && x.verdict.keep).map(normalize)
+ let added = 0
+ for (const f of passConfirmed) { if (isNew(f)) { confirmedAll.push(f); remember(f); added++ } }
+ log(`Pass ${p}: +${added} net-new confirmed (running total ${confirmedAll.length})`)
+
+ if (budget.total && budget.remaining() < 80000) { log(`Budget low (${Math.round(budget.remaining() / 1000)}k left) — stopping after pass ${p}.`); break }
+}
+
+confirmedAll.sort((a, b) => (sevRank[a.severity] - sevRank[b.severity]) || a.file.localeCompare(b.file) || a.line - b.line)
+const counts = confirmedAll.reduce((m, f) => (m[f.severity] = (m[f.severity] || 0) + 1, m), {})
+log(`Total confirmed (deduped) across passes: ${confirmedAll.length} from ${rawTotal} raw — ${JSON.stringify(counts)}`)
+
+// ---- Phase 3: auto-post ONE consolidated GitHub review (PR mode) -----------
+// In PR mode the workflow ALWAYS posts every confirmed finding (unless post=false).
+// The posting agent follows an exact, deterministic procedure so it is reliable
+// unattended: it parses the diff with the embedded python script (no guessing
+// about which lines are commentable) and submits one COMMENT review.
+let posted = { attempted: false }
+if (POST && MODE === 'pr' && PRNUM && confirmedAll.length) {
+ phase('Post')
+ const payload = JSON.stringify({ prNumber: PRNUM, counts, dedupe: DEDUPE, findings: confirmedAll })
+ const postResult = await agent(
+ `Publish the verified findings below as ONE consolidated GitHub pull-request review on PR #${PRNUM}, with each finding as an inline comment. Repo root: ${REPO_ROOT}. This DOES publish to GitHub — that is the intended behavior, post everything that maps. Use Bash (gh, git, python3).
+
+FINDINGS JSON (write it to a temp file, e.g. /tmp/swarm_findings.json):
+${payload}
+
+Run EXACTLY this procedure (do not improvise the diff parsing):
+
+STEP 1 — fetch the diff and owner/repo:
+ gh pr diff ${PRNUM} > /tmp/pr_${PRNUM}.diff
+ OWNER_REPO=$(gh repo view --json owner,name --jq '.owner.login + "/" + .name')
+
+STEP 2 — run this python3 script verbatim (it parses commentable RIGHT-side lines, snaps each finding to a valid line within ±3, optionally dedupes against existing PR comments, and writes the review payload):
+
+ cat > /tmp/build_review.py <<'PY'
+ import json, re, subprocess, sys
+ PR = "${PRNUM}"
+ data = json.load(open('/tmp/swarm_findings.json'))
+ findings = data['findings']; counts = data['counts']; dedupe = data.get('dedupe', False)
+ # 1. valid RIGHT-side (new-file) line numbers per path
+ valid = {}; cur=None; new_ln=None
+ for line in open('/tmp/pr_%s.diff' % PR):
+ if line.startswith('diff --git '): cur=None; new_ln=None; continue
+ if line.startswith('+++ '):
+ p=line[4:].strip(); cur=None if p=='/dev/null' else (p[2:] if p.startswith('b/') else p)
+ if cur: valid.setdefault(cur,set())
+ continue
+ if line.startswith('@@'):
+ m=re.search(r'\\+(\\d+)(?:,(\\d+))?',line); new_ln=int(m.group(1)) if m else None; continue
+ if cur is None or new_ln is None: continue
+ if line.startswith('+') and not line.startswith('+++'): valid[cur].add(new_ln); new_ln+=1
+ elif line.startswith('-') and not line.startswith('---'): pass
+ elif line.startswith('\\\\'): pass
+ else: valid[cur].add(new_ln); new_ln+=1
+ # 2. optional dedupe vs existing PR comments (existing comments file passed via EXISTING_JSON env)
+ import os
+ posted={}
+ if dedupe and os.environ.get('EXISTING_JSON'):
+ for c in json.load(open(os.environ['EXISTING_JSON'])):
+ ln=c.get('line') or c.get('original_line')
+ if c.get('path') and ln: posted.setdefault(c['path'],[]).append(ln)
+ # 3. map findings
+ comments=[]; unmapped=[]
+ for f in findings:
+ if dedupe and any(abs(l-f['line'])<=6 for l in posted.get(f['file'],[])):
+ continue
+ vs=valid.get(f['file']); ln=f['line']; chosen=None
+ if vs:
+ if ln in vs: chosen=ln
+ else:
+ cands=[l for l in vs if abs(l-ln)<=3]
+ if cands: chosen=min(cands,key=lambda l:(abs(l-ln),l))
+ if chosen is not None:
+ comments.append({"path":f['file'],"line":chosen,"side":"RIGHT","body":"**[%s]** %s"%(f['severity'],f['body'])})
+ else:
+ unmapped.append(f)
+ # 4. summary body
+ hi=[f for f in findings if f['severity']=='high' or f['severity']=='critical']
+ lines=["## 🤖 AI Swarm Code Review",""]
+ lines.append("Deep multi-agent review: expert subagents partitioned the diff by subsystem; every finding was adversarially verified by an independent skeptic before posting.")
+ lines.append("")
+ lines.append("**Confirmed findings: %d** — %s. %d posted as inline comments below." % (len(findings), json.dumps(counts), len(comments)))
+ if hi:
+ lines.append(""); lines.append("Highlights (high severity):")
+ for f in hi[:6]: lines.append("- **%s** — %s" % (f['file'].split('/')[-1], f['title']))
+ if unmapped:
+ lines.append(""); lines.append("Findings that could not be mapped to a diff line (shown here instead):")
+ for f in unmapped: lines.append("- **[%s] %s:%s** — %s" % (f['severity'], f['file'], f['line'], f['title']))
+ lines.append(""); lines.append("_Advisory; severities are the swarm's estimate. Generated with Claude Code._")
+ payload={"event":"COMMENT","body":"\\n".join(lines),"comments":comments}
+ json.dump(payload, open('/tmp/review_payload.json','w'))
+ print("MAPPED",len(comments),"UNMAPPED",len(unmapped))
+ PY
+ ${DEDUPE ? `gh api --paginate repos/$OWNER_REPO/pulls/${PRNUM}/comments > /tmp/existing_comments.json; EXISTING_JSON=/tmp/existing_comments.json python3 /tmp/build_review.py` : `python3 /tmp/build_review.py`}
+
+STEP 3 — submit ONE review:
+ gh api --method POST repos/$OWNER_REPO/pulls/${PRNUM}/reviews --input /tmp/review_payload.json --jq '{id, state, html_url}'
+
+STEP 4 — if the API returns 422 mentioning a specific line/path, remove that one comment from /tmp/review_payload.json (python or jq) and resubmit so the rest still post. Repeat at most 3 times.
+
+STEP 5 — verify and report: count how many comments belong to the new review id and return a concise report: number of inline comments posted, number unmapped, and the review html_url.`,
+ { label: 'post:github-review', phase: 'Post' },
+ )
+ posted = { attempted: true, report: postResult }
+ log('Auto-posted GitHub review.')
+} else if (POST && MODE === 'pr' && !confirmedAll.length) {
+ log('No confirmed findings — nothing to post.')
+} else if (!POST && MODE === 'pr') {
+ log('post=false — skipping GitHub posting (findings returned in result).')
+}
+
+return {
+ mode: MODE,
+ base: BASE,
+ prNumber: PRNUM,
+ passes: PASSES,
+ rawFindings: rawTotal,
+ confirmedCount: confirmedAll.length,
+ counts,
+ confirmed: confirmedAll,
+ posted,
+}
diff --git a/config/shell/zshrc b/config/shell/zshrc
index c3b0621..69d627e 100755
--- a/config/shell/zshrc
+++ b/config/shell/zshrc
@@ -609,8 +609,69 @@ alias devbench-status='docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Por
alias devbench-stop='docker stop java_bench dot_net_bench flutter_bench 2>/dev/null || true'
# End DevBench Aliases
-# Claude alias
-alias yolo="claude --dangerously-skip-permissions --teammate-mode tmux"
+# Claude helper. Runs the main Claude session inside tmux so another operator
+# can capture and drive it, while still letting Claude use tmux for teammates.
+_yolo_shell_quote() {
+ local quoted="" arg
+
+ for arg in "$@"; do
+ quoted="${quoted} $(printf '%q' "$arg")"
+ done
+
+ printf '%s\n' "${quoted# }"
+}
+
+unalias yolo 2>/dev/null || true
+yolo() {
+ local session_name command_string prompt_file
+ local -a prompt_args
+
+ if ! command -v claude >/dev/null 2>&1; then
+ echo "yolo: Claude CLI not found on PATH" >&2
+ return 1
+ fi
+
+ if ! command -v tmux >/dev/null 2>&1; then
+ echo "yolo: tmux not found on PATH" >&2
+ return 1
+ fi
+
+ prompt_file=""
+ for candidate_prompt_file in \
+ "$HOME/.claude/prompts/speckit-dashboard-full.md" \
+ "/usr/local/share/ct/claude/prompts/speckit-dashboard-full.md" \
+ "$HOME/.claude/prompts/speckit-dashboard-bootstrap.md"; do
+ if [ -r "$candidate_prompt_file" ]; then
+ prompt_file="$candidate_prompt_file"
+ break
+ fi
+ done
+ prompt_args=()
+ if [ -n "$prompt_file" ]; then
+ prompt_args=(--append-system-prompt-file "$prompt_file")
+ fi
+
+ if [ -n "${TMUX:-}" ]; then
+ claude --dangerously-skip-permissions --teammate-mode tmux "${prompt_args[@]}" "$@"
+ return $?
+ fi
+
+ session_name="yolo-$(date +%Y%m%d%H%M%S)-$$"
+ command_string=$(_yolo_shell_quote \
+ claude \
+ --dangerously-skip-permissions \
+ --teammate-mode tmux \
+ "${prompt_args[@]}" \
+ "$@") || return 1
+
+ tmux new-session -d -s "$session_name" -c "$PWD" "exec $command_string" || {
+ echo "yolo: failed to start tmux session" >&2
+ return 1
+ }
+
+ tmux set-option -t "$session_name" mouse on >/dev/null 2>&1 || true
+ tmux attach-session -t "$session_name"
+}
export PATH="$HOME/.npm-global/bin:$PATH"
export PATH="/workspace/.venv/bin:$PATH"
@@ -641,4 +702,3 @@ else
fi
unset __conda_setup
# <<< conda initialize <<<
-
diff --git a/devBenches/.devcontainer/devcontainer.json b/devBenches/.devcontainer/devcontainer.json
index 4bfe9d3..4581d8a 100644
--- a/devBenches/.devcontainer/devcontainer.json
+++ b/devBenches/.devcontainer/devcontainer.json
@@ -3,6 +3,7 @@
"dockerComposeFile": ["docker-compose.yml", "docker-compose.override.yml"],
"service": "devbench",
"workspaceFolder": "/workspace",
+ "initializeCommand": "bash ${localWorkspaceFolder}/scripts/ensure-sonarqube-mcp.sh",
"shutdownAction": "stopCompose",
"forwardPorts": [1455],
diff --git a/devBenches/README.md b/devBenches/README.md
index b15001d..8c8a71a 100755
--- a/devBenches/README.md
+++ b/devBenches/README.md
@@ -10,7 +10,8 @@ Each subfolder is a separate git repository containing a complete development en
- **`dotNetBench/`** - .NET development environment with DevContainer
- **`flutterBench/`** - Flutter/Dart development environment with DevContainer
- **`javaBench/`** - Java development environment with DevContainer
-- **`pythonBench/`** - Python development environment with DevContainer
+- **`phpBench/`** - PHP development environment with DevContainer
+- **`pyBench/`** - Python development environment with DevContainer
## Layered Containers (Current Standard)
@@ -20,6 +21,17 @@ All benches are moving to the layered image model described in `workBenches/docs
- **Layer 2**: `-bench:latest`
- **Layer 3**: `-bench:{user}` (user personalization)
+Layer 1a carries the shared developer tooling used by all devBenches, including:
+- `sonar-scanner` - SonarScanner CLI for project analysis uploads to SonarQube Server or SonarQube Cloud
+- `sonar` - SonarQube CLI for issue/project workflows, secrets scanning, and agent integrations
+- `sonar-env` - container-safe Sonar environment loader that reads `~/.config/sonarqube/sonar.env`
+- `gt` - Graphite CLI for stacked pull request workflows
+
+Devcontainers should mount `~/.config/sonarqube` read-only or mount the full host
+home directory. The shared `sonar-env` helper then loads tokens at runtime and
+sets `SONARQUBE_CLI_KEYCHAIN_FILE` to a writable file-backed keychain so `sonar`
+does not depend on a desktop keychain service inside containers.
+
## Legacy Monolithic DevContainers (Deprecated)
Some benches still include a `.devcontainer/` directory with a monolithic Dockerfile. These are **legacy** and should not be used as the source of truth. Use the layered images and bench-level build scripts instead; treat monolithic Dockerfiles as deprecated artifacts until removed.
diff --git a/devBenches/base-image/Dockerfile b/devBenches/base-image/Dockerfile
index 3403649..acf9793 100644
--- a/devBenches/base-image/Dockerfile
+++ b/devBenches/base-image/Dockerfile
@@ -1,6 +1,6 @@
# Layer 1a: Developer Base Image
-# Extends Layer 0 with Python, Node.js LTS, and dev tools
-# AI CLIs are inherited from Layer 0 (workbench-base)
+# Extends Layer 0 with Python, Node.js LTS, dev tools, and spec CLIs
+# Most AI CLIs are inherited from Layer 0 (workbench-base)
# Used by ALL developer benches (Frappe, Flutter, .NET, etc.)
# USER-AGNOSTIC: No user creation — Layer 3 handles user setup
@@ -9,7 +9,7 @@ FROM workbench-base:latest
# Container version labels
LABEL layer="1"
LABEL layer.name="dev-bench-base"
-LABEL layer.version="2.2.0"
+LABEL layer.version="2.2.3"
LABEL layer.description="Developer base with Python, Node.js LTS, dev tools, Playwright browsers, and generic testing tools (user-agnostic)"
# Everything runs as root
@@ -50,11 +50,12 @@ RUN pip install --break-system-packages \
pytest \
ipython
-# Install Node.js development tools (husky, commitlint)
+# Install Node.js development tools and cross-repo workflow CLIs.
RUN npm install -g \
husky \
@commitlint/cli \
- @commitlint/config-conventional
+ @commitlint/config-conventional \
+ @withgraphite/graphite-cli@stable
# ========================================
# DEVELOPER TOOLS SETUP
@@ -63,7 +64,71 @@ RUN npm install -g \
# Verify Python and pip
RUN python3 --version && pip --version
-# uv, AI CLIs are inherited from Layer 0 (workbench-base)
+# uv and most AI CLIs are inherited from Layer 0 (workbench-base)
+
+# ========================================
+# SPEC-DRIVEN DEVELOPMENT TOOLS
+# ========================================
+
+# Remove any inherited copies so Layer 1a is the clear owner of these tools.
+RUN npm uninstall -g @fission-ai/openspec || true \
+ && rm -f /usr/bin/openspec /usr/local/bin/openspec \
+ && rm -rf /usr/lib/node_modules/@fission-ai/openspec /usr/local/lib/node_modules/@fission-ai/openspec \
+ && uv tool uninstall specify-cli || true \
+ && env UV_TOOL_BIN_DIR=/usr/local/bin UV_TOOL_DIR=/opt/uv/tools uv tool uninstall specify-cli || true \
+ && rm -f /root/.local/bin/specify /usr/local/bin/specify \
+ && rm -rf /root/.local/share/uv/tools/specify-cli /opt/uv/tools/specify-cli
+
+# Install spec-driven CLIs only in developer benches, not every bench family.
+RUN mkdir -p /opt/uv/tools /root/.local/share/uv \
+ && UV_TOOL_BIN_DIR=/usr/local/bin UV_TOOL_DIR=/opt/uv/tools \
+ uv tool install specify-cli --from git+https://github.com/github/spec-kit.git --python-preference system \
+ && ln -sfn /opt/uv/tools /root/.local/share/uv/tools \
+ || echo "spec-kit installation skipped (non-fatal)"
+
+RUN npm install -g @fission-ai/openspec@latest \
+ || echo "OpenSpec installation skipped (non-fatal)"
+
+# Shared OpenSpec/Speckit project bootstrapper. Keep it with the spec-driven
+# CLIs so every developer bench can initialize the same agent context files.
+COPY files/openspeckit/setup-openspeckit /usr/local/bin/setup-openspeckit
+RUN chmod 0755 /usr/local/bin/setup-openspeckit \
+ && ln -sfn setup-openspeckit /usr/local/bin/setup-openspec-speckit-project
+
+# OpenSpec Claude commands and skills live with the devBench OpenSpec CLI.
+RUN mkdir -p /etc/skel/.claude/commands/opsx \
+ && mkdir -p /etc/skel/.claude/skills/opsx-clarify \
+ && mkdir -p /etc/skel/.claude/skills/opsx-analyze
+COPY files/claude/commands/opsx/ /etc/skel/.claude/commands/opsx/
+COPY files/claude/skills/opsx-clarify/ /etc/skel/.claude/skills/opsx-clarify/
+COPY files/claude/skills/opsx-analyze/ /etc/skel/.claude/skills/opsx-analyze/
+
+# Shared, project-agnostic Speckit worktree helpers for all developer benches.
+COPY files/ct/ct-functions.zsh /usr/local/share/ct/ct-functions.zsh
+COPY files/ct/claude/ /usr/local/share/ct/claude/
+RUN chmod 0644 /usr/local/share/ct/ct-functions.zsh \
+ && chmod 0755 /usr/local/share/ct/claude/speckit-dashboard.sh \
+ /usr/local/share/ct/claude/speckit-dashboard-sync.sh \
+ /usr/local/share/ct/claude/speckit-dash-toggle.sh \
+ && chmod 0644 /usr/local/share/ct/claude/prompts/speckit-dashboard-full.md \
+ && mkdir -p /etc/skel/.claude/prompts \
+ && cp /usr/local/share/ct/claude/speckit-dashboard.sh /etc/skel/.claude/speckit-dashboard.sh \
+ && cp /usr/local/share/ct/claude/speckit-dashboard-sync.sh /etc/skel/.claude/speckit-dashboard-sync.sh \
+ && cp /usr/local/share/ct/claude/speckit-dash-toggle.sh /etc/skel/.claude/speckit-dash-toggle.sh \
+ && cp /usr/local/share/ct/claude/prompts/speckit-dashboard-full.md /etc/skel/.claude/prompts/speckit-dashboard-full.md \
+ && chmod 0755 /etc/skel/.claude/speckit-dashboard.sh \
+ /etc/skel/.claude/speckit-dashboard-sync.sh \
+ /etc/skel/.claude/speckit-dash-toggle.sh \
+ && chmod 0644 /etc/skel/.claude/prompts/speckit-dashboard-full.md
+
+# Speckit worktree bootstrap. This installs a stable command outside Speckit
+# itself so generated Speckit files can be refreshed and the worktree workflow
+# can be reapplied afterwards.
+COPY files/speckit-worktree/templates /usr/local/share/speckit-worktree/templates
+COPY files/speckit-worktree/speckit-worktree-enable /usr/local/bin/speckit-worktree-enable
+RUN chmod 0755 /usr/local/bin/speckit-worktree-enable && \
+ find /usr/local/share/speckit-worktree/templates -type f \( -name '*.sh' -o -name '*.zsh' -o -name '*.ps1' \) -exec chmod 0755 {} + && \
+ find /usr/local/share/speckit-worktree/templates -type f ! \( -name '*.sh' -o -name '*.zsh' -o -name '*.ps1' \) -exec chmod 0644 {} +
# System-wide Corepack cache
ENV COREPACK_HOME=/opt/corepack
@@ -80,6 +145,14 @@ RUN mkdir -p /opt/corepack && \
COPY install-testing-tools.sh /tmp/
RUN bash /tmp/install-testing-tools.sh && rm -f /tmp/install-testing-tools.sh
+# Container-safe SonarQube/SonarCloud environment. These helpers make auth
+# usable without libsecret.
+COPY files/sonarqube/sonarqube-cli-env.sh /usr/local/share/sonarqube/sonarqube-cli-env.sh
+COPY files/sonarqube/sonar-env /usr/local/bin/sonar-env
+RUN chmod 0644 /usr/local/share/sonarqube/sonarqube-cli-env.sh \
+ && chmod 0755 /usr/local/bin/sonar-env \
+ && ln -sfn /usr/local/share/sonarqube/sonarqube-cli-env.sh /etc/profile.d/sonarqube-cli.sh
+
# ========================================
# ZSH PLUGINS (into /etc/skel)
# ========================================
@@ -99,6 +172,15 @@ RUN if [ ! -f /etc/skel/.zshrc ]; then \
# Update /etc/skel/.zshrc to include plugins
RUN sed -i 's/plugins=(git)/plugins=(git zsh-autosuggestions zsh-syntax-highlighting)/' /etc/skel/.zshrc
+# Source Speckit worktree helpers from global shell startup so they remain
+# available even when benches mount a host ~/.zshrc over the generated one.
+RUN printf '\n# DevBench Speckit worktree helpers\n[[ -f /usr/local/share/ct/ct-functions.zsh ]] && source /usr/local/share/ct/ct-functions.zsh\n' >> /etc/zsh/zshrc && \
+ printf '\n# DevBench Speckit worktree helpers\n[ -f /usr/local/share/ct/ct-functions.zsh ] && . /usr/local/share/ct/ct-functions.zsh\n' >> /etc/bash.bashrc
+
+# Source SonarQube CLI env defaults from global interactive shell startup.
+RUN printf '\n# DevBench SonarQube CLI environment\n[ -f /usr/local/share/sonarqube/sonarqube-cli-env.sh ] && . /usr/local/share/sonarqube/sonarqube-cli-env.sh\n' >> /etc/zsh/zshrc && \
+ printf '\n# DevBench SonarQube CLI environment\n[ -f /usr/local/share/sonarqube/sonarqube-cli-env.sh ] && . /usr/local/share/sonarqube/sonarqube-cli-env.sh\n' >> /etc/bash.bashrc
+
# Add bash_profile to /etc/skel (force zsh when bash is requested)
RUN echo '# Force zsh when bash is requested' > /etc/skel/.bash_profile && \
echo 'if [ -n "$PS1" ] && [ -z "$ZSH_VERSION" ]; then' >> /etc/skel/.bash_profile && \
diff --git a/devBenches/base-image/build.sh b/devBenches/base-image/build.sh
index 6e7b6a3..ab2e564 100755
--- a/devBenches/base-image/build.sh
+++ b/devBenches/base-image/build.sh
@@ -18,9 +18,11 @@ LEGACY_IMAGE="$(legacy_family_base_image dev)"
cd "$SCRIPT_DIR"
# Parse arguments (--user is accepted but ignored for backward compat)
+NO_CACHE="${NO_CACHE:-false}"
while [[ $# -gt 0 ]]; do
case $1 in
--user) shift 2 ;;
+ --no-cache) NO_CACHE=true; shift ;;
*) shift ;;
esac
done
@@ -28,6 +30,7 @@ done
echo "Configuration:"
echo " Tag: $CANONICAL_IMAGE (user-agnostic)"
echo " Legacy alias: $LEGACY_IMAGE"
+echo " No cache: $NO_CACHE"
echo ""
# Check if Layer 0 exists
@@ -43,6 +46,7 @@ fi
# Build the image
echo "Building $CANONICAL_IMAGE..."
docker build \
+ $([ "$NO_CACHE" = true ] && printf '%s\n' "--no-cache") \
-t "$CANONICAL_IMAGE" \
.
tag_family_base_legacy_alias dev
diff --git a/devBenches/base-image/files/claude/commands/opsx/apply.md b/devBenches/base-image/files/claude/commands/opsx/apply.md
new file mode 100644
index 0000000..3b9d1f9
--- /dev/null
+++ b/devBenches/base-image/files/claude/commands/opsx/apply.md
@@ -0,0 +1,226 @@
+---
+name: "OPSX: Apply"
+description: Implement tasks from an OpenSpec change using agent teams for parallel execution
+category: Workflow
+tags: [workflow, artifacts, experimental, teams]
+---
+
+Implement tasks from an OpenSpec change. Uses agent teams to parallelize independent tasks across non-overlapping file groups.
+
+**Input**: Optionally specify a change name (e.g., `/opsx:apply add-auth`). If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+---
+
+## Phase 1: Select & Load Context
+
+1. **Select the change**
+
+ If a name is provided, use it. Otherwise:
+ - Infer from conversation context if the user mentioned a change
+ - Auto-select if only one active change exists
+ - If ambiguous, run `openspec list --json` and use **AskUserQuestion** to let the user select
+
+ Always announce: "Using change: "
+
+2. **Check status**
+ ```bash
+ openspec status --change "" --json
+ ```
+ Parse `schemaName`, artifact status, and which artifact contains tasks.
+
+3. **Get apply instructions**
+ ```bash
+ openspec instructions apply --change "" --json
+ ```
+ - If `state: "blocked"` (missing artifacts): show message, suggest `/opsx:propose`
+ - If `state: "all_done"`: congratulate, suggest `/opsx:archive`
+ - Otherwise: proceed
+
+4. **Read context files**
+
+ Read ALL files from `contextFiles` in the apply instructions output (proposal, design, specs, tasks, clarifications if present).
+
+5. **Show progress**
+ - Schema being used
+ - "N/M tasks complete"
+ - Remaining tasks overview
+
+---
+
+## Phase 2: Task Analysis & Grouping
+
+Before implementing, analyze the pending tasks to determine the execution strategy.
+
+### Step 1: Build the task dependency graph
+
+For each pending task, determine:
+- **Which files it will create or modify** (infer from the task description and design.md)
+- **Which tasks it depends on** (does it reference output from another task?)
+- **Which tasks are independent** (no shared files, no dependency)
+
+### Step 2: Choose execution strategy
+
+**Sequential (no team)** — Use when:
+- 5 or fewer pending tasks
+- Most tasks depend on each other linearly
+- Tasks touch overlapping files
+- The user asks to go one-by-one
+
+**Parallel (agent team)** — Use when:
+- 6+ pending tasks remaining
+- Tasks can be grouped into 2+ independent clusters
+- Clusters touch non-overlapping files
+
+If parallel, proceed to Phase 3. If sequential, skip to Phase 4.
+
+### Step 3: Group tasks into work packages
+
+Group independent tasks into **work packages**, where each package:
+- Contains tasks that share related files (same service, same model, same test file)
+- Has **zero file overlap** with other packages
+- Has its internal tasks ordered by dependency
+
+Example grouping:
+```
+Package A (buffer-service): Tasks 1.1, 1.2, 1.3, 2.1, 2.2 → touches buffer_service.dart, buffer_manifest.dart
+Package B (packaging-service): Tasks 6.1, 6.2, 7.1, 7.2 → touches packaging_service.dart, package_metadata.dart
+Package C (ui-widgets): Tasks 5.2, 5.3, 12.3 → touches orphan_dialog.dart, incomplete_dialog.dart
+Package D (integration-tests): Tasks 13.1, 13.2 → touches test/ files
+```
+
+**Dependencies between packages**: If Package B depends on Package A completing first, mark it. Only packages with no blockers get spawned in the first wave.
+
+---
+
+## Phase 3: Team Execution (Parallel)
+
+### Create the team
+Use **TeamCreate** to create a team (e.g., `apply-`).
+
+### Create task items
+Use **TaskCreate** for each work package. Include:
+- All tasks in the package with their descriptions
+- The files to create/modify
+- Context: which design.md sections and spec scenarios are relevant
+- Dependencies on other packages (use `addBlockedBy` if needed)
+
+### Spawn teammates
+Use the team subagent mechanism to spawn one `general-purpose` teammate per
+work package, in parallel. Each teammate must be launched into the team created
+above, for example:
+
+```text
+Task({
+ team_name: "apply-",
+ name: "",
+ subagent_type: "general-purpose",
+ run_in_background: true
+})
+```
+
+Do not use plain one-shot Agent subagents for this phase. They cannot claim
+team tasks, receive inbox messages, or participate in shutdown. Each teammate
+prompt must include:
+
+1. The team name
+2. The task ID to claim
+3. Full file paths for all context files (proposal, design, specs, clarifications)
+4. The specific tasks to implement, in order
+5. The files they own (create/modify only these)
+6. Instruction to mark tasks complete with `- [ ]` → `- [x]` in tasks.md — **but only their assigned tasks**
+7. Instruction to report back when done or blocked
+
+**CRITICAL file ownership rules:**
+- Each agent ONLY modifies files in its assigned package
+- `tasks.md` checkbox updates: each agent updates ONLY its own task checkboxes
+- If an agent discovers it needs to modify a file owned by another agent, it reports the dependency instead of making the change
+
+### Monitor & coordinate
+- Wait for agents to complete or report blockers
+- If an agent is blocked on another package, check if the blocking package is done
+- When a wave completes, check for newly-unblocked packages and spawn the next wave
+- Handle conflicts: if two agents report needing the same file, reassign one
+
+### Shutdown
+After all packages complete:
+- Send **shutdown_request** to all teammates
+- **TeamDelete** to clean up
+
+---
+
+## Phase 4: Sequential Execution (Fallback)
+
+For each pending task:
+- Show which task is being worked on
+- Make the code changes required
+- Keep changes minimal and focused
+- Mark task complete: `- [ ]` → `- [x]`
+- Continue to next task
+
+**Pause if:**
+- Task is unclear → ask for clarification
+- Implementation reveals a design issue → suggest updating artifacts
+- Error or blocker encountered → report and wait for guidance
+- User interrupts
+
+---
+
+## Phase 5: Completion
+
+### Show final status
+
+```
+## Implementation Complete
+
+**Change:**
+**Schema:**
+**Strategy:** [Sequential | Parallel — N agents, M waves]
+**Progress:** N/N tasks complete
+
+### Completed This Session
+- [x] Task 1.1 — description
+- [x] Task 1.2 — description
+...
+
+All tasks complete! Run `/opsx:archive` to archive this change.
+```
+
+### On pause (issue encountered)
+
+```
+## Implementation Paused
+
+**Change:**
+**Schema:**
+**Progress:** N/M tasks complete
+
+### Issue Encountered
+
+
+**Options:**
+1.