Support: simplify benchmark workflow and fix case filtering

zhusy54 · zhusy54 · commit bce1e89d2a6d · 2026-04-16T11:26:08.000+08:00
- Restructure benchmark_rounds.sh to prefer test_*.py over run_example.py
- Fix: only pass --manual include to test_*.py; run_example.py does not
  support this flag and would crash when case_name is specified
diff --git a/.claude/skills/benchmark/SKILL.md b/.claude/skills/benchmark/SKILL.md
@@ -87,17 +87,7 @@ npu-smi info
 
 Pick devices with **HBM-Usage = 0**. Find the longest consecutive sub-range (at most 4). If no idle device is found, prompt user to specify a device ID.
 
-## Step 3: Pin PTO-ISA
-
-Extract pinned commit from `.github/workflows/ci.yml`:
-
-```bash
-PTO_ISA_COMMIT=$(grep -oP '(?<=-c )\w+' .github/workflows/ci.yml | head -1)
-```
-
-Append `-c $PTO_ISA_COMMIT` to benchmark args so `run_example.py` picks it up.
-
-## Step 4: Prepare — Compute Absolute Paths
+## Step 3: Compute Absolute Paths
 
 The Bash tool resets its working directory to the project root on every call. Relative paths like `cd worktree && ...` are fragile and easy to forget. **Compute absolute paths once, then use them everywhere.**
 
@@ -118,7 +108,7 @@ WORKTREE_ABS="/home/user/simpler/tmp/worktree_baseline_20260331_102302"
 
 **Do NOT use `cd` + relative `./tools/...`** — this is the #1 source of silent errors (running the wrong workspace).
 
-## Step 5: Run Benchmarks
+## Step 4: Run Benchmarks
 
 ### Single Mode
 
@@ -141,7 +131,7 @@ Pure Python files (`bindings.py`, `code_runner.py`) are resolved via `sys.path`
 
 **Solution: always create a venv in the worktree** (~26s overhead). This builds both the nanobind extension AND runtime binaries, fully isolating the baseline.
 
-#### 5a. Create worktree, venv, and build
+#### 4a. Create worktree, venv, and build
 
 Inline the **absolute** worktree path (copy-paste the value, do not rely on shell variables persisting):
 
@@ -158,27 +148,27 @@ python3 -m venv "${WORKTREE_ABS}/.venv" --system-site-packages
 
 This gives the worktree its own `_task_interface.*.so` in `.venv/lib/python3.*/site-packages/`, completely independent from the main workspace.
 
-#### 5b. Run baseline
+#### 4b. Run baseline
 
 Activate the venv so `benchmark_rounds.sh` (which calls `python3`) picks up the worktree's nanobind extension and Python bindings:
 
 ```bash
 # WORKTREE_ABS must be the literal absolute path (e.g. /home/user/simpler/tmp/worktree_baseline_20260331)
-cd "$WORKTREE_ABS" && source .venv/bin/activate && pwd && ./tools/benchmark_rounds.sh -d $BASELINE_DEVICE -c $PTO_ISA_COMMIT -r "$RUNTIME" \
+cd "$WORKTREE_ABS" && source .venv/bin/activate && pwd && ./tools/benchmark_rounds.sh -d $BASELINE_DEVICE -r "$RUNTIME" \
   2>&1 | tee "${PROJECT_ROOT}/tmp/benchmark_baseline_${TIMESTAMP}_${RUNTIME}.txt"
 ```
 
 **Always include `pwd &&` after `cd` to verify you are in the correct directory.** If `pwd` does not print the worktree path, something went wrong — do not proceed.
 
-#### 5c. Run current
+#### 4c. Run current
 
 ```bash
 # Runs from the main workspace (Bash tool default cwd)
-./tools/benchmark_rounds.sh -d $CURRENT_DEVICE -c $PTO_ISA_COMMIT -r "$RUNTIME" \
+./tools/benchmark_rounds.sh -d $CURRENT_DEVICE -r "$RUNTIME" \
   2>&1 | tee "tmp/benchmark_current_${TIMESTAMP}_${RUNTIME}.txt"
 ```
 
-#### 5d. Cleanup
+#### 4d. Cleanup
 
 ```bash
 git worktree remove "$WORKTREE_ABS" --force
@@ -209,22 +199,22 @@ done
 #### Sequential execution (one device)
 
 ```bash
-# 1. Worktree + venv already created in step 5a
+# 1. Worktree + venv already created in step 4a
 
 # 2. For each runtime (serially — one device, one process at a time):
 #    Baseline first (from worktree with venv activated)
-cd "$WORKTREE_ABS" && source .venv/bin/activate && pwd && ./tools/benchmark_rounds.sh -d $DEVICE -c $PTO_ISA_COMMIT -r "$RUNTIME" \
+cd "$WORKTREE_ABS" && source .venv/bin/activate && pwd && ./tools/benchmark_rounds.sh -d $DEVICE -r "$RUNTIME" \
   2>&1 | tee "${PROJECT_ROOT}/tmp/benchmark_baseline_${TIMESTAMP}_${RUNTIME}.txt"
 
 #    Then current (from main workspace — default cwd, no venv)
-./tools/benchmark_rounds.sh -d $DEVICE -c $PTO_ISA_COMMIT -r "$RUNTIME" \
+./tools/benchmark_rounds.sh -d $DEVICE -r "$RUNTIME" \
   2>&1 | tee "tmp/benchmark_current_${TIMESTAMP}_${RUNTIME}.txt"
 
 # 3. Cleanup
 git -C "$PROJECT_ROOT" worktree remove "$WORKTREE_ABS" --force
 ```
 
-## Step 6: Report Results
+## Step 5: Report Results
 
 Parse `Trimmed Avg:` for elapsed and `Orch Trimmed Avg:` for orchestration time from benchmark output.
 
@@ -293,7 +283,6 @@ If any example shows > 5% regression, highlight it explicitly.
 
 - [ ] Mode detected (single vs compare)
 - [ ] Idle device found or user-specified
-- [ ] PTO-ISA pinned to CI commit
 - [ ] `PROJECT_ROOT` and `WORKTREE_ABS` absolute paths computed
 - [ ] (Compare mode) Worktree created, venv built with `pip install -e .`
 - [ ] (Compare mode) Baseline completed — venv activated, `pwd` confirmed worktree path before running
diff --git a/tools/benchmark_rounds.sh b/tools/benchmark_rounds.sh
@@ -411,6 +411,7 @@ run_bench() {
     fi
     if [[ -n "$case_name" ]]; then
         run_cmd+=(--case "$case_name")
+        [[ -n "$test_file" ]] && run_cmd+=(--manual include)
     fi
     run_cmd+=("${EXTRA_ARGS[@]}")