You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add test stability ledger and document Slack notifications
- Create web/cypress/reports/test-stability.md — persistent ledger
tracking per-test pass rates and run history across iterations
- Add Step 14 to /iterate-incident-tests to update the ledger
- Document Slack notification design in ideas doc: webhook-based
notifications at key loop events (fix applied, CI complete,
review needed, blocked) so users can monitor and intervene
during long-running CI iteration loops
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
-**Cypress crashes** (not just test failures): Check if it's an OOM issue (`--max-old-space-size`), a missing dependency, or a config problem. Report to user.
Copy file name to clipboardExpand all lines: docs/agentic-test-iteration-ideas.md
+89-6Lines changed: 89 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,12 +104,95 @@ The orchestrator could automatically transition from Phase A to Phase B when loc
104
104
105
105
---
106
106
107
-
## Test Stability Dashboard
107
+
## Test Stability Ledger
108
108
109
-
**Problem**: Flakiness data is ephemeral — it exists in the agent's report from one run and is lost.
109
+
**Status**: Partially implemented. Ledger file created at `web/cypress/reports/test-stability.md`. Update step added to `/iterate-incident-tests` (Step 14). Still needs to be wired into `/iterate-ci-flaky`.
110
110
111
-
**Idea**: Persist test stability data across runs in a simple format (CSV, JSON, or markdown table). Track:
112
-
- Test name, last N run results, flakiness rate, last failure date, last fix commit
113
-
- Trend over time: is the test getting more or less stable?
111
+
**Problem**: Flakiness data is ephemeral — it exists in the agent's report from one run and is lost. Next time the agent runs, it has no memory of previous results.
114
112
115
-
Could be a file in the repo (`docs/test-stability.md`) updated by the agent after each iteration.
113
+
**Design**: A markdown file with embedded machine-readable JSON, updated by both skills after each run.
114
+
115
+
**Location**: `web/cypress/reports/test-stability.md` — committed to the working branch, travels with the fixes.
- Run history log: date, type (local/CI), branch, pass/fail counts
120
+
- Machine-readable JSON block for programmatic parsing by the agent
121
+
122
+
**Agent behavior**:
123
+
- Reads the ledger at the start of each run to prioritize — "this test was flaky in last 3 runs, focus here"
124
+
- Updates the ledger after each run with new results
125
+
- Commits the ledger update alongside fixes
126
+
127
+
---
128
+
129
+
## Slack Notifications for Long-Running Loops
130
+
131
+
**Problem**: The CI iteration loop (`/iterate-ci-flaky`) runs for hours (each CI run takes ~2h). The user has no visibility into what the agent is doing until the session ends. By then, multiple fix-push-wait cycles may have happened with no chance for the user to intervene.
132
+
133
+
**Idea**: Optional Slack notifications at key moments, giving the user a chance to review and influence the next cycle.
134
+
135
+
### Notification Events
136
+
137
+
| Event | When | Why the user cares |
138
+
|-------|------|-------------------|
139
+
|`fix_applied`| After committing and pushing a fix | User can review the diff before CI runs. Can reply "redo" or "don't change X" to influence next cycle |
140
+
|`ci_started`| After triggering `/test` or push | Confirmation that the loop is progressing |
141
+
|`ci_complete`| CI run finished (pass or fail) | User knows whether to check in or let it continue |
142
+
|`review_needed`| 5-commit threshold reached or blocking issue | User needs to act |
143
+
|`flaky_found`| Intermittent failure detected | User may have context about why |
144
+
|`blocked`| Agent stopped — REAL_REGRESSION, infra issue, or auth problem | Needs human input to continue |
145
+
|`iteration_done`| Full loop complete with summary | Final status |
146
+
147
+
### Implementation Options
148
+
149
+
**Option A: Slack Incoming Webhook** (simplest)
150
+
- User creates a webhook for their channel: Slack → Apps → Incoming Webhooks
151
+
- Set `SLACK_WEBHOOK_URL` in `export-env.sh` or shell environment
- Con: One-way — user can't reply to the agent via Slack
155
+
156
+
**Option B: Slack Bot with interactive messages**
157
+
- A proper Slack app with bot token
158
+
- Sends messages with action buttons: "Approve", "Redo", "Stop"
159
+
- User clicks a button, webhook fires back to the agent
160
+
- Pro: Two-way interaction without leaving Slack
161
+
- Con: Needs a server to receive button callbacks. Possible with a lightweight service or ngrok tunnel
162
+
163
+
**Option C: Claude Code hooks**
164
+
- Use Claude Code's hook system to trigger notifications on specific events (tool calls, commits)
165
+
- Pro: Native to Claude Code, no external service
166
+
- Con: Hooks are local — would need forwarding to Slack
167
+
168
+
### Recommended Approach
169
+
170
+
Start with **Option A** (webhook). It's 5 minutes to set up and covers the primary need: visibility into what the agent is doing. The agent posts, the user reads. If the user wants to intervene, they message the agent directly in the Claude Code session.
171
+
172
+
The `notify-slack.py` script would:
173
+
- Check if `SLACK_WEBHOOK_URL` is set — if not, skip silently (notifications are optional)
174
+
- Format messages with Slack Block Kit (sections, context with PR link, branch, CI URL)
175
+
- Be called by both skills at key points in the loop
Tracks incident detection test stability across local and CI iteration runs. Updated automatically by `/iterate-incident-tests` and `/iterate-ci-flaky`.
4
+
5
+
## How to Read
6
+
7
+
-**Pass rate**: percentage across all recorded runs (local + CI combined)
8
+
-**Trend**: direction over last 3 runs
9
+
-**Last failure**: most recent failure reason and which run it occurred in
10
+
-**Fixed by**: commit that resolved the issue (if applicable)
11
+
12
+
## Current Status
13
+
14
+
| Test | Pass Rate | Trend | Runs | Last Failure | Fixed By |
0 commit comments