Skip to content

Commit 752af7f

Browse files
DavidRajnohaclaude
andcommitted
feat: add test stability ledger and document Slack notifications
- Create web/cypress/reports/test-stability.md — persistent ledger tracking per-test pass rates and run history across iterations - Add Step 14 to /iterate-incident-tests to update the ledger - Document Slack notification design in ideas doc: webhook-based notifications at key loop events (fix applied, CI complete, review needed, blocked) so users can monitor and intervene during long-running CI iteration loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7023d17 commit 752af7f

3 files changed

Lines changed: 174 additions & 6 deletions

File tree

.claude/commands/iterate-incident-tests.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,57 @@ Output a summary:
395395
- [Whether to merge current fixes or wait]
396396
```
397397

398+
### Step 14: Update Stability Ledger
399+
400+
After the final report, update `web/cypress/reports/test-stability.md`.
401+
402+
Read the file and update both sections:
403+
404+
**1. Current Status table** — for each test in this run:
405+
- If test already in table: update pass rate (rolling average across all recorded runs), update trend
406+
- If test is new: add a row
407+
- Pass rate = total passes / total runs across all recorded iterations
408+
- Trend: compare last 3 runs — improving / stable / degrading
409+
410+
**2. Run History log** — append a new row:
411+
```
412+
| {next_number} | {YYYY-MM-DD} | local | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |
413+
```
414+
415+
**3. Machine-readable data** — update the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END`:
416+
```json
417+
{
418+
"tests": {
419+
"test full title": {
420+
"results": ["pass", "pass", "fail", "pass"],
421+
"last_failure_reason": "Timed out...",
422+
"last_failure_date": "2026-03-23",
423+
"fixed_by": "abc1234"
424+
}
425+
},
426+
"runs": [
427+
{
428+
"date": "2026-03-23",
429+
"type": "local",
430+
"branch": "test/incident-robustness-2026-03-23",
431+
"total": 15,
432+
"passed": 15,
433+
"failed": 0,
434+
"flaky": 0,
435+
"commit": "abc1234"
436+
}
437+
]
438+
}
439+
```
440+
441+
Commit the ledger update together with the final batch of fixes if any, or as a standalone commit:
442+
```bash
443+
git add web/cypress/reports/test-stability.md
444+
```
445+
```bash
446+
git commit --no-gpg-sign -m "docs: update test stability ledger — {passed}/{total} passed, {flaky} flaky"
447+
```
448+
398449
### Error Handling
399450

400451
- **Cypress crashes** (not just test failures): Check if it's an OOM issue (`--max-old-space-size`), a missing dependency, or a config problem. Report to user.

docs/agentic-test-iteration-ideas.md

Lines changed: 89 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,95 @@ The orchestrator could automatically transition from Phase A to Phase B when loc
104104

105105
---
106106

107-
## Test Stability Dashboard
107+
## Test Stability Ledger
108108

109-
**Problem**: Flakiness data is ephemeral — it exists in the agent's report from one run and is lost.
109+
**Status**: Partially implemented. Ledger file created at `web/cypress/reports/test-stability.md`. Update step added to `/iterate-incident-tests` (Step 14). Still needs to be wired into `/iterate-ci-flaky`.
110110

111-
**Idea**: Persist test stability data across runs in a simple format (CSV, JSON, or markdown table). Track:
112-
- Test name, last N run results, flakiness rate, last failure date, last fix commit
113-
- Trend over time: is the test getting more or less stable?
111+
**Problem**: Flakiness data is ephemeral — it exists in the agent's report from one run and is lost. Next time the agent runs, it has no memory of previous results.
114112

115-
Could be a file in the repo (`docs/test-stability.md`) updated by the agent after each iteration.
113+
**Design**: A markdown file with embedded machine-readable JSON, updated by both skills after each run.
114+
115+
**Location**: `web/cypress/reports/test-stability.md` — committed to the working branch, travels with the fixes.
116+
117+
**Contents**:
118+
- Human-readable table: per-test pass rate, trend, last failure reason, fix commit
119+
- Run history log: date, type (local/CI), branch, pass/fail counts
120+
- Machine-readable JSON block for programmatic parsing by the agent
121+
122+
**Agent behavior**:
123+
- Reads the ledger at the start of each run to prioritize — "this test was flaky in last 3 runs, focus here"
124+
- Updates the ledger after each run with new results
125+
- Commits the ledger update alongside fixes
126+
127+
---
128+
129+
## Slack Notifications for Long-Running Loops
130+
131+
**Problem**: The CI iteration loop (`/iterate-ci-flaky`) runs for hours (each CI run takes ~2h). The user has no visibility into what the agent is doing until the session ends. By then, multiple fix-push-wait cycles may have happened with no chance for the user to intervene.
132+
133+
**Idea**: Optional Slack notifications at key moments, giving the user a chance to review and influence the next cycle.
134+
135+
### Notification Events
136+
137+
| Event | When | Why the user cares |
138+
|-------|------|-------------------|
139+
| `fix_applied` | After committing and pushing a fix | User can review the diff before CI runs. Can reply "redo" or "don't change X" to influence next cycle |
140+
| `ci_started` | After triggering `/test` or push | Confirmation that the loop is progressing |
141+
| `ci_complete` | CI run finished (pass or fail) | User knows whether to check in or let it continue |
142+
| `review_needed` | 5-commit threshold reached or blocking issue | User needs to act |
143+
| `flaky_found` | Intermittent failure detected | User may have context about why |
144+
| `blocked` | Agent stopped — REAL_REGRESSION, infra issue, or auth problem | Needs human input to continue |
145+
| `iteration_done` | Full loop complete with summary | Final status |
146+
147+
### Implementation Options
148+
149+
**Option A: Slack Incoming Webhook** (simplest)
150+
- User creates a webhook for their channel: Slack → Apps → Incoming Webhooks
151+
- Set `SLACK_WEBHOOK_URL` in `export-env.sh` or shell environment
152+
- Agent calls `curl -X POST -H 'Content-type: application/json' -d '{"text":"..."}' $SLACK_WEBHOOK_URL`
153+
- Pro: No Slack app needed, 5-minute setup
154+
- Con: One-way — user can't reply to the agent via Slack
155+
156+
**Option B: Slack Bot with interactive messages**
157+
- A proper Slack app with bot token
158+
- Sends messages with action buttons: "Approve", "Redo", "Stop"
159+
- User clicks a button, webhook fires back to the agent
160+
- Pro: Two-way interaction without leaving Slack
161+
- Con: Needs a server to receive button callbacks. Possible with a lightweight service or ngrok tunnel
162+
163+
**Option C: Claude Code hooks**
164+
- Use Claude Code's hook system to trigger notifications on specific events (tool calls, commits)
165+
- Pro: Native to Claude Code, no external service
166+
- Con: Hooks are local — would need forwarding to Slack
167+
168+
### Recommended Approach
169+
170+
Start with **Option A** (webhook). It's 5 minutes to set up and covers the primary need: visibility into what the agent is doing. The agent posts, the user reads. If the user wants to intervene, they message the agent directly in the Claude Code session.
171+
172+
The `notify-slack.py` script would:
173+
- Check if `SLACK_WEBHOOK_URL` is set — if not, skip silently (notifications are optional)
174+
- Format messages with Slack Block Kit (sections, context with PR link, branch, CI URL)
175+
- Be called by both skills at key points in the loop
176+
177+
### Configuration
178+
179+
Add to `cypress/export-env.sh`:
180+
```bash
181+
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
182+
```
183+
184+
Or set globally in `~/.zshrc` if preferred.
185+
186+
### Message Format Example
187+
188+
```
189+
:wrench: Agent: Fix Applied
190+
191+
Fixed selector timeout in filtering test — `.severity-filter` →
192+
`[data-test="severity-filter"]`. Pushed to `test/incident-robustness-2026-03-24`.
193+
194+
CI will run automatically. Reply in the agent session if you want to
195+
change approach before next cycle.
196+
197+
PR #860 | Branch: agentic-test-iteration | CI Run
198+
```
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Test Stability Ledger
2+
3+
Tracks incident detection test stability across local and CI iteration runs. Updated automatically by `/iterate-incident-tests` and `/iterate-ci-flaky`.
4+
5+
## How to Read
6+
7+
- **Pass rate**: percentage across all recorded runs (local + CI combined)
8+
- **Trend**: direction over last 3 runs
9+
- **Last failure**: most recent failure reason and which run it occurred in
10+
- **Fixed by**: commit that resolved the issue (if applicable)
11+
12+
## Current Status
13+
14+
| Test | Pass Rate | Trend | Runs | Last Failure | Fixed By |
15+
|------|-----------|-------|------|-------------|----------|
16+
| _No data yet — run `/iterate-incident-tests` or `/iterate-ci-flaky` to populate_ | | | | | |
17+
18+
## Run History
19+
20+
### Run Log
21+
22+
| # | Date | Type | Branch | Tests | Passed | Failed | Flaky | Commit |
23+
|---|------|------|--------|-------|--------|--------|-------|--------|
24+
| _No runs recorded yet_ | | | | | | | | |
25+
26+
<!-- STABILITY_DATA_START
27+
This section is machine-readable. Do not edit manually.
28+
29+
{
30+
"tests": {},
31+
"runs": []
32+
}
33+
34+
STABILITY_DATA_END -->

0 commit comments

Comments
 (0)