Skip to content

Commit 85fed91

Browse files
DavidRajnohaclaude
andcommitted
docs: expand Slack notification design with interaction models
Detailed design for agent-user communication during long CI loops: - Natural pause points (after fix, after CI, when blocked) - Review window concept (agent waits N minutes for feedback before pushing) - Actionable notification content (what changed, why, confidence level) - Four implementation options with tradeoffs: A) Incoming webhook (one-way, 5-min setup) B) Slack bot with thread-based replies (two-way, no callback server) C) Claude Code hooks bridge D) GitHub PR comments as notification channel - Recommended progression path A → B - Skill integration points for both local and CI loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 752af7f commit 85fed91

1 file changed

Lines changed: 180 additions & 46 deletions

File tree

docs/agentic-test-iteration-ideas.md

Lines changed: 180 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -128,71 +128,205 @@ The orchestrator could automatically transition from Phase A to Phase B when loc
128128

129129
## Slack Notifications for Long-Running Loops
130130

131-
**Problem**: The CI iteration loop (`/iterate-ci-flaky`) runs for hours (each CI run takes ~2h). The user has no visibility into what the agent is doing until the session ends. By then, multiple fix-push-wait cycles may have happened with no chance for the user to intervene.
131+
### The Problem
132132

133-
**Idea**: Optional Slack notifications at key moments, giving the user a chance to review and influence the next cycle.
133+
The CI iteration loop (`/iterate-ci-flaky`) runs for hours — each CI run takes ~2h, and the loop may do 3-5 fix-push-wait cycles. During that time:
134134

135-
### Notification Events
135+
- The user has no visibility into what the agent decided to fix or how
136+
- By the time the loop finishes, multiple commits may have been pushed with no chance to course-correct
137+
- A wrong fix in cycle 1 wastes 2+ hours of CI time before the agent discovers it didn't work
138+
- The user may have domain context ("that test is flaky because of animation timing, not the selector") that would save cycles
136139

137-
| Event | When | Why the user cares |
138-
|-------|------|-------------------|
139-
| `fix_applied` | After committing and pushing a fix | User can review the diff before CI runs. Can reply "redo" or "don't change X" to influence next cycle |
140-
| `ci_started` | After triggering `/test` or push | Confirmation that the loop is progressing |
141-
| `ci_complete` | CI run finished (pass or fail) | User knows whether to check in or let it continue |
142-
| `review_needed` | 5-commit threshold reached or blocking issue | User needs to act |
143-
| `flaky_found` | Intermittent failure detected | User may have context about why |
144-
| `blocked` | Agent stopped — REAL_REGRESSION, infra issue, or auth problem | Needs human input to continue |
145-
| `iteration_done` | Full loop complete with summary | Final status |
140+
The core tension: **autonomy vs oversight**. The agent should run independently, but the user needs the ability to intervene at natural pause points.
146141

147-
### Implementation Options
142+
### Natural Pause Points
148143

149-
**Option A: Slack Incoming Webhook** (simplest)
150-
- User creates a webhook for their channel: Slack → Apps → Incoming Webhooks
151-
- Set `SLACK_WEBHOOK_URL` in `export-env.sh` or shell environment
152-
- Agent calls `curl -X POST -H 'Content-type: application/json' -d '{"text":"..."}' $SLACK_WEBHOOK_URL`
153-
- Pro: No Slack app needed, 5-minute setup
154-
- Con: One-way — user can't reply to the agent via Slack
144+
The CI loop has built-in pauses where user input is most valuable:
155145

156-
**Option B: Slack Bot with interactive messages**
157-
- A proper Slack app with bot token
158-
- Sends messages with action buttons: "Approve", "Redo", "Stop"
159-
- User clicks a button, webhook fires back to the agent
160-
- Pro: Two-way interaction without leaving Slack
161-
- Con: Needs a server to receive button callbacks. Possible with a lightweight service or ngrok tunnel
146+
```
147+
Push fix ──→ [PAUSE: fix_applied] ──→ CI runs (~2h) ──→ [PAUSE: ci_complete] ──→ Analyze ──→ ...
148+
```
162149

163-
**Option C: Claude Code hooks**
164-
- Use Claude Code's hook system to trigger notifications on specific events (tool calls, commits)
165-
- Pro: Native to Claude Code, no external service
166-
- Con: Hooks are local — would need forwarding to Slack
150+
1. **After fix, before CI runs** (`fix_applied`): The agent committed a fix and is about to push (or just pushed). This is the highest-value notification — the user can review the approach and say "redo" before a 2-hour CI cycle starts.
167151

168-
### Recommended Approach
152+
2. **After CI completes** (`ci_complete`): Results are in. The agent is about to diagnose. User might have context about known issues.
169153

170-
Start with **Option A** (webhook). It's 5 minutes to set up and covers the primary need: visibility into what the agent is doing. The agent posts, the user reads. If the user wants to intervene, they message the agent directly in the Claude Code session.
154+
3. **When blocked** (`blocked`): Agent can't continue — needs human decision.
171155

172-
The `notify-slack.py` script would:
173-
- Check if `SLACK_WEBHOOK_URL` is set — if not, skip silently (notifications are optional)
174-
- Format messages with Slack Block Kit (sections, context with PR link, branch, CI URL)
175-
- Be called by both skills at key points in the loop
156+
### Review Window
176157

177-
### Configuration
158+
For the `fix_applied` event, the agent could optionally **wait before pushing**, giving the user a time window to respond:
178159

179-
Add to `cypress/export-env.sh`:
180-
```bash
181-
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
182160
```
161+
Agent: "I'm about to push this fix. Waiting 10 minutes for feedback before proceeding."
162+
[Shows diff summary in Slack]
183163
184-
Or set globally in `~/.zshrc` if preferred.
164+
User (within 10 min): "Don't change the selector, the issue is timing. Add a cy.wait(500) instead."
165+
166+
Agent: Reverts fix, applies user's suggestion, pushes that instead.
167+
```
185168

186-
### Message Format Example
169+
Or if no response within the window, the agent proceeds autonomously.
187170

171+
Configuration: `review-window=10m` parameter on `/iterate-ci-flaky`. Set to `0` for fully autonomous (no waiting).
172+
173+
### Notification Content — What Makes Each Message Actionable
174+
175+
**`fix_applied`** — the most important notification:
188176
```
189177
:wrench: Agent: Fix Applied
190178
191-
Fixed selector timeout in filtering test — `.severity-filter` →
192-
`[data-test="severity-filter"]`. Pushed to `test/incident-robustness-2026-03-24`.
179+
*What changed:*
180+
• `cypress/views/incidents-page.ts:45` — selector `.severity-filter` → `[data-test="severity-filter"]`
181+
• `cypress/e2e/incidents/regression/01.reg_filtering.cy.ts:78` — added `.should('exist')` guard before click
182+
183+
*Why:* Screenshot showed the filter dropdown existed but had a different class. The `data-test` attribute is stable across builds.
184+
185+
*Classification:* PAGE_OBJECT_GAP (confidence: HIGH)
193186
194-
CI will run automatically. Reply in the agent session if you want to
195-
change approach before next cycle.
187+
*Diff:* `git diff HEAD~1` on branch `test/incident-robustness-2026-03-24`
188+
189+
*Next:* CI will trigger automatically on push. Reply in the agent session to change approach.
190+
191+
PR #860 | Branch: test/incident-robustness-2026-03-24
192+
```
193+
194+
The key: show **what** changed, **why** the agent chose that fix, and **how confident** it is. This lets the user quickly decide "looks good, let it run" vs "wrong approach, let me intervene."
195+
196+
**`ci_complete`** — actionable status:
197+
```
198+
:white_check_mark: Agent: CI Complete — PASSED (run 2/5)
199+
200+
*Results:* 15/15 tests passed in 1h 47m
201+
*Flakiness probe:* 2 of 5 confirmation runs complete, all green so far
202+
203+
*Next:* Triggering confirmation run 3. No action needed.
204+
205+
PR #860 | Branch: test/incident-robustness-2026-03-24 | CI Run
206+
```
207+
208+
Or on failure:
209+
```
210+
:x: Agent: CI Complete — FAILED (iteration 2/3)
211+
212+
*Results:* 13/15 passed, 2 failed
213+
*Failures:*
214+
• "should filter by severity" — Timed out on `[data-test="severity-chip"]` (same as last run)
215+
• "should display chart bars" — new failure, `Expected 5 bars, found 0`
216+
217+
*Assessment:*
218+
• severity filter: same fix didn't work, will try different approach
219+
• chart bars: new failure — possibly caused by previous fix (will investigate)
220+
221+
*Next:* Diagnosing and fixing. Will notify before pushing.
222+
223+
PR #860 | Branch: test/incident-robustness-2026-03-24 | CI Run
224+
```
196225

197-
PR #860 | Branch: agentic-test-iteration | CI Run
226+
**`blocked`** — requires user action:
198227
```
228+
:octagonal_sign: Agent: Blocked — REAL_REGRESSION
229+
230+
*Test:* "should display incident bars in chart"
231+
*Issue:* Chart component renders empty. Screenshot shows the chart area with no bars, no error, no loading state.
232+
*Commit correlation:* `src/components/incidents/IncidentChart.tsx` was modified in this PR (+45, -12)
233+
234+
*This is not a test issue* — the chart rendering logic appears broken. Agent cannot fix source code in Phase 1.
235+
236+
*Action needed:* Investigate the chart component refactor. Agent will stop iterating on this test.
237+
238+
PR #860 | Branch: test/incident-robustness-2026-03-24
239+
```
240+
241+
### Implementation Options
242+
243+
**Option A: Slack Incoming Webhook** (recommended starting point)
244+
- Setup: Slack → Apps → Incoming Webhooks → create webhook for your channel. 5 minutes.
245+
- Set `SLACK_WEBHOOK_URL` in `export-env.sh` or `~/.zshrc`
246+
- Agent posts via `curl` in a standalone `notify-slack.py` script
247+
- Messages formatted with Slack Block Kit (sections, context, code blocks)
248+
- Pro: No Slack app, no server, no OAuth. Just a URL.
249+
- Con: One-way — user sees notifications but must respond in the Claude Code session, not in Slack
250+
251+
**Option B: Slack Bot with thread-based interaction** (no callback server needed)
252+
- Create a Slack App with bot token (`chat:write`, `channels:history`)
253+
- Agent posts messages to a channel, capturing the message `ts` (timestamp/ID)
254+
- Before proceeding at pause points, agent **reads thread replies** via `conversations.replies` API
255+
- If user replied in the Slack thread → agent reads the reply and adjusts
256+
- If no reply within the review window → agent proceeds
257+
258+
```
259+
Agent posts: "Fix applied. Reply in this thread to change approach. Proceeding in 10 min."
260+
User replies: "Use data-test attributes instead of class selectors"
261+
Agent reads: conversations.replies → sees user feedback → adjusts fix
262+
```
263+
264+
- Pro: Two-way interaction without a callback server. User stays in Slack.
265+
- Con: Needs a Slack App (not just a webhook). Polling for replies adds complexity. Bot token needs to be stored securely.
266+
267+
**Implementation sketch for Option B:**
268+
```python
269+
# Post notification and get message timestamp
270+
response = slack_client.chat_postMessage(channel=CHANNEL, blocks=blocks)
271+
message_ts = response["ts"]
272+
273+
# Wait for review window, polling for replies
274+
deadline = time.time() + review_window_seconds
275+
while time.time() < deadline:
276+
replies = slack_client.conversations_replies(channel=CHANNEL, ts=message_ts)
277+
user_replies = [r for r in replies["messages"] if r.get("user") != BOT_USER_ID]
278+
if user_replies:
279+
return user_replies[-1]["text"] # Return latest user feedback
280+
time.sleep(30)
281+
282+
return None # No feedback, proceed autonomously
283+
```
284+
285+
**Option C: Claude Code hooks → Slack bridge**
286+
- Configure a Claude Code hook that fires on `git commit` or specific tool calls
287+
- The hook runs a shell script that posts to Slack
288+
- Pro: Zero changes to the skills — hooks are external
289+
- Con: Less control over notification content and timing. Can't implement review windows. Hooks are local config, not portable.
290+
291+
**Option D: GitHub PR comments as notification channel**
292+
- Instead of Slack, the agent posts status updates as PR comments
293+
- User replies directly on the PR
294+
- Agent reads PR comments via `gh api` before proceeding
295+
- Pro: No Slack setup at all. Everything stays in GitHub. Natural for code review context.
296+
- Con: Noisier PR history. Not real-time (no push notifications unless GitHub notifications are configured).
297+
298+
### Recommended Progression
299+
300+
1. **Start with Option A** — get visibility. User monitors passively, intervenes in Claude Code session when needed.
301+
2. **Upgrade to Option B** when the review window pattern proves valuable — adds two-way interaction within Slack.
302+
3. **Option D** is a good alternative if you prefer keeping everything in GitHub — especially for team use where the PR is the natural communication hub.
303+
304+
### Configuration
305+
306+
```bash
307+
# Option A: Webhook only (one-way)
308+
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
309+
310+
# Option B: Bot with thread interaction (two-way)
311+
export SLACK_BOT_TOKEN="xoxb-..."
312+
export SLACK_CHANNEL_ID="C0123456789"
313+
export SLACK_REVIEW_WINDOW="600" # seconds to wait for feedback (0 = no wait)
314+
```
315+
316+
### Skill Integration Points
317+
318+
Where notifications fire in each skill:
319+
320+
**`/iterate-ci-flaky`:**
321+
- Step 3: `ci_started` — after `/test` comment or push
322+
- Step 5: `ci_complete` — after CI analysis
323+
- Step 6: `fix_applied` — after committing fix, before push (with optional review window)
324+
- Step 7: `flaky_found` — when flakiness detected in confirmation runs
325+
- Step 8: `iteration_done` — final summary
326+
- Any step: `blocked` — on REAL_REGRESSION, INFRA_ISSUE, auth failure
327+
328+
**`/iterate-incident-tests`:**
329+
- Step 10: `fix_applied` — after committing batch (less critical since local runs are fast)
330+
- Step 12: `flaky_found` — during flakiness probe
331+
- Step 13: `iteration_done` — final summary
332+
- Any step: `blocked` — on REAL_REGRESSION

0 commit comments

Comments
 (0)