fix(discord): dedup WarnAndStop warning across multiple bot processes#886
Conversation
When N bots share a thread and all hit the soft turn limit simultaneously, each bot's per-process BotTurnTracker independently fires WarnAndStop, resulting in N duplicate warnings posted to the thread. Fix: before posting the warning, fetch the last 10 messages in the thread and skip if any bot message already contains 'Bot turn limit reached'. Fail-open: if the API call fails, the warning is still posted. Closes openabdev#530
OpenAB PR ScreeningThis is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Screening reportblocked on external writes: `gh auth status` reports `GH_TOKEN` is invalid, so i could not verify/update the existing marker comment or move the project item. local recovery checks also hit the sandbox namespace issue, but the GitHub blocker is decisive.comment link: not created/updated Intent PR #886 tries to prevent duplicate Discord Feat Fix. Before posting the Who It Serves Discord users and agent runtime operators. It reduces repeated warning noise in shared Discord threads without requiring deployers to add shared infrastructure. Rewritten Prompt Fix duplicate Discord soft-limit warnings when multiple OpenAB bot processes observe the same thread and hit Merge Pitch This should move forward because it addresses a real multi-process production annoyance with a small, localized change. The risk profile is moderate-low: the behavior only affects the warning-post path, and fail-open preserves the current behavior if history lookup fails. Likely reviewer concern: scanning the last 10 messages is a heuristic and not an atomic cross-process lock. Two bots can still race if both fetch before either posts. The patch reduces common duplicates but does not fully guarantee deduplication under simultaneous dispatch. Best-Practice Comparison OpenClaw’s stronger pattern would put scheduling and delivery ownership in the gateway, backed by durable job state, explicit routing, retries, and run logs. That would make warning emission a coordinated side effect instead of a per-process heuristic. Hermes Agent’s scheduled-run patterns around daemon ticks, file locking, atomic state, and fresh sessions are only partially relevant. The useful comparison is explicit shared state: Hermes-style atomic persisted state would be stronger than Discord-channel scanning, but probably heavier than this warning path deserves. For this PR, the channel-scan approach is a pragmatic lightweight mitigation, not a full distributed coordination mechanism. Implementation Options Conservative: accept the current channel-scan dedup with minor hardening. Keep the last-message lookup, check bot-authored messages, and document that this is best-effort. Balanced: extract warning-detection into a testable helper and add focused tests for matching, non-matching, bot/user authorship, and history-fetch failure. Keep no external dependency. Ambitious: introduce shared dedup state keyed by Discord thread and warning type, using persistent storage or gateway-owned emission. This gives stronger guarantees but increases operational complexity. Comparison Table
Recommendation Take the balanced path. Keep the Discord-channel scan as the immediate fix, but make the detection logic explicit and testable before merge. Do not introduce Redis, database state, or distributed locking for this warning path unless duplicate warnings remain a recurring production issue after this mitigation. Agent-ran OpenAB PR screening. Feedback welcome; react thumbs-up if useful. |
…arify best-effort semantics - Add BOT_TURN_LIMIT_WARNING_PREFIX const to bot_turns.rs so the dedup check and the warning text share a single source of truth - Update turn_limit_warning_present() to use the constant - Add doc comment clarifying best-effort / race-window limitation - Update tests to use the constant Addresses review feedback from internal review.
…ge and test Ensures the constant is the true single source of truth for the warning prefix — both soft and hard limit messages now use it. Addresses 臥龍 re-review feedback.
…o avoid serenity Message construction in tests Follows existing codebase convention (see format_thread_export boundary comment) of not constructing serenity::model::channel::Message in unit tests. Call site maps Message → (is_bot, content) pairs before passing to the helper. Tests now use plain tuple slices — no serde_json needed.
masami-agent
left a comment
There was a problem hiding this comment.
PR Review: #886
Based on commit: 7c9e1566bc75553f4d5920ff3cd71a17a0800a6f
Risk: Medium
Core Assessment
- Problem clearly stated: ✅ (Closes #530, clear before/after, data flow diagram)
- Approach appropriate: ✅ (channel scan before post — lightweight, no external deps)
- Alternatives considered: ✅ (Redis/DB dedup, distributed locking — documented and rejected)
- Best approach for now: ✅ (best-effort dedup is acceptable for this use case)
Review Summary (Traffic Light)
🔴 CRITICAL: Duplicate user_message: format!( in hard limit test — compilation error
In src/bot_turns.rs around line 313–317, there is a duplicate user_message: format!( line:
user_message: format!(
user_message: format!(
"🛑 {} ({HARD_BOT_TURN_LIMIT}/{HARD_BOT_TURN_LIMIT} hard limit). \
A human must reply to continue.",
BOT_TURN_LIMIT_WARNING_PREFIX,
),The first user_message: format!( is extraneous and causes a compilation error. This is almost certainly why all CI jobs are failing.
Fix: Remove the first user_message: format!( line.
🟡 MINOR: Hard limit message mixes 🛑 and
BOT_TURN_LIMIT_WARNING_PREFIX = "⚠️ Bot turn limit reached". Using it in the hard limit path produces:
🛑 ⚠️ Bot turn limit reached (1000/1000 hard limit). A human must reply to continue.
Two emojis stacked together is visually confusing. Suggestions:
- Option A: Change the constant to
"Bot turn limit reached"(no emoji), let each call site prepend its own emoji. - Option B: Keep the hard limit message separate (
"🛑 Hard bot turn limit reached") — the dedup only needs to match the soft limit in practice since hard limit (1000 turns) is extremely rare.
Option B is simpler — the dedup check would still work for the soft limit case (which is the actual problem), and the hard limit message stays semantically clear.
🟢 INFO: Dedup logic is well-designed
turn_limit_warning_present()is clean, testable, and follows existing codebase conventions.- The
(bool, &str)tuple approach avoids constructingserenity::Messagein tests — consistent with project patterns. - Race condition is honestly documented — best-effort is appropriate here.
- 4 unit tests cover the key scenarios.
🟢 INFO: Excellent PR description
Clear problem statement, before/after comparison, data flow diagram, ecosystem investigation, and known limitations. Well-structured contribution.
Verdict: REQUEST_CHANGES
Must fix:
- Remove the duplicate
user_message: format!(line in the hard limit test (CI blocker)
Suggested:
2. Resolve the emoji mixing in the hard limit message
…t message
- Remove extraneous duplicate 'user_message: format!(' in hard limit test
(CI blocker — caused compilation error)
- Revert hard limit message to standalone '🛑 Hard bot turn limit reached'
instead of using BOT_TURN_LIMIT_WARNING_PREFIX, which produced confusing
'🛑 ⚠️ Bot turn limit reached' double-emoji output (Option B per review)
- BOT_TURN_LIMIT_WARNING_PREFIX remains the single source of truth for the
soft limit warning and the dedup check — hard limit is rare enough that
dedup is not needed there
Addresses masami-agent review findings.
chaodu-agent
left a comment
There was a problem hiding this comment.
LGTM ✅ — Clean best-effort dedup for WarnAndStop across multiple bot processes.
What This PR Does
When N bots share a thread and all hit the soft turn limit simultaneously, each independently fires WarnAndStop, producing N duplicate warnings. This fix adds a channel scan (last 10 messages) before posting, skipping if a bot warning already exists.
How It Works
- Adds
BOT_TURN_LIMIT_WARNING_PREFIXconstant as single source of truth - Before posting WarnAndStop, fetches recent messages and checks for existing bot warning via
turn_limit_warning_present() - Hard limit message stays separate (no emoji mixing)
- Race window honestly documented — best-effort is appropriate for this use case
Findings
| # | Severity | Finding | Location |
|---|---|---|---|
| 1 | 🟢 | Clean testable helper using (bool, &str) tuples — follows existing codebase conventions |
src/discord.rs:2175 |
| 2 | 🟢 | 4 unit tests cover key scenarios (bot warning, human text, no warning, empty) | src/discord.rs:2976 |
| 3 | 🟢 | Race condition documented in code comment — honest about limitations | src/discord.rs:2169 |
Baseline Check
- PR opened: May 21
- Closes #530
- Main has no dedup logic for WarnAndStop — this is net-new value
- Previous review finding (duplicate line) fixed in latest commit
8ed7546 - CI: all checks pass
Reviewed by: 超渡法師 🟢 · 擺渡法師 🟢
masami-agent
left a comment
There was a problem hiding this comment.
Re-Review (commit 8ed75463)
All previously raised issues have been addressed:
- ✅ Duplicate
user_message: format!(removed — CI is now fully green. - ✅ Hard limit emoji mixing resolved —
BOT_TURN_LIMIT_WARNING_PREFIXis now only used in the soft limit path. Hard limit retains its own distinct message. Semantically clear.
Full Diff Scan
src/bot_turns.rs:BOT_TURN_LIMIT_WARNING_PREFIXconstant added, used only in soft limit. Soft limit test updated to use the constant. Clean.src/discord.rs: Dedup logic correct — fetch last 10 messages, check if any bot already posted the warning prefix, skip if present.turn_limit_warning_present()helper with 4 unit tests. Clean.- No unrelated changes in scope.
Verdict: APPROVE
Well-structured fix with honest documentation of the race-window limitation. Good contribution.
PR #886 — Stage 2 ReviewSummary
Masami Findings Verification
All prior findings resolved. Masami's review was accurate. Additional Observations🟢 INFO: Fail-open design is correct
🟢 INFO: Cross-module coupling is documented
🟢 INFO: Scope discipline
🟡 NIT:
Verdict: APPROVEClean, focused, well-tested fix with honest documentation of the race-window limitation. No blockers found. |
Summary
Closes #530.
When N bots share a thread and all hit the soft turn limit simultaneously, each bot's per-process `BotTurnTracker` independently fires `WarnAndStop`, resulting in N duplicate warnings posted to the thread.
Before / After
```
⚠️ Bot turn limit reached (20/20). A human must reply...
⚠️ Bot turn limit reached (20/20). A human must reply... ← duplicate
BEFORE (2 bots in thread, both hit soft limit):
AFTER (sequential case — most common):
⚠️ Bot turn limit reached (20/20). A human must reply... ← only once
```
Data Flow (ASCII)
```
Bot A hits soft limit → WarnAndStop
→ fetch last 10 msgs → no warning found → post ✅
Bot B hits soft limit → WarnAndStop (~shortly after)
→ fetch last 10 msgs → warning already present → skip ✅
Race window (both fetch simultaneously):⚠️ (best-effort, not eliminated)
→ both see no warning → both post
```
Semantics: Best-Effort Dedup
This fix reduces duplicate warnings in the common sequential case. A narrow race window exists where two bots fetch simultaneously and both see no warning, resulting in a duplicate. Strict once-only semantics would require shared state (gateway-owned emission or distributed lock). For most deployments, best-effort is acceptable.
Investigation: How the Ecosystem Handles This
acpx (github.com/openclaw/acpx) — Headless ACP Client
Not applicable — acpx handles single-process session lifecycle and crash reconnect; does not address cross-process notification dedup.
Hermes Agent (github.com/NousResearch/hermes-agent) — Agent Side
Not applicable — Hermes focuses on ACP session lifecycle and stderr logging; multi-instance coordination is out of scope.
Industry Pattern: Channel Scan Before Post
The standard lightweight approach for Discord bots without external state: before posting a system warning, fetch recent channel messages and skip if an identical warning already exists. Chosen over Redis/DB dedup (requires external dependency) and distributed locking (overkill for a low-frequency warning path).
Changes
Testing
Unit tests — 4 tests for `turn_limit_warning_present()`: bot warning detected, human text ignored, no warning present, empty list.
Why the helper accepts `&[(bool, &str)]` instead of `&[Message]`:
The existing codebase convention (see comment at `format_thread_export`) is to avoid constructing `serenity::model::channel::Message` in unit tests — the struct has many required fields and no test builder. The helper takes `(is_bot, content)` pairs instead; the call site maps `Message → (m.author.bot, m.content.as_str())` before passing to the helper. This keeps the pure logic testable without serenity dependencies.
Known limitation: simultaneous race (both bots fetch at the same instant) may still produce a duplicate. Documented in the code comment.
Manual: deploy two bots in the same Discord channel, trigger soft limit (20 consecutive bot messages), verify only one warning is posted in the sequential case.
https://discord.com/channels/1492794006885892229/1503631877058334751