fix: prevent A2A on_message trigger infinite loop by wisdomqin · Pull Request #666 · dataelement/Clawith

wisdomqin · 2026-06-09T13:57:12Z

Problem

Two agents with mutual on_message triggers could enter an infinite loop — Agent A's internal tool_call records were misidentified as "new messages" by Agent B's trigger, causing Agent B to wake up and vice versa.

Impact: Tenant musan.ai consumed 85,846 credits in one day (normal: ~2,000/day).

Root Cause

5 defects identified, 4 fixed (1 was already fixed):

Fixes

Fix 1 (P0): Add `role` filter to on_message query

Only match assistant and user messages; exclude tool_call and system records that were causing false matches.

Fix 2 (P1): Limit session scope

Exclude trigger internal sessions (source_channel='trigger') from message scanning, preventing cross-trigger false matches.

Fix 3 (P1): Per-agent on_message rate limiter

Auto-disable on_message triggers if a single agent fires them >30 times per hour. Logs warning for monitoring.

Fix 4 (P2): Default safety caps for `set_trigger`

Agent-created on_message triggers now default to max_fires=100 and expires_at=7 days.

Cleanup

Removed dead code MAX_AGENT_CHAIN_DEPTH = 5 (defined but never referenced).

Defense in Depth

Layer	What it prevents
Role filter (Fix 1)	Root cause — tool_call messages no longer trigger on_message
Session scope (Fix 2)	Cross-session false matches from trigger internal sessions
Rate limiter (Fix 3)	Auto-disables on_message triggers if 30+ fires/hr per agent
Default caps (Fix 4)	Limits agent-created on_message triggers to 100 fires / 7 days
Cooldown (existing)	60s minimum between consecutive fires of same trigger

Files Changed

backend/app/services/trigger_runtime/evaluator.py — Fix 1 + Fix 2
backend/app/services/trigger_daemon.py — Fix 3 + dead code cleanup
backend/app/services/agent_tools.py — Fix 4

No database migrations required.

- Fix 1 (P0): Add role filter to on_message query — only match 'assistant' and 'user' messages, exclude 'tool_call' and 'system' - Fix 2 (P1): Exclude trigger internal sessions (source_channel='trigger') from on_message message scanning - Fix 3 (P1): Add per-agent on_message hourly rate limiter (30/hr cap) with auto-disable and warning log - Fix 4 (P2): Default max_fires=100 and expires_at=7d for on_message triggers created via set_trigger tool - Cleanup: Remove unused MAX_AGENT_CHAIN_DEPTH dead code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 101a7ce498

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T14:00:06Z

+            if ttype == "on_message":
+                trigger.max_fires = trigger.max_fires or 100
+                if not trigger.expires_at:
+                    trigger.expires_at = datetime.now(timezone.utc) + timedelta(days=7)


Apply safety caps when re-enabling on_message triggers

Move these defaults into the existing-trigger path as well. When an agent cancels and recreates a previously unlimited on_message trigger (or converts any disabled trigger to this type), the function returns at line 8389 before reaching this block, preserving max_fires=None and expires_at=None; consequently the triggers most likely to predate this fix remain permanently uncapped.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-09T14:00:06Z

+                        recent.append(now)
+                        _on_msg_fire_log[trigger.agent_id] = recent
                    await enqueue_due_trigger(trigger, now)


Count only newly enqueued on_message executions

Record a rate-limit hit only when enqueue_due_trigger actually creates an execution. The current increment occurs before enqueueing and the enqueue helper discards its (execution, created) result, so an idempotency collision—such as an agent sending the same message text twice, since the key hashes only sender and content—leaves last_fired_at unchanged while this code increments every 15-second tick; the trigger is then disabled after about 7.5 minutes despite no new execution firing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-09T14:00:06Z

+# Safety: per-agent on_message fire rate limiter
+_ON_MSG_RATE_WINDOW = 3600  # 1 hour window
+_ON_MSG_RATE_LIMIT = 30     # max on_message fires per agent per hour
+_on_msg_fire_log: dict[uuid.UUID, list[datetime]] = {}  # agent_id -> list of fire timestamps


Persist the on_message rate-limit window

Store this safety state in shared durable storage rather than process memory. Every backend worker starts its own trigger daemon, Helm explicitly supports multiple backend replicas, and this dictionary is empty after every restart or rolling deployment, so an existing unlimited looping trigger receives a fresh allowance on each process lifecycle and the advertised per-agent hourly limit is not reliably enforced across the deployment.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

yaojin3616 merged commit 43ff3e8 into main Jun 10, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent A2A on_message trigger infinite loop#666

fix: prevent A2A on_message trigger infinite loop#666
yaojin3616 merged 1 commit into
mainfrom
fix/a2a-on-message-loop-bug

wisdomqin commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wisdomqin commented Jun 9, 2026

Problem

Root Cause

Fixes

Fix 1 (P0): Add role filter to on_message query

Fix 2 (P1): Limit session scope

Fix 3 (P1): Per-agent on_message rate limiter

Fix 4 (P2): Default safety caps for set_trigger

Cleanup

Defense in Depth

Files Changed

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix 1 (P0): Add `role` filter to on_message query

Fix 4 (P2): Default safety caps for `set_trigger`