You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 20, 2026. It is now read-only.
fix: Race condition in handleUserMessage cancellation detection
Fixes agent-loop-1: Race condition in user message cancellation detection
Changes:
- Add pendingCancellations Set for atomic cancellation state tracking
- Add messageTimestamps Map for out-of-order message detection
- Add cancelCountdownAtomic() helper for consistent cancellation
- Add hasPendingCancellation() and clearCancellationState() helpers
- Refactor handleUserMessage to:
- Check for cancellation messages first, then handle regular messages
- Only clear error cooldown for non-cancellation messages
- Use timestamp-based deduplication in addition to message ID
- Add cancellation state checks in scheduleContinuation and timer callback
- Update cleanup functions to handle new state maps
- Add comprehensive tests for race condition scenarios
This prevents the bug where:
1. Redundant timeout clearing missed cancellation detection
2. Non-atomic state updates allowed race conditions
3. Early error cooldown deletion allowed continuations to slip through
{"id":"agent-loop-0te","title":"Create short test and cleanup ticket","description":"Implement a short test and perform cleanup tasks","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-11T13:27:32.88315+01:00","created_by":"wese","updated_at":"2026-01-11T13:58:21.242624+01:00","closed_at":"2026-01-11T13:58:21.242624+01:00","close_reason":"Completed successfully"}
2
+
{"id":"agent-loop-1","title":"Fix race condition in user message cancellation detection","description":"The current cancellation detection in handleUserMessage has race conditions that can miss user cancellation signals, causing repeated todo continuation. Current logic in continuation.ts lines 612-647 checks for cancellation patterns but fails in scenarios where message processing order is non-deterministic. Need to add atomic state checks and improve message deduplication logic to ensure all cancellation signals are properly detected and acted upon.","status":"closed","priority":2,"issue_type":"bug","assignee":"René Weselowski","owner":"wese@nope.at","created_at":"2026-01-20T16:09:19.94535+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:22:03.621822+01:00","closed_at":"2026-01-20T16:22:03.621822+01:00","close_reason":"Fixed race condition in handleUserMessage with atomic cancellation state management, improved message deduplication with timestamps, and consolidated timeout clearing logic. Added comprehensive tests for race condition scenarios."}
3
+
{"id":"agent-loop-2","title":"Complete state cleanup on session cancellation","description":"When a session is cancelled, not all continuation state is properly cleaned up, leading to repeated continuation attempts. In continuation.ts handleSessionCancelled (lines 700-725), only pendingCountdowns and errorCooldowns are cleared. Missing cleanup: recoveringSessions, sessionAgentModel, lastProcessedMessageID, and potential race conditions with injectContinuation that may already be in flight. Need comprehensive state cleanup with proper synchronization to prevent orphaned continuation attempts.","status":"open","priority":2,"issue_type":"bug","owner":"wese@nope.at","created_at":"2026-01-20T16:09:25.427479+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:25.427479+01:00"}
4
+
{"id":"agent-loop-3","title":"Fix timeout management gaps in countdown handling","description":"Countdown timer management has multiple gaps causing repeated continuation. In continuation.ts: (1) scheduleContinuation (lines 457-493) clears existing timeout but may schedule new one even when cancellation should be pending, (2) injectContinuation (lines 298-455) doesn't verify session state before proceeding, (3) Multiple code paths can schedule continuation without checking if cancellation was requested. Need to add atomic countdown lifecycle management with proper state transitions and validation.","status":"open","priority":2,"issue_type":"bug","owner":"wese@nope.at","created_at":"2026-01-20T16:09:27.915916+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:27.915916+01:00"}
5
+
{"id":"agent-loop-4","title":"Add cancellation validation and confirmation","description":"Current code has no validation that cancellation actually succeeded, leading to silent failures and repeated continuation. After user cancellation detected in handleUserMessage (lines 621-646), no verification that pendingCountdowns was cleared. Need to add: (1) explicit cancellation confirmation callback, (2) post-cancellation state verification, (3) retry mechanism for failed cancellations, (4) user feedback when cancellation fails. This ensures users know when their cancellation didn't take effect.","status":"open","priority":3,"issue_type":"bug","owner":"wese@nope.at","created_at":"2026-01-20T16:09:30.111311+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:30.111311+01:00"}
6
+
{"id":"agent-loop-5","title":"Expand error interruption detection coverage","description":"The checkInterruption function (continuation.ts lines 60-129) has limited detection patterns that miss common cancellation signals from various sources. Current implementation only checks for specific error names and message patterns but misses: (1) Custom error types from different libraries, (2) Network-level cancellations, (3) Timeout-related aborts, (4) Platform-specific error codes. Need comprehensive error detection with extensible patterns and better handling of edge cases to ensure all interruption types properly halt continuation.","status":"open","priority":3,"issue_type":"bug","owner":"wese@nope.at","created_at":"2026-01-20T16:09:39.251478+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:39.251478+01:00"}
7
+
{"id":"agent-loop-6","title":"Add comprehensive cancellation test coverage","description":"Current test suite lacks comprehensive coverage for cancellation scenarios. Need to add tests for: (1) Race conditions in message processing order, (2) Multiple rapid cancellation requests, (3) Concurrent session events (idle + error + cancellation), (4) Network interruption during cancellation, (5) Platform-specific error patterns, (6) State cleanup verification after cancellation. Create test scenarios that reproduce the repeat issue and verify fixes prevent repeated continuation.","status":"open","priority":2,"issue_type":"task","owner":"wese@nope.at","created_at":"2026-01-20T16:09:41.657281+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:41.657281+01:00"}
8
+
{"id":"agent-loop-7","title":"Document cancellation behavior and edge cases","description":"Current documentation doesn't explain cancellation behavior, causing confusion about when and how continuation can be aborted. Need to document: (1) How user cancellation works (message patterns, timing), (2) Error detection and interruption handling, (3) Race condition scenarios and expected behavior, (4) State cleanup process, (5) Known limitations and workarounds, (6) Debugging tips for repeated continuation issues. Update README and add troubleshooting guide.","status":"open","priority":3,"issue_type":"task","owner":"wese@nope.at","created_at":"2026-01-20T16:09:43.163216+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:43.163216+01:00"}
9
+
{"id":"agent-loop-8","title":"Implement atomic continuation state machine","description":"The current continuation logic lacks atomic state transitions, leading to race conditions and incomplete cancellations. Current implementation uses multiple independent Sets and Maps that can get out of sync. Need to refactor into a proper state machine with: (1) Atomic state transitions (scheduled -\u003e injecting -\u003e active -\u003e cancelled), (2) Unified state container instead of multiple Maps/Sets, (3) State validation before each transition, (4) Eventual consistency with proper synchronization. This addresses the root cause of most cancellation failures.","status":"open","priority":2,"issue_type":"task","owner":"wese@nope.at","created_at":"2026-01-20T16:09:54.276743+01:00","created_by":"René Weselowski","updated_at":"2026-01-20T16:09:54.276743+01:00"}
0 commit comments