Skip to content

fix: resolve intermittent stream closure failures in EventConsumer#777

Open
kabir wants to merge 2 commits intoa2aproject:mainfrom
kabir:fix-local-handling-race-condition
Open

fix: resolve intermittent stream closure failures in EventConsumer#777
kabir wants to merge 2 commits intoa2aproject:mainfrom
kabir:fix-local-handling-race-condition

Conversation

@kabir
Copy link
Copy Markdown
Collaborator

@kabir kabir commented Mar 31, 2026

Fixes two separate race conditions causing intermittent test failures:

  1. Grace period closing streams before final events arrive

    • EventConsumer now checks awaitingFinalEvent flag before starting timeout counter, preventing premature closure when final events are in-transit through MainEventBusProcessor
    • Increased grace period timeout to 150ms for CI environment latency
  2. Client auto-close IOException not filtered in testAuthRequiredWorkflow

    • AbstractSSEEventListener cancels SSE streams on final events (by design)
    • This generates "Stream cancelled" IOException that must be filtered
    • testAuthRequiredWorkflow was the only test missing this filter

Fixes two separate race conditions causing intermittent test failures:

1. Grace period closing streams before final events arrive
   - EventConsumer now checks awaitingFinalEvent flag before starting
     timeout counter, preventing premature closure when final events are
     in-transit through MainEventBusProcessor
   - Increased grace period timeout to 150ms for CI environment latency

2. Client auto-close IOException not filtered in testAuthRequiredWorkflow
   - AbstractSSEEventListener cancels SSE streams on final events (by design)
   - This generates "Stream cancelled" IOException that must be filtered
   - testAuthRequiredWorkflow was the only test missing this filter

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a timeout mechanism for final events in the EventConsumer to prevent potential infinite polling. It introduces a maximum wait period (3 seconds) and increases the SSE buffer flush delay to 150ms to improve reliability in CI environments. Review feedback suggests refactoring the clearAwaitingFinalEvent logic into the EventQueue base class to avoid brittle type checking and identifies redundant flag-clearing logic in the ChildQueue dequeue process that should be removed.

kabir added a commit to kabir/a2a-java that referenced this pull request Mar 31, 2026
Runs testAgentToAgentLocalHandling and testNonBlockingWithMultipleMessages
in a loop (100 iterations) across 9 parallel jobs (3 Java versions × 3 transports)
to capture intermittent failures for analysis.

Related to a2aproject#777

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant