Skip to content

[BUG] IllegalStateException: channel not registered to an event loop in NettyAsyncHttpResponse.close() during RetryPolicy.attemptSync #49428

@CornelZ

Description

@CornelZ

Affected packages:

  • azure-core-http-netty 1.15.5
  • azure-core 1.58.0
  • azure-ai-agents 2.1.0

Severity: High — causes silent failure of all retried sync HTTP requests

Description

When using the Azure SDK's synchronous HTTP pipeline (HttpPipeline.sendSync) with the Netty HTTP client, a transient connection failure on the first attempt triggers a retry. During cleanup, RetryPolicy.attemptSync calls response.close() → NettyAsyncHttpResponse.close() → NettyUtility.closeConnection() → channel.eventLoop(), which throws:

java.lang.IllegalStateException: channel not registered to an event loop
at io.netty.channel.AbstractChannel.eventLoop(AbstractChannel.java:163)
at com.azure.core.http.netty.implementation.NettyUtility.closeConnection(NettyUtility.java:79)
at com.azure.core.http.netty.implementation.NettyAsyncHttpResponse.close(NettyAsyncHttpResponse.java:116)
at com.azure.core.http.policy.RetryPolicy.attemptSync(RetryPolicy.java:249)
at com.azure.core.http.policy.RetryPolicy.processSync(RetryPolicy.java:160)
at com.azure.core.http.HttpPipeline.sendSync(HttpPipeline.java:138)

Root Cause

Reactor Netty creates a Channel object during a connection attempt but registers it to an event loop asynchronously. If the connection fails (TCP or SSL) before channel.register(eventLoop) completes, the Channel object is left in an unregistered state. AbstractChannel.eventLoop() unconditionally throws IllegalStateException when eventLoop is null, yet NettyUtility.closeConnection() calls it without any null/registration guard.

Steps to Reproduce

  1. Use NettyAsyncHttpClientBuilder to create an HttpClient
  2. Pass it to any AgentsClientBuilder (or any Azure SDK builder) and call a sync method (e.g., ResponseService.create(...))
  3. Introduce a transient network error on the first attempt (or have the endpoint experience a brief hiccup)
  4. The RetryPolicy logs "Retrying." at DEBUG and then immediately throws IllegalStateException before the retry is attempted

Expected Behavior

NettyUtility.closeConnection() (or NettyAsyncHttpResponse.close()) should guard against an unregistered channel — either by checking channel.isRegistered() before calling channel.eventLoop(), or by catching IllegalStateException and closing via a fallback path (e.g., channel.unsafe().closeForcibly()).

Workaround

Switch the HTTP transport to azure-core-http-okhttp, which is thread-pool-based and has no channel-registration lifecycle:

com.azure azure-core-http-okhttp 1.12.8

Environment

┌─────────────────────────┬───────────────────────┐
│ │ │
├─────────────────────────┼───────────────────────┤
│ Java │ 17 │
├─────────────────────────┼───────────────────────┤
│ Spring Boot │ 3.4.3 │
├─────────────────────────┼───────────────────────┤
│ azure-ai-agents │ 2.1.0 │
├─────────────────────────┼───────────────────────┤
│ azure-core │ 1.58.0 │
├─────────────────────────┼───────────────────────┤
│ azure-core-http-netty │ 1.15.5 │
├─────────────────────────┼───────────────────────┤
│ netty-transport │ 4.1.118.Final │
├─────────────────────────┼───────────────────────┤
│ OS │ Linux (containerized) │
└─────────────────────────┴───────────────────────┘

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions