-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[BUG] Possible memory leak azure-sdk-bom 1.3.6 with netty 4.1.134-FINAL #49440
Copy link
Copy link
Open
Labels
customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.Workflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as thatThe issue doesn't require a change to the product in order to be resolved. Most issues start as that
Metadata
Metadata
Assignees
Labels
customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.Workflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as thatThe issue doesn't require a change to the product in order to be resolved. Most issues start as that
Type
Fields
Give feedbackNo fields configured for issues without a type.
Describe the bug
We have discovered a possible memory leak with a very specific use-case: we use azure communication only when an application is deployed and never after. Because of this, we run into Azure Service Bus AMQP idle-timeout reconnects. From our analysis on some memory dumps from before the app gets OOM killed, we noticed that every reconnect allocated ~67–134 MB of native memory that the allocator never returned to the OS. Over 6 days and dozens of reconnect cycles, this accumulated to ~1.8–2.0 GB of invisible off-heap memory — exhausting the 6 GiB container limit.
Exception or Stack Trace
The specific chain:
Azure Service Bus broker: 300-second AMQP idle timeout
↓
broker sends amqp:connection:forced
↓
Azure SDK (Reactor/Netty): ReactorSession + RequestResponseChannel errors
↓
Netty PooledByteBufAllocator allocates new native Chunk(s) (~67–134 MB)
via sun.misc.Unsafe.allocateMemory() — bypasses ALL JVM memory metrics
↓
Old chunks returned to pool arena, but native OS pages NEVER freed
↓
RSS grows by ~67–134 MB per reconnect, permanently
↓
After ~11 reconnects: RSS at 97% of 6 GiB limit
↓
13:16:46Z: minor GC burst + CPU spike on both pods
↓
RSS crosses 6 GiB → kernel OOM killer → SIGKILL on both replicas
Setup (please complete the following information):
Additional context
I can provide (if needed) memory dump + the analysis.
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report