Azure Blob storage: pooled reads, streaming serializer, buffered writes#9879
Azure Blob storage: pooled reads, streaming serializer, buffered writes#9879ReubenBond merged 7 commits intodotnet:mainfrom
Conversation
7a23245 to
7ec8b60
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in performance optimization for Azure Blob Storage reads by using pooled buffers to reduce memory pressure on the Large Object Heap (LOH) and minimize Gen2 garbage collection. The change adds a new UsePooledBufferForReads option to AzureBlobStorageOptions that, when enabled, switches from DownloadContentAsync to DownloadStreamingAsync with ArrayPool-rented buffers.
Changes:
- Added
UsePooledBufferForReadsoption toAzureBlobStorageOptionsfor opt-in buffer pooling - Modified
ReadStateAsyncto use streaming download with pooled buffers when the option is enabled - Added comprehensive test coverage with
PersistenceGrainTests_AzureBlobStore_PooledReads - Added performance benchmark
AzureBlobReadStateBenchmarkdemonstrating 56-74% memory allocation reduction for large payloads
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Azure/Orleans.Persistence.AzureStorage/Providers/Storage/AzureBlobStorageOptions.cs | Adds new UsePooledBufferForReads boolean option with documentation warning about buffer retention |
| src/Azure/Orleans.Persistence.AzureStorage/Providers/Storage/AzureBlobStorage.cs | Implements pooled buffer logic using DownloadStreamingAsync and ArrayPool<byte>.Shared when option is enabled |
| test/Extensions/TesterAzureUtils/Persistence/PersistenceGrainTests_AzureBlobStore_PooledReads.cs | New test class verifying functionality with pooled reads enabled |
| test/Benchmarks/GrainStorage/AzureBlobReadStateBenchmark.cs | New benchmark comparing pooled vs non-pooled read performance |
| test/Benchmarks/Program.cs | Adds benchmark runner entry for the new Azure Blob read state benchmark |
| test/Benchmarks/run_test.cmd | Updates script to run the new benchmark |
| test/Benchmarks/Properties/launchSettings.json | Removes hardcoded launch profile |
7ec8b60 to
99a2f67
Compare
|
I assume most of the remaining allocations are from the actual state object being loaded. Does that sound right? I also see an eTag string + the BinaryData object + the async state machine, but those are not worth going after at this point. |
That is my assumption too.
Agreed. First stab at this is to reduce GC churn. Similar pattern may be applicable to other storage providers though, but I do feel like blob storage is more likely to have larger payloads than e.g. a SQL server. Long term it would be great to do away with byte[] entirely and pass a stream or similar abstraction directly to the grain storage serializer, but that requires changing API surface. |
99a2f67 to
b8fb3bc
Compare
e4321a9 to
370139e
Compare
cee520a to
7793d64
Compare
|
|
|
7793d64 to
59a6eac
Compare
|
Not sure why this test is failing, and if it has anything to do with the changes on this PR: |
77c0d03 to
485d518
Compare
|
If I remember correctly, there is another change to the api surface you had I mind @ReubenBond? Or so you think this is ready to be merged in? Ahh yes, this: #9879 (comment) |
386f633 to
6324e87
Compare
|
Thanks for the help getting this ready @ReubenBond. Let me know if there are anything else I can help with at this point. |
Introduce IGrainStorageStreamingSerializer and stream overloads for OrleansJsonSerializer plus Orleans/Json grain storage serializers. Azure blob storage now supports pooled read buffers, a buffered stream write mode, and logs a warning when large payloads force a fallback to DownloadContentAsync. Add Azure Blob storage tests for pooled reads and streaming serializer behavior, and add focused grain storage benchmarks (binary vs streaming) across Orleans, Newtonsoft.Json, and STJ. Alternatives considered: full streaming OpenWriteAsync with separate ETag readback (rejected due to concurrency race), and IBufferWriter/ReadOnlySequence buffer paths (explored, but didnt improve performance or allocation in a meaningful way, so but dropped for now). Buffered stream uploads and pooled reads were kept as the best allocation/compatibility tradeoff while keeping BinaryData writes as the default.
…zeAsync return type
Delay ETag assignment until Azure Blob state deserialization succeeds so failed reads do not mutate the in-memory grain state. Also ensure pooled read buffers are always returned and add regression tests for failed stream, pooled, and binary deserialization paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use IGrainStorageStreamingSerializer as the derived serializer contract so AzureBlobGrainStorage can work from a single serializer field after the PR 9879 rebase. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9dbbe81 to
98a27f5
Compare
Summary
IGrainStorageStreamingSerializerplus stream overloads forOrleansJsonSerializer; Orleans and Newtonsoft.Json grain storage serializers implement it.DownloadStreamingAsync, and supports a buffered stream write mode using pooled segments; default write path remainsBinaryData.int.MaxValue.Tests/Benchmarks
Experiments/Decisions
OpenWriteAsyncfor fully streaming uploads; rejected due to an ETag/concurrency race whenGetPropertiesAsyncis needed post-upload.IBufferWriter/ReadOnlySequenceserializer paths; dropped to keep the API surface smaller and because Blob SDK paths are stream-first.BinaryDataas the default for throughput.Ill post measurements from my laptop in comments below.