GH-3466: Improve `RunLengthBitPackingHybridDecoder.readNext` to avoid per-call buffer allocation and `DataInputStream` wrapping by arouel · Pull Request #3467 · apache/parquet-java

arouel · 2026-04-06T19:09:22Z

Rationale for this change

RunLengthBitPackingHybridDecoder.readNext() allocates a new int[] and byte[] on every PACKED-mode call. The existing code even acknowledges this with a // TODO: reuse a buffer comment at line 94. In workloads that decode many bit-packed runs (definition levels, repetition levels, RLE-encoded integers), these allocations become a significant source of GC pressure.

There are two issues:

Per-call buffer allocation. currentBuffer = new int[currentCount] and byte[] bytes = new byte[numGroups * bitWidth] are allocated fresh on every PACKED-mode readNext(). The individual allocations are modest (8–128 ints per run) but occur thousands of times per column chunk. In a custom JFR-profiled benchmark merging 60 Parquet files (180M rows, 10 columns), these two sites were the number 2 and number 6 allocation hotspots respectively: 2,402 out of 12,711 total allocation samples (18.9%), accounting for ~7.2 GB of allocation per operation.
Per-call DataInputStream wrapping. new DataInputStream(in).readFully(bytes, 0, bytesToRead) creates a wrapper object on every PACKED-mode call just to access readFully().

What changes are included in this PR?

Three changes to RunLengthBitPackingHybridDecoder:

currentBuffer promoted from local to field, reused across readNext() calls with a grow-only strategy, only reallocated when the next run requires a larger buffer than currently held.
byte[] bytes promoted to a field packedBytes, same grow-only reuse strategy.
new DataInputStream(in).readFully(...) replaced with a private readFully() method that reads directly from the underlying InputStream, eliminating the per-call wrapper allocation and virtual dispatch.

Custom JFR-profiled benchmark results (merge of 60 Parquet files, 180M rows, 10 columns, JDK 26, macOS aarch64):

Metric	Before	After	Change
Throughput	44.2 s/op	42.5 s/op	-3.6%
Allocation rate	908.9 MB/s	793.9 MB/s	-12.7%
Allocation per op	42.1 GB	35.4 GB	-15.8%
RLE decoder alloc samples	2,402 (18.9%)	146 (1.2%)	-93.9%
RLE decoder alloc bytes	~7,245 MB	~417 MB	-94.2%

Are these changes tested?

Yes, changes are tested. We added a regression test in TestRunLengthBitPackingHybridEncoder called testTruncatedPackedRunAfterFullPackedRunDoesNotReuseStaleBytes to cover a truncated packed-run edge case after buffer reuse. We also ran the full TestRunLengthBitPackingHybridEncoder class and RunLengthBitPackingHybridIntegrationTest and all tests passed.

Are there any user-facing changes?

No. This is a transparent performance improvement internal to the RLE decoder. Decoded values are identical. No API changes, no configuration changes, no behavioral changes.

Closes #3466

Fokko · 2026-04-28T19:18:20Z

+        if (packedBytes == null || packedBytes.length < bytesNeeded) {
+          packedBytes = new byte[bytesNeeded];
+        }
+        int bytesRead = in.readNBytes(packedBytes, 0, bytesNeeded);


Nice, now we're at Java 11 :)

Fokko

I've left one fix that addresses a concern is that currentBuffer retains stale values from a previous larger packed run in positions beyond currentCount. While those positions are never accessed today (the indexing is bounded by currentBufferLength), it's a latent risk. Let's add defensive zeroing, consistent with how packedBytes is zeroed on short reads.

…avoid per-call buffer allocation and `DataInputStream` wrapping

arouel · 2026-06-16T16:16:16Z

I've left one fix that addresses a concern is that currentBuffer retains stale values from a previous larger packed run in positions beyond currentCount. While those positions are never accessed today (the indexing is bounded by currentBufferLength), it's a latent risk. Let's add defensive zeroing, consistent with how packedBytes is zeroed on short reads.

Thank you for the review. I applied the code changes as suggested.

arouel force-pushed the rle-buffer-reuse branch from 00bc776 to 75979e3 Compare April 6, 2026 20:01

arouel changed the title ~~GH-3466 Improve RunLengthBitPackingHybridDecoder.readNext to avoid per-call buffer allocation and DataInputStream wrapping~~ GH-3466: Improve RunLengthBitPackingHybridDecoder.readNext to avoid per-call buffer allocation and DataInputStream wrapping Apr 12, 2026

arouel force-pushed the rle-buffer-reuse branch from 75979e3 to 0081ccf Compare April 17, 2026 20:30

arouel mentioned this pull request Apr 21, 2026

GH-3522: Reuse intermediate buffers in RunLengthBitPackingHybridDecoder PACKED path (~22% throughput on dictionary-id decode) #3523

Closed

Fokko reviewed Apr 28, 2026

View reviewed changes

Comment thread ...umn/src/main/java/org/apache/parquet/column/values/rle/RunLengthBitPackingHybridDecoder.java

Comment thread ...umn/src/main/java/org/apache/parquet/column/values/rle/RunLengthBitPackingHybridDecoder.java

arouel added 2 commits June 16, 2026 18:10

apacheGH-3466 Improve RunLengthBitPackingHybridDecoder.readNext to …

a4859f3

…avoid per-call buffer allocation and `DataInputStream` wrapping

Update RunLengthBitPackingHybridDecoder.java

26fd855

arouel force-pushed the rle-buffer-reuse branch from 0081ccf to 26fd855 Compare June 16, 2026 16:14

arouel requested a review from Fokko June 16, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-3466: Improve `RunLengthBitPackingHybridDecoder.readNext` to avoid per-call buffer allocation and `DataInputStream` wrapping#3467

GH-3466: Improve `RunLengthBitPackingHybridDecoder.readNext` to avoid per-call buffer allocation and `DataInputStream` wrapping#3467
arouel wants to merge 2 commits into
apache:masterfrom
arouel:rle-buffer-reuse

arouel commented Apr 6, 2026 •

edited

Loading

Uh oh!

Fokko Apr 28, 2026

Uh oh!

Fokko left a comment

Uh oh!

Uh oh!

Uh oh!

arouel commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arouel commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Fokko Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Fokko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arouel commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arouel commented Apr 6, 2026 •

edited

Loading