perf: stdlib allocation optimizations — foldl while-loop, join pre-sized, flatten two-pass, reverse direct#695
Closed
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Closed
Conversation
…zed, flatten two-pass, reverse direct Optimize hot-path stdlib functions to reduce allocation and improve throughput: - flattenArrays: Two-pass approach counting total elements first, then using System.arraycopy for each sub-array (eliminates ArrayBuilder resizing) - flattenDeepArray: while-loop instead of foreach for initial fill - reverse: Direct reverse-copy into new array instead of .reverse - foldl (array path): Convert for-loop to while-loop, cache pos.noOffset - foldl (string path): Cache pos.noOffset in local - join (string path): Pre-size StringBuilder based on estimated element length - join (array path, empty separator): Two-pass pre-sized with arraycopy - join (array path, non-empty separator): while-loop with better sizeHint Add targeted regression tests for flattenArrays, reverse, join, and foldl covering edge cases (empty arrays, nulls, single elements). Upstream: 4fa535fb
Contributor
Author
|
Closing: superseded by consolidated stdlib optimization effort. The base64DecodeBytes unsigned byte fix has been extracted to #705. Performance optimizations from this PR will be resubmitted in a consolidated PR with comprehensive native benchmarks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Several hot-path stdlib functions (
std.flattenArrays,std.reverse,std.foldl,std.join) use Scala collection patterns that allocate unnecessary intermediate objects:ArrayBuilderwith rough size hints,.reversecreating new collections,for-comprehension iterator/lambda allocations, and un-sizedStringBuilder.These functions are called millions of times in benchmarks like
foldl,reverse,comparison2, andrealistic2. Reducing per-call allocation overhead yields measurable throughput gains.Key Design Decisions
flattenArraysandjoin(empty separator), count total elements first, then allocate the exact-sized result array and fill withSystem.arraycopy. EliminatesArrayBuilderresize/copy cycles entirely.for (x <- arr)and.foreachwith index-basedwhileloops to avoid iterator/lambda allocation overhead.pos.noOffsetout of tight loops into a local variable to avoid repeated method dispatch.arr.length * (separator.length + 8)to avoid StringBuilder growth copies.Modification
ArrayModule.scala:
FlattenArrays: Two-pass (count → pre-sized array +System.arraycopy)FlattenDeepArrays:foreach→while-loop for initial deque fillReverse:.reverse→ manual reverse-copywhile-loopFoldl(array path):for-loop →while-loop + cachepos.noOffsetFoldl(string path): Cachepos.noOffsetin localStringModule.scala:
Join(string path): Pre-sizeStringBuilderJoin(array path, empty separator): Two-pass pre-sized withSystem.arraycopyJoin(array path, non-empty separator):for-loop →while-loop + bettersizeHintTests: Added
stdlib_alloc_opt.jsonnetcoveringflattenArrays,reverse,join(both paths), andfoldledge cases (empty arrays, nulls, single elements).Benchmark Results
JMH (1 fork, 1 warmup, 1 iteration — full regression suite):
No regressions across all 35 benchmarks.
Targeted realistic2 (5 iterations with error bars): 66.743 ± 0.937 ms/op — confirms no regression.
Analysis
pos.noOffsetcaching avoids repeatedPosition.noOffsetdispatch in tight loop..reversewhich allocates intermediateArraySeq+ copies.References
he-pin/sjsonnet@4fa535fbResult
All 140 tests pass. Consistent improvements across stdlib-heavy benchmarks with no regressions.