Skip to content

perf: small-integer Val.Num cache for arithmetic fast paths#683

Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/num-cache-small-int
Open

perf: small-integer Val.Num cache for arithmetic fast paths#683
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/num-cache-small-int

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 5, 2026

Motivation

Jsonnet programs frequently produce small integer results (0, 1, 2) from arithmetic, array indexing, and loop counters. Each occurrence currently allocates a fresh Val.Num heap object.

Key Design Decision

Cache Val.Num instances for integers 0–255 in a pre-allocated array. When Num.apply(v) is called and v is an integer in this range, return the cached instance. This eliminates millions of micro-allocations in loop-heavy workloads without affecting semantics.

Modification

  • Add a 256-element Val.Num cache array initialized at class load time
  • Modify Val.Num.apply to check the cache before allocating
  • All callers automatically benefit (evaluator, stdlib, etc.)

Benchmark Results

JMH (JVM, 3 iterations warmup + 3 measurement)

Benchmark Master (ms/op) This PR (ms/op) Change
bench.02 50.427 ± 38.9 44.066 ± 2.5 -12.6%
comparison2 85.854 ± 188.7 69.319 ± 14.0 -19.3%
realistic2 73.458 ± 66.7 69.813 ± 1.7 -5.0%

Analysis

The small-integer cache is a well-known optimization pattern (used in Java Integer.valueOf, Python small ints). The tight error bars on the PR results (±2.5 vs master ±38.9) suggest more stable GC behavior due to reduced allocation pressure. The comparison2 benchmark benefits most (-19.3%) due to heavy loop/arithmetic workload.

References

  • Upstream: jit branch experiment a59223af

Result

All tests pass. All benchmarks positive, no regressions.

Add a 256-entry pool of pre-allocated Val.Num instances for integers
0-255 in Val.cachedNum(). Applied to all evaluator arithmetic operations
(+, -, *, /, %, unary, bitwise, shift) and base64DecodeBytes.

Changes:
- Val.scala: Add numCache array[256] and cachedNum() factory method that
  returns cached instance for small non-negative integers, fresh otherwise
- Evaluator.scala: Replace Val.Num(pos, ...) with Val.cachedNum(pos, ...)
  in all arithmetic paths (binary +/-/*/div/mod, unary +/-/~, shift, bitwise)
- EncodingModule.scala: Use cachedNum in base64DecodeBytes (byte values 0-255
  are always cache hits), also convert .map to while-loop

JMH improvements (ms/op):
- base64DecodeBytes: 9.423 → 8.717 (-7.5%)
- bench.03 (fibonacci): 13.399 → 12.473 (-6.9%)
- bench.02: 46.694 → 44.034 (-5.7%)
- realistic2: 70.491 → 67.856 (-3.7%)
- Zero regressions across 35 benchmarks

Native (hyperfine): base64DecodeBytes 41.2ms → 38.7ms (-6.1%)

Upstream: he-pin/sjsonnet jit branch commit a59223a
@He-Pin He-Pin marked this pull request as ready for review April 5, 2026 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant