fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config by He-Pin · Pull Request #705 · databricks/sjsonnet

He-Pin · 2026-04-06T21:46:12Z

Motivation

std.base64DecodeBytes returns incorrect values for bytes >= 128. Java's byte type is signed (-128..127), so decoding base64 data containing high bytes (e.g., 0x80 = 128) produces negative numbers instead of the expected unsigned values (0-255). This violates the Jsonnet specification.

Additionally, the JMH RegressionBenchmark uses only 1 warmup and 1 measurement iteration, producing noisy results with wide confidence intervals that make it difficult to detect real performance changes.

Key Design Decision

Apply the standard Java idiom & 0xff to convert signed bytes to unsigned integers, consistent with the existing encodeUTF8 implementation in StringModule.scala
Increase JMH iterations to 3 warmup + 3 measurement for statistically meaningful results while keeping total runtime reasonable (~20 min for 35 benchmarks)

Modification

Bug Fix: `EncodingModule.scala`

Apply & 0xff mask when converting decoded bytes to Val.Num, ensuring values are in the unsigned range 0-255
Replace .map() with explicit while-loop for consistency with other encoding functions

Benchmark Config: `RegressionBenchmark.scala`

@Warmup(iterations = 3, time = 2) (was iterations = 1)
@Measurement(iterations = 3) (was iterations = 1)

Regression Test: `base64DecodeBytes_unsigned.jsonnet`

Tests boundary values: 0 (0x00), 128 (0x80), 255 (0xFF)
Tests round-trip: base64DecodeBytes(base64([0, 127, 128, 200, 255])) == [0, 127, 128, 200, 255]

Benchmark Results

JMH (3 iterations, back-to-back runs)

No performance impact — this is a correctness fix. The base64DecodeBytes benchmark is unchanged within CI:

Benchmark	Before	After	Δ%
base64DecodeBytes	8.915 ± 1.362 ms/op	9.234 ± 1.372 ms/op	+3.6% (within CI)
base64_byte_array	1.373 ± 0.148 ms/op	1.441 ± 0.137 ms/op	+5.0% (within CI)
base64	0.760 ± 0.066 ms/op	0.752 ± 0.041 ms/op	-1.1% (within CI)
base64Decode	0.571 ± 0.031 ms/op	0.571 ± 0.139 ms/op	+0.0% (within CI)

All confidence intervals overlap — no statistically significant change.

Analysis

The bug manifests when decoding base64 data containing bytes in the range 128-255. For example:

std.base64DecodeBytes("gIA=") should return [128, 128] but returned [-128, -128]
std.base64DecodeBytes("/w==") should return [255] but returned [-1]

The fix is the standard Java unsigned byte conversion: byte & 0xff promotes the signed byte to an int with zero-extension, producing the correct unsigned value.

References

Upstream jit branch commits: b833428e, af4832f2
Consistent with existing pattern in StringModule.scala encodeUTF8: bytes(i) & 0xff

Result

✅ All tests pass (11 suites)
✅ No performance regression
✅ Cross-model review: GPT-5.4 + Opus 4.5 — no blockers
✅ New regression test covers boundary values and round-trip

…onfig Fix std.base64DecodeBytes to correctly return unsigned byte values (0-255) instead of signed Java byte values (-128..127). Java's byte type is signed, so without masking with 0xff, bytes >= 128 (e.g., 0x80 = 128) would appear as negative numbers (e.g., -128), violating the Jsonnet specification. Also increase JMH RegressionBenchmark iterations from 1 to 3 for both warmup and measurement phases, providing more stable and reliable benchmark results with proper confidence intervals. Changes: - EncodingModule.scala: Apply `& 0xff` mask in base64DecodeBytes to convert signed Java bytes to unsigned integers, matching Jsonnet byte semantics - RegressionBenchmark.scala: Increase warmup/measurement iterations to 3 - Add regression test: base64DecodeBytes_unsigned.jsonnet with boundary values (0, 127, 128, 255) and encode/decode round-trip verification Upstream reference: jit branch commits b833428, af4832f

This was referenced Apr 6, 2026

perf: stdlib allocation optimizations — foldl while-loop, join pre-sized, flatten two-pass, reverse direct #695

Closed

perf: While-loop stdlib optimizations + base64DecodeBytes unsigned fix #700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config#705

fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config#705
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:fix/base64-unsigned-jmh-config

He-Pin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 6, 2026

Motivation

Key Design Decision

Modification

Bug Fix: EncodingModule.scala

Benchmark Config: RegressionBenchmark.scala

Regression Test: base64DecodeBytes_unsigned.jsonnet

Benchmark Results

JMH (3 iterations, back-to-back runs)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug Fix: `EncodingModule.scala`

Benchmark Config: `RegressionBenchmark.scala`

Regression Test: `base64DecodeBytes_unsigned.jsonnet`