solidity: short-circuit bcs_serialize_len for x < 128#93
Closed
deuszx wants to merge 1 commit into
Closed
Conversation
99d814f to
f821a52
Compare
Replace the unconditional `bytes memory result;` + chained
`abi.encodePacked(result, entry)` loop with a direct single-byte
allocation for the common (x < 128) case. The multi-byte path counts
the required bytes once, allocates the buffer once, then writes each
LEB128 byte in place — removing the per-byte reallocation that
`abi.encodePacked` does on every iteration.
The count-loop and emit-loop arithmetic is wrapped in `unchecked { }`:
count tops out at 37 even for `type(uint256).max` and the emit
index is bounded by `last = count - 1`, so neither can overflow.
`count - 1` is hoisted into `last` so the for-loop condition does not
recompute it each iteration.
Coverage:
* `test_varint_length_boundaries` round-trips Vec<u8> payloads at the
1/2/3-byte LEB128 boundaries (len = 1, 127, 128, 129, 16383,
16384, 16385).
* `test_varint_unchecked_loop_coverage` calls `bcs_serialize_len` and
`bcs_deserialize_offset_len` directly with 20 values spanning every
LEB128 byte count from 1 up to 37 (`type(uint256).max`), exercising
every iteration depth of both unchecked loops and asserting the
encoded byte count and round-trip value per case.
f821a52 to
eec7b31
Compare
Contributor
Author
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the unconditional
bytes memory result;+ chainedabi.encodePacked(result, entry)loop with a direct single-byteallocation for the common (
x < 128) case. The multi-byte pathcounts the required bytes once, allocates the buffer once, then
writes each LEB128 byte in place — removing the per-byte
reallocation that
abi.encodePackeddoes on every iteration.The count-loop and emit-loop arithmetic is wrapped in
unchecked { }:counttops out at 37 even fortype(uint256).max,last = count - 1, so neither canoverflow.
count - 1is hoisted intolastso the for-loop condition does notrecompute it each iteration.
Benchmarks
Measured with
forge test --gas-report,via_ir = true,optimizer_runs = 200,solc 0.8.33. Numbers are per external callto a small harness that delegates to the library; subtract a ~600-gas
baseline for the external call wiring to read per-
bcs_serialize_lencost.
x112712816383163842^32 − 1type(uint256).maxHonest read of the table:
x = 1) are slightly slower under the new form(+242 gas). The old form starts from
bytes memory result;— azero-length pointer to the canonical empty
bytes, no allocation —and only allocates once inside
abi.encodePackedat the very end.The new form's
new bytes(1) + result[0] = …does an extraindexed-assignment with a bounds check.
127–16384) shave~120–180 gas — modest but consistent.
type(uint256).max(37-byte LEB128) the new form is~6.8 K gas cheaper, because the old
abi.encodePacked(result, entry)loop reallocates and copies
resultonce per byte (O(N²) memorytraffic), while the new form allocates once.
Deployed bytecode (harness contract that inlines the library):
Practical impact for
linera-bridge: typical certificate lengthprefixes encode small vec / string lengths (the
x < 128and 2-bytepaths). Those cases shift between +242 and −180 gas. The big
wins only materialize for unusually large length prefixes that the
hot path does not encounter often.
Verdict: the change is well-motivated for correctness reasons
(removes the O(N²) memory traffic and the redundant
requireon themulti-byte path) and is a clear win on large inputs, but the gas
savings on the dominant
x < 128case are essentially zero andmodestly negative on
x = 1. Treat the PR as a cleanup + future-proofing for large lengths, not as a hot-path optimization.
Reproduce the benchmark
Save the two libraries below as
Old.solandNew.sol(eachexposes a
OldHarness/NewHarnesscontract with a singleexternal
ser(uint256)method that delegates to the library).Create
foundry.toml:Save the test harness as
test/Bench.t.sol:Run
forge test --gas-reportand read the per-function gas tablefor
OldHarness/NewHarness. Runtime-bytecode size comes fromsolc --via-ir --optimize --bin-runtime(orforge inspect <name> deployedBytecode).Test Plan
test_varint_length_boundariesround-tripsVec<u8>payloads atthe 1/2/3-byte LEB128 boundaries
(
len = 1, 127, 128, 129, 16383, 16384, 16385).test_varint_unchecked_loop_coveragecallsbcs_serialize_lenandbcs_deserialize_offset_lendirectly with 20 values spanning everyLEB128 byte count from 1 up to 37 (
type(uint256).max), exercisingevery iteration depth of both
uncheckedloops and asserting theencoded byte count and round-trip value per case.