solidity: short-circuit bcs_serialize_len for x < 128 by deuszx · Pull Request #93 · zefchain/serde-reflection

deuszx · 2026-05-14T11:05:34Z

Summary

Replace the unconditional bytes memory result; + chained
abi.encodePacked(result, entry) loop with a direct single-byte
allocation for the common (x < 128) case. The multi-byte path
counts the required bytes once, allocates the buffer once, then
writes each LEB128 byte in place — removing the per-byte
reallocation that abi.encodePacked does on every iteration.

The count-loop and emit-loop arithmetic is wrapped in unchecked { }:

count tops out at 37 even for type(uint256).max,
the emit index is bounded by last = count - 1, so neither can
overflow.

count - 1 is hoisted into last so the for-loop condition does not
recompute it each iteration.

Benchmarks

Measured with forge test --gas-report, via_ir = true,
optimizer_runs = 200, solc 0.8.33. Numbers are per external call
to a small harness that delegates to the library; subtract a ~600-gas
baseline for the external call wiring to read per-bcs_serialize_len
cost.

Input `x`	LEB128 bytes	Old gas	New gas	Δ
`1`	1	6051	6293	+242
`127`	1	6348	6177	−171
`128`	2	6736	6619	−117
`16383`	2	6758	6597	−161
`16384`	3	7121	6945	−176
`2^32 − 1`	5	7825	7355	−470
`type(uint256).max`	37	21659	14865	−6794

Honest read of the table:

Tiny inputs (x = 1) are slightly slower under the new form
(+242 gas). The old form starts from bytes memory result; — a
zero-length pointer to the canonical empty bytes, no allocation —
and only allocates once inside abi.encodePacked at the very end.
The new form's new bytes(1) + result[0] = … does an extra
indexed-assignment with a bounds check.
Boundary and small multi-byte inputs (127–16384) shave
~120–180 gas — modest but consistent.
The win scales sharply with byte count. For
type(uint256).max (37-byte LEB128) the new form is
~6.8 K gas cheaper, because the old abi.encodePacked(result, entry)
loop reallocates and copies result once per byte (O(N²) memory
traffic), while the new form allocates once.

Deployed bytecode (harness contract that inlines the library):

Form	Bytes	Deployment gas
Old	451	145 183
New	603	177 994
Δ	+152	+32 811

Practical impact for linera-bridge: typical certificate length
prefixes encode small vec / string lengths (the x < 128 and 2-byte
paths). Those cases shift between +242 and −180 gas. The big
wins only materialize for unusually large length prefixes that the
hot path does not encounter often.

Verdict: the change is well-motivated for correctness reasons
(removes the O(N²) memory traffic and the redundant require on the
multi-byte path) and is a clear win on large inputs, but the gas
savings on the dominant x < 128 case are essentially zero and
modestly negative on x = 1. Treat the PR as a cleanup + future-
proofing for large lengths, not as a hot-path optimization.

Reproduce the benchmark

Save the two libraries below as Old.sol and New.sol (each
exposes a OldHarness / NewHarness contract with a single
external ser(uint256) method that delegates to the library).

Create foundry.toml:

[profile.default]
src = "."
out = "out"
test = "test"
via_ir = true
optimizer = true
optimizer_runs = 200

Save the test harness as test/Bench.t.sol:

// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.0;

import "../Old.sol";
import "../New.sol";

contract BenchTest {
    OldHarness o = new OldHarness();
    NewHarness n = new NewHarness();

    function test_old_len_1()      public view { o.ser(1); }
    function test_new_len_1()      public view { n.ser(1); }
    function test_old_len_127()    public view { o.ser(127); }
    function test_new_len_127()    public view { n.ser(127); }
    function test_old_len_128()    public view { o.ser(128); }
    function test_new_len_128()    public view { n.ser(128); }
    function test_old_len_16383()  public view { o.ser(16383); }
    function test_new_len_16383()  public view { n.ser(16383); }
    function test_old_len_16384()  public view { o.ser(16384); }
    function test_new_len_16384()  public view { n.ser(16384); }
    function test_old_len_2pow32() public view { o.ser(4294967295); }
    function test_new_len_2pow32() public view { n.ser(4294967295); }
    function test_old_len_max256() public view { o.ser(type(uint256).max); }
    function test_new_len_max256() public view { n.ser(type(uint256).max); }
}

Run forge test --gas-report and read the per-function gas table
for OldHarness / NewHarness. Runtime-bytecode size comes from
solc --via-ir --optimize --bin-runtime (or
forge inspect <name> deployedBytecode).

Test Plan

test_varint_length_boundaries round-trips Vec<u8> payloads at
the 1/2/3-byte LEB128 boundaries
(len = 1, 127, 128, 129, 16383, 16384, 16385).
test_varint_unchecked_loop_coverage calls bcs_serialize_len and
bcs_deserialize_offset_len directly with 20 values spanning every
LEB128 byte count from 1 up to 37 (type(uint256).max), exercising
every iteration depth of both unchecked loops and asserting the
encoded byte count and round-trip value per case.

Replace the unconditional `bytes memory result;` + chained `abi.encodePacked(result, entry)` loop with a direct single-byte allocation for the common (x < 128) case. The multi-byte path counts the required bytes once, allocates the buffer once, then writes each LEB128 byte in place — removing the per-byte reallocation that `abi.encodePacked` does on every iteration. The count-loop and emit-loop arithmetic is wrapped in `unchecked { }`: count tops out at 37 even for `type(uint256).max` and the emit index is bounded by `last = count - 1`, so neither can overflow. `count - 1` is hoisted into `last` so the for-loop condition does not recompute it each iteration. Coverage: * `test_varint_length_boundaries` round-trips Vec<u8> payloads at the 1/2/3-byte LEB128 boundaries (len = 1, 127, 128, 129, 16383, 16384, 16385). * `test_varint_unchecked_loop_coverage` calls `bcs_serialize_len` and `bcs_deserialize_offset_len` directly with 20 values spanning every LEB128 byte count from 1 up to 37 (`type(uint256).max`), exercising every iteration depth of both unchecked loops and asserting the encoded byte count and round-trip value per case.

deuszx · 2026-05-14T13:16:18Z

Verdict: the change is well-motivated for correctness reasons
(removes the O(N²) memory traffic and the redundant require on the
multi-byte path) and is a clear win on large inputs, but the gas
savings on the dominant x < 128 case are essentially zero and
modestly negative on x = 1.

deuszx requested a review from ma2bd as a code owner May 14, 2026 11:05

deuszx force-pushed the solidity-pr-2-serialize-len-shortcircuit branch 2 times, most recently from 99d814f to f821a52 Compare May 14, 2026 11:18

deuszx force-pushed the solidity-pr-2-serialize-len-shortcircuit branch from f821a52 to eec7b31 Compare May 14, 2026 12:56

deuszx closed this May 14, 2026

deuszx deleted the solidity-pr-2-serialize-len-shortcircuit branch May 14, 2026 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solidity: short-circuit bcs_serialize_len for x < 128#93

solidity: short-circuit bcs_serialize_len for x < 128#93
deuszx wants to merge 1 commit into
zefchain:mainfrom
deuszx:solidity-pr-2-serialize-len-shortcircuit

deuszx commented May 14, 2026 •

edited

Loading

Uh oh!

deuszx commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deuszx commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

Reproduce the benchmark

Test Plan

Uh oh!

deuszx commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

deuszx commented May 14, 2026 •

edited

Loading