Skip to content

solidity: short-circuit bcs_serialize_len for x < 128#93

Closed
deuszx wants to merge 1 commit into
zefchain:mainfrom
deuszx:solidity-pr-2-serialize-len-shortcircuit
Closed

solidity: short-circuit bcs_serialize_len for x < 128#93
deuszx wants to merge 1 commit into
zefchain:mainfrom
deuszx:solidity-pr-2-serialize-len-shortcircuit

Conversation

@deuszx
Copy link
Copy Markdown
Contributor

@deuszx deuszx commented May 14, 2026

Summary

Replace the unconditional bytes memory result; + chained
abi.encodePacked(result, entry) loop with a direct single-byte
allocation for the common (x < 128) case. The multi-byte path
counts the required bytes once, allocates the buffer once, then
writes each LEB128 byte in place — removing the per-byte
reallocation that abi.encodePacked does on every iteration.

The count-loop and emit-loop arithmetic is wrapped in unchecked { }:

  • count tops out at 37 even for type(uint256).max,
  • the emit index is bounded by last = count - 1, so neither can
    overflow.

count - 1 is hoisted into last so the for-loop condition does not
recompute it each iteration.

Benchmarks

Measured with forge test --gas-report, via_ir = true,
optimizer_runs = 200, solc 0.8.33. Numbers are per external call
to a small harness that delegates to the library; subtract a ~600-gas
baseline for the external call wiring to read per-bcs_serialize_len
cost.

Input x LEB128 bytes Old gas New gas Δ
1 1 6051 6293 +242
127 1 6348 6177 −171
128 2 6736 6619 −117
16383 2 6758 6597 −161
16384 3 7121 6945 −176
2^32 − 1 5 7825 7355 −470
type(uint256).max 37 21659 14865 −6794

Honest read of the table:

  • Tiny inputs (x = 1) are slightly slower under the new form
    (+242 gas). The old form starts from bytes memory result; — a
    zero-length pointer to the canonical empty bytes, no allocation —
    and only allocates once inside abi.encodePacked at the very end.
    The new form's new bytes(1) + result[0] = … does an extra
    indexed-assignment with a bounds check.
  • Boundary and small multi-byte inputs (12716384) shave
    ~120–180 gas
    — modest but consistent.
  • The win scales sharply with byte count. For
    type(uint256).max (37-byte LEB128) the new form is
    ~6.8 K gas cheaper, because the old abi.encodePacked(result, entry)
    loop reallocates and copies result once per byte (O(N²) memory
    traffic), while the new form allocates once.

Deployed bytecode (harness contract that inlines the library):

Form Bytes Deployment gas
Old 451 145 183
New 603 177 994
Δ +152 +32 811

Practical impact for linera-bridge: typical certificate length
prefixes encode small vec / string lengths (the x < 128 and 2-byte
paths). Those cases shift between +242 and −180 gas. The big
wins only materialize for unusually large length prefixes that the
hot path does not encounter often.

Verdict: the change is well-motivated for correctness reasons
(removes the O(N²) memory traffic and the redundant require on the
multi-byte path) and is a clear win on large inputs, but the gas
savings on the dominant x < 128 case are essentially zero and
modestly negative on x = 1. Treat the PR as a cleanup + future-
proofing for large lengths, not as a hot-path optimization.

Reproduce the benchmark

  1. Save the two libraries below as Old.sol and New.sol (each
    exposes a OldHarness / NewHarness contract with a single
    external ser(uint256) method that delegates to the library).

  2. Create foundry.toml:

    [profile.default]
    src = "."
    out = "out"
    test = "test"
    via_ir = true
    optimizer = true
    optimizer_runs = 200
  3. Save the test harness as test/Bench.t.sol:

    // SPDX-License-Identifier: UNLICENSED
    pragma solidity ^0.8.0;
    
    import "../Old.sol";
    import "../New.sol";
    
    contract BenchTest {
        OldHarness o = new OldHarness();
        NewHarness n = new NewHarness();
    
        function test_old_len_1()      public view { o.ser(1); }
        function test_new_len_1()      public view { n.ser(1); }
        function test_old_len_127()    public view { o.ser(127); }
        function test_new_len_127()    public view { n.ser(127); }
        function test_old_len_128()    public view { o.ser(128); }
        function test_new_len_128()    public view { n.ser(128); }
        function test_old_len_16383()  public view { o.ser(16383); }
        function test_new_len_16383()  public view { n.ser(16383); }
        function test_old_len_16384()  public view { o.ser(16384); }
        function test_new_len_16384()  public view { n.ser(16384); }
        function test_old_len_2pow32() public view { o.ser(4294967295); }
        function test_new_len_2pow32() public view { n.ser(4294967295); }
        function test_old_len_max256() public view { o.ser(type(uint256).max); }
        function test_new_len_max256() public view { n.ser(type(uint256).max); }
    }
  4. Run forge test --gas-report and read the per-function gas table
    for OldHarness / NewHarness. Runtime-bytecode size comes from
    solc --via-ir --optimize --bin-runtime (or
    forge inspect <name> deployedBytecode).

Test Plan

  • test_varint_length_boundaries round-trips Vec<u8> payloads at
    the 1/2/3-byte LEB128 boundaries
    (len = 1, 127, 128, 129, 16383, 16384, 16385).
  • test_varint_unchecked_loop_coverage calls bcs_serialize_len and
    bcs_deserialize_offset_len directly with 20 values spanning every
    LEB128 byte count from 1 up to 37 (type(uint256).max), exercising
    every iteration depth of both unchecked loops and asserting the
    encoded byte count and round-trip value per case.

@deuszx deuszx requested a review from ma2bd as a code owner May 14, 2026 11:05
@deuszx deuszx force-pushed the solidity-pr-2-serialize-len-shortcircuit branch 2 times, most recently from 99d814f to f821a52 Compare May 14, 2026 11:18
Replace the unconditional `bytes memory result;` + chained
`abi.encodePacked(result, entry)` loop with a direct single-byte
allocation for the common (x < 128) case. The multi-byte path counts
the required bytes once, allocates the buffer once, then writes each
LEB128 byte in place — removing the per-byte reallocation that
`abi.encodePacked` does on every iteration.

The count-loop and emit-loop arithmetic is wrapped in `unchecked { }`:
count tops out at 37 even for `type(uint256).max` and the emit
index is bounded by `last = count - 1`, so neither can overflow.
`count - 1` is hoisted into `last` so the for-loop condition does not
recompute it each iteration.

Coverage:
* `test_varint_length_boundaries` round-trips Vec<u8> payloads at the
  1/2/3-byte LEB128 boundaries (len = 1, 127, 128, 129, 16383,
  16384, 16385).
* `test_varint_unchecked_loop_coverage` calls `bcs_serialize_len` and
  `bcs_deserialize_offset_len` directly with 20 values spanning every
  LEB128 byte count from 1 up to 37 (`type(uint256).max`), exercising
  every iteration depth of both unchecked loops and asserting the
  encoded byte count and round-trip value per case.
@deuszx deuszx force-pushed the solidity-pr-2-serialize-len-shortcircuit branch from f821a52 to eec7b31 Compare May 14, 2026 12:56
@deuszx
Copy link
Copy Markdown
Contributor Author

deuszx commented May 14, 2026

Verdict: the change is well-motivated for correctness reasons
(removes the O(N²) memory traffic and the redundant require on the
multi-byte path) and is a clear win on large inputs, but the gas
savings on the dominant x < 128 case are essentially zero and
modestly negative on x = 1.

@deuszx deuszx closed this May 14, 2026
@deuszx deuszx deleted the solidity-pr-2-serialize-len-shortcircuit branch May 14, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant