Skip to content

meta: evaluate cargo-sonic for CPU-dispatched fat binaries on Linux releases #181

@dekobon

Description

@dekobon

Summary

Investigate adopting cargo-sonic for the
release builds of the bca and bca-web binaries so a single distributed
Linux executable contains several CPU-microarchitecture-optimized variants
and dispatches the best one at startup.

Background

cargo-sonic is a Cargo subcommand that produces Linux CPU-dispatched fat
binaries. It compiles the same binary multiple times with different
-C target-cpu flags (e.g. x86-64-v3, znver5, raptorlake), embeds every
payload alongside a tiny no_std loader, and at runtime:

  1. Inspects CPU features via CPUID (x86_64) / auxv (AArch64).
  2. Picks the best-matching payload.
  3. Writes it to a memfd and execveat(AT_EMPTY_PATH)s into it.

The selected variant is observable through the
CARGO_SONIC_SELECTED_TARGET_CPU env var.

Typical invocation:

cargo install cargo-sonic
cargo sonic --target-cpus=x86-64-v3,znver5 build --release

Output lands at target/sonic/<triple>/<profile>/<bin>.

Why this might be a fit

Both binaries published from this workspace are CPU-bound:

  • bca (big-code-analysis-cli) — tree-sitter parsing plus per-language
    metric computation across potentially large source trees. The hot paths
    (AST traversal in src/spaces.rs, the metric implementations under
    src/metrics/, Halstead operator counting, cognitive/cyclomatic
    scoring) are exactly the kind of integer/branch-heavy code that
    benefits from -C target-cpu tuning.
  • bca-web (big-code-analysis-web) — a long-running service doing the
    same work per request; the upstream README explicitly calls out
    "long-running services" and "analytics … engines" as the sweet spot.

A single fat binary keeps distribution simple (one artifact per OS/arch
in releases) while still letting modern CPUs run code tuned for them.

Proposed scope of investigation

  1. Feasibility / build matrix

    • Confirm cargo-sonic works against our edition 2024 + tree-sitter
      C-shim grammar crates (the vendored tree-sitter-ccomment,
      tree-sitter-mozcpp, tree-sitter-mozjs, tree-sitter-preproc,
      and the pinned external grammars).
    • Verify it composes with our existing release profile in the root
      Cargo.toml and any LTO settings.
  2. Benchmark the gain on a representative corpus (e.g. the
    big-code-analysis-output integration submodule or a Mozilla-central
    subtree) for:

    • generic x86-64 (current baseline)
    • x86-64-v3 (the broad modern baseline)
    • one Intel-leaning target (e.g. raptorlake) and one AMD-leaning
      target (e.g. znver4/znver5)
      so we can decide which --target-cpus list is worth shipping.
  3. Binary size / build time tradeoffs

    • Measure the cost of the chosen target-cpu list and whether
      --compress=zstd is worth enabling.
    • Decide whether the release CI runner has enough parallelism for
      --parallelism N.
  4. Release pipeline integration

    • Wire cargo sonic build --release into the Linux release job (only
      cargo-sonic is Linux-only; macOS/Windows artifacts stay on
      plain cargo build).
    • Document the CARGO_SONIC_SELECTED_TARGET_CPU env var so users can
      verify which variant ran.

Non-goals

  • macOS / Windows fat binaries (out of scope for cargo-sonic).
  • Replacing the cargo install big-code-analysis-cli path — that
    continues to produce a host-native single-variant binary.
  • Changing library consumers — this only affects the two distributed
    binaries.

Risks / open questions

  • The loader uses memfd_create + execveat, which require a
    reasonably modern Linux kernel (≥ 3.17 for memfd, ≥ 3.19 for
    execveat). Confirm this matches our minimum-supported Linux.
  • --target-cpus selection is essentially a policy decision; we want
    to balance binary size against coverage of real user CPUs.
  • Need to confirm cargo-sonic plays nicely with our make pre-commit / CI clippy + test gates (only the release artifact step
    should change).

Acceptance

Either:

  • A PR that wires cargo-sonic into the Linux release job with a
    documented --target-cpus list and a one-paragraph note in the
    README / book explaining the fat binary, or
  • A short writeup on this issue concluding the gain doesn't justify
    the added complexity, with numbers to back the decision.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions