Summary
Investigate adopting cargo-sonic for the
release builds of the bca and bca-web binaries so a single distributed
Linux executable contains several CPU-microarchitecture-optimized variants
and dispatches the best one at startup.
Background
cargo-sonic is a Cargo subcommand that produces Linux CPU-dispatched fat
binaries. It compiles the same binary multiple times with different
-C target-cpu flags (e.g. x86-64-v3, znver5, raptorlake), embeds every
payload alongside a tiny no_std loader, and at runtime:
- Inspects CPU features via CPUID (x86_64) / auxv (AArch64).
- Picks the best-matching payload.
- Writes it to a
memfd and execveat(AT_EMPTY_PATH)s into it.
The selected variant is observable through the
CARGO_SONIC_SELECTED_TARGET_CPU env var.
Typical invocation:
cargo install cargo-sonic
cargo sonic --target-cpus=x86-64-v3,znver5 build --release
Output lands at target/sonic/<triple>/<profile>/<bin>.
Why this might be a fit
Both binaries published from this workspace are CPU-bound:
bca (big-code-analysis-cli) — tree-sitter parsing plus per-language
metric computation across potentially large source trees. The hot paths
(AST traversal in src/spaces.rs, the metric implementations under
src/metrics/, Halstead operator counting, cognitive/cyclomatic
scoring) are exactly the kind of integer/branch-heavy code that
benefits from -C target-cpu tuning.
bca-web (big-code-analysis-web) — a long-running service doing the
same work per request; the upstream README explicitly calls out
"long-running services" and "analytics … engines" as the sweet spot.
A single fat binary keeps distribution simple (one artifact per OS/arch
in releases) while still letting modern CPUs run code tuned for them.
Proposed scope of investigation
-
Feasibility / build matrix
- Confirm
cargo-sonic works against our edition 2024 + tree-sitter
C-shim grammar crates (the vendored tree-sitter-ccomment,
tree-sitter-mozcpp, tree-sitter-mozjs, tree-sitter-preproc,
and the pinned external grammars).
- Verify it composes with our existing release profile in the root
Cargo.toml and any LTO settings.
-
Benchmark the gain on a representative corpus (e.g. the
big-code-analysis-output integration submodule or a Mozilla-central
subtree) for:
- generic
x86-64 (current baseline)
x86-64-v3 (the broad modern baseline)
- one Intel-leaning target (e.g.
raptorlake) and one AMD-leaning
target (e.g. znver4/znver5)
so we can decide which --target-cpus list is worth shipping.
-
Binary size / build time tradeoffs
- Measure the cost of the chosen target-cpu list and whether
--compress=zstd is worth enabling.
- Decide whether the release CI runner has enough parallelism for
--parallelism N.
-
Release pipeline integration
- Wire
cargo sonic build --release into the Linux release job (only
— cargo-sonic is Linux-only; macOS/Windows artifacts stay on
plain cargo build).
- Document the
CARGO_SONIC_SELECTED_TARGET_CPU env var so users can
verify which variant ran.
Non-goals
- macOS / Windows fat binaries (out of scope for
cargo-sonic).
- Replacing the
cargo install big-code-analysis-cli path — that
continues to produce a host-native single-variant binary.
- Changing library consumers — this only affects the two distributed
binaries.
Risks / open questions
- The loader uses
memfd_create + execveat, which require a
reasonably modern Linux kernel (≥ 3.17 for memfd, ≥ 3.19 for
execveat). Confirm this matches our minimum-supported Linux.
--target-cpus selection is essentially a policy decision; we want
to balance binary size against coverage of real user CPUs.
- Need to confirm
cargo-sonic plays nicely with our make pre-commit / CI clippy + test gates (only the release artifact step
should change).
Acceptance
Either:
- A PR that wires
cargo-sonic into the Linux release job with a
documented --target-cpus list and a one-paragraph note in the
README / book explaining the fat binary, or
- A short writeup on this issue concluding the gain doesn't justify
the added complexity, with numbers to back the decision.
Summary
Investigate adopting
cargo-sonicfor therelease builds of the
bcaandbca-webbinaries so a single distributedLinux executable contains several CPU-microarchitecture-optimized variants
and dispatches the best one at startup.
Background
cargo-sonicis a Cargo subcommand that produces Linux CPU-dispatched fatbinaries. It compiles the same binary multiple times with different
-C target-cpuflags (e.g.x86-64-v3,znver5,raptorlake), embeds everypayload alongside a tiny
no_stdloader, and at runtime:memfdandexecveat(AT_EMPTY_PATH)s into it.The selected variant is observable through the
CARGO_SONIC_SELECTED_TARGET_CPUenv var.Typical invocation:
Output lands at
target/sonic/<triple>/<profile>/<bin>.Why this might be a fit
Both binaries published from this workspace are CPU-bound:
bca(big-code-analysis-cli) — tree-sitter parsing plus per-languagemetric computation across potentially large source trees. The hot paths
(AST traversal in
src/spaces.rs, the metric implementations undersrc/metrics/, Halstead operator counting, cognitive/cyclomaticscoring) are exactly the kind of integer/branch-heavy code that
benefits from
-C target-cputuning.bca-web(big-code-analysis-web) — a long-running service doing thesame work per request; the upstream README explicitly calls out
"long-running services" and "analytics … engines" as the sweet spot.
A single fat binary keeps distribution simple (one artifact per OS/arch
in releases) while still letting modern CPUs run code tuned for them.
Proposed scope of investigation
Feasibility / build matrix
cargo-sonicworks against our edition 2024 + tree-sitterC-shim grammar crates (the vendored
tree-sitter-ccomment,tree-sitter-mozcpp,tree-sitter-mozjs,tree-sitter-preproc,and the pinned external grammars).
Cargo.tomland any LTO settings.Benchmark the gain on a representative corpus (e.g. the
big-code-analysis-outputintegration submodule or a Mozilla-centralsubtree) for:
x86-64(current baseline)x86-64-v3(the broad modern baseline)raptorlake) and one AMD-leaningtarget (e.g.
znver4/znver5)so we can decide which
--target-cpuslist is worth shipping.Binary size / build time tradeoffs
--compress=zstdis worth enabling.--parallelism N.Release pipeline integration
cargo sonic build --releaseinto the Linux release job (only—
cargo-sonicis Linux-only; macOS/Windows artifacts stay onplain
cargo build).CARGO_SONIC_SELECTED_TARGET_CPUenv var so users canverify which variant ran.
Non-goals
cargo-sonic).cargo install big-code-analysis-clipath — thatcontinues to produce a host-native single-variant binary.
binaries.
Risks / open questions
memfd_create+execveat, which require areasonably modern Linux kernel (≥ 3.17 for memfd, ≥ 3.19 for
execveat). Confirm this matches our minimum-supported Linux.--target-cpusselection is essentially a policy decision; we wantto balance binary size against coverage of real user CPUs.
cargo-sonicplays nicely with ourmake pre-commit/ CI clippy + test gates (only the release artifact stepshould change).
Acceptance
Either:
cargo-sonicinto the Linux release job with adocumented
--target-cpuslist and a one-paragraph note in theREADME / book explaining the fat binary, or
the added complexity, with numbers to back the decision.