fix(moe-shards): guard empty q4 dense FFN + wire --metal (closes #151) by deem0n · Pull Request #152 · chrishayuk/larql

deem0n · 2026-05-27T07:55:09Z

Summary

Two fixes for larql run --moe-shards on pure-MoE models (e.g. Gemma-4 26B A4B), reported in #151:

Empty interleaved_q4k.bin panic guard. resolve_ffn_weights panicked with range end index 3345408 out of range for slice of length 0 when the dense FFN mmap was 0 bytes. Pure-MoE vindexes ship no dense FFN tensor, so the fallback branch sliced into an empty mmap. Now guarded with q4_ffn_mmap.is_empty(): returns empty QuantWeight stubs — patch_pipeline_layers_for_remote_moe overwrites them downstream and moe_fn supersedes the dense FFN path during decode, so the stubs are never read.
--metal not wired into run_with_moe_shards. Always used default_backend() instead of honouring the CLI flag. Mirrors the pattern PR fix: wire --metal into remote FFN path, add post-FFN norms, flush stdout (cherry-pick of #115) #122 already applied to run_with_remote_ffn: explicit metal_backend() with CPU fallback on init failure, and a clear error when the gpu feature isn't compiled in.

Both fixes were needed to get Gemma-4 26B A4B running with --moe-shards.

Files changed

crates/larql-compute/src/pipeline_layer.rs — is_empty() guard + regression test
crates/larql-cli/src/commands/primary/run_cmd.rs — new metal: bool param on run_with_moe_shards, mirroring run_with_remote_ffn's backend init, threaded from args.metal at the call site

Net diff: 2 files, +60 / -1.

Test plan

Local verification on macOS, branched off current chrishayuk/larql:main (post-#145):

cargo check -p larql-compute -p larql-cli --lib --tests — clean
cargo clippy -p larql-compute -p larql-cli --lib --tests --no-deps -- -D warnings — clean
cargo fmt -p larql-compute -p larql-cli -- --check — clean
cargo test -p larql-compute --lib — 657 passed (includes new resolve_ffn_weights_returns_empty_stubs_when_q4_ffn_mmap_is_empty regression)
cargo test -p larql-cli --tests — 3 passed (1 ignored, model-heavy)

CI verification (all green — 15/15)

🤖 Generated with Claude Code

…shayuk#151) Two fixes for `larql run --moe-shards` on pure-MoE models (e.g. Gemma-4 26B A4B), reported in chrishayuk#151: 1. `resolve_ffn_weights` panicked with `range end index 3345408 out of range for slice of length 0` when `interleaved_q4k.bin` is 0 bytes. Pure-MoE vindexes ship no dense FFN tensor, so the fallback branch sliced into an empty mmap. Guard with `q4_ffn_mmap.is_empty()` and return empty `QuantWeight` stubs — `patch_pipeline_layers_for_remote_moe` overwrites them downstream and `moe_fn` supersedes the dense FFN path during decode, so the stubs are never read. 2. `--metal` was not wired into `run_with_moe_shards`; it always used `default_backend()` instead of letting the CLI flag select Metal. Mirror the same pattern PR chrishayuk#122 applied to `run_with_remote_ffn`: explicit `metal_backend()` with CPU fallback on init failure, and a clear error when the `gpu` feature isn't compiled in. Both were needed to get Gemma-4 26B A4B running with `--moe-shards`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chrishayuk merged commit 270269c into chrishayuk:main May 27, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(moe-shards): guard empty q4 dense FFN + wire --metal (closes #151)#152

fix(moe-shards): guard empty q4 dense FFN + wire --metal (closes #151)#152
chrishayuk merged 1 commit into
chrishayuk:mainfrom
deem0n:fix/moe-shards-pure-moe-and-metal

deem0n commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

deem0n commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

CI verification (all green — 15/15)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

deem0n commented May 27, 2026 •

edited

Loading