Skip to content

fix(lance-linalg): gate f32 cosine dispatch behind cfg(target_feature=avx2)#3

Merged
tobocop2 merged 1 commit into
tobocop2:fix/runtime-simd-multiversionfrom
stumpylog:cosine-batch-dispatch-fix
Jun 11, 2026
Merged

fix(lance-linalg): gate f32 cosine dispatch behind cfg(target_feature=avx2)#3
tobocop2 merged 1 commit into
tobocop2:fix/runtime-simd-multiversionfrom
stumpylog:cosine-batch-dispatch-fix

Conversation

@stumpylog

Copy link
Copy Markdown
Collaborator

Follow-up from the benchmarks I posted on lance-format#6630 (lance-format#6630 (comment)): the per-vector dispatch regressed the f32 cosine batch path +24-36% on AVX2-baseline builds.

This gates the f32 cosine dispatch behind cfg(target_feature = "avx2"):

  • AVX2-baseline builds (the default haswell wheel): inlined base-equivalent kernels plus a single per-batch AVX-512 check (large dims only) — no per-vector dispatch tax.
  • Sub-AVX2 builds: your runtime dispatch path, unchanged.

Same-session benchmarks (Broadwell AVX2 + Zen4 AVX-512): dim8 flat on both boxes; dim1024 +6% on Broadwell (codegen-unit residual, not dispatch) and -7% on Zen4 (the AVX-512 tier engages on the default wheel for the first time).

Scope: just the cosine path. dot/l2/norm_l2 have the same shape (l2 measured +6.4%); happy to extend the gate there in this PR or as a follow-up, whichever you prefer.

🤖 Generated with Claude Code

…=avx2)

On AVX2-baseline builds (the default haswell wheel), cosine_batch uses inlined
non-target_feature kernels (base-equivalent) plus a single per-batch AVX-512
check, avoiding the per-vector runtime-dispatch + target_feature tax that
regressed the modern path. Sub-AVX2 builds keep the runtime dispatch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the bug Something isn't working label Jun 10, 2026
@tobocop2

Copy link
Copy Markdown
Owner

awesome, thank you!

@tobocop2 tobocop2 merged commit 8577f3b into tobocop2:fix/runtime-simd-multiversion Jun 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants