Skip to content

Migrate x86 SSE2 SIMD to ARM64 NEON#19

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/migrate-repo-to-arm-ad9d0276-d278-4433-836d-95c609abb4b6
Draft

Migrate x86 SSE2 SIMD to ARM64 NEON#19
Copilot wants to merge 4 commits intomainfrom
copilot/migrate-repo-to-arm-ad9d0276-d278-4433-836d-95c609abb4b6

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 15, 2026

All compute-hot paths used x86-only SSE2 intrinsics (<immintrin.h>), making the codebase non-portable to ARM64.

Changes

  • All 5 compute files (matrix_operations, hash_operations, string_search, memory_operations, polynomial_eval): wrapped existing SSE2 paths in #ifdef __x86_64__, added #elif defined(__aarch64__) NEON paths, and scalar fallbacks
  • <immintrin.h><arm_neon.h> under aarch64 guard
  • main.cpp: added __aarch64__ NEON detection branch
  • Docker: ubuntu:22.04 already multi-arch; updated comment only

Bug fixes (discovered during migration)

  • hash_operations.cpp: _mm_extract_epi16 was called with a runtime-variable lane index — illegal on NEON (lane indices must be compile-time constants). Replaced with vst1q_u8 into a local byte array.
  • string_search.cpp: no direct NEON equivalent for _mm_movemask_epi8; added neon_movemask_epi8() helper.

Example pattern applied across all files

// Before
#include <immintrin.h>
__m128d a = _mm_loadu_pd(ptr);
__m128d b = _mm_mul_pd(a, scale);

// After
#ifdef __x86_64__
  #include <immintrin.h>
  __m128d a = _mm_loadu_pd(ptr); __m128d b = _mm_mul_pd(a, scale);
#elif defined(__aarch64__)
  #include <arm_neon.h>
  float64x2_t a = vld1q_f64(ptr); float64x2_t b = vmulq_f64(a, scale);
#else
  // scalar fallback
#endif

Copilot AI linked an issue Apr 15, 2026 that may be closed by this pull request
Copilot AI and others added 3 commits April 15, 2026 16:15
Add #elif defined(__aarch64__) ARM NEON paths alongside all existing
x86-64 SSE2 paths in every translation unit.  The scalar fallback is
retained for other architectures.

Intrinsic mapping applied
--------------------------
matrix_operations.cpp
  __m128d              -> float64x2_t
  _mm_setzero_pd()     -> vdupq_n_f64(0.0)
  _mm_loadu_pd(p)      -> vld1q_f64(p)
  _mm_set_pd(hi,lo)    -> local double[2] + vld1q_f64
  _mm_mul_pd(a,b)      -> vmulq_f64(a,b)
  _mm_add_pd(a,b)      -> vaddq_f64(a,b)
  _mm_storeu_pd / horiz-add -> vgetq_lane_f64(v,0)+vgetq_lane_f64(v,1)

hash_operations.cpp
  _mm_loadu_si128      -> vld1q_u8
  _mm_extract_epi16(v, j/2) [variable lane — ILLEGAL on both SSE2 and NEON
  when j is a runtime variable] -> fixed on BOTH paths: _mm_storeu_si128 /
  vst1q_u8 to a local uint8_t[16] array, then iterate.

string_search.cpp
  _mm_set1_epi8(c)     -> vdupq_n_u8(c)
  _mm_loadu_si128      -> vld1q_u8
  _mm_cmpeq_epi8(a,b)  -> vceqq_u8(a,b)  (0xFF/0x00 per lane — same semantics)
  _mm_movemask_epi8    -> neon_movemask_epi8() helper:
                          vandq_u8 with power-of-2 mask + three rounds of
                          vpadd_u8, then vget_lane_u8 × 2

memory_operations.cpp
  _mm_loadu_si128      -> vld1q_u8
  _mm_storeu_si128     -> vst1q_u8

polynomial_eval.cpp
  __m128d              -> float64x2_t
  _mm_setzero_pd()     -> vdupq_n_f64(0.0)
  _mm_set1_pd(x)       -> vdupq_n_f64(x)
  _mm_set_pd(hi,lo)    -> local double[2] + vld1q_f64
  _mm_loadu_pd / coeffs.data()+i -> vld1q_f64(coeffs.data()+i) (contiguous)
  _mm_mul_pd           -> vmulq_f64
  _mm_add_pd           -> vaddq_f64
  _mm_storeu_pd / horiz-add -> vgetq_lane_f64(v,0)+vgetq_lane_f64(v,1)

Dockerfile
  ubuntu:22.04 is a multi-arch manifest (amd64 + arm64) — no base-image
  change needed.  Updated comment to document dual-arch SIMD support.
  Build flags are already architecture-neutral (-O2 -std=c++11).

Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
…warning build)

Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>
Copilot AI changed the title [WIP] Migrate this repo to Arm using MCP server tools Migrate x86 SSE2 SIMD to ARM64 NEON Apr 15, 2026
Copilot AI requested a review from JoeStech April 15, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate this repo to Arm

2 participants