Migrate x86 SSE2 SIMD to ARM64 NEON by Copilot · Pull Request #19 · arm/arm-migration-example

Copilot · 2026-04-15T15:35:57Z

All compute-hot paths used x86-only SSE2 intrinsics (<immintrin.h>), making the codebase non-portable to ARM64.

Changes

All 5 compute files (matrix_operations, hash_operations, string_search, memory_operations, polynomial_eval): wrapped existing SSE2 paths in #ifdef __x86_64__, added #elif defined(__aarch64__) NEON paths, and scalar fallbacks
<immintrin.h> → <arm_neon.h> under aarch64 guard
main.cpp: added __aarch64__ NEON detection branch
Docker: ubuntu:22.04 already multi-arch; updated comment only

Bug fixes (discovered during migration)

hash_operations.cpp: _mm_extract_epi16 was called with a runtime-variable lane index — illegal on NEON (lane indices must be compile-time constants). Replaced with vst1q_u8 into a local byte array.
string_search.cpp: no direct NEON equivalent for _mm_movemask_epi8; added neon_movemask_epi8() helper.

Example pattern applied across all files

// Before
#include <immintrin.h>
__m128d a = _mm_loadu_pd(ptr);
__m128d b = _mm_mul_pd(a, scale);

// After
#ifdef __x86_64__
  #include <immintrin.h>
  __m128d a = _mm_loadu_pd(ptr); __m128d b = _mm_mul_pd(a, scale);
#elif defined(__aarch64__)
  #include <arm_neon.h>
  float64x2_t a = vld1q_f64(ptr); float64x2_t b = vmulq_f64(a, scale);
#else
  // scalar fallback
#endif

Add #elif defined(__aarch64__) ARM NEON paths alongside all existing x86-64 SSE2 paths in every translation unit. The scalar fallback is retained for other architectures. Intrinsic mapping applied -------------------------- matrix_operations.cpp __m128d -> float64x2_t _mm_setzero_pd() -> vdupq_n_f64(0.0) _mm_loadu_pd(p) -> vld1q_f64(p) _mm_set_pd(hi,lo) -> local double[2] + vld1q_f64 _mm_mul_pd(a,b) -> vmulq_f64(a,b) _mm_add_pd(a,b) -> vaddq_f64(a,b) _mm_storeu_pd / horiz-add -> vgetq_lane_f64(v,0)+vgetq_lane_f64(v,1) hash_operations.cpp _mm_loadu_si128 -> vld1q_u8 _mm_extract_epi16(v, j/2) [variable lane — ILLEGAL on both SSE2 and NEON when j is a runtime variable] -> fixed on BOTH paths: _mm_storeu_si128 / vst1q_u8 to a local uint8_t[16] array, then iterate. string_search.cpp _mm_set1_epi8(c) -> vdupq_n_u8(c) _mm_loadu_si128 -> vld1q_u8 _mm_cmpeq_epi8(a,b) -> vceqq_u8(a,b) (0xFF/0x00 per lane — same semantics) _mm_movemask_epi8 -> neon_movemask_epi8() helper: vandq_u8 with power-of-2 mask + three rounds of vpadd_u8, then vget_lane_u8 × 2 memory_operations.cpp _mm_loadu_si128 -> vld1q_u8 _mm_storeu_si128 -> vst1q_u8 polynomial_eval.cpp __m128d -> float64x2_t _mm_setzero_pd() -> vdupq_n_f64(0.0) _mm_set1_pd(x) -> vdupq_n_f64(x) _mm_set_pd(hi,lo) -> local double[2] + vld1q_f64 _mm_loadu_pd / coeffs.data()+i -> vld1q_f64(coeffs.data()+i) (contiguous) _mm_mul_pd -> vmulq_f64 _mm_add_pd -> vaddq_f64 _mm_storeu_pd / horiz-add -> vgetq_lane_f64(v,0)+vgetq_lane_f64(v,1) Dockerfile ubuntu:22.04 is a multi-arch manifest (amd64 + arm64) — no base-image change needed. Updated comment to document dual-arch SIMD support. Build flags are already architecture-neutral (-O2 -std=c++11). Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>

…warning build) Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>

Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>

Initial plan

5e5c2df

Copilot AI assigned Copilot and JoeStech Apr 15, 2026

Copilot AI linked an issue Apr 15, 2026 that may be closed by this pull request

Migrate this repo to Arm #18

Open

Copilot started work on behalf of JoeStech April 15, 2026 15:36 View session

Copilot AI and others added 3 commits April 15, 2026 16:15

Fix: remove unused x_vec variable in polynomial_eval SSE2 path (zero-…

8fb222a

…warning build) Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>

Remove binary benchmark artifact

797a239

Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Migrate this repo to Arm using MCP server tools~~ Migrate x86 SSE2 SIMD to ARM64 NEON Apr 15, 2026

Copilot stopped work on behalf of JoeStech due to an error April 15, 2026 16:25
The "Docker pull ARM MCP image" custom setup step from your .github/workflows/copilot-setup-steps.yml file failed.

Copilot AI requested a review from JoeStech April 15, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate x86 SSE2 SIMD to ARM64 NEON#19

Migrate x86 SSE2 SIMD to ARM64 NEON#19
Copilot wants to merge 4 commits intomainfrom
copilot/migrate-repo-to-arm-ad9d0276-d278-4433-836d-95c609abb4b6

Copilot AI commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Bug fixes (discovered during migration)

Example pattern applied across all files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 15, 2026 •

edited

Loading