You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# This starter workflow is for a CMake project running on multiple platforms. There is a different starter workflow if you just want a single platform.
# Set fail-fast to false to ensure that feedback is delivered for all matrix combinations. Consider changing this to true when your workflow is stable.
17
14
fail-fast: false
18
15
19
-
# Set up a matrix to run the following 3 configurations:
20
-
# 1. <Windows, Release, latest MSVC compiler toolchain on the default runner image, default generator>
21
-
# 2. <Linux, Release, latest GCC compiler toolchain on the default runner image, default generator>
22
-
# 3. <Linux, Release, latest Clang compiler toolchain on the default runner image, default generator>
23
-
#
24
-
# To add more build types (Release, Debug, RelWithDebInfo, etc.) customize the build_type list.
# Build your program with the given configuration. Note that --config is needed because the default Windows generator is a multi-config generator (Visual Studio generator).
# Execute tests defined by the CMake configuration. Note that --build-config is needed because the default Windows generator is a multi-config generator (Visual Studio generator).
82
-
# See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
Copy file name to clipboardExpand all lines: CHANGELOG.md
+28Lines changed: 28 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,33 @@
1
1
# Changelog
2
2
3
+
## [0.3.0] - 2026-03-18
4
+
5
+
### Added
6
+
-**`Vector4d` SIMD arithmetic** — all operators now have AVX (x86) and NEON (AArch64) paths instead of falling back to the scalar `Vector4<double>` base class:
-**FMA3 in `Matrix4d` mat×mat and mat×vec** (x86) — `mul + add` pairs replaced with `_mm256_fmadd_pd` when compiled with `-mfma` (`__FMA__` defined); reduces 7 instructions to 4 per accumulation step
16
+
-**`Matrix4f * Vector4f` ARM NEON** — previously fell back to scalar; now uses two `vpaddq_f32` passes to compute all four dot products simultaneously
17
+
-**`-mfma` compiler flag** added to the x86 CMake path (gcc/clang: `-mavx -mfma`; MSVC: `/arch:AVX2`)
18
+
- Benchmarks for all new operations: `BM_Vector4dSIMDAdd`, `BM_Vector4dSIMDScalarMultiply`, `BM_Vector4dSIMDDot`, `BM_Matrix4dSIMDAdd`, `BM_Matrix4dSIMDScalarMultiply`, `BM_Matrix4fByVectorGeneric`, with scalar and GLM baselines
19
+
-**AArch64 NEON** full implementation for `Matrix4d` matrix–matrix and matrix–vector multiply (`float64x2_t`, `vfmaq_f64`, `vpaddq_f64`)
- CMake **install support**: `GNUInstallDirs`, `CMakePackageConfigHelpers`, package config files (`vector_mathConfig.cmake`, `vector_mathConfigVersion.cmake`), and `INSTALL_INTERFACE` include paths
22
+
- Benchmark suite expanded: `Matrix4f`, `Matrix4d`, `Quaternion`, and GLM comparison benchmarks; removed dummy `BM_StringCreation`
23
+
24
+
### Changed
25
+
-`Matrix4d` and `Matrix4f` implementations moved from `.cpp` translation units to **header-only inline** methods — `src/matrix4d.cpp` and `src/matrix4f.cpp` removed
26
+
-`Matrix4::identity()` now returns a cached `static const` instance (computed once via IIFE) instead of allocating a local array on every call
27
+
- CI: added **`ubuntu-24.04-arm`** runner (AArch64 NEON coverage); added **Debug** build type alongside Release; `ctest` now runs with `--output-on-failure`
Copy file name to clipboardExpand all lines: CLAUDE.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,9 +55,10 @@ Architecture is detected at compile time:
55
55
56
56
`Matrix4f` uses SSE 128-bit intrinsics (4×float). `Matrix4d` uses AVX 256-bit intrinsics (4×double). ARM paths currently fall back to scalar operations.
57
57
58
-
### Known issues
58
+
### Known limitations
59
59
60
-
-`matrix4d` AVX implementation is broken (commit `f7bf612`). The scalar fallback is used on ARM; the AVX path may mix `_mm256_add_ps` (32-bit) with `_mm256_add_pd` (64-bit) incorrectly.
60
+
-`Matrix4d` ARM 32-bit (ARMv7) uses scalar fallback — `float64x2_t` is AArch64-only. AArch64 (Apple Silicon, `ubuntu-24.04-arm`) uses the full NEON implementation.
61
+
-`Matrix4f` vector-multiply `#else` fallback (non-x86, non-ARM) uses a reinterpret cast (`*(Vector4f*)&toReturn`) rather than the copy constructor; technically UB but harmless in practice.
61
62
62
63
### Dependencies (auto-fetched by CMake via FetchContent)
0 commit comments