Skip to content

Commit 614f4e5

Browse files
committed
feat: complete IL kernel migration batch 2 - Dot.NDMD, CumSum axis, Shift, Var/Std SIMD
Major changes: - Dot.NDMD: 15,880 → 419 lines (97% reduction) with SIMD for float/double - CumSum axis: IL kernel with caching, optimized inner contiguous path - LeftShift/RightShift: New ILKernelGenerator.Shift.cs (546 lines) with SIMD - Var/Std axis: SIMD support for int/long/short/byte types New IL infrastructure: - ILKernelGenerator.Shift.cs - Bit shift operations with Vector256 - ILKernelGenerator.Scan.cs - Extended with axis cumsum support - ILKernelGenerator.Reduction.cs - SIMD for integer types in Var/Std Bug fixes: - Single element Var/Std with ddof >= size returns NaN (NumPy parity) - Dot tests Dot3412x5621 and Dot311x511 now pass (removed OpenBugs) Documentation: - CLAUDE.md updated with all migrations - PR #573 comments with progress updates and Definition of Done Test coverage: - All Var/Std/CumSum/Shift/Dot tests passing
1 parent 5f48da5 commit 614f4e5

12 files changed

Lines changed: 1995 additions & 15959 deletions

File tree

.claude/CLAUDE.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,9 @@ Runtime IL generation via `System.Reflection.Emit.DynamicMethod` for high-perfor
9494
| `.MixedType.cs` | Mixed-type binary ops with promotion; owns `ClearAll()` |
9595
| `.Unary.cs` | Math functions (Negate, Abs, Sqrt, Sin, Cos, Exp, Log, Sign, etc.) |
9696
| `.Comparison.cs` | Comparisons (==, !=, <, >, <=, >=) returning bool arrays |
97-
| `.Reduction.cs` | Reductions (Sum, Prod, Min, Max, ArgMax, ArgMin, All, Any) |
97+
| `.Reduction.cs` | Reductions (Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any) |
98+
| `.Scan.cs` | Cumulative ops (CumSum) with element-wise SIMD + axis support |
99+
| `.Shift.cs` | Bit shift ops (LeftShift, RightShift) with SIMD for scalar shifts |
98100

99101
**Execution Paths:**
100102
1. **SimdFull** - Both operands contiguous, SIMD-capable dtype → Vector loop + scalar tail
@@ -114,16 +116,21 @@ Runtime IL generation via `System.Reflection.Emit.DynamicMethod` for high-perfor
114116
**ILKernel Status (0.41.x):**
115117
| Category | Implemented | Pending |
116118
|----------|-------------|---------|
117-
| Binary | Add, Sub, Mul, Div, Power, FloorDivide, BitwiseAnd/Or/Xor | LeftShift, RightShift (use Default.Shift.cs) |
119+
| Binary | Add, Sub, Mul, Div, Power, FloorDivide, BitwiseAnd/Or/Xor ||
120+
| Shift | LeftShift, RightShift (SIMD for scalar, scalar loop for array) ||
118121
| Unary | Negate, Abs, Sign, Sqrt, Cbrt, Square, Reciprocal, Floor, Ceil, Truncate, Trig, Exp, Log, BitwiseNot | Deg2Rad, Rad2Deg (use DefaultEngine) |
119-
| Reduction | Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any, CumSum | Std, Var (use Regen templates) |
122+
| Reduction | Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any, Std, Var ||
123+
| Scan | CumSum (element-wise SIMD + axis support) | CumProd |
120124
| NaN Reduction || NanSum, NanProd, NanMin, NanMax (Task #88) |
121125
| Comparison | Equal, NotEqual, Less, Greater, LessEqual, GreaterEqual ||
122126
| Clip/Modf | Clip, Modf (SIMD helpers) ||
123-
| Axis reductions | Uses iterator path (no SIMD) | SIMD axis kernels (Task #89) |
127+
| Axis reductions | Sum, Prod, Min, Max, Mean, Std, Var (iterator path) | SIMD axis kernels (Task #89) |
124128

125129
**DefaultEngine ops needing IL migration:**
126-
- High impact: `MatMul`, `Dot` (complex - consider BLAS integration)
130+
- High impact: `MatMul` (complex - consider BLAS integration)
131+
132+
**Recently migrated:**
133+
- `Dot.NDMD` — Migrated to cache-blocked SIMD (15,880 lines Regen → 419 lines IL)
127134

128135
## Shape Architecture (NumPy-Aligned)
129136

@@ -211,6 +218,7 @@ These bugs were fixed in recent commits:
211218
| BUG-18 | `np.convolve` | `0857d109` — NullReferenceException fixed |
212219
| BUG-15 | `np.abs` | `0857d109` — int dtype preserved (no longer converts to Double) |
213220
| BUG-13 | `np.linspace` | `0857d109` — returns float64 (was float32) |
221+
| BUG-22 | `np.var`/`np.std` | IL migration — single element with ddof returns NaN (NumPy-aligned) |
214222

215223
### Dead Code (Returns null/default)
216224

src/NumSharp.Core/Backends/Default/Math/BLAS/Default.Dot.NDMD.cs

Lines changed: 343 additions & 15804 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)