Skip to content

Commit 19dcaa1

Browse files
committed
docs: fix broken documentation URLs (scipy → numpy.org)
- Replace all docs.scipy.org/doc/numpy/ URLs with numpy.org/doc/stable/ - Fix numpy.random.* URLs: reference/generated/ → reference/random/generated/ - Fix numpy.bitwise_not.html → numpy.invert.html (function renamed) - Fix NEP 41 URL: nep-0041-improved-dtype.html → nep-0041-improved-dtype-support.html - Fix arrays.strings.html → routines.strings.html The scipy documentation URLs have been deprecated and now redirect to numpy.org. Some URLs were returning 404 because the paths changed in the new location. Files updated: - README.md - 16 docs/issues/*.md files - 2 docs/neps/*.md files - 1 docs/plans/*.md file - 4 src/NumSharp.Core/*.cs files
1 parent 22b2971 commit 19dcaa1

26 files changed

Lines changed: 1092 additions & 31 deletions

CHANGES.md

Lines changed: 680 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ Here is a comparison code between NumSharp and NumPy (left is python, right is C
1717

1818
### Bold Features
1919
* Use of Unmanaged Memory and fast unsafe algorithms.
20-
* [Broadcasting](https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html) n-d shapes against each other. ([intro](https://machinelearningmastery.com/broadcasting-with-numpy-arrays/))
21-
* [NDArray Slicing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) and nested/recusive slicing (`nd["-1, ::2"]["1::3, :, 0"]`)
20+
* [Broadcasting](https://numpy.org/doc/stable-1.15.0/user/basics.broadcasting.html) n-d shapes against each other. ([intro](https://machinelearningmastery.com/broadcasting-with-numpy-arrays/))
21+
* [NDArray Slicing](https://numpy.org/doc/stable/reference/arrays.indexing.html) and nested/recusive slicing (`nd["-1, ::2"]["1::3, :, 0"]`)
2222
* Axis iteration and support in all of our implemented functions.
2323
* Full and precise (to numpy) automatic type resolving and conversion (upcasting, downcasting and other cases)
2424
* Non-copy - most cases, similarly to numpy, does not perform copying but returns a view instead.

RELEASE_0.41.0-prerelease.md

Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
# NumSharp 0.41.0-prerelease
2+
3+
This prerelease introduces the **IL Kernel Generator** - a complete architectural overhaul that replaces ~600K lines of Regen-generated template code with ~19K lines of runtime IL generation. This delivers massive performance improvements, comprehensive NumPy 2.x alignment, and significantly cleaner maintainable code.
4+
5+
---
6+
7+
## TL;DR
8+
9+
Backend rewrite via dynamic IL emission, 25 new `np.*` functions, boolean indexing rewrite, broadcast slicing fix, Regen static generation deprecated, 52 bug fixes, MatMul 35-100x faster, -532K lines net.
10+
11+
```
12+
+ 25 new/fixed functions (nansum, isnan, isfinite, isinf, isclose, cumprod, etc.)
13+
+ 52 bug fixes for NumPy 2.x alignment
14+
+ MatMul 35-100x faster (SIMD cache-blocked, 20+ GFLOPS)
15+
+ 97% code reduction (-532K lines)
16+
+ Runtime IL generation replaces static templates
17+
+ Vector128/256/512 SIMD with runtime detection
18+
+ Boolean indexing rewrite with SIMD fast path
19+
+ All comparison/bitwise operators now work (were returning null)
20+
+ No breaking changes - drop-in replacement
21+
```
22+
23+
**Install**: `dotnet add package NumSharp --version 0.41.0-prerelease`
24+
25+
---
26+
27+
## Contents
28+
29+
| Section | Highlights |
30+
|---------|------------|
31+
| [Summary](#summary) | 80 commits, -532K lines, 3,868 tests |
32+
| [IL Kernel Generator](#il-kernel-generator) | 27 files, SIMD V128/256/512 |
33+
| [New NumPy Functions (25)](#new-numpy-functions-25) | nansum, isnan, cumprod, etc. |
34+
| [Critical Bug Fixes](#critical-bug-fixes) | negative, unique, dot, linspace |
35+
| [Operator Rewrites](#operator-rewrites) | ==, !=, <, >, &, \| now work |
36+
| [Boolean Indexing Rewrite](#boolean-indexing-rewrite) | SIMD fast path |
37+
| [Slicing Improvements](#slicing-improvements) | Broadcast stride=0 preserved |
38+
| [Performance Improvements](#performance-improvements) | MatMul 35-100x, 20+ GFLOPS |
39+
| [Code Reduction](#code-reduction) | 99% binary, 98% MatMul, 97% Dot |
40+
| [Infrastructure Changes](#infrastructure-changes) | NativeMemory, KernelProvider |
41+
| [API Fixes](#api-fixes) | random(), standard_normal, dtype |
42+
| [New Test Files (64)](#new-test-files-64) | 34 kernel, 8 NumPy, 3 linalg |
43+
| [Breaking Changes](#breaking-changes) | None |
44+
| [Known Issues](#known-issues-openbugs) | 52 OpenBugs excluded |
45+
| [Installation](#installation) | `dotnet add package NumSharp` |
46+
47+
---
48+
49+
## Summary
50+
51+
| Metric | Value |
52+
|--------|-------|
53+
| Commits | 80 |
54+
| Files Changed | 623 |
55+
| Lines Added | +71,355 |
56+
| Lines Deleted | -603,345 |
57+
| **Net Change** | **-532K lines** |
58+
| Test Results | 3,868 passed, 52 OpenBugs, 11 skipped |
59+
60+
---
61+
62+
## IL Kernel Generator
63+
64+
Runtime IL generation via `System.Reflection.Emit.DynamicMethod` replaces static Regen templates.
65+
66+
### Kernel Files (27 new files)
67+
- `ILKernelGenerator.cs` - Core infrastructure, SIMD detection (Vector128/256/512)
68+
- `ILKernelGenerator.Binary.cs` - Add, Sub, Mul, Div, BitwiseAnd/Or/Xor
69+
- `ILKernelGenerator.MixedType.cs` - Mixed-type ops with type promotion
70+
- `ILKernelGenerator.Unary.cs` - Negate, Abs, Sqrt, Sin, Cos, Exp, Log, Sign
71+
- `ILKernelGenerator.Comparison.cs` - ==, !=, <, >, <=, >= returning bool arrays
72+
- `ILKernelGenerator.Reduction.cs` - Sum, Prod, Min, Max, Mean, ArgMax, ArgMin, All, Any
73+
- `ILKernelGenerator.Reduction.Axis.Simd.cs` - AVX2 gather for axis reductions
74+
- `ILKernelGenerator.Scan.cs` - CumSum, CumProd with SIMD
75+
- `ILKernelGenerator.Shift.cs` - LeftShift, RightShift
76+
- `ILKernelGenerator.MatMul.cs` - Cache-blocked SIMD matrix multiply
77+
- `ILKernelGenerator.Clip.cs`, `.Modf.cs`, `.Masking.cs` - Specialized ops
78+
79+
### Execution Paths
80+
1. **SimdFull** - Contiguous + SIMD-capable dtype → Vector loop + scalar tail
81+
2. **ScalarFull** - Contiguous + non-SIMD dtype (Decimal) → Scalar loop
82+
3. **General** - Strided/broadcast → Coordinate-based iteration
83+
84+
### Infrastructure
85+
- `IKernelProvider.cs` - Abstraction for future backends (CUDA, Vulkan)
86+
- `KernelKey.cs`, `KernelOp.cs`, `KernelSignatures.cs` - Kernel dispatch
87+
- `SimdMatMul.cs`, `SimdReductionOptimized.cs` - SIMD helpers
88+
- `TypeRules.cs` - NEP50 type promotion rules
89+
90+
---
91+
92+
## New NumPy Functions (25)
93+
94+
### NaN-Aware Reductions (7)
95+
| Function | Description |
96+
|----------|-------------|
97+
| `np.nansum` | Sum ignoring NaN |
98+
| `np.nanprod` | Product ignoring NaN |
99+
| `np.nanmin` | Minimum ignoring NaN |
100+
| `np.nanmax` | Maximum ignoring NaN |
101+
| `np.nanmean` | Mean ignoring NaN |
102+
| `np.nanvar` | Variance ignoring NaN |
103+
| `np.nanstd` | Standard deviation ignoring NaN |
104+
105+
### Math Operations (8)
106+
| Function | Description |
107+
|----------|-------------|
108+
| `np.cbrt` | Cube root |
109+
| `np.floor_divide` | Integer division |
110+
| `np.reciprocal` | Element-wise 1/x |
111+
| `np.trunc` | Truncate to integer |
112+
| `np.invert` | Bitwise NOT |
113+
| `np.square` | Element-wise square |
114+
| `np.cumprod` | Cumulative product |
115+
| `np.count_nonzero` | Count non-zero elements |
116+
117+
### Bitwise & Trigonometric (4)
118+
| Function | Description |
119+
|----------|-------------|
120+
| `np.left_shift` | Bitwise left shift |
121+
| `np.right_shift` | Bitwise right shift |
122+
| `np.deg2rad` | Degrees to radians |
123+
| `np.rad2deg` | Radians to degrees |
124+
125+
### Logic & Validation (4) - Previously returned `null`
126+
| Function | Description |
127+
|----------|-------------|
128+
| `np.isnan` | Test element-wise for NaN |
129+
| `np.isfinite` | Test element-wise for finiteness |
130+
| `np.isinf` | Test element-wise for infinity |
131+
| `np.isclose` | Element-wise comparison within tolerance |
132+
133+
### Operators (2) - Previously returned `null`
134+
| Operator | Description |
135+
|----------|-------------|
136+
| `operator &` | Bitwise/logical AND with broadcasting |
137+
| `operator \|` | Bitwise/logical OR with broadcasting |
138+
139+
### New Overloads
140+
| Function | New Capability |
141+
|----------|----------------|
142+
| `np.power(array, array)` | Array exponents (was scalar only) |
143+
| `np.repeat(array, NDArray)` | Per-element repeat counts |
144+
| `np.argmax/argmin(axis, keepdims)` | keepdims parameter |
145+
| `np.convolve` | Complete rewrite (was throwing NRE) |
146+
147+
---
148+
149+
## Critical Bug Fixes
150+
151+
### Behavioral Fixes
152+
| Bug | Before | After |
153+
|-----|--------|-------|
154+
| `np.negative()` | Only negated positive values (`if val > 0`) | Negates ALL values (`val = -val`) |
155+
| `np.unique()` | Returned unsorted | Sorts output, NaN at end |
156+
| `np.dot(1D, 2D)` | Threw `NotSupportedException` | Treats 1D as row vector |
157+
| `np.linspace()` | Returned `float32` for float inputs | Always `float64` default |
158+
| `np.arange()` | Threw on `start >= stop` | Returns empty array |
159+
| `np.searchsorted()` | No scalar support | Added scalar overloads returning `int` |
160+
| `np.shuffle()` | Non-standard `passes` parameter | NumPy legacy API (axis-0 only) |
161+
| Float-to-int conversion | Used rounding | Uses truncation toward zero |
162+
163+
### Return Type Fixes
164+
| Function | Before | After |
165+
|----------|--------|-------|
166+
| `np.argmax()` / `np.argmin()` | Returned `int` | Returns `long` (large array support) |
167+
| `np.abs()` | Converted to Double | Preserves input dtype |
168+
169+
### Empty Array Handling
170+
| Function | Before | After |
171+
|----------|--------|-------|
172+
| `np.mean([])` | Threw or returned 0 | Returns `NaN` |
173+
| `np.mean(zeros((0,3)), axis=0)` | Incorrect | `[NaN, NaN, NaN]` |
174+
| `np.mean(zeros((0,3)), axis=1)` | Incorrect | Empty array `[]` |
175+
| `np.std/var` single element | Returned 0 | Returns `NaN` with `ddof >= size` |
176+
177+
### keepdims Fixes
178+
All reduction functions now properly preserve dimensions when `keepdims=True`:
179+
- `np.sum`, `np.prod`, `np.mean`, `np.std`, `np.var`
180+
- `np.min`, `np.max`, `np.argmin`, `np.argmax`
181+
182+
---
183+
184+
## Operator Rewrites
185+
186+
### Comparison Operators (==, !=, <, >, <=, >=)
187+
- **Before**: Manual type switch per dtype
188+
- **After**: Uses `TensorEngine` with IL kernels
189+
- Proper null handling (returns `false` scalar)
190+
- Empty array handling (returns empty bool array)
191+
- Added reverse operators (`object op NDArray`)
192+
- Full broadcasting support
193+
194+
### Bitwise Operators (&, |, ^)
195+
- **Before**: Returned `null`
196+
- **After**: Full implementation via IL kernels
197+
- Added `NDArray<T>` typed operators
198+
- Scalar overloads for all integer types
199+
200+
### Implicit Scalar Conversion
201+
- **Before**: `(int)ndarray_float64` would fail
202+
- **After**: Uses `Converts.ChangeType` for cross-dtype conversion
203+
204+
---
205+
206+
## Boolean Indexing Rewrite
207+
208+
Complete rewrite with NumPy-aligned behavior:
209+
210+
### Two Cases Supported
211+
1. `arr[mask]` where `mask.shape == arr.shape` → element-wise selection
212+
2. `arr[mask]` where `mask` is 1D and `mask.shape[0] == arr.shape[0]` → axis-0 selection
213+
214+
### SIMD Fast Path
215+
- New `BooleanMaskFastPath` for contiguous arrays
216+
- `CountTrue(bool*, int)` - SIMD count of true values
217+
- `CopyMasked<T>(src, mask, dest, size)` - SIMD masked copy
218+
219+
---
220+
221+
## Slicing Improvements
222+
223+
### Broadcast Array Handling
224+
- **Before**: Slicing broadcast arrays would materialize data (losing stride=0)
225+
- **After**: Preserves stride=0 information (NumPy behavior)
226+
- Critical for `cumsum` and axis reductions on broadcast arrays
227+
228+
### Empty Slice Handling
229+
- `a[100:200]` on 10-element array now returns proper empty array
230+
231+
### Contiguous Optimization
232+
- Contiguous slices get fresh shape with `offset=0`
233+
- `IsSliced=false` for contiguous slices
234+
235+
---
236+
237+
## Performance Improvements
238+
239+
| Operation | Improvement | Details |
240+
|-----------|-------------|---------|
241+
| MatMul (2D) | 35-100x | Cache-blocked SIMD, 20+ GFLOPS |
242+
| Axis Reductions | Major | AVX2 gather + parallel outer loop |
243+
| All/Any | Major | SIMD with early-exit |
244+
| CumSum/CumProd | Major | Element-wise SIMD |
245+
| Boolean Masking | Major | SIMD CountTrue + CopyMasked |
246+
| Integer Abs/Sign | Minor | Bitwise (branchless) |
247+
| Vector512 | New | Runtime detection and utilization |
248+
| Loop Unrolling | 4x | All SIMD kernels |
249+
250+
---
251+
252+
## Code Reduction
253+
254+
### Massive File Deletions
255+
| Component | Before | After | Reduction |
256+
|-----------|--------|-------|-----------|
257+
| Binary ops (Add/Sub/Mul/Div/Mod) | 60 files, ~500K lines | 2 IL files | **99%** |
258+
| `Default.MatMul.2D2D.cs` | ~20K lines | 325 lines | **98.4%** |
259+
| `Default.Dot.NDMD.cs` | ~16K lines | 422 lines | **97.4%** |
260+
| Comparison ops (Equals) | 13 files | 1 IL file | **92%** |
261+
| Std/Var reductions | ~20K lines | ~500 lines | **97%** |
262+
263+
### Deleted Files (76)
264+
- 60 binary op files (`Default.Add.{Type}.cs`, etc.)
265+
- 13 comparison files (`Default.Equals.{Type}.cs`, etc.)
266+
- 3 template files
267+
268+
---
269+
270+
## Infrastructure Changes
271+
272+
### Memory Allocation
273+
- `Marshal.AllocHGlobal``NativeMemory.Alloc`
274+
- `Marshal.FreeHGlobal``NativeMemory.Free`
275+
- `AllocationType.AllocHGlobal``AllocationType.Native`
276+
- `StackedMemoryPool` migrated to NativeMemory
277+
278+
### DefaultEngine
279+
- Removed `ParallelAbove = 84999` constant
280+
- Added `KernelProvider` instance field
281+
- Added static `DefaultKernelProvider` for code without engine access
282+
- Removed all `Parallel.For` usage (single-threaded for determinism)
283+
284+
### Math Functions
285+
All migrated from Regen templates to `ExecuteUnaryOp`:
286+
- Sin, Cos, Tan, ASin, ACos, ATan, ATan2
287+
- Exp, Exp2, Expm1, Log, Log2, Log10, Log1p
288+
- Sqrt, Cbrt, Abs, Sign, Floor, Ceil, Truncate
289+
- Removed `DecimalMath` dependency for most operations
290+
291+
### TensorEngine Extensions
292+
New abstract methods:
293+
- `NotEqual`, `Less`, `LessEqual`, `Greater`, `GreaterEqual`
294+
- `BitwiseAnd`, `BitwiseOr`, `BitwiseXor`
295+
- `LeftShift`, `RightShift`
296+
- `Power(NDArray, NDArray)`, `FloorDivide`
297+
- `Truncate`, `Reciprocal`, `Square`, `Cbrt`, `Invert`
298+
- `Deg2Rad`, `Rad2Deg`, `IsInf`
299+
- `ReduceCumMul`
300+
301+
### IKernelProvider Methods
302+
- `CountTrue(bool*, int)` - SIMD true count
303+
- `CopyMasked<T>` - SIMD masked copy
304+
- `Variance<T>`, `StandardDeviation<T>` - SIMD two-pass
305+
- `NanSum/Prod/Min/Max` for float/double
306+
- `FindNonZeroStrided<T>` - Strided nonzero detection
307+
308+
---
309+
310+
## API Fixes
311+
312+
| Change | Details |
313+
|--------|---------|
314+
| `np.random.random()` | New alias for `random_sample()` |
315+
| `stardard_normal` | Fixed typo → `standard_normal` (old deprecated) |
316+
| `outType``dtype` | Parameter rename in `minimum/maximum/fmin/fmax` |
317+
| `np.modf()` | Now validates floating-point input types |
318+
319+
---
320+
321+
## New Test Files (64)
322+
323+
### Kernel Tests (34)
324+
`BinaryOpTests`, `UnaryOpTests`, `ComparisonOpTests`, `ReductionOpTests`, `AxisReductionSimdTests`, `NonContiguousTests`, `SlicedArrayOpTests`, `NanReductionTests`, `VarStdComprehensiveTests`, `ArgMaxArgMinComprehensiveTests`, `CumSumComprehensiveTests`, `BitwiseOpTests`, `ShiftOpTests`, `DtypeCoverageTests`, `DtypePromotionTests`, `EdgeCaseTests`, `BattleProofTests`, `SimdOptimizationTests`, and more.
325+
326+
### NumPy Ported Tests (8)
327+
`ArgMaxArgMinEdgeCaseTests`, `ClipEdgeCaseTests`, `ClipNDArrayTests`, `CumSumEdgeCaseTests`, `ModfEdgeCaseTests`, `NonzeroEdgeCaseTests`, `PowerEdgeCaseTests`, `VarStdEdgeCaseTests`
328+
329+
### Linear Algebra Battle Tests (3)
330+
`np.dot.BattleTest`, `np.matmul.BattleTest`, `np.outer.BattleTest`
331+
332+
---
333+
334+
## Breaking Changes
335+
336+
**None.** This is a drop-in replacement with improved performance and NumPy compatibility.
337+
338+
---
339+
340+
## Known Issues (OpenBugs)
341+
342+
52 tests marked as `[OpenBugs]` are excluded from CI:
343+
- sbyte (int8) type not supported
344+
- Some bitmap operations require GDI+ (Windows only)
345+
- Various edge cases documented in test files
346+
347+
---
348+
349+
## Installation
350+
351+
```bash
352+
dotnet add package NumSharp --version 0.41.0-prerelease
353+
```
354+
355+
Or via Package Manager:
356+
```powershell
357+
Install-Package NumSharp -Version 0.41.0-prerelease
358+
```
359+
360+
## Testing
361+
362+
```bash
363+
cd test/NumSharp.UnitTest
364+
365+
# Run tests excluding known issues
366+
dotnet test -- "--treenode-filter=/*/*/*/*[Category!=OpenBugs]"
367+
368+
# Run all tests
369+
dotnet test
370+
```
371+
372+
---
373+
374+
## Feedback
375+
376+
This is a prerelease. Please report any issues at:
377+
https://github.com/SciSharp/NumSharp/issues
378+
379+
---
380+
381+
**Full Changelog**: See [CHANGES.md](./CHANGES.md) for complete documentation of all 80 commits.

0 commit comments

Comments
 (0)