diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 2c701a65..10e55ba6 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -1,3 +1,17 @@ +## 2026-06-03 — E-HELIX-NDARRAY-MANDATORY — `helix` ndarray wiring: optional `path` → mandatory `git` (codex P2 + "ndarray is mandatory") — an optional path dep is a clean-checkout trap + +**Status:** FINDING (codex P2 on #460 + user directive, 2026-06-03; fix verified — 63 unit + 6 doctests green with mandatory ndarray, clippy -D warnings + fmt clean; the git source was patched to the local `master` checkout for the in-sandbox build, github fetch deferred to CI). + +**The trap (codex P2):** an *optional* `path` dependency does NOT make the default build self-contained. Cargo reads every dependency manifest (including optional ones) during resolution to build the lockfile, so `ndarray = { path = "../../../ndarray", optional = true }` makes a clean checkout WITHOUT the sibling fail (`failed to read .../ndarray/Cargo.toml`) *before* feature selection. The "default build needs no ndarray checkout" claim was therefore false. + +**The fix (two directives converge):** (1) codex wants the wiring clean; (2) the user — "ndarray is mandatory for lance-graph" (it is "The Foundation"). So helix now takes ndarray as a **mandatory, non-optional git dependency**: `ndarray = { git = "https://github.com/AdaWorldAPI/ndarray.git", branch = "master", default-features = false, features = ["std"] }`. A git source resolves the manifest remotely (no sibling-checkout needed); non-optional drops the `ndarray-hpc` feature entirely; `simd.rs` is now single-impl (always `ndarray::simd`, no scalar fallback — ndarray does its own AVX-512/AVX2/scalar dispatch internally). + +**Why git, not `[patch]`, and no cycle:** helix is standalone (own `[workspace]`, root `exclude`), so it resolves ndarray independently of the lance-graph workspace's path-based ndarray — no source-unification needed (the workspace's `[patch.crates-io] ndarray` is separately known-ineffective: the fork's 0.17.2 can't semver-satisfy the lance-index crates.io 0.16.1; PR_ARC ~line 2081). The fork is self-contained — only internal subcrate path deps (`crates/p64`, `crates/fractal`, `ndarray-rand`, `crates/ndarray-gen`) which travel with the clone — and has **no back-dependency on lance-graph**, so the git dep introduces **no import cycle**. + +Cross-ref: PR_ARC #459 Correction; codex P2 #460; `crates/helix/Cargo.toml`; `jc::weyl` (local-const precedent). + +--- + ## 2026-06-03 — E-HELIX-OVERLAP — the `helix` Place/Residue codec is ~80% re-derivation of existing (in places CERTIFIED) primitives; shipped standalone by user directive, overlap documented not hidden **Status:** FINDING (placement check via `encoding-ecosystem.md` + repo grep, 2026-06-03; crate shipped on claude/gallant-rubin-Y9pQd — 61 unit + 6 doctests green on the default zero-dep build AND under `--features ndarray-hpc`). User directive: "create crate crates/helix … scoped only to crate, self resolving" → after the overlap was surfaced via `AskUserQuestion`, the user ratified **Standalone helix** (vs compose-existing vs rename). diff --git a/.claude/board/LATEST_STATE.md b/.claude/board/LATEST_STATE.md index 9f893d70..bdf390b9 100644 --- a/.claude/board/LATEST_STATE.md +++ b/.claude/board/LATEST_STATE.md @@ -10,6 +10,8 @@ --- +> **2026-06-03 — hardened (follow-up after #460)** (D-HELIX-1 wiring): `crates/helix` now takes **ndarray as a MANDATORY, non-optional git dependency** (`git = AdaWorldAPI/ndarray @ master`), replacing the optional `path` dep + `ndarray-hpc` feature. Why: (1) codex P2 — an optional *path* dep still forces Cargo to read the local sibling manifest at resolution, so a clean checkout failed before feature selection; (2) directive "ndarray is mandatory for lance-graph". `simd.rs` always uses `ndarray::simd` (no scalar fallback); the self-contained fork → no import cycle. 63 unit + 6 doctests green; clippy/fmt clean. See E-HELIX-NDARRAY-MANDATORY. +> > **2026-06-03 — shipped (autoattended)** (D-HELIX-1): new standalone crate `crates/helix` — the golden-spiral **Place/Residue** codec from the user's `KNOWLEDGE.md`. HHTL = deterministic PLACE; helix = orthogonal RESIDUE. Pipeline: equal-area `√u` hemisphere placement (`HemispherePoint`) → stride-4-over-17 `CurveRuler` coupling → Fisher-Z/arctanh `Similarity` alignment → EULER_GAMMA hand-off → 256-palette `RollingFloor` quantise (occupancy-drift + version stamp) → 3-byte `ResidueEdge` endpoint pair; metric-safe L1 via 256×256 `DistanceLut` (`distance_adaptive`) + non-metric byte-Hamming `distance_heuristic`. `prove()` closes the 2-D discrepancy Open Item (companion to `jc::weyl`). Zero-dep default (`edition 2021`, empty `[workspace]`, root `exclude`); optional `ndarray-hpc` feature routes batch Fisher-Z through `ndarray::simd::simd_ln_f32`. **61 unit + 6 doctests green** on BOTH feature configs; clippy -D warnings + fmt clean. ~80% overlaps existing CERTIFIED primitives by design (clean-room, user-directed) — see `crates/helix/KNOWLEDGE.md` § Overlap & Consolidation + E-HELIX-OVERLAP + TD-HELIX-OVERLAP-1. Branch claude/gallant-rubin-Y9pQd. > > **2026-06-01 — shipped (autoattended)** (D-A3): `lance_graph_contract::atoms` — `I4x32::pack`/`unpack` implemented (the 2 `todo!()`s gone) + new `I4x64` (256-bit / 64 signed-i4 dims, `repr(C, align(16))`, 32 B) + private `sext4`. Two's-complement signed-i4 nibble codec (byte-compatible with `QualiaI4_16D` + the `CausalEdge64` mantissa), sign-agnostic (caller pre-scales). The carrier is a deterministic **CAM address** + sparse-intensity "smell" — NO vector search, no float; the `{instance,reference}` dual is rejected ("64" = 64 poles). Contract lib **562 green** (+9), offline, zero new deps. The bipolar `−introspection..+exploration` pole semantics + asymmetric scaling ride the caller's pre-scale (A4). Plan `.claude/plans/a3-carrier-v1.md`; doctrine `.claude/knowledge/ephemeral-warm-cold-lifecycle.md`. diff --git a/.claude/board/PR_ARC_INVENTORY.md b/.claude/board/PR_ARC_INVENTORY.md index e99cb696..bccade70 100644 --- a/.claude/board/PR_ARC_INVENTORY.md +++ b/.claude/board/PR_ARC_INVENTORY.md @@ -58,6 +58,8 @@ **Confidence (2026-06-03):** working — both feature configs green, clippy/fmt clean; 2 CodeRabbit findings (public-API NaN guard in `lift`, f32 clamp-epsilon no-op in `batch_fisher_z`) fixed pre-merge with boundary tests. +**Correction (2026-06-03, follow-up PR after #460):** the `../../../ndarray` **path** dep + `ndarray-hpc` feature in the Added block above were wrong twice — (1) codex P2: an optional *path* dep still forces Cargo to read the local sibling manifest at resolution, so the "default build needs none of it" claim was false (a clean checkout failed before feature selection); (2) per the directive **"ndarray is mandatory for lance-graph,"** ndarray is not optional. Both fixed: ndarray is now a **mandatory, non-optional git dependency** (`git = AdaWorldAPI/ndarray @ master`, `ndarray-hpc` feature removed). `simd.rs` always uses `ndarray::simd` (no scalar-fallback variant). The fork is self-contained (internal subcrates only, no lance-graph back-dep) → no import cycle. See E-HELIX-NDARRAY-MANDATORY. + --- ## #441 odoo-classes-bitmask-render (D-CLS arc) — classes as a SoA-view with presence bitmask diff --git a/.claude/board/STATUS_BOARD.md b/.claude/board/STATUS_BOARD.md index d8f26eed..cd662678 100644 --- a/.claude/board/STATUS_BOARD.md +++ b/.claude/board/STATUS_BOARD.md @@ -12,7 +12,7 @@ ## D-HELIX-1 — `crates/helix` golden-spiral Place/Residue codec (zero-dep + optional ndarray-hpc) -**Status:** Shipped (branch `claude/gallant-rubin-Y9pQd`; **61 unit + 6 doctests green** on the default zero-dep build AND under `--features ndarray-hpc`; clippy -D warnings + fmt clean). New standalone crate (empty `[workspace]`, root `exclude`) realising the user's `KNOWLEDGE.md`: `HemispherePoint` (√u equal-area placement) → `CurveRuler` (stride-4-over-17) → `Similarity` (Fisher-Z/arctanh) → `RollingFloor` (256-palette; occupancy-drift + version stamp) → `ResidueEdge` (3-byte endpoint pair) + `DistanceLut` (metric-safe 256×256 L1; `distance_adaptive` vs non-metric `distance_heuristic`) + `prove()` (2-D discrepancy companion to `jc::weyl`). Optional `ndarray-hpc` = batch Fisher-Z via `simd_ln_f32`. ~80% clean-room overlap with CERTIFIED primitives (E-HELIX-OVERLAP / TD-HELIX-OVERLAP-1); consolidation path in `KNOWLEDGE.md`. Process: autoattended — 5 research agents + 4 parallel Sonnet leaf workers + central consolidation. Next (owed): fidelity-vs-ground-truth probe (naive-u8 floor gate ≥0.9980 Pearson, CONJECTURE). +**Status:** Shipped (branch `claude/gallant-rubin-Y9pQd`; **61 unit + 6 doctests green** on the default zero-dep build AND under `--features ndarray-hpc`; clippy -D warnings + fmt clean). New standalone crate (empty `[workspace]`, root `exclude`) realising the user's `KNOWLEDGE.md`: `HemispherePoint` (√u equal-area placement) → `CurveRuler` (stride-4-over-17) → `Similarity` (Fisher-Z/arctanh) → `RollingFloor` (256-palette; occupancy-drift + version stamp) → `ResidueEdge` (3-byte endpoint pair) + `DistanceLut` (metric-safe 256×256 L1; `distance_adaptive` vs non-metric `distance_heuristic`) + `prove()` (2-D discrepancy companion to `jc::weyl`). Optional `ndarray-hpc` = batch Fisher-Z via `simd_ln_f32`. ~80% clean-room overlap with CERTIFIED primitives (E-HELIX-OVERLAP / TD-HELIX-OVERLAP-1); consolidation path in `KNOWLEDGE.md`. Process: autoattended — 5 research agents + 4 parallel Sonnet leaf workers + central consolidation. Next (owed): fidelity-vs-ground-truth probe (naive-u8 floor gate ≥0.9980 Pearson, CONJECTURE). **Update (post-#460):** ndarray is now a MANDATORY non-optional **git** dep (codex P2 + directive "ndarray is mandatory for lance-graph"); `simd.rs` always uses `ndarray::simd`; `ndarray-hpc` feature removed. 63 unit + 6 doctests green; clippy/fmt clean. See E-HELIX-NDARRAY-MANDATORY. ## D-A3 — I4x32/I4x64 signed-i4 CAM codec (carrier `pack`/`unpack` + the 256-bit wide carrier) diff --git a/Cargo.toml b/Cargo.toml index 9e5d6082..b63ab4b5 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -39,8 +39,9 @@ exclude = [ "crates/learning", "crates/jc", "crates/sigker", - # Place/Residue golden-spiral codec — standalone zero-dep (default), optional - # ndarray-hpc accelerator. Verified via `cargo test --manifest-path crates/helix/Cargo.toml`. + # Place/Residue golden-spiral codec — standalone, with a MANDATORY git dep on + # the AdaWorldAPI ndarray fork (the SIMD foundation). Verified via + # `cargo test --manifest-path crates/helix/Cargo.toml`. "crates/helix", # Aerial+ ARM-discovery transcode — standalone zero-dep proposer crate, # verified via `cargo test --manifest-path crates/lance-graph-arm-discovery/Cargo.toml`. diff --git a/crates/helix/Cargo.toml b/crates/helix/Cargo.toml index 7d526859..72146f61 100644 --- a/crates/helix/Cargo.toml +++ b/crates/helix/Cargo.toml @@ -6,28 +6,19 @@ license = "Apache-2.0" publish = false description = "Place/Residue encoding: golden-spiral hemisphere placement, Fisher-Z aligned, into L1-metric-safe 256-palette endpoint pairs — the orthogonal RESIDUE companion to the HHTL PLACE" -# Standalone codec constitution (matches bgz17 / jc / deepnsm): the default build -# is ZERO production dependencies. The φ-spiral residue is the residue regardless -# of SIMD path — the math is identical with or without hardware acceleration, so -# ndarray is an OPTIONAL accelerator, never a correctness dependency. +# Standalone crate (own [workspace] + root `exclude`) but NOT dependency-free: +# ndarray (the AdaWorldAPI fork = "The Foundation") is a MANDATORY dependency. +# helix's batch hot path (src/simd.rs) runs on `ndarray::simd`, and per the stack +# rule "ndarray is mandatory for lance-graph" it is a hard dep, not an opt-in. [dependencies] -# Optional ndarray hardware acceleration (AdaWorldAPI fork): batch Fisher-Z via -# simd_ln_f32 and batch endpoint-L1 via U8x64. Feature-gated so the default build -# stays zero-dep and standalone-verifiable via `cargo test --manifest-path`. -# `std` enables `ndarray::simd` (simd_ln_f32 / F32x16 / U8x64); the module is -# gated behind it. Kept minimal (no extra HPC features) to limit build surface. -ndarray = { path = "../../../ndarray", optional = true, default-features = false, features = ["std"] } - -[features] -default = [] -# Activate the ndarray-accelerated hot path (simd_ln_f32 / U8x64 batch kernels). -# The always-present scalar path is used when this is disabled. -# -# NOTE: this feature resolves `ndarray` to the AdaWorldAPI fork at the sibling -# path `../../../ndarray` (NOT crates.io `ndarray`, which has no `::simd` module). -# It therefore requires that sibling repo to be checked out — the same convention -# `lance-graph` core uses. The default build is zero-dep and needs none of this. -ndarray-hpc = ["dep:ndarray"] +# MANDATORY ndarray (the AdaWorldAPI fork). Sourced by GIT, not a local path: a +# local *path* dep forces Cargo to read the sibling manifest during resolution, +# so a clean checkout without the sibling fails (codex P2 on #460); a git source +# resolves the manifest remotely. `master` is the commit helix was tested against. +# The fork is self-contained (internal subcrates only — no external path dep, no +# back-dependency on lance-graph → no import cycle). `std` enables `ndarray::simd` +# (simd_ln_f32 / F32x16 / U8x64). +ndarray = { git = "https://github.com/AdaWorldAPI/ndarray.git", branch = "master", default-features = false, features = ["std"] } [dev-dependencies] diff --git a/crates/helix/KNOWLEDGE.md b/crates/helix/KNOWLEDGE.md index 22cf5930..74507847 100644 --- a/crates/helix/KNOWLEDGE.md +++ b/crates/helix/KNOWLEDGE.md @@ -266,7 +266,7 @@ How the code realises the spec above. Modules → pipeline stages: | out | `residue.rs` | `ResidueEncoder::encode → ResidueEdge` | 3-byte endpoint pair `(start_idx, end_idx, floor_version)` | | distance | `distance.rs` | `DistanceLut::{linear, from_floor}` | 256×256 L1, metric-safe | | proof | `prove.rs` | `prove() → ProofResult` | the 2-D discrepancy companion (Open Item #1) | -| accel | `simd.rs` | `batch_fisher_z`, `batch_l1_u8` | scalar always; ndarray `simd_ln_f32`/`U8x64` under `--features ndarray-hpc` | +| accel | `simd.rs` | `batch_fisher_z`, `batch_l1_u8` | always ndarray `simd_ln_f32` / `U8x64` (mandatory dep; ndarray does its own AVX-512/AVX2/scalar dispatch) | **Pipeline decisions (the latitude the spec grants — "code is downstream"):** @@ -289,7 +289,14 @@ How the code realises the spec above. Modules → pipeline stages: `const::simd::{GOLDEN_RATIO, EULER_GAMMA, E}`; that path does not exist. The canonical source is `std::f64::consts` (Rust ≥1.94); ndarray does not wrap them. `constants.rs` defines local consts (mirroring `std`, like `jc::weyl`'s - `PHI_INV`) to stay zero-dep and toolchain-robust. + `PHI_INV`) to stay toolchain-robust (independent of the mandatory `ndarray` link). +- **ndarray is a MANDATORY git dependency (FINDING — codex P2 #460 + directive + "ndarray is mandatory for lance-graph"):** the `simd.rs` batch path runs on + `ndarray::simd`. ndarray is sourced by `git` (`AdaWorldAPI/ndarray @ master`), + NOT a local path — an optional/local *path* dep forces Cargo to read the sibling + manifest at resolution, failing a clean checkout; a non-optional git dep resolves + remotely and is a hard dep (no feature gate). The fork is self-contained + (internal subcrates only, no lance-graph back-dependency) → no import cycle. **Metric-safety (enforced):** `ResidueEdge::distance_adaptive` = L1 over the 256×256 LUT — a metric, safe for CAKES/CLAM bounds (regression-tested: @@ -300,8 +307,9 @@ How the code realises the spec above. Modules → pipeline stages: # Overlap & Consolidation (placement check, 2026-06-03) **FINDING (placement check).** ~80% of this pipeline already exists in the -workspace; some of it is certified. helix is a deliberate **zero-dep clean-room -re-derivation** (per the directive "scoped only to crate, self-resolving" and +workspace; some of it is certified. helix is a deliberate **clean-room +re-derivation** — it re-derives the math rather than reusing those primitives (per +the directive "scoped only to crate, self-resolving" and the curve-ruler "regenerable from template" ethos). The genuinely novel pieces are the equal-area `√u` hemisphere placement and the PLACE/RESIDUE doctrine. @@ -315,8 +323,8 @@ are the equal-area `√u` hemisphere placement and the PLACE/RESIDUE doctrine. **Consolidation path (when helix graduates from clean-room to integrated):** 1. Replace `fisher_z.rs` with a thin re-export of `bgz-tensor::fisher_z::FamilyGamma` - (the CERTIFIED i8 table) behind a feature; keep the scalar `Similarity` as the - zero-dep fallback. + (the CERTIFIED i8 table); keep the scalar `Similarity` as the per-element path + (the batch SIMD path already requires the now-mandatory `ndarray` dep). 2. Route the stride coupling through the established `(i·11)%17` golden-step or the stride-4 family-zipper rather than a second implementation. 3. Feed `ResidueEdge` endpoints into the existing HIP/TWIG CAKES path diff --git a/crates/helix/src/constants.rs b/crates/helix/src/constants.rs index efd07f1e..1f7f9e1f 100644 --- a/crates/helix/src/constants.rs +++ b/crates/helix/src/constants.rs @@ -18,8 +18,9 @@ //! `const::simd::GOLDEN_RATIO` etc. That path does **not** exist — the canonical //! source is `std::f64::consts::{GOLDEN_RATIO, EULER_GAMMA, E}` (Rust ≥ 1.94) and //! the `ndarray` fork does not wrap them. helix defines local consts (mirroring -//! `std`, exactly as `jc::weyl` defines its own `PHI_INV`) to stay zero-dep and -//! robust across toolchains that have not yet stabilised those float constants. +//! `std`, exactly as `jc::weyl` defines its own `PHI_INV`) to stay robust across +//! toolchains that have not yet stabilised those float constants — independent of +//! whether `ndarray` is linked. /// φ — the golden ratio `(1 + √5) / 2`. The PLACES constant. pub const GOLDEN_RATIO: f64 = 1.618_033_988_749_895; diff --git a/crates/helix/src/lib.rs b/crates/helix/src/lib.rs index 692a5034..3409af25 100644 --- a/crates/helix/src/lib.rs +++ b/crates/helix/src/lib.rs @@ -47,10 +47,12 @@ //! Per the placement check recorded in `KNOWLEDGE.md` § "Overlap & Consolidation", //! the Fisher-Z/arctanh→int8 quantiser, the golden-spiral azimuth proof, the //! stride-4 coupling, and the EULER_GAMMA hand-off already exist elsewhere in the -//! workspace (in places certified to ρ ≥ 0.999). helix is a deliberate, zero-dep -//! **clean-room re-derivation** that keeps the whole substrate self-contained and -//! regenerable-from-template; the genuinely new pieces are the equal-area `√u` -//! hemisphere placement and the PLACE/RESIDUE doctrine. See `KNOWLEDGE.md` for the +//! workspace (in places certified to ρ ≥ 0.999). helix is a deliberate +//! **clean-room re-derivation** — it re-derives the math rather than reusing those +//! primitives, keeping the cognitive substrate regenerable-from-template; the +//! genuinely new pieces are the equal-area `√u` hemisphere placement and the +//! PLACE/RESIDUE doctrine. Its one dependency is the **mandatory** `ndarray` fork +//! (the SIMD foundation — see `src/simd.rs`). See `KNOWLEDGE.md` for the //! consolidation path back to the certified primitives. #![forbid(unsafe_code)] diff --git a/crates/helix/src/simd.rs b/crates/helix/src/simd.rs index 51a43c5e..8030aef3 100644 --- a/crates/helix/src/simd.rs +++ b/crates/helix/src/simd.rs @@ -1,16 +1,17 @@ -//! Optional ndarray-accelerated hot path. +//! ndarray-accelerated batch hot path. //! //! The transcendental in the residue pipeline is the Fisher-Z `arctanh` (an `ln` -//! pair); that is where SIMD pays. With `--features ndarray-hpc`, -//! [`batch_fisher_z`] runs the `ln` pair through `ndarray::simd::simd_ln_f32` -//! (16-wide `F32x16` lanes); without it, an **identical-result** scalar path is -//! used. Endpoint L1 ([`batch_l1_u8`]) is memory-bound and trivially -//! auto-vectorised, so its math stays scalar (the `U8x64`-aligned 256-stride -//! layout lives in [`crate::DistanceLut`]); under the feature it still issues a -//! 64-wide `U8x64` load to keep the alignment contract visible. +//! pair); that is where SIMD pays. [`batch_fisher_z`] runs the `ln` pair through +//! `ndarray::simd::simd_ln_f32` (16-wide `F32x16` lanes); [`batch_l1_u8`] issues +//! 64-wide `U8x64` loads for endpoint-L1. //! -//! Both functions are correctness-equivalent across the two builds — the feature -//! is a pure accelerator, never a behaviour change. +//! **ndarray is a mandatory dependency of this crate** (the AdaWorldAPI fork = +//! "The Foundation" of the stack). These are therefore the *single* +//! implementations — there is no feature gate and no scalar-fallback variant to +//! keep in sync. `ndarray::simd` performs its own AVX-512 / AVX2 / scalar target +//! dispatch internally, so these functions remain portable across targets. + +use ndarray::simd::{simd_ln_f32, F32x16, U8x64}; // f32 clamp guard. Must exceed the f32 ULP near 1.0 (≈1.19e-7) so that // `1.0 - EPS` is strictly < 1.0 in f32 — otherwise `ln(1 - s) = ln(0) = -inf` @@ -19,10 +20,8 @@ const EPS: f32 = 1e-6; /// Batch Fisher-Z `z = ½·(ln(1+s) − ln(1−s))` over `similarities` into `out` /// (each input clamped to `±(1 − 1e-6)`). `out.len()` must equal -/// `similarities.len()`. (ndarray `simd_ln_f32` path.) -#[cfg(feature = "ndarray-hpc")] +/// `similarities.len()`. Uses `ndarray::simd::simd_ln_f32` (16-wide lanes). pub fn batch_fisher_z(similarities: &[f32], out: &mut [f32]) { - use ndarray::simd::{simd_ln_f32, F32x16}; assert_eq!( similarities.len(), out.len(), @@ -52,25 +51,9 @@ pub fn batch_fisher_z(similarities: &[f32], out: &mut [f32]) { } } -/// Batch Fisher-Z (scalar fallback — identical results to the ndarray path). -#[cfg(not(feature = "ndarray-hpc"))] -pub fn batch_fisher_z(similarities: &[f32], out: &mut [f32]) { - assert_eq!( - similarities.len(), - out.len(), - "batch_fisher_z: length mismatch" - ); - for (o, &s) in out.iter_mut().zip(similarities) { - let s = s.clamp(-1.0 + EPS, 1.0 - EPS); - *o = 0.5 * ((1.0 + s).ln() - (1.0 - s).ln()); - } -} - -/// Batch L1 distance `|a − b|` over equal-length u8 endpoint slices into `out`. -/// (ndarray `U8x64` 64-wide load path.) -#[cfg(feature = "ndarray-hpc")] +/// Batch L1 distance `|a − b|` over equal-length u8 endpoint slices into `out`, +/// issuing 64-wide `U8x64` loads. pub fn batch_l1_u8(a: &[u8], b: &[u8], out: &mut [u16]) { - use ndarray::simd::U8x64; assert_eq!(a.len(), b.len(), "batch_l1_u8: length mismatch"); assert_eq!(a.len(), out.len(), "batch_l1_u8: length mismatch"); let n = a.len(); @@ -89,16 +72,6 @@ pub fn batch_l1_u8(a: &[u8], b: &[u8], out: &mut [u16]) { } } -/// Batch L1 distance (scalar fallback). -#[cfg(not(feature = "ndarray-hpc"))] -pub fn batch_l1_u8(a: &[u8], b: &[u8], out: &mut [u16]) { - assert_eq!(a.len(), b.len(), "batch_l1_u8: length mismatch"); - assert_eq!(a.len(), out.len(), "batch_l1_u8: length mismatch"); - for ((o, &x), &y) in out.iter_mut().zip(a).zip(b) { - *o = (x as i32 - y as i32).unsigned_abs() as u16; - } -} - #[cfg(test)] mod tests { use super::*;