diff --git a/.claude/board/AGENT_LOG.md b/.claude/board/AGENT_LOG.md index bbe617d1..d148fbea 100644 --- a/.claude/board/AGENT_LOG.md +++ b/.claude/board/AGENT_LOG.md @@ -1,3 +1,19 @@ +## 2026-06-20 (cont.⁹) — PR #560 codex P2 review fixes (gremlin bag semantics + aiwar cross-family edges) + +**Main thread (Opus), autoattended.** Two unresolved P2 codex threads on PR #560, both fixed: (1) `graph_gremlin.rs` `step()` silently deduped targets via a `seen` set — broke Gremlin bag/multiset semantics (`v(["A","C"]).out().count()` = 1 not 2 when both reach B). Rewrote to per-traverser emission (duplicates preserved); added explicit `dedup()` step + `out_preserves_bag_multiplicity` test. (2) `aiwar.rs` `aiwar_node_rows` put cross-category adapter bytes into the first 12 `in_family` slots (labeled `linked`), so `references` queries missed them and the label flipped with fan-out count — aiwar edges are ALL cross-family, so they now go to the 4 `out_family` slots (`references`), cap 4; test asserts `references` present + no `linked`. contract aiwar 3/3, callcenter gremlin 8/8 (+1 bag test), clippy clean (my files; pre-existing TD-CALLCENTER-QUERY-CLIPPY untouched). Pushed to #560; both review threads resolved. **558/559 (NOT mine — OpenProject/Redmine ontology bridges) checked: NOT all resolved** — #558 2 open (codex P2 seed-context-id + CodeRabbit unit-tests), #559 1 open P1 (Redmine/OpenProject entity_type_id convergence). Surfaced to operator (different arc); not auto-fixed. + +## 2026-06-20 (cont.⁸) — D-GV2-1 shipped: GUID v2 tail (leaf·family·identity 3×u16), feature-gated + +**Main thread (Opus), autoattended.** Operator "go" on the guid-v2-tail plan (canon version bump + capacity numbers accepted). Built **D-GV2-1** additive + `#[cfg(feature="guid-v2-tail")]` + NON-breaking (v1 untouched): `canonical_node::{new_v2, leaf() 10..12, family_v2() 12..14, identity_v2() 14..16, local_key_v2, decode_v2/GuidPartsV2, to_hex_v2, GUID_TAIL_LAYOUT_VERSION_V2=2}`; `hhtl::from_guid_prefix_v2` (HEEL·HIP·TWIG·leaf, 16 nibbles — leaf in path, family/identity in basin tail). Per `I-LEGACY-API-FEATURE-GATED`: distinct v2 names (no silent semantic swap), field-isolation matrix test (vary one tier → only that accessor changes), v1/v2 coexistence test, version-gate const. **Verified BOTH configs:** default `cargo test -p lance-graph-contract --lib` = **703** (unchanged, non-breaking); `--features guid-v2-tail` = **706** (+3 v2 tests); clippy `-D warnings` clean on both. Cutover (rename v2→canonical, deprecate v1, ENVELOPE_LAYOUT_VERSION bump) = D-GV2-5, after D-GV2-2 (family→Codebook registry) / D-GV2-3 (soa_graph per-family edges) / D-GV2-4 (aiwar re-key) consume the v2 accessors. Pushed to jirak (extends PR #560). Plan D-GV2-1 marked SHIPPED. + +## 2026-06-20 (cont.⁷) — codex roll-up + 16-family-adapter edges + Callcenter DataFusion/Gremlin + aiwar POC + +**Main thread (Opus), autoattended.** Follow-up to merged PR #557. Pulled the 2 codex P1 review comments (chatgpt-codex-connector; the CodeRabbit arg-count was fixed pre-merge) and rolled both into this PR per operator. **codex #1** (classid filter): `project_snapshot`/`nearest_anchor` now include only rows where `classid == domain.classid` (a classid IS the class, exact — operator). **codex #2** (ambiguous edge bytes): resolved via the operator's **16×8-bit family-node adapter** model — the `EdgeBlock` is read as 16 family adapters (each byte → a FAMILY by `family & 0xFF`, collision-aware skip), not member-by-identity; the >255-member aliasing dissolves (resolution is family-level only). EPIPHANY `E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE` (mixin-dependency traded for render stability + flexibility). + +**Callcenter slice (lance-graph-callcenter):** NEW `graph_table` (`query-lite`) — `GraphSnapshot` → `nodes`/`edges` arrow MemTable `TableProvider`s + `register_graph(SessionContext)` (the DataFusion/SQL/Cypher→SQL path, mirrors `transcode::ontology_table`); NEW `graph_gremlin` (always-on, pure contract types) — `g(&snap).v().out()/.in_()/.out_e(label)/.values_kind()` Gremlin POC = the SurrealQL `->edge->` traversal kernel. **aiwar POC (contract `aiwar.rs` + example):** `AiwarClassView` (category ⇒ family-id) + `aiwar_node_rows` ingest the REAL `AdaWorldAPI/aiwar-neo4j-harvest/data/aiwar_graph.json` (174 KB) → OSINT NodeRows → `project_snapshot`. Example run: **221 entities/326 edges → 281 nodes (221 members + 60 family hubs) + 481 edges**. (Honest: 60 families because the class view keys off the raw fine-grained `type` field; coarse `N_*`-bucket grouping is a one-line knob — mechanism is correct.) + +Tests: contract **703** lib (+5: aiwar ×3, soa_graph ambiguity+mixed-class ×2), clippy `--all-targets -D warnings` clean. Callcenter **10** graph tests (`--features query`, incl. live SQL roundtrip), default build compiles `graph_gremlin`; my two files clippy-clean (pre-existing oxrdf/doc `-D warnings` debt in unrelated modules logged to TECH_DEBT). q2 wires the GraphSnapshot to the Quadro-2 visual. PR opened (codex fixes rolled in). + ## 2026-06-20 (cont.⁶) — SoA-as-graph domain foundation for q2 (OSINT/Gotham 0x0007 + FMA 0x0008) **Main thread (Opus), autoattended.** Operator: "prepare everything so q2 can render nodes/edges + family nodes + HHTL CLAM hop adjacency, neo4j-emulation; OSINT OGAR class is 0x0007; also FMA anatomy 70k as body with bones as stability anchor — rendering is wired in the q2 session, here just the basic domain + SoA-as-graph." Grounded with two parallel Explore agents (q2 wiring + lance-graph ontology/callcenter/polyglot) BEFORE building — consult-don't-guess paid off twice: (a) `graph_render.rs` ALREADY is the Neo4j/Gotham surface (`GraphSnapshot`/`RenderNode`/`RenderEdge`, consumer = q2 cockpit) → reused, not duplicated; (b) `NiblePath::from_guid_prefix` ALREADY is the canonical GUID→path lowering → de-duped symbiont's third copy onto it. diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index b12c1326..9758a3ef 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -1,3 +1,52 @@ +## 2026-06-20 — E-UNIFORM-MORTON-TILE-PYRAMID — making every GUID tier the same size (8×u16) makes the KEY, the per-family CODEBOOK, the VALUE tile, and the PERTURBATION pyramid all the SAME 2bit×2bit 4×4 Morton-tile primitive — so one kernel (Morton + AMX 4×4 BF16 GEMM), one distance (Morton common-prefix = HHTL hop), and one codebook shape (256×256 per tier) govern the whole substrate + +**Status:** FINDING (operator design lock, GUID-v2-tail, 2026-06-20). + +The v2 tail's payoff is not just "kills u24" — it's that **uniform tier size collapses four separate structures onto one tile primitive**: + +- **1 nibble = 2bit×2bit = a 4×4 Morton tile = `FAN_OUT`-16** (one HHTL level). The atom — the same `morton4(x,y)` interleave `domino.rs` already runs on the value side. +- **1 u16 tier = 4 nibbles = a 256×256 Morton tile** (8 bits/axis) = OGAR's "256×256 centroid tile per tier" (`256 = 4⁴`; nibble-interleave = alternating-axis refinement = Morton in centroid space). +- **8 u16 tiers = one stacked pyramid**, every tier identical in size/shape (`classid_hi·classid_lo·HEEL·HIP·TWIG·leaf·family·identity`). + +The same 4×4 Morton primitive then governs four things that were separate: +1. **Key/address** — each tier a 256×256 tile; routing = descend tiles. +2. **Codebook** — a 256×256 centroid tile per family (per-family scoping), selected by the family tier (itself a tile). D-GV2-2. +3. **Value** — the domino BF16 4×4 Morton tile → AMX `TDPBF16PS` GEMM (shipped). +4. **Perturbation/helix** — the stacked-pyramid (exponent=tier-level, location=Morton sub-tile, phase=deterministic, magnitude=stored) lands on the same grid (OGAR `guid-prefix-shape-routing.md` §4). + +Payoffs collapse to one each: **one kernel** (Morton addressing + the AMX 4×4 tile GEMM sweeps tiers / codebooks / values uniformly, no special-casing), **one distance** (Morton common-prefix depth = HHTL hop = `family_hop_count` = `E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY`'s "distance in the address"), **one codebook shape**. The 24+24 tail broke this uniformity (a u24 is neither a clean Morton tile nor u16-aligned); 16+16+16 restores it. Condition: only holds while every tier stays u16 (4-nibble = one 256×256 tile) and the codebook stays the 4⁴ centroid hierarchy (flat k-means-256 breaks the Morton prefix; `OGAR/CLAUDE.md` "Tier interpretation"). Cross-ref: `E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY`, `E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE`, OGAR canon "256×256 CENTROID TILE" + "Bipolar-phase pyramid", `domino.rs` (`morton4` + AMX tile GEMM), plan `guid-v2-tail-per-family-codebook-v1.md`. + +--- + +## 2026-06-20 — E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY — when group membership lives in the GUID ADDRESS (the family tier / the 16 family-adapter bytes) and shared state lives ONCE on the family-node basin, a mixin / multiple-membership is O(1) (a byte reference) not O(n) (materialized inherited edges), and inter-node distance is HHTL hop arithmetic on the address, never a BFS over edges — "distance is in the address/hops" + +**Status:** FINDING (perennial; operator design lock, GUID-v2-tail plan, 2026-06-20). + +Naive multiple-inheritance / multi-group graphs explode: a node in N groups must materialize N inherited edge-sets (copy or link each) → O(n) per node, and "how far apart are A and B" is a BFS over those materialized edges. Both costs vanish when membership is **addressed, not materialized**: + +1. **A family node is an episodic basin.** The connections accumulated *on the basin* (its in/out edges) ARE the **supporting edges** of every member — the shared state lives ONCE, on the family-node row, not copied into each member. +2. **A member mixes in a family by reference** — a 1-byte family-adapter slot (or the `family` tier of its own GUID), never a copy. N memberships = N adapter bytes (≤16), bounded. Adding a member to a rich basin is free: it inherits the basin's whole supporting-edge set by pointing at it. **Mixin = O(1).** +3. **Distance is in the address.** Inter-node distance = HHTL hop count (`family_hop_count` / `common_prefix_depth`) computed from the two GUIDs — O(depth) arithmetic, never a graph traversal. "Distance is in the address/hops." + +So the three costs that kill multi-group graphs become: membership = O(1) (a byte), shared episodic state = stored once (the basin), distance = O(1) (address arithmetic). This is *why* the GUID-v2 tail (`leaf` HHTL tier · `family` basin · `identity` instance) + per-family codebooks is not just a layout tidy-up — it makes mixins and supporting-edge inheritance asymptotically cheap. The condition is strict: it only holds while membership stays in the head address (never a value-slab list) and the family basin owns the shared edges (never per-member copies). Cross-ref: `E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE` (the edge-resolution face), `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE` (structure-in-the-head), `E-GUID-IS-THE-GRAPH`, plan `guid-v2-tail-per-family-codebook-v1.md` (D-GV2-2 episodic basin), OGAR codebook-scoping canon. + +--- + +## 2026-06-20 — E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE — resolving graph edges to FAMILY nodes (16×8-bit family-node adapters) instead of to individual members trades a "mixin dependency" (a referenced family must exist) for two structural wins — extreme render stability (family hubs are fixed anchors, members attach to them, the layout doesn't churn) and the dissolution of the >255-member identity-byte aliasing (resolution is only ever family-level) + +**Status:** FINDING (perennial; operator model, shipped `contract::soa_graph` 16-adapter reading + `aiwar` POC, 2026-06-20). + +The merged `soa_graph` (PR #557) resolved the 12 in-family edge slots to individual sibling members by `identity & 0xFF`. Codex P1 #2 caught the flaw: a family with >255 members aliases on the low byte, so "first match" renders an edge to the *wrong* member by row order. The operator's fix is not a wider member index — it is a **reinterpretation**: read the canonical 16-byte `EdgeBlock` as **16 family-node adapter slots**, every byte resolving to a FAMILY (`family & 0xFF`), never an individual member. + +Why this is the right move, not just a bug patch: +1. **The aliasing dissolves at the source.** There is no longer any member-level byte resolution — edges land on family nodes. The only remaining ambiguity is two families sharing a low byte (>256 families), handled by a collision-aware map that skips the ambiguous byte (never a wrong edge). 256 families "covers pretty much everything" for a POC; the prefix/HHTL route is the >256 escalation. +2. **Render stability is structural.** Family nodes are fixed hubs; members attach to them (member-of) and adjacency is member→family. A force-directed layout anchored on a small stable set of family hubs does not churn frame-to-frame — the operator's "extreme render stability." This is the same stability `E-ANCHOR-IS-A-HEAD-FIELD` gives the FMA skeleton (bones = anchor families); the two epiphanies are the static (anchor) and dynamic (edge-resolution) faces of one principle: **structure lives on families, in the head.** +3. **Flexibility + the one cost.** A node mixes in up to 16 family adjacencies (huge flexibility, any-to-any within 256). The named limitation is **mixin dependency**: a referenced family must exist or the slot is a dangling adapter (skipped). That is the honest trade — and it is cheap, because a missing family is a render no-op, not a corruption. + +The general rule for graph edges on this substrate: **resolve to the stable grouping (family), not the volatile leaf (member)** — unless a richer flavor (8×16-bit, 32×4 residue, member→member second-hop) is measured to be needed. Cross-ref: `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE` (the static dual), `E-GUID-IS-THE-GRAPH`, the operator's deferred helix-basin-anchor (CLAM ⇄ Louvain turbovec edge residue) as the eventual richer flavor; `aiwar.rs` (the POC: 221 aiwar entities → 60 category family hubs). + +--- + ## 2026-06-20 — E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE — graph STRUCTURE (domain, family grouping, hierarchy, stability anchors, adjacency) must key off the 32-byte HEAD (classid / family / HHTL path), never the value slab; only then does the whole neo4j/Gotham view — and "FMA bones as stability anchor" — stay zero-value-decode at memory-scan speed **Status:** FINDING (perennial; shipped `contract::soa_graph` + `NiblePath::family_hop_count`, 2026-06-20). diff --git a/.claude/board/INTEGRATION_PLANS.md b/.claude/board/INTEGRATION_PLANS.md index c3306e52..ca34e61d 100644 --- a/.claude/board/INTEGRATION_PLANS.md +++ b/.claude/board/INTEGRATION_PLANS.md @@ -1,3 +1,9 @@ +## 2026-06-20 — guid-v2-tail + per-family codebook scoping (PROPOSED, operator what-if) + +Plan: `.claude/plans/guid-v2-tail-per-family-codebook-v1.md`. Repartition the 48-bit basin tail `family(u24)|identity(u24)` → `leaf(u16)|family(u16)|identity(u16)` (whole key = uniform 8×u16 tiers), and scope **codebooks per family** (`family → Codebook`, the finer sibling of `classid → ClassView`; 12 in-family edge slots = 1-byte index into own family codebook, 4 out-of-family = `(family,index)`). Dissolves the aiwar "60 noisy families" at the root (per-family ≤256 vocabularies), kills the awkward `u24`, makes family-adapter resolution exact, and gives the 3-tier subclass codebook a native home. **Blast radius measured: CONTAINED in lance-graph** (q2/smb-office-rs/medcare-rs = 0; routing prefix `from_guid_prefix`/`mailbox_scan` is tail-agnostic; ~3 layout files + ~35 mostly-test `NodeGuid::new` call sites). **PROPOSED — gated on operator sign-off (canon version bump) + two numbers:** ≤65 536 identities per `(leaf,family)` bucket, ≤256 codebook entries per family before split. Ships feature-gated `guid-v2-tail` (default OFF) per `I-LEGACY-API-FEATURE-GATED` (field-isolation matrix + version gate). D-GV2-1..5. + +--- + ## 2026-06-20 — unified-soa-rubikon-integration (ONE SoA: planner + kanban + SurrealQL + thinking-styles↔Rubikon) Plan: `.claude/plans/unified-soa-rubikon-integration-v1.md`. The capstone after the golden image: the 16k-board SoA (`NodeRow` key16|edges16|value480) becomes ONE struct-of-arrays the **planner**, **kanban** lifecycle, **SurrealQL membrane**, and **thinking styles** all consume — landing in `lance-graph-planner` + `lance-graph-contract`, consuming Ontology + OGAR inheritance, with **ractor as the dummy ownership guarantee** (no messages/tokio). **SHIPPED foundation (verified file:line):** D0–D3 + NaN-projection + the synchronous writer-fires-kanban loop (D2). **PROPOSED (named targets located):** the four planner superpowers over the one SoA — (1) tenant→fingerprint meta-query (standing wave), (2) `temporal` implicit Markov chaining, (3) project any tenant (witness/CausalEdge64), (4) key-only 32-byte (GUID+EdgeBlock) neo4j render, zero value-decode; the Rubikon/Heckhausen + Libet −550..−200 ms mapping (−200 ms veto on `Planning→Prune`); OGAR class DO/THINK thinking-style selection via i4-32D fingerprint CAM + implicit sparse adjacency; JITson + lance-graph-callcenter outer boundary/SLA/commit membrane. Honest: §1 shipped, §2–§5 not yet running. diff --git a/.claude/board/LATEST_STATE.md b/.claude/board/LATEST_STATE.md index 32b8abc6..46e29345 100644 --- a/.claude/board/LATEST_STATE.md +++ b/.claude/board/LATEST_STATE.md @@ -16,6 +16,8 @@ --- +> **2026-06-20 — IN PR (`claude/jirak-math-theorems-harvest-rfii13`)** — **codex roll-up + 16-family-adapter edges + Callcenter DataFusion/Gremlin + aiwar POC.** Follow-up to merged #557. (1) Both codex P1 fixes rolled in: classid filter (`project_snapshot`/`nearest_anchor` only project `classid == domain.classid` rows) + the operator's **16×8-bit family-node adapter** edge model — the `EdgeBlock` reads as 16 family adapters (each byte → a FAMILY by `family & 0xFF`, collision-aware skip), dissolving the >255-member aliasing; member-by-identity resolution removed (`E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE`). (2) `lance-graph-callcenter`: NEW `graph_table` (`query-lite`, `GraphSnapshot` → `nodes`/`edges` arrow MemTable `TableProvider`s + `register_graph(SessionContext)`) + NEW `graph_gremlin` (always-on Gremlin/SurrealQL traversal kernel). (3) `contract::aiwar` + example: `AiwarClassView` (category ⇒ family) + `aiwar_node_rows` ingest the real `aiwar-neo4j-harvest/data/aiwar_graph.json` (221 entities → 281 nodes / 60 family hubs / 481 edges). Contract 703 lib + callcenter 10 graph tests green; contract clippy `--all-targets -D warnings` clean. q2 wires the GraphSnapshot → Quadro-2 visual. Refs: AGENT_LOG 2026-06-20 (cont.⁷), EPIPHANIES `E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE`, TECH_DEBT `TD-CALLCENTER-QUERY-CLIPPY`. +> > **2026-06-20 — branch work (`claude/jirak-math-theorems-harvest-rfii13`)** — **SoA-as-graph domain foundation for the OSINT/Gotham + FMA consumers (q2 renders the pixels).** New zero-dep `contract::soa_graph`: `project_snapshot(&[NodeRow], &DomainSpec) -> graph_render::GraphSnapshot` projects the canonical 32-byte head (NodeGuid + EdgeBlock) into the EXISTING Gotham/neo4j surface (`graph_render` — reused, not duplicated) — family nodes (by u24 `family`), member/in-family/out-of-family edges, all **zero value decode**. `nearest_anchor` ranks nodes to their nearest stability-anchor family by the new `NiblePath::family_hop_count` (CLAM tree distance). Two domains registered: `OSINT_GOTHAM` (classid **`0x0007`**) + `FMA_ANATOMY` (**`0x0008`**, bones = anchor families) in `BUILTIN_READ_MODES` (`ReadMode::OSINT` Cognitive/CoarseOnly hot; `ReadMode::FMA` Compressed/CoarseOnly cold). Anchor-ness is a HEAD field (`family`), never a value type — so "FMA bones as stability anchor" stays head-only (`E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE`). De-duped the GUID→NiblePath lowering: symbiont's `hhtl_path_of` now delegates to canonical `from_guid_prefix` (third copy collapsed). 698 contract + 12 symbiont tests green, clippy clean. **Deferred (named):** q2 rendering (q2 session), Callcenter DataFusion/gremlin POC, OntologyRegistry ClassView labels. Refs: AGENT_LOG 2026-06-20 (cont.⁶), EPIPHANIES `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE`. > > **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET `other_case` transcoded + byte-parity proven (E-CPP-PARITY-5), the fifth leaf.** `UniCharSet` now parses the case-pair id (the token right after the script) into `other_cases: Vec`, applying the load-time clamp (`unicharset.cpp:901`: a value `>= size`, incl. the absent default, folds to the id itself). Exposes `get_other_case` + `dump_other_case`, mirroring `unicharset.h:703` (out-of-range id → `INVALID_UNICHAR_ID` -1). **Byte-identical 112/112** on real `eng.lstm-unicharset` vs tesseract's own `get_other_case` (self-validating oracle, `other_case` mode; 60/112 self, 52 real pairs, e.g. `C`→`c`). Last field cleanly reachable by token-offset; direction/mirror/bbox need the multi-tier parser (next, larger leaf). Additive, zero-dep; +4 contract tests (23 unicharset total), clippy `-D warnings` + fmt clean; reproducible via `examples/unicharset_dump.rs other_case`. Consumed by `tesseract-core::CharSet::get_other_case` (+1 boundary test, 6/6). No Core gap. EPIPHANIES `E-CPP-PARITY-5`. diff --git a/.claude/board/TECH_DEBT.md b/.claude/board/TECH_DEBT.md index e2b92e07..2573c489 100644 --- a/.claude/board/TECH_DEBT.md +++ b/.claude/board/TECH_DEBT.md @@ -2890,3 +2890,6 @@ after hydration (the OGAR/hydrator seed step does this). Persisting the pairs the Paid state. Low risk: dedup itself survives replay (the deduped `entity_type` is baked into each persisted `schema_ptr`); only the path bijection needs re-seeding. Pair: D-IDENTITY-4. + +## TD-CALLCENTER-QUERY-CLIPPY — 2026-06-20 — OPEN +**Owed:** `cargo clippy -p lance-graph-callcenter --features query --lib --tests -- -D warnings` fails on PRE-EXISTING debt unrelated to the graph-adapter work: `oxrdf::Subject` deprecation (`src/edge.rs`, use `NamedOrBlankNode`), doc-list overindent + misc in `savant_reasoners.rs` / `vsa_udfs.rs` / `ttl_parse.rs` / `transcode/zerocopy.rs` / `hydrators/owl.rs`. Surfaced (not caused) by the 16-family-adapter PR, whose own files (`graph_table.rs`, `graph_gremlin.rs`) are clippy-clean (verified: zero errors originate in them). **Why deferred:** out of scope — fixing the oxrdf deprecation + doc lints across 6 unrelated modules is its own sweep; this PR is the OSINT graph adapter. **Pay when:** a callcenter-clippy-hardening pass, or opportunistically when those modules are next touched. diff --git a/.claude/plans/guid-v2-tail-per-family-codebook-v1.md b/.claude/plans/guid-v2-tail-per-family-codebook-v1.md new file mode 100644 index 00000000..1c99e6a7 --- /dev/null +++ b/.claude/plans/guid-v2-tail-per-family-codebook-v1.md @@ -0,0 +1,118 @@ +# Integration Plan — GUID v2 tail (3×u16) + per-family codebook scoping (v1) + +> **Status:** PROPOSED (operator "what-if", 2026-06-20). Gated on operator +> sign-off for a canon version bump + the two capacity numbers below. +> **Owner branch (when greenlit):** `guid-v2-tail` feature, default OFF. +> **Canon impact:** layout reclaim of the 48-bit basin tail → `I-LEGACY-API-FEATURE-GATED`. + +--- + +## Motivation (why now, before it dilutes) + +Three findings this session converge on one layout change: + +1. **The aiwar codebook is a 3-tier subclass hierarchy** (`node-type (5) → class/airo:type (~6) → fine ML-type (68 noisy)`); the noise is entirely at the leaf, mixed into ONE global codebook. (See `aiwar.rs` POC + the 2026-06-20 codebook check.) +2. **`u24` is awkward** — `family()`/`identity()` hand-assemble 3 bytes into a zero-padded `u32`; the tail is the only non-`u16`-aligned part of the key. +3. **The 16×8-bit family-adapter edges** (`E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE`) resolve by `family & 0xFF`, which aliases at >256 families (codex P1 #2, currently handled by collision-skip). + +**The change (operator-pinned 2026-06-20):** `leaf` is the **4th HHTL tier** +("a natural HHTL") — the cascade becomes `HEEL·HIP·TWIG·leaf` = 4 tiers × 4 +nibbles = **16 nibbles = a full `u64` NiblePath** (routing, not basin). `family` +is the **basin / episodic hub**, `identity` the instance, so the basin-local key +shrinks to `family·identity` = **4 bytes** (was 6). The whole 16-byte key is a +**uniform 8×u16 tier array** that splits cleanly into three roles: + +``` +classid_hi classid_lo │ HEEL HIP TWIG leaf │ family identity +└─ codebook prefix ───┘ └─ HHTL path (4×4=16 nibbles) ┘ └─ basin · instance ┘ + (classid → Codebook) (routing / hop distance) (episodic hub · row) +``` + +Plus **codebooks scoped per family** (`family → Codebook`, the finer sibling of +`classid → ClassView`). + +## What it buys + +- **Kills `u24`** — every tier is a native `u16` masked load; Display becomes uniform 4-hex groups (more self-describing, OGAR "dash-groups are the semantics"). +- **Native home for the subclass codebook** — `leaf`=coarse category, `family`=subclass, `identity`=instance; the HHTL prefix (HEEL/HIP/TWIG) stays the routing cascade, the tail carries the content hierarchy. +- **Per-family codebooks dissolve the noise at the root** — each family owns a ≤256-entry codebook; the fine `type` becomes a 1-byte palette index into the family's codebook, not a global string. Within-family references are exact (no `& 0xFF` alias). This IS OGAR canon's "finer scope … longest-prefix-wins" (`classid·…·family → codebook`) + bgz17 palette-per-family + D-AMORT (codebook minted once/family). +- **The 12+4 EdgeBlock split gains a precise meaning:** 12 in-family slots = 1-byte index into the OWN family codebook (family implicit); 4 out-of-family slots = `(family u16, index u8)` cross-family reference (the "8×16-bit out-of-family" widening). + +## Blast radius (measured 2026-06-20 — CONTAINED in lance-graph) + +Cross-repo: **q2 = 0, smb-office-rs = 0, medcare-rs = 0** (no downstream consumer +touches `NodeGuid`; q2 consumes `GraphSnapshot` strings, tail-agnostic). + +| Site | Kind | Action | +|---|---|---| +| `lance-graph-contract/src/canonical_node.rs` | **layout source** | `new()` (add `leaf` arg; 24-bit asserts → 16-bit), `family()`/`identity()` (offsets 10..13/13..16 → 10..12/12..14/14..16, return `u16`), add `leaf()`, `local_key()` (reinterpret trailing 6 B), `decode()`/`GuidParts` (+leaf, u16), `Display` (`{:06x}{:06x}` → `{:04x}-{:04x}-{:04x}`), zero-fallback ladder re-pin, ~15 tests + the 0x0100_0000 overflow panics | +| `soa_graph.rs` (8 hits) | **semantic** | family grouping + `family & 0xFF` → per-family codebook + 16-bit family / exact resolution | +| `aiwar.rs` (2 masks + 1 `new`) | **semantic** | `0x00FF_FFFF` masks → 16-bit; key on `leaf` (coarse) not noisy `type` | +| `action.rs`, `ocr.rs` | **mechanical (prod)** | add `leaf` arg to `NodeGuid::new` (route via `NodeGuid::local`) | +| `hhtl.rs`, `ontology/registry.rs`, `symbiont/key_render.rs`, `callcenter/graph_table.rs` | **mechanical (tests)** | add `leaf` arg to ~30 test `NodeGuid::new` call sites | +| `hhtl::from_guid_prefix`, `mailbox_scan`, ontology `NiblePath↔entity_type` | **TAIL-AGNOSTIC — confirm, no change** | read the routing PREFIX (classid·HEEL·HIP·TWIG), never the tail | + +~35 `NodeGuid::new` call sites total (mostly tests). The unrelated `0xFFFF…` +hits (cycle counters, fingerprint masks, learning-state) are NOT the GUID tail. + +## Gating numbers (operator must confirm before build) + +- **identity 16.7M → 65 536 per family basin.** OK unless a single family needs >65 536 instances. OSINT/FMA: comfortable. (`leaf` is now a routing tier, so it deepens the HHTL tree rather than partitioning the basin.) Confirm against the densest expected family. +- **family codebook ≤ 256 entries** (1-byte in-family index). A family that outgrows it **splits** (mint a sub-family — cheap with a 16-bit family) rather than widening the byte. Confirm densest single-family vocabulary < 256. + +## The O(1) mixin — family node = episodic basin (operator, 2026-06-20) + +The payoff of putting membership in the address: **mixin / multiple-membership +is O(1), not O(n).** + +- A **family node is an episodic basin** — connections accumulated *on the basin* + (its in/out edges) ARE the **supporting edges** of every member. The shared + state lives ONCE, on the family-node row. +- A member **mixes in** a family by **reference** — a 1-byte family-adapter slot + (or the `family` field itself), never a copy. N memberships = N adapter bytes + (≤16), bounded; not N materialized edge-sets per node. So adding a member to a + rich basin is free — it inherits the basin's whole supporting-edge set by + pointing at it. +- **Distance is in the address.** Inter-node distance = HHTL hop count + (`family_hop_count` / `common_prefix_depth`) computed from the two addresses — + O(depth) arithmetic, NOT a BFS over materialized edges. "Distance is in the + address/hops." + +Net: membership O(1) (a byte), shared episodic state stored once (the basin), +distance O(1) (address arithmetic). This is what avoids the O(n) edge explosion +that kills naive multiple-inheritance / multi-group graphs. Captured as +`E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY`. + +## Everything folds into one Morton tile (operator, 2026-06-20) + +Uniform tier size is the real prize: **1 nibble = 2bit×2bit = a 4×4 Morton tile += `FAN_OUT`-16; 1 u16 tier = a 256×256 Morton tile (`256 = 4⁴`); 8 tiers = one +stacked pyramid.** The same 4×4 Morton primitive then governs the **key** (each +tier a tile, routing = descend tiles), the **per-family codebook** (a 256×256 +centroid tile per family, D-GV2-2), the **value** (the shipped `domino.rs` BF16 +4×4 Morton tile → AMX `TDPBF16PS`), and the **perturbation/helix pyramid** — +collapsing to one kernel (Morton + AMX tile GEMM), one distance (Morton +common-prefix = HHTL hop = `family_hop_count`), one codebook shape. The 24+24 +tail broke this (u24 ≠ a clean Morton tile, not u16-aligned); 16+16+16 restores +it. Captured as `E-UNIFORM-MORTON-TILE-PYRAMID`. (Only holds while tiers stay +u16 and the codebook stays the 4⁴ centroid hierarchy — flat k-means-256 breaks +the Morton prefix.) + +## Resolved decision (operator, 2026-06-20) + +`leaf` = the **4th HHTL routing tier** (the natural cascade terminal), NOT a +basin tier. `family` = the basin/episodic hub, `identity` = the instance. +coarse→fine left-to-right: `…·TWIG·leaf` (routing) then `family·identity` +(basin). Display: uniform 4-hex groups, `…-twig-leaf-family-identity`. + +## Deliverables (when greenlit — feature `guid-v2-tail`, default OFF) + +- **D-GV2-1** ✅ **SHIPPED (this PR, feature `guid-v2-tail` default OFF, additive & non-breaking).** `canonical_node`: `new_v2` (+`leaf`), `leaf()` `10..12`, `family_v2()` `12..14`, `identity_v2()` `14..16` (all `u16`), `local_key_v2()` (4 bytes), `decode_v2()`/`GuidPartsV2`, `to_hex_v2()` (uniform 4-hex), `GUID_TAIL_LAYOUT_VERSION_V2 = 2` (version gate). `hhtl::from_guid_prefix_v2` = `HEEL·HIP·TWIG·leaf` (16 nibbles; classid is the separate codebook prefix; leaf in path, family/identity NOT). v1 `new`/`family()`/`identity()` UNTOUCHED (distinct v2 names → no silent semantic swap, `I-LEGACY-API-FEATURE-GATED`). **Field-isolation matrix test** + v1/v2 coexistence + leaf-in-path tests. default 703 / `--features guid-v2-tail` 706, clippy clean both. **Cutover (rename v2→canonical, deprecate v1, bump `ENVELOPE_LAYOUT_VERSION`) = D-GV2-5.** +- **D-GV2-2** `family → Codebook` registry = **episodic basin** (the codebook + the basin's accumulated supporting edges), sibling of `classid → ClassView` in `lance-graph-ontology`: `LazyLock`/Lance-backed, masked-load lookup, head-only. 256-entry cap + split-on-overflow guard. Mixin = O(1) reference to this basin (`E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY`). +- **D-GV2-3** `soa_graph` per-family edge resolution: 12 in-family = 1-byte own-codebook index, 4 out-of-family = `(family,index)`; retire `family & 0xFF` collision-skip under v2. +- **D-GV2-4** `aiwar` re-keyed on `leaf` (coarse node-type, 5 hubs) + per-family codebook (System/Stakeholder/… vocabularies) → resolves the "60 noisy families" on real data. +- **D-GV2-5** cutover: flip default after the gating numbers + downstream (none today) confirmed; v1 → `#[deprecated]` no-op path with migration pointer. + +## Cross-refs + +`EPIPHANIES.md` `E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE` / `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE`; OGAR `CLAUDE.md` "Codebook scoping = the class routing prefix … finer scopes follow the same longest-prefix-wins rule"; canon "Minimal SoA node" (the v1 layout this supersedes under feature gate); `I-LEGACY-API-FEATURE-GATED`; bgz17 palette (per-family 256-centroid tile); `aiwar.rs` + the 2026-06-20 codebook check. diff --git a/.claude/plans/unified-soa-rubikon-integration-v1.md b/.claude/plans/unified-soa-rubikon-integration-v1.md index fa2b8309..a4557a61 100644 --- a/.claude/plans/unified-soa-rubikon-integration-v1.md +++ b/.claude/plans/unified-soa-rubikon-integration-v1.md @@ -54,6 +54,14 @@ No copies, no per-subsystem mirror (R1 "one SoA never transformed"). 16384 nodes / 32768 edges from 512 KiB of heads, 7680 KiB of value slabs COLD; zero-value-decode proven by the `0xFF`-poison falsifiable probe. `SymbiontBoard` now materialises the contract's `edge_block_at`/`hhtl_path_at` key facets. +- ✅ **OSINT family-adapter edges + Callcenter DataFusion/Gremlin + aiwar POC** + (follow-up to #557, codex P1 fixes rolled in) — `soa_graph` now reads the + `EdgeBlock` as **16×8-bit family-node adapters** (edges → families, collision- + aware; dissolves >255-member aliasing) + classid-filtered projection; + `lance-graph-callcenter::graph_table` (DataFusion `nodes`/`edges` MemTables) + + `graph_gremlin` (Gremlin/SurrealQL traversal); `contract::aiwar` ingests the + real `aiwar-neo4j-harvest` graph (221 entities → 60 family hubs). q2 wires the + GraphSnapshot → Quadro-2 visual. `E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE`. - ✅ **SoA-as-graph domain foundation for q2 (OSINT/Gotham + FMA)** — `contract::soa_graph` projects the head into the EXISTING `graph_render` Gotham/neo4j surface (`GraphSnapshot`): family nodes (u24 `family`), diff --git a/crates/lance-graph-callcenter/src/graph_gremlin.rs b/crates/lance-graph-callcenter/src/graph_gremlin.rs new file mode 100644 index 00000000..c314becb --- /dev/null +++ b/crates/lance-graph-callcenter/src/graph_gremlin.rs @@ -0,0 +1,261 @@ +//! `graph_gremlin` — a minimal Gremlin-style traversal over a [`GraphSnapshot`]. +//! +//! The pure-Rust contrast to the DataFusion node/edge tables ([`crate::graph_table`], +//! `query-lite`): `g(&snap).v(&["id"]).out().to_vec()` walks the adjacency in the +//! snapshot with **zero SQL, zero DataFusion** — just the `source → target` edge +//! list. A "very basic Gremlin POC": `V` / `out` / `in_` / `out_e(label)` / +//! `in_e(label)` / `values_kind` / `to_vec` / `count`. +//! +//! SurrealQL graph traversal lowers to the SAME steps — `->edge->` ≈ [`out`], +//! `<-edge<-` ≈ [`in_`], `->edge(WHERE ...)->` ≈ [`out_e`] — so this doubles as +//! the SurrealQL traversal kernel over the family-adapter graph. Both consume the +//! `GraphSnapshot` the SoA projector ([`lance_graph_contract::soa_graph`]) +//! produces from the 32-byte node head; the family nodes are the stable hubs the +//! traversal hops through. +//! +//! [`out`]: Traversal::out +//! [`in_`]: Traversal::in_ +//! [`out_e`]: Traversal::out_e + +use lance_graph_contract::graph_render::GraphSnapshot; +use std::collections::HashSet; + +/// The Gremlin `g` — a traversal source bound to one graph snapshot. +pub struct GraphTraversalSource<'a> { + snap: &'a GraphSnapshot, +} + +/// `g(&snap)` — open a traversal over the snapshot (the Gremlin `g`). +pub fn g(snap: &GraphSnapshot) -> GraphTraversalSource<'_> { + GraphTraversalSource { snap } +} + +impl<'a> GraphTraversalSource<'a> { + /// `g.V(ids)` — seed the traversal at the given vertex ids. An empty slice is + /// `g.V()` (all vertices). + pub fn v(&self, ids: &[&str]) -> Traversal<'a> { + let current: Vec = if ids.is_empty() { + self.snap.nodes.iter().map(|n| n.id.clone()).collect() + } else { + ids.iter().map(|s| s.to_string()).collect() + }; + Traversal { + snap: self.snap, + current, + } + } +} + +/// An in-flight traversal: the multiset of vertex ids currently held, plus the +/// snapshot to hop over. Steps consume `self` and return `Self` (Gremlin fluent). +pub struct Traversal<'a> { + snap: &'a GraphSnapshot, + current: Vec, +} + +impl<'a> Traversal<'a> { + /// `out()` — follow outgoing edges (`source ∈ current → target`), any label. + #[must_use] + pub fn out(mut self) -> Self { + self.step(None, true); + self + } + + /// `out(label)` — outgoing edges whose label equals `label`. + #[must_use] + pub fn out_e(mut self, label: &str) -> Self { + self.step(Some(label), true); + self + } + + /// `in()` — follow incoming edges (`target ∈ current → source`), any label. + #[must_use] + pub fn in_(mut self) -> Self { + self.step(None, false); + self + } + + /// `in(label)` — incoming edges whose label equals `label`. + #[must_use] + pub fn in_e(mut self, label: &str) -> Self { + self.step(Some(label), false); + self + } + + fn step(&mut self, label: Option<&str>, outgoing: bool) { + // Bag/multiset semantics (Gremlin): each current traverser independently + // emits its reached targets, so duplicates are PRESERVED — `count()` and + // downstream steps reflect edge multiplicity / fan-in. Set semantics is + // the explicit `dedup()` step, never implicit (codex P2, PR #560). + let cur = std::mem::take(&mut self.current); + let mut next: Vec = Vec::new(); + for from_v in &cur { + for e in &self.snap.edges { + if let Some(l) = label { + if e.label != l { + continue; + } + } + let (from, to) = if outgoing { + (&e.source, &e.target) + } else { + (&e.target, &e.source) + }; + if from == from_v { + next.push(to.clone()); + } + } + } + self.current = next; + } + + /// Gremlin `dedup()` — collapse to distinct vertices (set semantics). The + /// explicit opt-in; every other step preserves bag/multiset multiplicity. + #[must_use] + pub fn dedup(mut self) -> Self { + let mut seen: HashSet = HashSet::new(); + self.current.retain(|id| seen.insert(id.clone())); + self + } + + /// Terminal `values("kind")` — project the `kind` of each reached vertex + /// (skips ids that are not present as nodes, e.g. a dangling adapter target). + #[must_use] + pub fn values_kind(&self) -> Vec { + self.current + .iter() + .filter_map(|id| { + self.snap + .nodes + .iter() + .find(|n| &n.id == id) + .map(|n| n.kind.clone()) + }) + .collect() + } + + /// Terminal `toList()` — the vertex ids currently reached. + #[must_use] + pub fn to_vec(self) -> Vec { + self.current + } + + /// Terminal `count()` — number of vertices reached. + #[must_use] + pub fn count(self) -> usize { + self.current.len() + } +} + +#[cfg(test)] +mod tests { + use super::*; + use lance_graph_contract::graph_render::{RenderEdge, RenderNode}; + + fn node(id: &str, kind: &str) -> RenderNode { + RenderNode { + id: id.to_string(), + label: id.to_string(), + kind: kind.to_string(), + confidence: 1.0, + props: vec![], + } + } + fn edge(source: &str, target: &str, label: &str) -> RenderEdge { + RenderEdge { + source: source.to_string(), + target: target.to_string(), + label: label.to_string(), + frequency: 1.0, + confidence: 1.0, + inferred: false, + } + } + + fn sample() -> GraphSnapshot { + // A -knows-> B -knows-> C ; A -member-of-> family:00000a + GraphSnapshot { + nodes: vec![ + node("A", "Person"), + node("B", "Person"), + node("C", "Person"), + node("family:00000a", "Family"), + ], + edges: vec![ + edge("A", "B", "knows"), + edge("B", "C", "knows"), + edge("A", "family:00000a", "member-of"), + ], + inferences: vec![], + contradictions: vec![], + timestamp: 0, + } + } + + #[test] + fn out_follows_outgoing_edges() { + let s = sample(); + assert_eq!(g(&s).v(&["A"]).out().to_vec(), vec!["B", "family:00000a"]); + } + + #[test] + fn in_follows_incoming_edges() { + let s = sample(); + assert_eq!(g(&s).v(&["B"]).in_().to_vec(), vec!["A".to_string()]); + } + + #[test] + fn out_e_filters_by_label() { + let s = sample(); + // g.V("A").out("knows") = [B]; out("member-of") = [family:00000a] + assert_eq!(g(&s).v(&["A"]).out_e("knows").to_vec(), vec!["B"]); + assert_eq!( + g(&s).v(&["A"]).out_e("member-of").to_vec(), + vec!["family:00000a"] + ); + } + + #[test] + fn two_hop_traversal() { + let s = sample(); + // g.V("A").out("knows").out("knows") = [C] + assert_eq!( + g(&s).v(&["A"]).out_e("knows").out_e("knows").to_vec(), + vec!["C"] + ); + } + + #[test] + fn values_kind_projects_node_property() { + let s = sample(); + // A's "member-of" neighbour is the family hub → kind "Family". + assert_eq!(g(&s).v(&["A"]).out_e("member-of").values_kind(), vec!["Family"]); + } + + #[test] + fn out_preserves_bag_multiplicity() { + // A→B and C→B: from {A,C}, out() reaches B TWICE (bag semantics); + // dedup() collapses to one. No implicit dedup (codex P2, PR #560). + let s = GraphSnapshot { + nodes: vec![node("A", "X"), node("B", "X"), node("C", "X")], + edges: vec![edge("A", "B", "r"), edge("C", "B", "r")], + inferences: vec![], + contradictions: vec![], + timestamp: 0, + }; + assert_eq!(g(&s).v(&["A", "C"]).out().count(), 2); + assert_eq!(g(&s).v(&["A", "C"]).out().dedup().count(), 1); + } + + #[test] + fn unknown_label_yields_empty() { + let s = sample(); + assert_eq!(g(&s).v(&["A"]).out_e("nope").count(), 0); + } + + #[test] + fn v_with_no_seed_is_all_vertices() { + let s = sample(); + assert_eq!(g(&s).v(&[]).count(), 4); + } +} diff --git a/crates/lance-graph-callcenter/src/graph_table.rs b/crates/lance-graph-callcenter/src/graph_table.rs new file mode 100644 index 00000000..cabf0499 --- /dev/null +++ b/crates/lance-graph-callcenter/src/graph_table.rs @@ -0,0 +1,205 @@ +//! `graph_table` — DataFusion node/edge adapter over a [`GraphSnapshot`]. +//! +//! The DataFusion path for the OSINT/Gotham + FMA graph (contrast: the pure-Rust +//! Gremlin/SurrealQL traversal in [`crate::graph_gremlin`]). Projects the +//! `GraphSnapshot` that [`lance_graph_contract::soa_graph`] builds from the +//! 32-byte node head into two MemTable-backed [`TableProvider`]s — `nodes` and +//! `edges` — so a consumer (q2 cockpit) can run Cypher→SQL / GQL / SQL directly: +//! +//! ```sql +//! SELECT n.kind, count(*) FROM nodes n GROUP BY n.kind; +//! SELECT e.label, count(*) FROM edges e GROUP BY e.label; +//! SELECT * FROM edges WHERE source = '...' ; -- one-hop, the SQL "out()" +//! ``` +//! +//! Round-1 is MemTable-backed (mirrors `transcode::ontology_table`); a +//! Lance-dataset-backed scan is the next round. Feature-gated on `query-lite`. + +use std::sync::Arc; + +use arrow::array::{Float32Array, StringArray}; +use arrow::datatypes::{DataType, Field, Schema, SchemaRef}; +use arrow::record_batch::RecordBatch; +use datafusion::datasource::{MemTable, TableProvider}; +use datafusion::error::Result as DfResult; + +use lance_graph_contract::graph_render::GraphSnapshot; + +/// Arrow schema of the `nodes` table: `id, label, kind, confidence`. +pub fn nodes_schema() -> SchemaRef { + Arc::new(Schema::new(vec![ + Field::new("id", DataType::Utf8, false), + Field::new("label", DataType::Utf8, false), + Field::new("kind", DataType::Utf8, false), + Field::new("confidence", DataType::Float32, false), + ])) +} + +/// Arrow schema of the `edges` table: `source, target, label, frequency, confidence`. +pub fn edges_schema() -> SchemaRef { + Arc::new(Schema::new(vec![ + Field::new("source", DataType::Utf8, false), + Field::new("target", DataType::Utf8, false), + Field::new("label", DataType::Utf8, false), + Field::new("frequency", DataType::Float32, false), + Field::new("confidence", DataType::Float32, false), + ])) +} + +/// One `RecordBatch` holding all snapshot nodes. +pub fn nodes_batch(snap: &GraphSnapshot) -> Result { + let ids: Vec<&str> = snap.nodes.iter().map(|n| n.id.as_str()).collect(); + let labels: Vec<&str> = snap.nodes.iter().map(|n| n.label.as_str()).collect(); + let kinds: Vec<&str> = snap.nodes.iter().map(|n| n.kind.as_str()).collect(); + let confs: Vec = snap.nodes.iter().map(|n| n.confidence).collect(); + RecordBatch::try_new( + nodes_schema(), + vec![ + Arc::new(StringArray::from(ids)), + Arc::new(StringArray::from(labels)), + Arc::new(StringArray::from(kinds)), + Arc::new(Float32Array::from(confs)), + ], + ) +} + +/// One `RecordBatch` holding all snapshot edges. +pub fn edges_batch(snap: &GraphSnapshot) -> Result { + let src: Vec<&str> = snap.edges.iter().map(|e| e.source.as_str()).collect(); + let tgt: Vec<&str> = snap.edges.iter().map(|e| e.target.as_str()).collect(); + let lbl: Vec<&str> = snap.edges.iter().map(|e| e.label.as_str()).collect(); + let freq: Vec = snap.edges.iter().map(|e| e.frequency).collect(); + let conf: Vec = snap.edges.iter().map(|e| e.confidence).collect(); + RecordBatch::try_new( + edges_schema(), + vec![ + Arc::new(StringArray::from(src)), + Arc::new(StringArray::from(tgt)), + Arc::new(StringArray::from(lbl)), + Arc::new(Float32Array::from(freq)), + Arc::new(Float32Array::from(conf)), + ], + ) +} + +/// MemTable-backed `TableProvider` for the snapshot nodes. +pub fn nodes_table(snap: &GraphSnapshot) -> DfResult { + MemTable::try_new(nodes_schema(), vec![vec![nodes_batch(snap)?]]) +} + +/// MemTable-backed `TableProvider` for the snapshot edges. +pub fn edges_table(snap: &GraphSnapshot) -> DfResult { + MemTable::try_new(edges_schema(), vec![vec![edges_batch(snap)?]]) +} + +/// The `(nodes, edges)` providers as trait objects, ready to register. +pub fn graph_tables( + snap: &GraphSnapshot, +) -> DfResult<(Arc, Arc)> { + Ok(( + Arc::new(nodes_table(snap)?), + Arc::new(edges_table(snap)?), + )) +} + +/// Register `nodes` + `edges` into a DataFusion `SessionContext`, so a consumer +/// can `ctx.sql("SELECT * FROM nodes")` / `edges`. Requires the full `query` +/// feature (a `SessionContext` is the executable surface). +#[cfg(feature = "query")] +pub fn register_graph( + ctx: &datafusion::prelude::SessionContext, + snap: &GraphSnapshot, +) -> DfResult<()> { + ctx.register_table("nodes", Arc::new(nodes_table(snap)?))?; + ctx.register_table("edges", Arc::new(edges_table(snap)?))?; + Ok(()) +} + +#[cfg(test)] +mod tests { + use super::*; + use lance_graph_contract::canonical_node::{EdgeBlock, NodeRow}; + use lance_graph_contract::soa_graph::{project_snapshot, OSINT_GOTHAM}; + use lance_graph_contract::NodeGuid; + + /// Two OSINT members in families 0xA, 0xB; the family-0xA member carries an + /// out-of-family adapter byte 0x0B → family 0xB. project_snapshot → + /// GraphSnapshot → arrow tables. End-to-end head → DataFusion. + fn sample_snapshot() -> GraphSnapshot { + let mut a_edges = EdgeBlock::default(); + a_edges.out_family[0] = 0x0B; + let rows = [ + NodeRow { + key: NodeGuid::new(NodeGuid::CLASSID_OSINT, 1, 0, 0, 0xA, 1), + edges: a_edges, + value: [0u8; 480], + }, + NodeRow { + key: NodeGuid::new(NodeGuid::CLASSID_OSINT, 2, 0, 0, 0xB, 1), + edges: EdgeBlock::default(), + value: [0u8; 480], + }, + ]; + project_snapshot(&rows, &OSINT_GOTHAM) + } + + #[test] + fn node_and_edge_batches_match_schema_and_counts() { + let snap = sample_snapshot(); + let nb = nodes_batch(&snap).unwrap(); + let eb = edges_batch(&snap).unwrap(); + // 2 members + 2 family nodes = 4 node rows. + assert_eq!(nb.num_rows(), snap.nodes.len()); + assert_eq!(nb.num_rows(), 4); + assert_eq!(nb.schema().field(0).name(), "id"); + // 2 member-of edges + 1 out-of-family ("references") edge = 3. + assert_eq!(eb.num_rows(), snap.edges.len()); + assert_eq!(eb.num_rows(), 3); + assert_eq!(eb.schema().field(0).name(), "source"); + } + + #[test] + fn tables_build_as_providers() { + let snap = sample_snapshot(); + let (nodes, edges) = graph_tables(&snap).unwrap(); + use datafusion::datasource::TableType; + assert_eq!(nodes.table_type(), TableType::Base); + assert_eq!(edges.table_type(), TableType::Base); + assert_eq!(nodes.schema().field(2).name(), "kind"); + } + + #[cfg(feature = "query")] + #[tokio::test] + async fn sql_over_registered_graph() { + use datafusion::prelude::SessionContext; + let snap = sample_snapshot(); + let ctx = SessionContext::new(); + register_graph(&ctx, &snap).unwrap(); + + // SQL "out()": one-hop from the family-0xA member to family:00000b. + let member_a = snap + .nodes + .iter() + .find(|n| n.kind == "OSINT/Gotham" && n.props.iter().any(|(k, v)| k == "family" && v == "00000a")) + .unwrap() + .id + .clone(); + let df = ctx + .sql(&format!( + "SELECT target FROM edges WHERE source = '{member_a}' AND label = 'references'" + )) + .await + .unwrap(); + let batches = df.collect().await.unwrap(); + let rows: usize = batches.iter().map(|b| b.num_rows()).sum(); + assert_eq!(rows, 1, "the out-of-family adapter edge is queryable via SQL"); + + // GROUP BY over node kinds: 2 OSINT members + 2 family nodes. + let df = ctx + .sql("SELECT count(*) AS n FROM nodes") + .await + .unwrap(); + let batches = df.collect().await.unwrap(); + assert_eq!(batches[0].num_rows(), 1); + } +} diff --git a/crates/lance-graph-callcenter/src/lib.rs b/crates/lance-graph-callcenter/src/lib.rs index 96e518d2..27325bc8 100644 --- a/crates/lance-graph-callcenter/src/lib.rs +++ b/crates/lance-graph-callcenter/src/lib.rs @@ -130,6 +130,15 @@ pub mod policy; // MySQL ↔ DataFusion ↔ SPO reconciler. See `transcode/mod.rs`. pub mod transcode; +// Graph adapter (OSINT/Gotham + FMA): the SoA `GraphSnapshot` (built from the +// 32-byte node head by `lance_graph_contract::soa_graph`) projected two ways — +// `graph_table` exposes `nodes` + `edges` as DataFusion `TableProvider`s (the +// SQL/Cypher→SQL path, `query-lite`); `graph_gremlin` is the pure-Rust +// Gremlin/SurrealQL traversal kernel (always-on, zero extra deps). +pub mod graph_gremlin; +#[cfg(feature = "query-lite")] +pub mod graph_table; + // PR-D4 (pr-d4-family-hydration.md) — boot-time TTL hydration of FAMILY_TABLE. // `parse_family_registry()` (OQ-1 option c) + `FAMILY_TABLE` OnceLock + // `try_resolve()` + backward-compat shim for `super_domain_for_family()`. diff --git a/crates/lance-graph-contract/Cargo.toml b/crates/lance-graph-contract/Cargo.toml index 842507c8..58f985dc 100644 --- a/crates/lance-graph-contract/Cargo.toml +++ b/crates/lance-graph-contract/Cargo.toml @@ -35,3 +35,10 @@ harness = false # cross-PR bridge between PR #278 audit log + PR #279 grammar trajectory. # No-op alias today; concrete impl lands once the bridge PR ships. trajectory-audit = [] + +# guid-v2-tail (D-GV2-1, plan: guid-v2-tail-per-family-codebook-v1.md) — the v2 +# basin tail leaf(u16)·family(u16)·identity(u16) replacing v1 family(u24)· +# identity(u24), leaf as the 4th HHTL tier. Default OFF; additive v2 accessors +# (`leaf`/`*_v2`) coexist with v1 until cutover (D-GV2-5). Layout reclaim → +# I-LEGACY-API-FEATURE-GATED (field-isolation matrix + version gate). +guid-v2-tail = [] diff --git a/crates/lance-graph-contract/examples/aiwar_family_poc.rs b/crates/lance-graph-contract/examples/aiwar_family_poc.rs new file mode 100644 index 00000000..c052292a --- /dev/null +++ b/crates/lance-graph-contract/examples/aiwar_family_poc.rs @@ -0,0 +1,70 @@ +//! Run the aiwar OSINT family-node POC on the real graph. +//! +//! ```text +//! cargo run -p lance-graph-contract --example aiwar_family_poc -- [path] +//! ``` +//! Default path: `/tmp/aiwar_graph.json` — download from +//! . +//! +//! Prints the OSINT family **class view** + the projected Gotham `GraphSnapshot` +//! (family nodes = entity categories, members hang off them). q2's cockpit wires +//! the same snapshot to the Quadro-2 visual. + +use lance_graph_contract::aiwar::{aiwar_node_rows, AiwarClassView}; +use lance_graph_contract::literal_graph::ingest_aiwar_json; +use lance_graph_contract::soa_graph::{project_snapshot, OSINT_GOTHAM}; + +fn main() { + let path = std::env::args() + .nth(1) + .unwrap_or_else(|| "/tmp/aiwar_graph.json".to_string()); + let json = match std::fs::read_to_string(&path) { + Ok(s) => s, + Err(e) => { + eprintln!("could not read {path}: {e}"); + eprintln!( + "download: https://raw.githubusercontent.com/AdaWorldAPI/aiwar-neo4j-harvest/main/data/aiwar_graph.json" + ); + std::process::exit(1); + } + }; + let g = ingest_aiwar_json(&json).expect("parse aiwar_graph.json"); + let view = AiwarClassView::from_graph(&g); + let rows = aiwar_node_rows(&g); + let snap = project_snapshot(&rows, &OSINT_GOTHAM); + + let family_nodes = snap + .nodes + .iter() + .filter(|n| n.kind == "Family" || n.kind == "Anchor") + .count(); + let members = snap.nodes.len() - family_nodes; + + println!("aiwar OSINT family-node POC ({path}):"); + println!( + " ingested {} entities / {} edges; {} categories (the class view)", + g.node_count(), + g.edge_count(), + view.len() + ); + println!( + " projected GraphSnapshot: {} nodes ({members} members + {family_nodes} family hubs), {} edges", + snap.nodes.len(), + snap.edges.len() + ); + println!(" family nodes (category ⇒ id, members):"); + for (cat, fam) in view.categories() { + let count = snap + .nodes + .iter() + .find(|n| n.id == format!("family:{fam:06x}")) + .and_then(|n| { + n.props + .iter() + .find(|(k, _)| k == "members") + .map(|(_, v)| v.clone()) + }) + .unwrap_or_default(); + println!(" {fam:>3} {cat:<30} {count} members"); + } +} diff --git a/crates/lance-graph-contract/src/aiwar.rs b/crates/lance-graph-contract/src/aiwar.rs new file mode 100644 index 00000000..f11bb078 --- /dev/null +++ b/crates/lance-graph-contract/src/aiwar.rs @@ -0,0 +1,193 @@ +//! aiwar OSINT POC — the `AdaWorldAPI/aiwar-neo4j-harvest` graph as family nodes. +//! +//! Ingest `data/aiwar_graph.json` (via [`ingest_aiwar_json`](crate::literal_graph::ingest_aiwar_json)) +//! → map each entity to a canonical OSINT [`NodeRow`] (classid +//! [`NodeGuid::CLASSID_OSINT`], `family` = its category) → +//! [`project_snapshot`](crate::soa_graph::project_snapshot) yields a Gotham graph +//! whose **family nodes ARE the categories** (System / Stakeholder / Person / +//! Civic / Historical) — the stable hubs the entities hang off (extreme render +//! stability, `E-ANCHOR-IS-A-HEAD-FIELD`). +//! +//! This is the OSINT domain's family **class view**: a `category ⇒ family-id` +//! map, head-only, zero value decode. q2's cockpit wires the resulting +//! `GraphSnapshot` to the Quadro-2 visual. Run it on the real graph with +//! `cargo run -p lance-graph-contract --example aiwar_family_poc`. + +use crate::canonical_node::{EdgeBlock, NodeGuid, NodeRow}; +use crate::literal_graph::LiteralGraph; +use std::collections::{BTreeMap, BTreeSet}; + +/// The OSINT family **class view**: each distinct entity category (the +/// [`LiteralNode`](crate::literal_graph::LiteralNode) label) mapped to a +/// deterministic 1-based family id (sorted by label). The family nodes in the +/// projection are exactly these categories — the OSINT "classes". +#[derive(Debug, Clone, Default)] +pub struct AiwarClassView { + families: BTreeMap, +} + +impl AiwarClassView { + /// Build from an ingested aiwar graph: distinct node labels → family ids + /// `1..=N` (sorted, deterministic; `0` is reserved for the default basin). + pub fn from_graph(graph: &LiteralGraph) -> Self { + let mut labels: BTreeSet = BTreeSet::new(); + for id in graph.all_node_ids() { + if let Some(n) = graph.node(&id) { + labels.insert(n.label.clone()); + } + } + let families = labels + .into_iter() + .enumerate() + .map(|(i, l)| (l, (i as u32) + 1)) + .collect(); + Self { families } + } + + /// Family id for a category label. + pub fn family_of(&self, label: &str) -> Option { + self.families.get(label).copied() + } + + /// Number of categories (= number of family nodes). + pub fn len(&self) -> usize { + self.families.len() + } + + /// Whether the view holds no categories. + pub fn is_empty(&self) -> bool { + self.families.is_empty() + } + + /// The `(category, family_id)` pairs, sorted by category. + pub fn categories(&self) -> impl Iterator { + self.families.iter().map(|(k, v)| (k.as_str(), *v)) + } +} + +/// Map an ingested aiwar graph to canonical OSINT node rows: classid +/// [`NodeGuid::CLASSID_OSINT`], `family` = the entity's category (via +/// [`AiwarClassView`]), `identity` = the node's position. Each out-edge becomes a +/// **family-node adapter** byte (the target category's `family & 0xFF`), so the +/// projected graph links category hubs (Nation → System, etc.). Head-only — the +/// 480-byte value slab stays zero. +pub fn aiwar_node_rows(graph: &LiteralGraph) -> Vec { + let view = AiwarClassView::from_graph(graph); + let ids = graph.all_node_ids(); + let fam_of = |id: &str| -> Option { graph.node(id).and_then(|n| view.family_of(&n.label)) }; + ids.iter() + .enumerate() + .map(|(i, id)| { + let fam = fam_of(id).unwrap_or(0); + // distinct target-category family low bytes → 16 adapter slots + let mut slots: Vec = graph + .edges_from(id) + .iter() + .filter_map(|e| fam_of(&e.target)) + .filter(|tf| *tf != fam) + .map(|tf| (tf & 0xFF) as u8) + .collect(); + slots.sort_unstable(); + slots.dedup(); + // aiwar entities connect ACROSS categories (Nation→System, …); every + // adapter here is cross-family (built from `tf != fam`), so they go in + // the 4 OUT-of-family slots (labeled `references`), never in-family + // (`linked`) — otherwise the label would flip with fan-out count and + // `references` queries would miss them (codex P2, PR #560). Cap at the + // 4 canonical out-of-family slots. + let mut edges = EdgeBlock::default(); + for (k, &b) in slots.iter().take(4).enumerate() { + edges.out_family[k] = b; + } + NodeRow { + key: NodeGuid::new( + NodeGuid::CLASSID_OSINT, + 0, + 0, + 0, + fam & 0x00FF_FFFF, + (i as u32) & 0x00FF_FFFF, + ), + edges, + value: [0u8; 480], + } + }) + .collect() +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::literal_graph::ingest_aiwar_json; + use crate::soa_graph::{project_snapshot, OSINT_GOTHAM}; + + // A representative slice of the aiwar_graph.json shape (categories via the + // `type` field; edges across categories). Deterministic, CI-safe. + const SAMPLE: &str = r#"{ + "N_Systems": [ + {"id": "Lavender", "name": "Lavender", "type": "PredictiveAnalytics"}, + {"id": "Gospel", "name": "Gospel", "type": "PredictiveAnalytics"} + ], + "N_Stakeholders": [ + {"id": "Israel", "name": "Israel", "type": "Nation"}, + {"id": "UnitNSO", "name": "NSO Group", "type": "TechCompany"} + ], + "E_connection": [ + {"source": "UnitNSO", "target": "Israel", "label": "based in"} + ], + "E_isDevelopedBy": [ + {"source": "Israel", "target": "Lavender", "label": "developed"}, + {"source": "Israel", "target": "Gospel", "label": "developed"} + ] + }"#; + + #[test] + fn class_view_maps_categories_to_families() { + let g = ingest_aiwar_json(SAMPLE).unwrap(); + let view = AiwarClassView::from_graph(&g); + // categories from the `type` field: PredictiveAnalytics, Nation, TechCompany + assert_eq!(view.len(), 3); + assert!(view.family_of("Nation").is_some()); + assert!(view.categories().all(|(_, f)| f >= 1)); + } + + #[test] + fn projects_to_family_node_graph() { + let g = ingest_aiwar_json(SAMPLE).unwrap(); + let view = AiwarClassView::from_graph(&g); + let rows = aiwar_node_rows(&g); + assert_eq!(rows.len(), g.node_count(), "one row per entity"); + + let snap = project_snapshot(&rows, &OSINT_GOTHAM); + // one family node per category (kind Family/Anchor) + let family_nodes = snap + .nodes + .iter() + .filter(|n| n.kind == "Family" || n.kind == "Anchor") + .count(); + assert_eq!(family_nodes, view.len()); + // every entity → member-of edge to its category family hub + let member_of = snap.edges.iter().filter(|e| e.label == "member-of").count(); + assert_eq!(member_of, g.node_count()); + // Israel (Nation) → the PredictiveAnalytics family hub: a cross-CATEGORY + // edge, so it carries the out-of-family `references` label — never the + // in-family `linked` (aiwar edges are all cross-category). + assert!(snap + .edges + .iter() + .any(|e| e.label == "references" && e.target.starts_with("family:"))); + assert!( + !snap.edges.iter().any(|e| e.label == "linked"), + "aiwar edges are all cross-category ⇒ none are in-family `linked`" + ); + } + + #[test] + fn rows_are_osint_class_and_head_only() { + let g = ingest_aiwar_json(SAMPLE).unwrap(); + for row in aiwar_node_rows(&g) { + assert_eq!(row.key.classid(), NodeGuid::CLASSID_OSINT); + assert_eq!(row.value, [0u8; 480], "head-only: value slab stays zero"); + } + } +} diff --git a/crates/lance-graph-contract/src/canonical_node.rs b/crates/lance-graph-contract/src/canonical_node.rs index 67751c85..15df9b8c 100644 --- a/crates/lance-graph-contract/src/canonical_node.rs +++ b/crates/lance-graph-contract/src/canonical_node.rs @@ -204,6 +204,131 @@ impl NodeGuid { } } +// ── GUID v2 tail (leaf·family·identity, 3×u16) — D-GV2-1, feature-gated ──────── +// +// The v2 basin tail repartitions bytes 10..16: leaf(u16) 10..12 (the 4th HHTL +// tier), family(u16) 12..14 (the basin / episodic hub), identity(u16) 14..16 +// (the instance). Bytes 0..10 (classid·HEEL·HIP·TWIG) are IDENTICAL to v1. +// Additive and NON-breaking: v1 `new`/`family`/`identity` are untouched; these +// v2 accessors coexist behind `guid-v2-tail` until cutover (D-GV2-5). Per +// I-LEGACY-API-FEATURE-GATED the v2 names are distinct (`leaf`/`*_v2`), so no +// function silently changes semantics, and `GUID_TAIL_LAYOUT_VERSION_V2` is the +// version gate marking a v2-tail packet. +#[cfg(feature = "guid-v2-tail")] +impl NodeGuid { + /// Construct a v2-tail GUID: `classid·HEEL·HIP·TWIG` identical to v1, then the + /// 3×u16 basin tail `leaf·family·identity`. Each tail field is a full `u16` — + /// no 24-bit truncation footgun (the point of v2). + #[allow(clippy::too_many_arguments)] + pub const fn new_v2( + classid: u32, + heel: u16, + hip: u16, + twig: u16, + leaf: u16, + family: u16, + identity: u16, + ) -> Self { + let c = classid.to_le_bytes(); + let h = heel.to_le_bytes(); + let p = hip.to_le_bytes(); + let t = twig.to_le_bytes(); + let l = leaf.to_le_bytes(); + let f = family.to_le_bytes(); + let i = identity.to_le_bytes(); + Self([ + c[0], c[1], c[2], c[3], // 0..4 classid + h[0], h[1], // 4..6 HEEL + p[0], p[1], // 6..8 HIP + t[0], t[1], // 8..10 TWIG + l[0], l[1], // 10..12 leaf (4th HHTL tier) + f[0], f[1], // 12..14 family (basin / episodic hub) + i[0], i[1], // 14..16 identity (instance) + ]) + } + + /// v2 `leaf` — bytes 10..12, the 4th HHTL routing tier (cascade terminal). + #[inline] + pub const fn leaf(&self) -> u16 { + u16::from_le_bytes([self.0[10], self.0[11]]) + } + + /// v2 `family` — bytes 12..14, the basin / episodic-hub tier (the codebook + /// selector). Distinct from v1 [`family`](NodeGuid::family) (u24 at 10..13): + /// different name, different bytes — no silent semantic swap. + #[inline] + pub const fn family_v2(&self) -> u16 { + u16::from_le_bytes([self.0[12], self.0[13]]) + } + + /// v2 `identity` — bytes 14..16, the instance tier (full `u16`). + #[inline] + pub const fn identity_v2(&self) -> u16 { + u16::from_le_bytes([self.0[14], self.0[15]]) + } + + /// v2 basin-local key: trailing 4 bytes (family ++ identity), zero-padded to + /// `u32` — the discriminator once the HHTL prefix (incl. leaf) is bound. + #[inline] + pub const fn local_key_v2(&self) -> u32 { + u32::from_le_bytes([self.0[12], self.0[13], self.0[14], self.0[15]]) + } + + /// v2 decode — every tier (`classid·HEEL·HIP·TWIG·leaf·family·identity`) as a + /// native integer. The "read the GUID as a GUID" surface for v2. + #[inline] + pub const fn decode_v2(&self) -> GuidPartsV2 { + GuidPartsV2 { + classid: self.classid(), + heel: self.heel(), + hip: self.hip(), + twig: self.twig(), + leaf: self.leaf(), + family: self.family_v2(), + identity: self.identity_v2(), + } + } + + /// v2 self-describing hex: `classid-heel-hip-twig-leaf-family-identity`, + /// uniform 4-hex groups (classid as 8) — the v2 Display shape. + pub fn to_hex_v2(&self) -> String { + let p = self.decode_v2(); + format!( + "{:08x}-{:04x}-{:04x}-{:04x}-{:04x}-{:04x}-{:04x}", + p.classid, p.heel, p.hip, p.twig, p.leaf, p.family, p.identity + ) + } +} + +/// The v2-tail GUID decoded — `classid · HEEL · HIP · TWIG · leaf · family · +/// identity`, every tier a native integer (no `u24`). The v2 counterpart of +/// [`GuidParts`]. (D-GV2-1; feature `guid-v2-tail`.) +#[cfg(feature = "guid-v2-tail")] +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct GuidPartsV2 { + /// 0..4 — prefix-routable class id. + pub classid: u32, + /// 4..6 — HEEL (HHT cascade tier 1). + pub heel: u16, + /// 6..8 — HIP (HHT cascade tier 2). + pub hip: u16, + /// 8..10 — TWIG (HHT cascade tier 3). + pub twig: u16, + /// 10..12 — leaf, the 4th HHTL tier. + pub leaf: u16, + /// 12..14 — family, the basin / episodic hub. + pub family: u16, + /// 14..16 — identity, the instance. + pub identity: u16, +} + +/// v2 layout-version marker: a v2-tail packet is layout version 2. A v1 reader +/// MUST refuse a v2 blob (and vice-versa) — the version gate per +/// `I-LEGACY-API-FEATURE-GATED`. Wired into the `SoaEnvelope` version at cutover +/// (D-GV2-5). +#[cfg(feature = "guid-v2-tail")] +pub const GUID_TAIL_LAYOUT_VERSION_V2: u16 = 2; + /// The whole canonical key decoded in one shot — `classid · HEEL · HIP · TWIG · /// family · identity`, each as its native LE-decoded integer. /// @@ -1284,4 +1409,65 @@ mod tests { ); assert!(osint.is_layout_preserving() && fma.is_layout_preserving()); } + + // ── GUID v2 tail (D-GV2-1) — field-isolation matrix + coexistence ───────── + + #[cfg(feature = "guid-v2-tail")] + #[test] + fn v2_field_isolation_matrix() { + // Each tier carries a distinct value; every accessor reads back exactly + // its own, and varying ONE tier changes ONLY that accessor (the + // mandatory layout-bit-boundary test for a reclaim, I-LEGACY). + let base = NodeGuid::new_v2(0x1111_2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888); + assert_eq!(base.classid(), 0x1111_2222); + assert_eq!(base.heel(), 0x3333); + assert_eq!(base.hip(), 0x4444); + assert_eq!(base.twig(), 0x5555); + assert_eq!(base.leaf(), 0x6666); + assert_eq!(base.family_v2(), 0x7777); + assert_eq!(base.identity_v2(), 0x8888); + + // vary ONLY leaf + let l = NodeGuid::new_v2(0x1111_2222, 0x3333, 0x4444, 0x5555, 0xAAAA, 0x7777, 0x8888); + assert_eq!(l.leaf(), 0xAAAA); + assert_eq!(l.family_v2(), base.family_v2()); + assert_eq!(l.identity_v2(), base.identity_v2()); + assert_eq!(l.twig(), base.twig()); + // vary ONLY family + let f = NodeGuid::new_v2(0x1111_2222, 0x3333, 0x4444, 0x5555, 0x6666, 0xBBBB, 0x8888); + assert_eq!(f.family_v2(), 0xBBBB); + assert_eq!(f.leaf(), base.leaf()); + assert_eq!(f.identity_v2(), base.identity_v2()); + // vary ONLY identity + let i = NodeGuid::new_v2(0x1111_2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0xCCCC); + assert_eq!(i.identity_v2(), 0xCCCC); + assert_eq!(i.leaf(), base.leaf()); + assert_eq!(i.family_v2(), base.family_v2()); + + // local_key_v2 = family ++ identity (LE) + assert_eq!(base.local_key_v2(), 0x8888_7777); + // decode_v2 round-trips the tail + let d = base.decode_v2(); + assert_eq!((d.leaf, d.family, d.identity), (0x6666, 0x7777, 0x8888)); + // Display is uniform 4-hex groups (classid 8). + assert_eq!(base.to_hex_v2(), "11112222-3333-4444-5555-6666-7777-8888"); + } + + #[cfg(feature = "guid-v2-tail")] + #[test] + fn v1_and_v2_share_prefix_differ_in_tail() { + // v1 and v2 agree on the prefix (classid·HEEL·HIP·TWIG)… + let v1 = NodeGuid::new(0xDEAD_BEEF, 0x1111, 0x2222, 0x3333, 0x00_00AB, 0x00_00CD); + let v2 = NodeGuid::new_v2(0xDEAD_BEEF, 0x1111, 0x2222, 0x3333, 0, 0xABCD, 0); + assert_eq!(v1.classid(), v2.classid()); + assert_eq!(v1.heel(), v2.heel()); + assert_eq!(v1.hip(), v2.hip()); + assert_eq!(v1.twig(), v2.twig()); + // …but the tail bytes are interpreted differently — which is exactly why + // the version gate is mandatory before reading a tail. + assert_eq!(GUID_TAIL_LAYOUT_VERSION_V2, 2); + // v1 accessors remain UNTOUCHED under the feature (additive, non-breaking). + assert_eq!(v1.family(), 0x00_00AB); + assert_eq!(v1.identity(), 0x00_00CD); + } } diff --git a/crates/lance-graph-contract/src/hhtl.rs b/crates/lance-graph-contract/src/hhtl.rs index 5176da88..8b0142d6 100644 --- a/crates/lance-graph-contract/src/hhtl.rs +++ b/crates/lance-graph-contract/src/hhtl.rs @@ -312,6 +312,27 @@ impl NiblePath { Self::from_packed(path, MAX_DEPTH) } + /// v2 GUID→path lowering (D-GV2-1, feature `guid-v2-tail`): the HHTL path is + /// `HEEL·HIP·TWIG·leaf` — 4 tiers × 4 nibbles = 16 nibbles = a full `u64` + /// NiblePath. `leaf` (the v2 4th tier) IS part of the routing path; `classid` + /// is the separate codebook prefix (not folded in), and `family`/`identity` + /// are the basin tail (NOT in the path). Two GUIDs differing only in + /// family/identity therefore share a path; differing in any HHT tier (incl. + /// `leaf`) do not — the property v2 hop-distance relies on. + #[cfg(feature = "guid-v2-tail")] + #[must_use] + pub const fn from_guid_prefix_v2(guid: &crate::canonical_node::NodeGuid) -> Self { + let path = ((guid.heel() as u64) << 48) + | ((guid.hip() as u64) << 32) + | ((guid.twig() as u64) << 16) + | (guid.leaf() as u64); + // 16 nibbles = full depth; from_packed is always Some at MAX_DEPTH. + match Self::from_packed(path, MAX_DEPTH) { + Some(p) => p, + None => Self::EMPTY, + } + } + /// **Family hop count** — the CLAM tree distance to `other`: the number of /// edges between the two nodes through their lowest common ancestor in the /// 16ⁿ tree. `(self.depth − common) + (other.depth − common)` where `common = @@ -673,6 +694,26 @@ mod tests { assert_eq!(NiblePath::EMPTY.common_ancestor(a), None); } + #[cfg(feature = "guid-v2-tail")] + #[test] + fn from_guid_prefix_v2_includes_leaf_not_basin_tail() { + use crate::canonical_node::NodeGuid; + let g = NodeGuid::new_v2(0xDEAD_BEEF, 0x1234, 0x5678, 0x9ABC, 0xDEF0, 0, 0); + assert_eq!(NiblePath::from_guid_prefix_v2(&g).depth(), 16); + // family/identity (basin tail) do NOT affect the path + let same = NodeGuid::new_v2(0xDEAD_BEEF, 0x1234, 0x5678, 0x9ABC, 0xDEF0, 0xFFFF, 0xFFFF); + assert_eq!( + NiblePath::from_guid_prefix_v2(&g), + NiblePath::from_guid_prefix_v2(&same) + ); + // leaf IS in the path → changing it changes the path + let diff_leaf = NodeGuid::new_v2(0xDEAD_BEEF, 0x1234, 0x5678, 0x9ABC, 0x0EF0, 0, 0); + assert_ne!( + NiblePath::from_guid_prefix_v2(&g), + NiblePath::from_guid_prefix_v2(&diff_leaf) + ); + } + #[test] fn family_hop_count_is_clam_tree_distance() { let a = NiblePath::root(0x1).child(0x2).child(0x3).child(0x4); diff --git a/crates/lance-graph-contract/src/lib.rs b/crates/lance-graph-contract/src/lib.rs index 2e4bd24f..02f1d8db 100644 --- a/crates/lance-graph-contract/src/lib.rs +++ b/crates/lance-graph-contract/src/lib.rs @@ -39,6 +39,7 @@ pub mod transaction; pub mod a2a_blackboard; pub mod action; +pub mod aiwar; pub mod atoms; pub mod auth; pub mod callcenter; diff --git a/crates/lance-graph-contract/src/soa_graph.rs b/crates/lance-graph-contract/src/soa_graph.rs index fd7c1102..d7107bac 100644 --- a/crates/lance-graph-contract/src/soa_graph.rs +++ b/crates/lance-graph-contract/src/soa_graph.rs @@ -28,16 +28,29 @@ //! [`NiblePath::family_hop_count`] (CLAM tree distance) — the "HHTL CLAM via //! family-nodes hop count as adjacency" metric. //! -//! ## Edge resolution (the `EdgeBlock` reading) +//! ## Edge resolution — 16 × 8-bit family-node adapters //! -//! `EdgeCodecFlavor::CoarseOnly` (the read-mode both registered domains use): -//! each non-zero edge byte is a one-byte basin-local neighbour index. -//! - `in_family[k]` → the same-family member whose `identity & 0xFF` equals the -//! byte (an intra-basin adjacency edge, [`DomainSpec::in_family_edge`]). -//! - `out_family[k]` → the family node whose `family & 0xFF` equals the byte (a -//! cross-basin link to another family, [`DomainSpec::out_family_edge`]). +//! `EdgeCodecFlavor::CoarseOnly` over the canonical 16-byte [`EdgeBlock`], read +//! as **16 family-node adapter slots** (operator model, 2026-06-20): every +//! non-zero edge byte references a FAMILY (not an individual member), resolved by +//! `family & 0xFF` → the family node. The 12 in-family slots emit +//! [`DomainSpec::in_family_edge`] edges, the 4 out-of-family slots emit +//! [`DomainSpec::out_family_edge`] edges; both land on a stable family node. //! -//! Unresolved bytes are skipped (a dangling 1-byte index, never a wrong edge). +//! Why family adapters (not member-by-identity): edges to families give +//! **extreme render stability** — family nodes are fixed anchors, members attach +//! to stable hubs, the layout doesn't churn — and **huge flexibility** (a node +//! mixes in up to 16 family adjacencies). The one limitation is **mixin +//! dependency**: a referenced family must exist, else the slot is a dangling +//! adapter (skipped). It also dissolves the >255-member aliasing — resolution is +//! only ever family-level (256 families per low-byte space), and an ambiguous low +//! byte (two families sharing it) is skipped, never mis-routed. Member→member +//! edges and wider encodings (8×16-bit, 32×4 residue, second-hop) are deferred +//! richer flavors. +//! +//! A `classid` IS the class (granular, exact — not a 2-nibble prefix): the +//! projector includes only rows whose `classid == domain.classid`, so a +//! mixed-class board never leaks one domain's nodes into another's view. //! //! Two domains ship registered: [`OSINT_GOTHAM`] (classid //! [`NodeGuid::CLASSID_OSINT`]) and [`FMA_ANATOMY`] (classid @@ -114,28 +127,40 @@ fn hhtl_path(guid: &NodeGuid) -> NiblePath { /// edges. Touches ONLY the 32-byte head of each row (`key` + `edges`); never the /// value slab. pub fn project_snapshot(rows: &[NodeRow], domain: &DomainSpec) -> GraphSnapshot { - // family → its members as (identity_low_byte, guid) - let mut by_family: HashMap> = HashMap::new(); - // family_low_byte → a family id (first seen) for out-of-family resolution - let mut family_by_low: HashMap = HashMap::new(); - for row in rows { - let g = row.key; - let fam = g.family(); - by_family - .entry(fam) - .or_default() - .push(((g.identity() & 0xFF) as u8, g)); - family_by_low.entry((fam & 0xFF) as u8).or_insert(fam); + // codex P1: a classid IS the class — project only rows of THIS domain, so a + // mixed-class board can't leak other domains' nodes/edges into the view. + let domain_rows: Vec<&NodeRow> = rows + .iter() + .filter(|r| r.key.classid() == domain.classid) + .collect(); + + // family → member count, and a COLLISION-AWARE family-low-byte → family map. + // codex P1: with >256 families two ids can share a low byte; a duplicate + // marks the slot ambiguous (None) so an adapter byte is skipped, never + // mis-routed (the family-adapter model: edges resolve only at family level). + let mut by_family: HashMap = HashMap::new(); + let mut family_by_low: HashMap> = HashMap::new(); + for row in &domain_rows { + let fam = row.key.family(); + *by_family.entry(fam).or_insert(0) += 1; + family_by_low + .entry((fam & 0xFF) as u8) + .and_modify(|e| { + if *e != Some(fam) { + *e = None; // collision ⇒ ambiguous + } + }) + .or_insert(Some(fam)); } - let mut nodes: Vec = Vec::with_capacity(rows.len() + by_family.len()); + let mut nodes: Vec = Vec::with_capacity(domain_rows.len() + by_family.len()); let mut edges: Vec = Vec::new(); - // One family node per distinct family (the "use family nodes" surface). - // Sorted for deterministic output regardless of HashMap iteration order. - let mut families: Vec<(&u32, &Vec<(u8, NodeGuid)>)> = by_family.iter().collect(); + // One family node per distinct family (the stable anchors). Sorted for + // deterministic output regardless of HashMap iteration order. + let mut families: Vec<(&u32, &usize)> = by_family.iter().collect(); families.sort_by_key(|(fam, _)| **fam); - for (&fam, members) in families { + for (&fam, &members) in families { let is_anchor = domain.anchor_families.contains(&fam); nodes.push(RenderNode { id: family_node_id(fam), @@ -144,14 +169,22 @@ pub fn project_snapshot(rows: &[NodeRow], domain: &DomainSpec) -> GraphSnapshot confidence: 1.0, props: vec![ ("family".to_string(), format!("{fam:06x}")), - ("members".to_string(), members.len().to_string()), + ("members".to_string(), members.to_string()), ("anchor".to_string(), is_anchor.to_string()), ], }); } - // Member nodes + their edges (all head-only). - for row in rows { + // Resolve a family-adapter byte to its (unambiguous, non-self) family node. + let resolve = |b: u8, own: u32| -> Option { + match family_by_low.get(&b).copied().flatten() { + Some(fam) if fam != own => Some(family_node_id(fam)), + _ => None, + } + }; + + // Member nodes + their edges (all head-only, family-adapter resolution). + for row in &domain_rows { let g = row.key; let fam = g.family(); nodes.push(RenderNode { @@ -165,7 +198,7 @@ pub fn project_snapshot(rows: &[NodeRow], domain: &DomainSpec) -> GraphSnapshot ("hhtl_depth".to_string(), hhtl_path(&g).depth().to_string()), ], }); - // member → family containment + // member → own family containment edges.push(RenderEdge { source: g.to_string(), target: family_node_id(fam), @@ -174,37 +207,30 @@ pub fn project_snapshot(rows: &[NodeRow], domain: &DomainSpec) -> GraphSnapshot confidence: 1.0, inferred: false, }); + // 16 family-node adapters: 12 in-family + 4 out-of-family, each → a family. let eb = row.edges; - // in-family adjacency: byte = same-family member's identity low byte - if let Some(members) = by_family.get(&fam) { - for &b in eb.in_family.iter().filter(|&&b| b != 0) { - if let Some(&(_, target)) = - members.iter().find(|(lb, t)| *lb == b && *t != g) - { - edges.push(RenderEdge { - source: g.to_string(), - target: target.to_string(), - label: domain.in_family_edge.to_string(), - frequency: 1.0, - confidence: 1.0, - inferred: false, - }); - } + for &b in eb.in_family.iter().filter(|&&b| b != 0) { + if let Some(target) = resolve(b, fam) { + edges.push(RenderEdge { + source: g.to_string(), + target, + label: domain.in_family_edge.to_string(), + frequency: 1.0, + confidence: 1.0, + inferred: false, + }); } } - // out-of-family links: byte = target family's low byte → its family node for &b in eb.out_family.iter().filter(|&&b| b != 0) { - if let Some(&target_fam) = family_by_low.get(&b) { - if target_fam != fam { - edges.push(RenderEdge { - source: g.to_string(), - target: family_node_id(target_fam), - label: domain.out_family_edge.to_string(), - frequency: 1.0, - confidence: 1.0, - inferred: false, - }); - } + if let Some(target) = resolve(b, fam) { + edges.push(RenderEdge { + source: g.to_string(), + target, + label: domain.out_family_edge.to_string(), + frequency: 1.0, + confidence: 1.0, + inferred: false, + }); } } } @@ -244,9 +270,14 @@ pub struct AnchorHop { /// tree walk: smaller hops ⇔ deeper shared `classid_lo·HEEL·HIP·TWIG` prefix. /// Ranking (nearest anchor) is what callers use; the absolute value is even. pub fn nearest_anchor(rows: &[NodeRow], domain: &DomainSpec) -> Vec { + // codex P1: only rank rows of THIS domain (classid IS the class). + let domain_rows: Vec<&NodeRow> = rows + .iter() + .filter(|r| r.key.classid() == domain.classid) + .collect(); // Representative HHTL path per anchor family (first member encountered). let mut anchor_paths: Vec<(u32, NiblePath)> = Vec::new(); - for row in rows { + for row in &domain_rows { let fam = row.key.family(); if domain.anchor_families.contains(&fam) && !anchor_paths.iter().any(|(f, _)| *f == fam) @@ -254,7 +285,8 @@ pub fn nearest_anchor(rows: &[NodeRow], domain: &DomainSpec) -> Vec { anchor_paths.push((fam, hhtl_path(&row.key))); } } - rows.iter() + domain_rows + .iter() .map(|row| { let g = row.key; let p = hhtl_path(&g); @@ -307,12 +339,13 @@ mod tests { } #[test] - fn project_emits_family_nodes_and_member_edges() { - // Two families (0xA, 0xB), two members each. Member 1 in family A points - // at member 2 (identity low byte 2) via in_family; member 1 also links - // out to family B (low byte 0xB) via out_family. + fn project_emits_family_nodes_and_family_adapter_edges() { + // Two families (0xA, 0xB), two members each. Member 1 in family A carries + // an in-family adapter byte 0x0B (→ family B, "linked") and an + // out-of-family adapter byte 0x0B (→ family B, "references"). Edges resolve + // to the FAMILY node, not an individual member (the 16-adapter model). let rows = [ - node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 1, &[2], &[0xB]), + node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 1, &[0x0B], &[0x0B]), node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 2, &[], &[]), node(&OSINT_GOTHAM, (2, 0, 0), 0xB, 1, &[], &[]), node(&OSINT_GOTHAM, (2, 0, 0), 0xB, 2, &[], &[]), @@ -325,14 +358,64 @@ mod tests { // every member has a member-of edge → 4 of them let member_of = snap.edges.iter().filter(|e| e.label == "member-of").count(); assert_eq!(member_of, 4); - // the in-family adjacency edge member1 → member2 - assert!(snap.edges.iter().any(|e| e.label == "linked" - && e.target.ends_with("000a000002"))); - // the out-of-family link member1 → family:00000b + // in-family adapter byte 0x0B → family:00000b ("linked") + assert!(snap + .edges + .iter() + .any(|e| e.label == "linked" && e.target == "family:00000b")); + // out-of-family adapter byte 0x0B → family:00000b ("references") assert!(snap .edges .iter() .any(|e| e.label == "references" && e.target == "family:00000b")); + // No edge targets an individual member (everything lands on a family node). + assert!(snap + .edges + .iter() + .all(|e| e.target.starts_with("family:"))); + } + + #[test] + fn ambiguous_family_low_byte_is_skipped_not_misrouted() { + // codex P1 #2: two families sharing low byte 0x00 (0x0100, 0x0200) are + // ambiguous; a node referencing 0x00 must NOT render an edge to either + // (skipped, never the wrong one). A node referencing the unambiguous + // 0x55 family DOES resolve. + let rows = [ + node(&OSINT_GOTHAM, (1, 0, 0), 0x0100, 1, &[0x00], &[]), + node(&OSINT_GOTHAM, (1, 0, 0), 0x0200, 1, &[], &[]), + node(&OSINT_GOTHAM, (1, 0, 0), 0x0055, 1, &[0x55], &[]), + ]; + let snap = project_snapshot(&rows, &OSINT_GOTHAM); + // The ambiguous 0x00 adapter resolves to nothing — no "linked" edge from + // the 0x0100 member. + let from_0100 = snap + .edges + .iter() + .filter(|e| e.label == "linked" && e.source.ends_with("010000000001")) + .count(); + assert_eq!(from_0100, 0, "ambiguous low byte must be skipped"); + // 0x55 is unambiguous → the 0x0055 member links to family:000055 (self, + // skipped because own family) — assert the ambiguity skip didn't crash and + // every emitted edge still targets a real family node. + assert!(snap.edges.iter().all(|e| e.target.starts_with("family:"))); + } + + #[test] + fn mixed_class_board_excludes_other_domains() { + // codex P1 #1: a board with one OSINT row + one FMA row, projected as + // OSINT, yields ONLY the OSINT member + its family — the FMA node and its + // family are excluded (classid IS the class). + let rows = [ + node(&OSINT_GOTHAM, (1, 0, 0), 0xA, 1, &[], &[]), + node(&FMA_ANATOMY, (1, 0, 0), 0xB, 1, &[], &[]), + ]; + let snap = project_snapshot(&rows, &OSINT_GOTHAM); + // 1 OSINT member + 1 family node (0xA) = 2; FMA's 0xB family absent. + assert_eq!(snap.nodes.len(), 2); + assert!(snap.nodes.iter().all(|n| n.kind != "FMA-Anatomy")); + assert!(snap.nodes.iter().any(|n| n.id == "family:00000a")); + assert!(!snap.nodes.iter().any(|n| n.id == "family:00000b")); } #[test]