Skip to content

Commit ec15e61

Browse files
committed
docs(plans): fix reasoning gaps in MTPPS plan (audit pass)
Four corrections from a logical-gap audit: - AMX claim softened: TDPBF16PS needs K=32 + VNNI pair-interleaved B operand; 16x16 u16 is a half-width tile. One build-time repack (amortized), zero per-QUERY adaptation; shape identity holds at the 16x16 f32 accumulator, not the operands. - cascade.rs relabeled: HDR search cascade is precedent machinery, NOT the Markov router; the transition-table router is D-MTP-1 work. - tile_frontier_route returns raw scores; RouteAction policy moves to W1a closure parameters so routing semantics stay upstream (ndarray=hardware rule preserved). - D-MTP-3 metric split: reconstruction-error budget gates codec fidelity; rho >= 0.99 gates only distance-surrogate use. - New OQ-MTP-4: BlockedGrid is 2-D — quadtree native, octree volume support explicitly deferred, no silent 2D-is-3D claim. Blackboard correction entry appended. https://claude.ai/code/session_0178q99yhgoVu9NPk8gmkRU8
1 parent 76f1c1f commit ec15e61

2 files changed

Lines changed: 47 additions & 11 deletions

File tree

.claude/blackboard.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,3 +149,16 @@ predict+residual probe (pattern only, NOT H.266 bitstream).
149149
[DECISION] ndarray = hardware rule preserved: Markov transition tables are
150150
opaque DATA; semantics stay in lance-graph.
151151
Plan: .claude/plans/mtpps-markov-tile-pyramid-substrate-v1.md
152+
153+
## 2026-06-10 — CORRECTION to MTPPS entry above (reasoning-gap audit)
154+
[CORRECTION] "TDPBF16PS, zero layout adaptation" was an overclaim: TDPBF16PS
155+
takes K=32 BF16 operands with B in VNNI pair-interleaved layout; 16x16 u16 is a
156+
half-width tile. Honest form: ONE build-time repack (amortized), zero per-query
157+
adaptation; shape identity holds at the 16x16 f32 accumulator.
158+
[CORRECTION] cascade.rs is HDR *search* cascade (precedent), NOT the Markov
159+
transition-table router — the router is D-MTP-1 work, not existing inventory.
160+
[CORRECTION] tile_frontier_route returns raw scores only; RouteAction
161+
(Skip/Attend/Compose/Escalate) thresholds arrive as W1a closure params —
162+
routing policy stays upstream (ndarray=hardware preserved).
163+
[ADDED] OQ-MTP-4: BlockedGrid is 2-D; quadtree native, octree volumes deferred
164+
(z-slice / BlockedGrid3 / project-at-paint — decide on demand).

.claude/plans/mtpps-markov-tile-pyramid-substrate-v1.md

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -26,17 +26,22 @@
2626
| **SoA envelope** | `src/simd_soa.rs` (PR-X1) — `MultiLaneColumn` carriers, layout-only, re-exported via `crate::simd::*` per W1a; PLUS `blocked_grid_struct!` SoA-of-grids macro (worker B) | simd_soa.rs, grid_struct_macro.rs |
2727
| **AMX acceleration** | `src/simd_amx.rs` — AMX-TILE/INT8/BF16 **hardware-confirmed via inline asm on stable Rust 1.94** (TDPBF16PS, TDPBUSD; 256 MACs/instr); `hpc/amx_matmul.rs`, `bf16_tile_gemm.rs`, `int8_tile_gemm.rs`, `vnni_gemm.rs` | simd_amx.rs header |
2828
| **Grid↔AMX shape unity** | `AmxBf16Grid = BlockedGrid<u16, 16, 16>`, `AmxInt8Grid = BlockedGrid<u8, 16, 64>`**the AMX tile IS a grid tier alias already** | blocked_grid/aliases.rs:92,109 |
29-
| **Markov cascade** | `src/hpc/cascade.rs` — 3-stroke HDR adaptive cascade (Hamming NN + precision tiers VNNI/BF16/DeltaXor) | cascade.rs header |
29+
| **Cascade precedent (search, NOT routing)** | `src/hpc/cascade.rs` — 3-stroke HDR adaptive cascade (Hamming NN + precision tiers VNNI/BF16/DeltaXor). Precedent for tiered skip-machinery; the Markov *transition-table router* itself does NOT exist yet — it is the core of D-MTP-1, not inventory | cascade.rs header |
3030
| **Render sink** | `src/hpc/renderer.rs` — double-buffer SIMD renderer (front/back `RenderFrame`, atomic swap, F32x16 FMA; AVX-512/AVX2/**AMX**/NEON/scalar tiers; explicitly "Neo4j-style visual rendering") + `src/hpc/framebuffer.rs` — palette-indexed framebuffer ("ndarray IS the graphics card"; 4-bit nibble wire format, tier-adaptive palette) | renderer.rs, framebuffer.rs headers |
3131
| **Tile streaming envelope** | `crates/cesium/``implicit_tiling.rs`, `hlod.rs`, `sse.rs`, `tileset.rs`, `khr_gs.rs`, `spz.rs`, `to_cam_soa.rs`, `osm_pbf.rs` | crate listing |
3232
| **Splat SIMD math (queued)** | D-SPLAT-2 five primitives (`batched_cholesky_3x3`, `batched_mahalanobis`, `batched_opacity_blend`, `batched_sh_eval_l3`, `batched_se3_transform`) per `splat-native-ultrasound-simd-substrate-v1.md` | ndarray plans |
3333

3434
**The headline finding:** the "optimized SoA envelope of a stacked grid pyramid"
3535
is not hypothetical — `BlockedGrid` L1–L4 + `blocked_grid_struct!` + `simd_soa`
36-
carriers ARE that envelope. And the deepest alignment is free: **a 16×16 Markov
37-
transition tile is byte-identical in shape to an AMX hardware tile**, so the
38-
cascade's `frontier × transition` step maps onto `TDPBF16PS` (BF16 tiles) /
39-
`TDPBUSD` (u8×i8 tiles) with zero layout adaptation.
36+
carriers ARE that envelope. And the AMX alignment is strong, with one honest
37+
caveat: a 16×16 u16 grid block (512 B = 16 rows × 32 B) loads directly as a
38+
**half-width AMX tile** (max tile is 16 rows × 64 B), and `TDPBF16PS` computes a
39+
16×16 f32 accumulator from **K=32** BF16 operands with the B operand in
40+
**VNNI pair-interleaved layout**. So the transition table needs **one build-time
41+
repack** (row-pairing into VNNI order, amortized across all queries) — after
42+
which the hot path runs with zero per-query adaptation. "Same shape" is true at
43+
the accumulator (16×16 f32 out); the operand layout is a documented repack, not
44+
free.
4045

4146
## 2. What's missing (the two real gaps)
4247

@@ -60,11 +65,16 @@ AMX is an *additional* fast path behind `simd_caps()`, never the only path.
6065

6166
Four kernels (names indicative):
6267

63-
- `tile_frontier_route` — frontier vector × 16×16 transition tile → child
64-
scores + `RouteAction` (Skip/Attend/Compose/Escalate). Hot path: gather +
65-
table-lookup + popcount; AMX `TDPBF16PS` on BF16 transition tiles when
66-
available (the `AmxBf16Grid` shape, zero adaptation). This is the **Markov**
67-
step.
68+
- `tile_frontier_route` — frontier vector × 16×16 transition tile → **raw
69+
child scores only**. The Skip/Attend/Compose/Escalate decision is NOT made
70+
here: thresholds/policy arrive as **closure parameters** (the W1a
71+
closure-parameterized-primitive pattern), so the `RouteAction` semantics stay
72+
upstream where they live today (`bgz-tensor::hhtl_cache` owns the enum).
73+
Scores are hardware; routing policy is thinking — the ndarray=hardware rule
74+
holds. Hot path: gather + table-lookup + popcount; AMX `TDPBF16PS` on
75+
VNNI-repacked BF16 transition tiles when available (one build-time repack,
76+
§1). This is the **Markov** step — Markov in the plain first-order sense
77+
(level-(k+1) routing depends only on the level-k frontier and the table).
6878
- `tile_pyramid_predict` — upsample tier-k block → tier-(k+1) prediction
6979
(bilinear/nearest per payload). This is the **x266-style intra-pyramid
7080
prediction** step: predict the finer level from the coarser, so only
@@ -124,7 +134,12 @@ geo/graph/splat — through one walk); encode→decode→render round-trip;
124134
The example IS the probe (workspace convention). Run predict+residual coding
125135
over a synthetic pyramid (then a real OSM tile when D-OSM-2 lands) and report:
126136

127-
- **ρ-vs-reference ≥ 0.99** (ADR-024 contract) for the palette-coded residual.
137+
- **Reconstruction-error budget** for the residual codec path — a per-primitive
138+
error bound (max + p95, payload-appropriate units), NOT rank correlation.
139+
ρ measures ordering fidelity of palette *distances*; codec fidelity is
140+
reconstruction error. Both are reported, each gating its own claim:
141+
- **ρ-vs-reference ≥ 0.99** (ADR-024 contract) — only where the palette is used
142+
as a *distance surrogate* (similarity/routing), not as the codec-fidelity gate.
128143
- **bits/primitive vs flat encoding** — the pyramid-prediction win must be
129144
measured, not asserted (truth-architect is mandatory reviewer per Hard Rule:
130145
no performance claims without bench).
@@ -175,6 +190,14 @@ the inline-asm path stays clean; otherwise it follows.
175190
- **OQ-MTP-3:** does `tile_perturb_paint` write into `hpc::framebuffer`
176191
(palette-indexed, wire-ready) or `hpc::renderer` (RenderFrame double-buffer)?
177192
(Recommend: framebuffer for encode-parity — paint and code share the palette.)
193+
- **OQ-MTP-4:** `BlockedGrid` is **2-D** by design (PR-X3 header). Quadtree
194+
pyramids (maps, images, screen-space) are native. Octree *volumes* (3-D splat
195+
fields, the `ok:` implicit-tiling variant) are NOT directly covered — options:
196+
(a) z-sliced stack of 2-D grids per level (cheap, anisotropic), (b) a future
197+
`BlockedGrid3<T, BR, BC, BD>` (real work, own plan), (c) project-to-2-D at
198+
paint time and keep volumes upstream. v1 scopes to quadtree; the octree
199+
decision is deferred until a volumetric consumer demands it. Do NOT claim the
200+
2-D grid "is" the 3-D pyramid.
178201

179202
---
180203

0 commit comments

Comments
 (0)