|
| 1 | +// SPDX-License-Identifier: PMPL-1.0-or-later |
| 2 | + |
| 3 | += VeriSimDB — Show Me The Receipts |
| 4 | +:toc: preamble |
| 5 | +:icons: font |
| 6 | + |
| 7 | +The README makes claims. This file backs them up with code paths, honest |
| 8 | +caveats, and enough structural detail for an external reviewer to know where |
| 9 | +to look when something goes wrong. |
| 10 | + |
| 11 | +== Claim 1: "100% drift detection and repair rate across 1000 entities" |
| 12 | + |
| 13 | +[quote, README.adoc §Drift Detection Demo] |
| 14 | +____ |
| 15 | +Detection rate: 100.0% |
| 16 | +Repair rate: 100.0% |
| 17 | +Consistency rate: 100.0% |
| 18 | +____ |
| 19 | + |
| 20 | +=== How it works |
| 21 | + |
| 22 | +`demos/drift-detection/run_demo.exs` creates 1,000 octad entities, corrupts 50 |
| 23 | +of them (semantic/vector/graph divergence), then runs the full |
| 24 | +`VeriSim.DriftMonitor` → `StorageRegenerator` pipeline. The demo exits cleanly |
| 25 | +only when `VeriSim.EntityServer.get/1` returns consistent state for every |
| 26 | +entity. |
| 27 | + |
| 28 | +The detection path: `rust-core/verisim-drift/` measures divergence between |
| 29 | +modality pairs using cosine similarity (Vector ↔ Semantic) and Jaccard index |
| 30 | +(Document ↔ Semantic). Scores above configurable thresholds trigger a |
| 31 | +`DriftEvent`, which `VeriSim.DriftMonitor` routes to the normalizer. |
| 32 | + |
| 33 | +The repair path: `rust-core/verisim-normalizer/src/storage_regenerator.rs` |
| 34 | +(`StorageRegenerator`) identifies the most authoritative modality and |
| 35 | +regenerates drifted modalities from it — six cross-regeneration pairs (Document |
| 36 | +→ Vector, Document → Semantic, Semantic → Graph, etc.). Regeneration uses |
| 37 | +FNV-1a trigram hashing to 384-dim embeddings for Document → Vector, and |
| 38 | +keyword extraction for Document → Semantic. |
| 39 | + |
| 40 | +=== Honest caveat |
| 41 | + |
| 42 | +The demo runs in-memory on a single node — both detection and repair are |
| 43 | +synchronous in this configuration. Production deployments with the Elixir OTP |
| 44 | +supervision tree and persistent storage (redb + Tantivy WAL) have higher |
| 45 | +latency. The 100% rates reflect the in-memory ephemeral case; network partitions |
| 46 | +or storage engine failures are handled by OTP supervision restart, not by the |
| 47 | +normalizer. |
| 48 | + |
| 49 | +=== Code path |
| 50 | + |
| 51 | +| File | What it does | |
| 52 | +|------|-------------| |
| 53 | +| `rust-core/verisim-drift/src/lib.rs` | Drift score computation across modality pairs | |
| 54 | +| `rust-core/verisim-normalizer/src/storage_regenerator.rs` | Cross-modality regeneration (68 tests) | |
| 55 | +| `elixir-orchestration/lib/verisim/drift_monitor.ex` | OTP GenServer: drift event coordinator | |
| 56 | +| `elixir-orchestration/lib/verisim/entity_server.ex` | Per-entity GenServer under DynamicSupervisor | |
| 57 | +| `demos/drift-detection/run_demo.exs` | Runnable demo: 1000 entities, 50 corrupted | |
| 58 | + |
| 59 | +== Claim 2: "Eight modalities per entity — Graph, Vector, Tensor, Semantic, Document, Temporal, Provenance, Spatial" |
| 60 | + |
| 61 | +[quote, README.adoc §The Octad] |
| 62 | +____ |
| 63 | +Each entity in VeriSimDB can have representations across eight modalities. |
| 64 | +Drift detection operates across all of them. |
| 65 | +____ |
| 66 | + |
| 67 | +=== How it works |
| 68 | + |
| 69 | +`rust-core/verisim-octad/` defines the unified `OctadStore` type. Each octad |
| 70 | +entity carries one optional value per modality. The eight Rust storage backends |
| 71 | +are independent crates: |
| 72 | + |
| 73 | +| Modality | Crate | Storage engine | |
| 74 | +|----------|-------|----------------| |
| 75 | +| Graph | `verisim-graph` | Pure Rust `SimpleGraphStore` (RDF triples + property graph) | |
| 76 | +| Vector | `verisim-vector` | HNSW in-memory similarity index | |
| 77 | +| Tensor | `verisim-tensor` | ndarray / Burn multi-dimensional arrays | |
| 78 | +| Semantic | `verisim-semantic` | CBOR proof blobs (ciborium) | |
| 79 | +| Document | `verisim-document` | Tantivy full-text (LZ4 compressed) | |
| 80 | +| Temporal | `verisim-temporal` | chrono version history + time-series | |
| 81 | +| Provenance | `verisim-provenance` | SHA-256 hash-chain origin tracking | |
| 82 | +| Spatial | `verisim-spatial` | R-tree geospatial index (radius/bounds/nearest) | |
| 83 | + |
| 84 | +`OctadBuilder` in `rust-core/verisim-octad/` provides a fluent API to populate |
| 85 | +any subset of modalities before creating an entity. Unset modalities are `None` |
| 86 | +— no phantom drift signals. |
| 87 | + |
| 88 | +=== Honest caveat |
| 89 | + |
| 90 | +Tensor modality is the most research-stage of the eight. The storage engine |
| 91 | +compiles and tests pass, but real-world use cases (beyond numeric arrays) are |
| 92 | +still being refined. The README acknowledges "novel applications, details |
| 93 | +forthcoming." Do not rely on the Tensor modality for production data without |
| 94 | +reviewing the open issues in `docs/`. |
| 95 | + |
| 96 | +== Claim 3: "VCL query language with dependent types and proof certificates (VCL-UT)" |
| 97 | + |
| 98 | +[quote, README.adoc §How It Compares] |
| 99 | +____ |
| 100 | +Query language: VCL (with dependent types) |
| 101 | +Formal verification: VCL-UT (proof certificates) |
| 102 | +____ |
| 103 | + |
| 104 | +=== How it works |
| 105 | + |
| 106 | +VCL (VeriSim Consonance Language) is the native query interface — NOT SQL. |
| 107 | +The built-in Elixir VCL parser lives in `elixir-orchestration/lib/verisim/vcl/` |
| 108 | +and translates VCL ASTs against flat-file stores via `FileExecutor`. The parser |
| 109 | +handles `FETCH`, `FILTER`, `GROUP`, `FEDERATION`, and `PROOF` clauses. |
| 110 | + |
| 111 | +VCL-UT extends VCL with typed proof certificates. Eleven proof types are |
| 112 | +supported: `EXISTENCE`, `INTEGRITY`, `CONSISTENCY`, `PROVENANCE`, `FRESHNESS`, |
| 113 | +`ACCESS`, `CITATION`, `CUSTOM`, `ZKP`, `PROVEN`, `SANCTIFY`. Multi-proof |
| 114 | +queries (`PROOF A(x) AND B(y)`) parse and split correctly. |
| 115 | + |
| 116 | +The Idris2 ABI layer (`src/abi/`) provides formal dependent-type specifications |
| 117 | +for VCL query shapes; proof certificates are verified against the ABI before |
| 118 | +storage. The `proven` library integration bridges to external certificate-based |
| 119 | +JSON/CBOR verification. |
| 120 | + |
| 121 | +=== Honest caveat |
| 122 | + |
| 123 | +VCL federation (cross-store queries against remote VeriSimDB instances) is |
| 124 | +currently local-only. The `FileExecutor` handles `FEDERATION` clauses against |
| 125 | +local flat files; the multi-store remote executor is planned but not |
| 126 | +implemented. Do not use VCL for cross-instance federation in production. |
| 127 | + |
| 128 | +The ReScript VCL Playground in `panll/` connects to the real backend API with |
| 129 | +a demo-mode fallback — the fallback is active when `verisim-api` is not running |
| 130 | +locally. |
| 131 | + |
| 132 | +== Dogfooded Across The Account |
| 133 | + |
| 134 | +[cols="1,2,2", options="header"] |
| 135 | +|=== |
| 136 | +| Technology / Pattern | Role in VeriSimDB | Also Used In |
| 137 | + |
| 138 | +| *Rust workspace (12 crates)* |
| 139 | +| `rust-core/` — each modality is a separate crate (`verisim-graph`, `-vector`, |
| 140 | + `-tensor`, `-semantic`, `-document`, `-temporal`, `-provenance`, `-spatial`, |
| 141 | + `-octad`, `-drift`, `-normalizer`, `-api`); Cargo workspace coordinates them |
| 142 | +| ephapax (17-crate Rust workspace), panic-attack (Rust analysis engine), |
| 143 | + hypatia (`src/rust/` adapters + CLI), a2ml-rs, k9-rs |
| 144 | + |
| 145 | +| *Elixir/OTP orchestration* |
| 146 | +| `elixir-orchestration/` — one `EntityServer` GenServer per entity under |
| 147 | + `DynamicSupervisor`; `DriftMonitor`, `QueryRouter`, `SchemaRegistry` as OTP |
| 148 | + GenServers; fault isolation: one crashed entity does not affect others |
| 149 | +| hypatia (8 OTP GenServers), burble (room-per-GenServer, DynamicSupervisor), |
| 150 | + gitbot-fleet (bot GenServers) |
| 151 | + |
| 152 | +| *Idris2 ABI + Zig FFI standard* |
| 153 | +| `src/abi/` — Idris2 ABI definitions for VCL proof types and octad entity |
| 154 | + contracts; Zig FFI bridge for C-ABI runtime interop |
| 155 | +| gossamer (`src/interface/abi/`), burble (`src/Burble/ABI/`), ephapax (`idris2/`), |
| 156 | + typed-wasm, tangle — universal ABI/FFI pattern across the estate |
| 157 | + |
| 158 | +| *Stapeln container ecosystem* |
| 159 | +| `container/compose.toml` (selur-compose stack), `container/.gatekeeper.yaml` |
| 160 | + (svalinn edge gateway), `container/manifest.toml` (.ctp bundle manifest); |
| 161 | + Chainguard base images throughout |
| 162 | +| burble (`containers/compose.toml`), hypatia (Containerfile), boj-server, |
| 163 | + all containerised repos via stapeln policy |
| 164 | + |
| 165 | +| *VCL (VeriSim Consonance Language)* |
| 166 | +| Native query interface; built-in Elixir parser + FileExecutor; VCL-UT proof |
| 167 | + certificates; ReScript Playground panel via PanLL |
| 168 | +| hypatia (`lib/vcl/` — VCL client GenServer, file executor, cross-repo analytics); |
| 169 | + nextgen-databases monorepo siblings (QuandleDB, LithoGlyph) |
| 170 | +|=== |
| 171 | + |
| 172 | +== File Map |
| 173 | + |
| 174 | +[cols="1,3", options="header"] |
| 175 | +|=== |
| 176 | +| Path | What's There |
| 177 | + |
| 178 | +| `rust-core/verisim-octad/` | Unified `OctadStore` + `OctadBuilder` — the entity model |
| 179 | +| `rust-core/verisim-drift/` | Drift score computation (cosine similarity, Jaccard index) |
| 180 | +| `rust-core/verisim-normalizer/src/storage_regenerator.rs` | Cross-modality regeneration (68 tests, 6 source→target pairs) |
| 181 | +| `rust-core/verisim-api/` | HTTP API — REST endpoint (8080/8090/8091/8092 per project) |
| 182 | +| `elixir-orchestration/lib/verisim/entity_server.ex` | Per-entity GenServer |
| 183 | +| `elixir-orchestration/lib/verisim/drift_monitor.ex` | Drift event coordinator |
| 184 | +| `elixir-orchestration/lib/verisim/vcl/` | VCL parser, FileExecutor, query functions |
| 185 | +| `elixir-orchestration/lib/verisim/hypatia/` | Hypatia integration: ScanIngester, PatternQuery, DispatchBridge (37 tests) |
| 186 | +| `connectors/clients/` | 5 client SDKs (Rust, Elixir, ReScript, Julia, Gleam) |
| 187 | +| `connectors/test-infra/` | selur-compose: 7 databases for federation adapter integration tests |
| 188 | +| `container/Containerfile` | Chainguard-based container (in-memory or persistent build arg) |
| 189 | +| `container/compose.toml` | Full stack: rust-core + elixir + svalinn gateway |
| 190 | +| `demos/drift-detection/run_demo.exs` | Runnable demo: 1000 entities, 50 corrupted, 100% detect+repair |
| 191 | +| `src/abi/` | Idris2 ABI definitions — VCL types, proof certificates, octad contracts |
| 192 | +|=== |
| 193 | + |
| 194 | +== Questions? |
| 195 | + |
| 196 | +Open an issue or reach out at j.d.a.jewell@open.ac.uk. For instance |
| 197 | +deployment, each consuming project should run its own VeriSimDB container on a |
| 198 | +unique port — do not share the dev instance across projects. See |
| 199 | +`CLAUDE.md §CRITICAL: Instance Policy` for the port assignment table. |
0 commit comments