proof(verisimdb): close V9 (normalizer determinism + convergence) in TLA+

hyperpolymath · claude · hyperpolymath · commit 81e1d524e84a · 2026-04-17T20:57:46.000+01:00
Normalizer.tla models verisim-normalizer's self-normalisation as a
deterministic state transition over an octad:

  - Pick the highest-priority modality (by an injective Priority
    function) as the authoritative source.
  - Rewrite every modality to the source's value in one atomic step.

Properties proven by TLC (bounded model):

  * Safety (composite NormalizerSafe):
    - SourceIsMaximal     -- CHOOSE in SourceOf always returns the
                             unique priority-maximum. Determinism
                             hinges on this; injectivity of Priority
                             is ASSUMEd at module level.
    - NormalizeIdempotent -- Normalize(Normalize(s)) = Normalize(s).
                             Checked on every reachable state,
                             including the non-deterministic Init.
    - PostStepNoDrift     -- after any normalisation round, drift
                             is gone (one-step convergence).
    - FixedPointStable    -- on a drift-free state, Normalize is the
                             identity. "Once converged, stay converged."

  * Liveness:
    - Convergence         -- &lt;&gt;[]~HasDrift. Eventually-always drift-
                             free, given weak fairness on Step.

Config: Modalities = {graph, vector, semantic, document} (abstracted
from the real 8; Priority + Modalities are module-level because TLC
config syntax cannot represent record literals). Values = {v0, v1, v2},
MaxSteps = 3. 84 reachable states, sub-second model-check via
eclipse-temurin:21-jre container.

Wiring: `verify-tlaplus` in the verisimdb Justfile now runs both
OctadAtomicity (V5) and Normalizer (V9) in sequence.

Design note: the spec captures the structural "single deterministic
source" pattern used by the real StorageRegenerator; it does not
model the per-(source,target) regeneration tables (FNV-1a trigrams,
keyword extraction, weighted merge) -- those are in-modality
payload-level details that do not affect the determinism or
convergence claims. If future drift detection introduces non-
deterministic tie-breaking, this spec will need revisiting.

Closes V9 in proof-debt-plan.md. V10 (serializability) remains as the
final TLA+-shaped VeriSimDB obligation.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/verisimdb/Justfile b/verisimdb/Justfile
@@ -65,6 +65,7 @@ verify-tlaplus:
         fi
     }
     run_tlc OctadAtomicity.tla OctadAtomicity.cfg
+    run_tlc Normalizer.tla      Normalizer.cfg
 
 # ── Test ───────────────────────────────────────────────────────
 
diff --git a/verisimdb/verification/proofs/tlaplus/Normalizer.cfg b/verisimdb/verification/proofs/tlaplus/Normalizer.cfg
@@ -0,0 +1,25 @@
+\* SPDX-License-Identifier: PMPL-1.0-or-later
+\* TLC configuration for Normalizer.tla (V9).
+\*
+\* Bounded instance: 4 modalities (abstracted from 8), 3 distinct values,
+\* MaxSteps=3. State space = |Values|^|Modalities| initial states (3^4=81)
+\* times up to MaxSteps transitions. Convergence happens in 1 step, so
+\* effective space ~= 81 * 2 = 162 reachable states. Finishes instantly.
+\*
+\* Run with:
+\*   java -cp tla2tools.jar tlc2.TLC -config Normalizer.cfg Normalizer.tla
+\* or via `just verify-tlaplus`.
+
+SPECIFICATION Spec
+
+CONSTANTS
+    Values = {"v0", "v1", "v2"}
+    MaxSteps = 3
+
+INVARIANTS
+    NormalizerSafe
+
+PROPERTIES
+    Convergence
+
+CHECK_DEADLOCK FALSE
diff --git a/verisimdb/verification/proofs/tlaplus/Normalizer.tla b/verisimdb/verification/proofs/tlaplus/Normalizer.tla
@@ -0,0 +1,139 @@
+------------------------------- MODULE Normalizer -------------------------------
+\* SPDX-License-Identifier: PMPL-1.0-or-later
+\* Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>
+\*
+\* V9: Normalizer determinism + convergence.
+\* Corresponds to rust-core/verisim-normalizer/src/lib.rs (StorageRegenerator).
+\*
+\* The verisim-normalizer resolves drift between the 8 modalities of an octad
+\* by picking an authoritative source modality and regenerating the others
+\* from it. The real system has a (source, target) strategy table -- Document
+\* is the usual authoritative source for Vector/Semantic/Graph regeneration,
+\* with cosine-similarity drift measured against Vector and Jaccard against
+\* Semantic. This spec abstracts that machinery into its essential claim:
+\*
+\*   - Determinism: normalisation is a *function* of state. Given the same
+\*     input octad, normalisation always produces the same output octad;
+\*     there is no schedule-dependent or source-rank-tie non-determinism.
+\*   - Convergence: starting from any drift-ed state, repeated normalisation
+\*     reaches a drift-free fixed point in bounded time.
+\*
+\* The spec also checks that the fixed point is *stable* (Normalize is
+\* identity on a drift-free state).
+
+EXTENDS Naturals, FiniteSets, TLC
+
+CONSTANTS
+    Values,         \* abstract set of possible modality payload hashes
+    MaxSteps        \* bound on normalisation rounds for model-checking
+
+\* Modalities and their deterministic priority ordering are module-level (TLC
+\* config files cannot represent record literals, and the real system has a
+\* fixed strategy table anyway). Priority is strictly injective by
+\* construction here -- that injectivity is exactly what makes the CHOOSE in
+\* SourceOf deterministic, and it is the spec's central structural claim.
+Modalities == {"graph", "vector", "semantic", "document"}
+
+Priority == [graph |-> 1, vector |-> 2, semantic |-> 3, document |-> 4]
+
+ASSUME Cardinality(Values) >= 1
+ASSUME MaxSteps \in Nat
+ASSUME \A m1, m2 \in Modalities: (m1 /= m2) => (Priority[m1] /= Priority[m2])
+
+VARIABLES
+    state,          \* [Modalities -> Values] -- current octad snapshot
+    steps           \* Nat -- normalisation rounds elapsed
+
+vars == <<state, steps>>
+
+TypeOK ==
+    /\ state \in [Modalities -> Values]
+    /\ steps \in 0..MaxSteps
+
+\* Drift holds when any two modalities disagree on the payload.
+HasDrift(s) ==
+    \E m1, m2 \in Modalities: s[m1] /= s[m2]
+
+\* The deterministic authoritative-source function. Under the injectivity
+\* ASSUME above, CHOOSE returns the unique highest-priority modality. This
+\* is the single piece of the spec that, if wrong, would make the whole
+\* normaliser non-deterministic -- so it is the thing determinism hinges on.
+SourceOf(s) ==
+    CHOOSE m \in Modalities:
+        \A other \in Modalities: Priority[m] >= Priority[other]
+
+\* The normaliser: rewrite every modality to the source's value.
+Normalize(s) ==
+    [m \in Modalities |-> s[SourceOf(s)]]
+
+\* Non-deterministic initial state; all octads are possible starting points
+\* for the model-check.
+Init ==
+    /\ state \in [Modalities -> Values]
+    /\ steps = 0
+
+\* One round of normalisation. Guard by HasDrift so converged states are
+\* stuttering; guard by MaxSteps for finite model-check.
+Step ==
+    /\ steps < MaxSteps
+    /\ HasDrift(state)
+    /\ state' = Normalize(state)
+    /\ steps' = steps + 1
+
+Next == Step
+
+\* Weak fairness forces Step to fire while drift remains, which is what makes
+\* Convergence true. Without it, the system could stutter forever in a drifted
+\* state (that would be a real bug in the implementation, not the spec).
+Spec == Init /\ [][Next]_vars /\ WF_vars(Step)
+
+--------------------------------------------------------------------------------
+\* Safety
+--------------------------------------------------------------------------------
+
+\* I1. SourceOf is well-defined: the CHOOSE always returns an element of
+\* Modalities, and that element is the unique priority-maximum. Trivial from
+\* the ASSUME, but stated explicitly so TLC exercises it on every state.
+SourceIsMaximal ==
+    \A other \in Modalities:
+        Priority[SourceOf(state)] >= Priority[other]
+
+\* I2. Idempotence of Normalize: normalising a normalised state is a no-op.
+\* This is a property of the *definition* of Normalize; TLC checks it across
+\* all reachable states including the non-deterministic Init.
+NormalizeIdempotent ==
+    Normalize(Normalize(state)) = Normalize(state)
+
+\* I3. Post-Step drift-free: immediately after Step, no drift remains.
+\* Implementation: Step replaces state with Normalize(state), which makes
+\* every modality equal to state[SourceOf(state)] -- so any two modalities
+\* agree, i.e. ~HasDrift. TLC verifies by exploring.
+PostStepNoDrift ==
+    (steps > 0) => ~HasDrift(state)
+
+\* I4. Stability of fixed point: in any drift-free state, Normalize is the
+\* identity. This is the "once converged, stay converged" guarantee.
+FixedPointStable ==
+    ~HasDrift(state) => (Normalize(state) = state)
+
+NormalizerSafe ==
+    /\ TypeOK
+    /\ SourceIsMaximal
+    /\ NormalizeIdempotent
+    /\ PostStepNoDrift
+    /\ FixedPointStable
+
+--------------------------------------------------------------------------------
+\* Liveness / convergence
+--------------------------------------------------------------------------------
+
+\* Eventually, drift is gone and stays gone. Stronger than "eventually no
+\* drift" because the system could in principle re-drift if Step non-
+\* deterministically reintroduced disagreement -- the <>[] form forbids that.
+Convergence ==
+    <>[]~HasDrift(state)
+
+THEOREM NormalizerSafety == Spec => []NormalizerSafe
+THEOREM NormalizerConverges == Spec => Convergence
+
+================================================================================
diff --git a/verisimdb/verification/proofs/tlaplus/README.adoc b/verisimdb/verification/proofs/tlaplus/README.adoc
@@ -45,14 +45,37 @@ The default config bounds the model to 2 transactions and 2 crash
 events. State space: 134,160 distinct states at depth 19, ~18s on an
 8-core laptop. Verified 2026-04-17.
 
+== V9: Normalizer determinism + convergence
+
+Specification: `Normalizer.tla` +
+Configuration: `Normalizer.cfg` +
+Source being verified: `rust-core/verisim-normalizer/src/lib.rs`
+
+Models self-normalisation as a deterministic state transition: pick
+the highest-priority modality as source, rewrite every modality to
+the source's value. Safety properties:
+
+* `SourceIsMaximal` -- the `CHOOSE` in `SourceOf` always returns the
+  unique priority-maximum (determinism hinges on this).
+* `NormalizeIdempotent` -- `Normalize(Normalize(s)) = Normalize(s)`.
+* `PostStepNoDrift` -- after any normalisation round, drift is gone.
+* `FixedPointStable` -- on a drift-free state, `Normalize` is identity.
+
+Liveness: `Convergence` -- `<>[]~HasDrift` (eventually-always drift-free).
+
+Default config: 4 modalities (`graph`, `vector`, `semantic`,
+`document`), 3 distinct values, `MaxSteps=3`. 84 reachable states,
+finishes in < 1s. Verified 2026-04-17.
+
 == Cross-references
 
-* `developer-ecosystem/standards/docs/proofs/spec-templates/T1-critical/verisimdb.md` -- V5
-* `/home/hyper/Desktop/proof-debt-plan.md` -- Dependability / VeriSimDB V5
+* `developer-ecosystem/standards/docs/proofs/spec-templates/T1-critical/verisimdb.md` -- V5, V9
+* `/home/hyper/Desktop/proof-debt-plan.md` -- Dependability / VeriSimDB V5, V9
 
 == Future specs in this directory
 
-V9 (normalizer determinism) and V10 (transaction serializability) are
-the remaining TLA+-shaped VeriSimDB obligations. They will land as
-separate `.tla` files under this directory; extend `verify-tlaplus` in
-the Justfile to run each.
+V10 (transaction serializability) is the remaining TLA+-shaped VeriSimDB
+obligation. It will land as a separate `Serializability.tla`; extend
+`verify-tlaplus` in the Justfile to include it. State-space concerns:
+V10 models concurrent transactions and may blow up past V5/V9's sub-
+second runs -- start with 2 transactions, widen cautiously.

Original file line number	Diff line number	Diff line change
`@@ -65,6 +65,7 @@ verify-tlaplus:`
`65`	`65`	`fi`
`66`	`66`	`}`
`67`	`67`	`run_tlc OctadAtomicity.tla OctadAtomicity.cfg`
	`68`	`+ run_tlc Normalizer.tla Normalizer.cfg`
`68`	`69`
`69`	`70`	`# ── Test ───────────────────────────────────────────────────────`
`70`	`71`