Skip to content

Commit 81e1d52

Browse files
hyperpolymathclaude
andcommitted
proof(verisimdb): close V9 (normalizer determinism + convergence) in TLA+
Normalizer.tla models verisim-normalizer's self-normalisation as a deterministic state transition over an octad: - Pick the highest-priority modality (by an injective Priority function) as the authoritative source. - Rewrite every modality to the source's value in one atomic step. Properties proven by TLC (bounded model): * Safety (composite NormalizerSafe): - SourceIsMaximal -- CHOOSE in SourceOf always returns the unique priority-maximum. Determinism hinges on this; injectivity of Priority is ASSUMEd at module level. - NormalizeIdempotent -- Normalize(Normalize(s)) = Normalize(s). Checked on every reachable state, including the non-deterministic Init. - PostStepNoDrift -- after any normalisation round, drift is gone (one-step convergence). - FixedPointStable -- on a drift-free state, Normalize is the identity. "Once converged, stay converged." * Liveness: - Convergence -- <>[]~HasDrift. Eventually-always drift- free, given weak fairness on Step. Config: Modalities = {graph, vector, semantic, document} (abstracted from the real 8; Priority + Modalities are module-level because TLC config syntax cannot represent record literals). Values = {v0, v1, v2}, MaxSteps = 3. 84 reachable states, sub-second model-check via eclipse-temurin:21-jre container. Wiring: `verify-tlaplus` in the verisimdb Justfile now runs both OctadAtomicity (V5) and Normalizer (V9) in sequence. Design note: the spec captures the structural "single deterministic source" pattern used by the real StorageRegenerator; it does not model the per-(source,target) regeneration tables (FNV-1a trigrams, keyword extraction, weighted merge) -- those are in-modality payload-level details that do not affect the determinism or convergence claims. If future drift detection introduces non- deterministic tie-breaking, this spec will need revisiting. Closes V9 in proof-debt-plan.md. V10 (serializability) remains as the final TLA+-shaped VeriSimDB obligation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9d3dfd8 commit 81e1d52

4 files changed

Lines changed: 194 additions & 6 deletions

File tree

verisimdb/Justfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ verify-tlaplus:
6565
fi
6666
}
6767
run_tlc OctadAtomicity.tla OctadAtomicity.cfg
68+
run_tlc Normalizer.tla Normalizer.cfg
6869

6970
# ── Test ───────────────────────────────────────────────────────
7071

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
\* SPDX-License-Identifier: PMPL-1.0-or-later
2+
\* TLC configuration for Normalizer.tla (V9).
3+
\*
4+
\* Bounded instance: 4 modalities (abstracted from 8), 3 distinct values,
5+
\* MaxSteps=3. State space = |Values|^|Modalities| initial states (3^4=81)
6+
\* times up to MaxSteps transitions. Convergence happens in 1 step, so
7+
\* effective space ~= 81 * 2 = 162 reachable states. Finishes instantly.
8+
\*
9+
\* Run with:
10+
\* java -cp tla2tools.jar tlc2.TLC -config Normalizer.cfg Normalizer.tla
11+
\* or via `just verify-tlaplus`.
12+
13+
SPECIFICATION Spec
14+
15+
CONSTANTS
16+
Values = {"v0", "v1", "v2"}
17+
MaxSteps = 3
18+
19+
INVARIANTS
20+
NormalizerSafe
21+
22+
PROPERTIES
23+
Convergence
24+
25+
CHECK_DEADLOCK FALSE
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
------------------------------- MODULE Normalizer -------------------------------
2+
\* SPDX-License-Identifier: PMPL-1.0-or-later
3+
\* Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>
4+
\*
5+
\* V9: Normalizer determinism + convergence.
6+
\* Corresponds to rust-core/verisim-normalizer/src/lib.rs (StorageRegenerator).
7+
\*
8+
\* The verisim-normalizer resolves drift between the 8 modalities of an octad
9+
\* by picking an authoritative source modality and regenerating the others
10+
\* from it. The real system has a (source, target) strategy table -- Document
11+
\* is the usual authoritative source for Vector/Semantic/Graph regeneration,
12+
\* with cosine-similarity drift measured against Vector and Jaccard against
13+
\* Semantic. This spec abstracts that machinery into its essential claim:
14+
\*
15+
\* - Determinism: normalisation is a *function* of state. Given the same
16+
\* input octad, normalisation always produces the same output octad;
17+
\* there is no schedule-dependent or source-rank-tie non-determinism.
18+
\* - Convergence: starting from any drift-ed state, repeated normalisation
19+
\* reaches a drift-free fixed point in bounded time.
20+
\*
21+
\* The spec also checks that the fixed point is *stable* (Normalize is
22+
\* identity on a drift-free state).
23+
24+
EXTENDS Naturals, FiniteSets, TLC
25+
26+
CONSTANTS
27+
Values, \* abstract set of possible modality payload hashes
28+
MaxSteps \* bound on normalisation rounds for model-checking
29+
30+
\* Modalities and their deterministic priority ordering are module-level (TLC
31+
\* config files cannot represent record literals, and the real system has a
32+
\* fixed strategy table anyway). Priority is strictly injective by
33+
\* construction here -- that injectivity is exactly what makes the CHOOSE in
34+
\* SourceOf deterministic, and it is the spec's central structural claim.
35+
Modalities == {"graph", "vector", "semantic", "document"}
36+
37+
Priority == [graph |-> 1, vector |-> 2, semantic |-> 3, document |-> 4]
38+
39+
ASSUME Cardinality(Values) >= 1
40+
ASSUME MaxSteps \in Nat
41+
ASSUME \A m1, m2 \in Modalities: (m1 /= m2) => (Priority[m1] /= Priority[m2])
42+
43+
VARIABLES
44+
state, \* [Modalities -> Values] -- current octad snapshot
45+
steps \* Nat -- normalisation rounds elapsed
46+
47+
vars == <<state, steps>>
48+
49+
TypeOK ==
50+
/\ state \in [Modalities -> Values]
51+
/\ steps \in 0..MaxSteps
52+
53+
\* Drift holds when any two modalities disagree on the payload.
54+
HasDrift(s) ==
55+
\E m1, m2 \in Modalities: s[m1] /= s[m2]
56+
57+
\* The deterministic authoritative-source function. Under the injectivity
58+
\* ASSUME above, CHOOSE returns the unique highest-priority modality. This
59+
\* is the single piece of the spec that, if wrong, would make the whole
60+
\* normaliser non-deterministic -- so it is the thing determinism hinges on.
61+
SourceOf(s) ==
62+
CHOOSE m \in Modalities:
63+
\A other \in Modalities: Priority[m] >= Priority[other]
64+
65+
\* The normaliser: rewrite every modality to the source's value.
66+
Normalize(s) ==
67+
[m \in Modalities |-> s[SourceOf(s)]]
68+
69+
\* Non-deterministic initial state; all octads are possible starting points
70+
\* for the model-check.
71+
Init ==
72+
/\ state \in [Modalities -> Values]
73+
/\ steps = 0
74+
75+
\* One round of normalisation. Guard by HasDrift so converged states are
76+
\* stuttering; guard by MaxSteps for finite model-check.
77+
Step ==
78+
/\ steps < MaxSteps
79+
/\ HasDrift(state)
80+
/\ state' = Normalize(state)
81+
/\ steps' = steps + 1
82+
83+
Next == Step
84+
85+
\* Weak fairness forces Step to fire while drift remains, which is what makes
86+
\* Convergence true. Without it, the system could stutter forever in a drifted
87+
\* state (that would be a real bug in the implementation, not the spec).
88+
Spec == Init /\ [][Next]_vars /\ WF_vars(Step)
89+
90+
--------------------------------------------------------------------------------
91+
\* Safety
92+
--------------------------------------------------------------------------------
93+
94+
\* I1. SourceOf is well-defined: the CHOOSE always returns an element of
95+
\* Modalities, and that element is the unique priority-maximum. Trivial from
96+
\* the ASSUME, but stated explicitly so TLC exercises it on every state.
97+
SourceIsMaximal ==
98+
\A other \in Modalities:
99+
Priority[SourceOf(state)] >= Priority[other]
100+
101+
\* I2. Idempotence of Normalize: normalising a normalised state is a no-op.
102+
\* This is a property of the *definition* of Normalize; TLC checks it across
103+
\* all reachable states including the non-deterministic Init.
104+
NormalizeIdempotent ==
105+
Normalize(Normalize(state)) = Normalize(state)
106+
107+
\* I3. Post-Step drift-free: immediately after Step, no drift remains.
108+
\* Implementation: Step replaces state with Normalize(state), which makes
109+
\* every modality equal to state[SourceOf(state)] -- so any two modalities
110+
\* agree, i.e. ~HasDrift. TLC verifies by exploring.
111+
PostStepNoDrift ==
112+
(steps > 0) => ~HasDrift(state)
113+
114+
\* I4. Stability of fixed point: in any drift-free state, Normalize is the
115+
\* identity. This is the "once converged, stay converged" guarantee.
116+
FixedPointStable ==
117+
~HasDrift(state) => (Normalize(state) = state)
118+
119+
NormalizerSafe ==
120+
/\ TypeOK
121+
/\ SourceIsMaximal
122+
/\ NormalizeIdempotent
123+
/\ PostStepNoDrift
124+
/\ FixedPointStable
125+
126+
--------------------------------------------------------------------------------
127+
\* Liveness / convergence
128+
--------------------------------------------------------------------------------
129+
130+
\* Eventually, drift is gone and stays gone. Stronger than "eventually no
131+
\* drift" because the system could in principle re-drift if Step non-
132+
\* deterministically reintroduced disagreement -- the <>[] form forbids that.
133+
Convergence ==
134+
<>[]~HasDrift(state)
135+
136+
THEOREM NormalizerSafety == Spec => []NormalizerSafe
137+
THEOREM NormalizerConverges == Spec => Convergence
138+
139+
================================================================================

verisimdb/verification/proofs/tlaplus/README.adoc

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,37 @@ The default config bounds the model to 2 transactions and 2 crash
4545
events. State space: 134,160 distinct states at depth 19, ~18s on an
4646
8-core laptop. Verified 2026-04-17.
4747

48+
== V9: Normalizer determinism + convergence
49+
50+
Specification: `Normalizer.tla` +
51+
Configuration: `Normalizer.cfg` +
52+
Source being verified: `rust-core/verisim-normalizer/src/lib.rs`
53+
54+
Models self-normalisation as a deterministic state transition: pick
55+
the highest-priority modality as source, rewrite every modality to
56+
the source's value. Safety properties:
57+
58+
* `SourceIsMaximal` -- the `CHOOSE` in `SourceOf` always returns the
59+
unique priority-maximum (determinism hinges on this).
60+
* `NormalizeIdempotent` -- `Normalize(Normalize(s)) = Normalize(s)`.
61+
* `PostStepNoDrift` -- after any normalisation round, drift is gone.
62+
* `FixedPointStable` -- on a drift-free state, `Normalize` is identity.
63+
64+
Liveness: `Convergence` -- `<>[]~HasDrift` (eventually-always drift-free).
65+
66+
Default config: 4 modalities (`graph`, `vector`, `semantic`,
67+
`document`), 3 distinct values, `MaxSteps=3`. 84 reachable states,
68+
finishes in < 1s. Verified 2026-04-17.
69+
4870
== Cross-references
4971

50-
* `developer-ecosystem/standards/docs/proofs/spec-templates/T1-critical/verisimdb.md` -- V5
51-
* `/home/hyper/Desktop/proof-debt-plan.md` -- Dependability / VeriSimDB V5
72+
* `developer-ecosystem/standards/docs/proofs/spec-templates/T1-critical/verisimdb.md` -- V5, V9
73+
* `/home/hyper/Desktop/proof-debt-plan.md` -- Dependability / VeriSimDB V5, V9
5274

5375
== Future specs in this directory
5476

55-
V9 (normalizer determinism) and V10 (transaction serializability) are
56-
the remaining TLA+-shaped VeriSimDB obligations. They will land as
57-
separate `.tla` files under this directory; extend `verify-tlaplus` in
58-
the Justfile to run each.
77+
V10 (transaction serializability) is the remaining TLA+-shaped VeriSimDB
78+
obligation. It will land as a separate `Serializability.tla`; extend
79+
`verify-tlaplus` in the Justfile to include it. State-space concerns:
80+
V10 models concurrent transactions and may blow up past V5/V9's sub-
81+
second runs -- start with 2 transactions, widen cautiously.

0 commit comments

Comments
 (0)