Skip to content

Latest commit

Β 

History

History
394 lines (298 loc) Β· 26 KB

File metadata and controls

394 lines (298 loc) Β· 26 KB

test-formal-fuzz-audit-results.md

Auditor: Independent Formal Verification & Fuzzing Specialist Date: 2026-02-25 (re-audit superseding prior 10/10 report) Updated: 2026-02-25 (remediation pass β€” 6 of 8 gaps closed) Branch: main (post-commit 5087a44 + remediation commits) Scope: Formal verification, fuzzing, property-based testing, coverage, reachability, CI enforcement


1. Executive Summary

Overall Score: 8.5 / 10 (post-remediation)

Is this component legitimately 10/10? No, but substantially improved.

The prior audit report claiming 10/10 contained factual inaccuracies β€” most critically, it stated Verus was "FIXED" with "30+ real verus!{} proofs" when the CI Dockerfile still performs grep-based structural analysis only and never invokes the Verus/Z3 toolchain. This re-audit is based on direct file reads of every formal specification, CI workflow, fuzz harness, test file, and coverage configuration in the repository.

Remediation (6 of 8 original gaps closed):

  1. Verus CI fabricates Z3 results β†’ FIXED: Dockerfile and CI job honestly relabeled as "Structural Regression Tracker (NOT Z3)". SECURITY_INVARIANTS.md Verus rows downgraded from βœ… to ⚠️ STRUCTURAL.
  2. Rust coverage has no threshold β†’ FIXED: 70% minimum enforcement added to rust-test-coverage.yml.
  3. Security coverage gate targets only 7 modules β†’ FIXED: Expanded to 14 secret-handling modules.
  4. crypto_core has zero proptest β†’ FIXED: 15 property tests added (crypto_core/tests/property_tests.rs).
  5. Several modules lack fuzz harnesses β†’ FIXED: 3 new fuzz harnesses created (fuzz_x25519_fs.py, fuzz_timelock_expiry.py, fuzz_forensic_cleanup.py), added to fuzz.yml.
  6. Python coverage thresholds 70%/80% β†’ FIXED: Raised to 80%/85%.
  7. AEAD-005–012 vacuous lemmas β†’ OPEN (acknowledged RL-1, research-level).
  8. Lean 1 sorry + 2 axioms β†’ OPEN (API incompatibility, textbook axioms).

Key Strengths

  1. Five-tool formal verification stack: TLA+ (7 specs), ProVerif (6 models), Tamarin (16 models), Lean 4 (4 files), Verus (4 files) β€” 13,000+ lines of specifications across 44 artifacts. All except Verus are genuinely CI-executed and blocking.
  2. 31 fuzz targets (21 Python Atheris + 1 AFL++ + 9 Rust cargo-fuzz), all CI-gated on push/PR, exercising real production code (verified via import statements).
  3. 2,953+ test functions (2,412 Python + 541 Rust) plus 29 Rust proptest blocks and 20+ Python Hypothesis property tests.
  4. 348 dedicated security tests in tests/security/ across 18 test files.
  5. Meta-tests that verify production boundaries, fuzz target coverage, fail-closed enforcement, and absence of Python key bytes β€” unusual and commendable discipline.
  6. Mutation testing (mutmut + cargo-mutants) running on push + weekly.
  7. Negative formal tests in both ProVerif (replay vulnerability documented) and Tamarin (NoKEMBinding, LeaksFailureReason correctly fail).

Key Remaining Gaps

Priority Gap Status
Critical Verus CI is grep-only, not Z3 verification βœ… FIXED β€” Honestly relabeled; SECURITY_INVARIANTS.md downgraded
High AEAD-005–012 Verus lemmas are vacuous OPEN (acknowledged RL-1)
High Rust coverage has NO threshold enforcement βœ… FIXED β€” 70% minimum via tarpaulin output parsing
Medium Python coverage ceiling is 70%/80% βœ… FIXED β€” Raised to 80%/85%
Medium crypto_core has zero proptest despite declaring the dep βœ… FIXED β€” 15 property tests in property_tests.rs
Medium Lean has 1 approved sorry + 2 axioms OPEN (API gap + textbook axioms)
Low Constant-time is statistical only (dudect, exit-0) OPEN (no formal CT proof)
Low Several security modules lack fuzz harnesses βœ… FIXED β€” 3 new harnesses, 4 modules covered

2. Formal Verification Audit

2.1 Formal Artifact Inventory

Tool Artifacts Total LOC CI Job CI-Gated?
TLA+ (TLC) 7 specs + 7 configs ~2,180 formal-verification.yml β†’ tlaplus βœ… Blocking
ProVerif 2.05 6 models ~3,093 formal-verification.yml β†’ proverif βœ… Blocking
Tamarin 1.10.0 16 .spthy files ~3,504 formal-verification.yml β†’ tamarin βœ… Blocking (10) + non-blocking (5)
Lean 4 (Mathlib) 4 .lean files ~966 formal-verification.yml β†’ lean βœ… Blocking + sorry gate
Verus 4 .rs files ~3,426 formal-verification.yml β†’ verus ⚠️ STRUCTURAL GREP ONLY (now honestly labeled)

Total formal LOC: ~13,169 lines across 44 specification files.

2.2 Machine-Checked Properties

TLA+ (7 specs β€” all TLC-verified in CI):

Spec Lines Key Properties
MeowEncode.tla 626 Auth-then-output state machine, manifest integrity
MeowFountain.tla 268 Fountain decode guarantee (1.5Γ— k suffices)
MeowStreaming.tla 283 Streaming protocol liveness, frame ordering
MeowRatchet.tla 227 Ratchet DoS bound, chain monotonicity, key uniqueness
MasterRatchet.tla 210 Cross-session forward secrecy
TimingEqualizer.tla 166 Constant-time equalization
ExpiryProtocol.tla β€” Message expiry fail-closed invariant

ProVerif (6 models β€” all verified in CI):

Model Lines Key Queries
meow_encode.pv 1,099 Dolev-Yao secrecy, authentication, duress safety (3 critical queries gated)
meow_aad_8field_binding.pv β€” 8-field AAD binding under active network attacker
pq_beacon_pcs.pv β€” Post-quantum beacon post-compromise security
manifest_signing.pv β€” Ed25519/ML-DSA manifest signature unforgeability
deadmans_switch_duress.pv β€” Dead-man's switch duress protocol
meow_encode_NEGATIVE_ReplayNoCounterCheck.pv 1,095 Negative test: confirms replay attack succeeds without counter (correctly fails)

Tamarin (16 models β€” 10 blocking, 5 non-blocking, 1 negative):

Model Lemmas Status Key Properties
MeowSchrodingerDeniability_Core.spthy ~8 βœ… Blocking Deniability, indistinguishability
MeowSchrodingerDeniability_Ratchet.spthy ~7 βœ… Blocking Ratcheted deniability
MeowKeyCommitment.spthy 4 βœ… Blocking Invisible salamanders prevention
MeowRatchetFS.spthy 5 βœ… Blocking Per-frame forward secrecy, PCS
MeowAEADBinding.spthy 5 βœ… Blocking 4-ary AEAD AAD binding, nonce uniqueness, executability
meow_deadmans_switch.spthy 4 βœ… Blocking Dead-man's switch protocol
MeowDuressEquiv.spthy β€” βœ… Blocking MEOW3 duress equivalence
MeowDuressEquivPQ.spthy β€” βœ… Blocking MEOW4/5 PQ duress equivalence
MeowDuressEquivPQ_NEGATIVE_NoKEMBinding.spthy β€” βœ… Negative Correctly falsified
MeowDuressEquivPQ_NEGATIVE_LeaksFailureReason.spthy β€” βœ… Negative Correctly falsified
SchrodingerDeniabilityTiming.spthy β€” ⚠️ Non-blocking Timing analysis (may timeout)
secure_alloc_guard_pages.spthy β€” ⚠️ Non-blocking Memory allocation model
RatchetHeaderOE.spthy β€” ⚠️ Non-blocking Header observational equivalence
SchrodingerOE.spthy β€” ⚠️ Non-blocking SchrΓΆdinger OE
meow_encode_equiv.spthy β€” ⚠️ Non-blocking Encoding equivalence

Lean 4 (4 files):

File Lines Theorems Open Items
DomainSeparation.lean 225 10+ theorems via native_decide (distinctness, prefix-free, versioning, min length β‰₯14) None β€” fully proved βœ…
ShamirSecretSharing.lean 175 GF(2^8) field axioms via native_decide 1 approved axiom: shamir_threshold_security (Lagrange interpolation, textbook)
FountainCodes.lean 412 XOR algebra, droplet recovery, type-level k-block indexing 1 approved sorry at L254 (Lean 4.5.0 List.find? API incompatibility)
Assumptions.lean 131 None (registry) Axiom A1: lt_decode_completeness_prob (Luby FOCS 2002); Axiom A2: belief_propagation_progress (proof sketch only)

2.3 CRITICAL FINDING: Verus CI Does Not Run Z3 β€” REMEDIATED (honest labeling)

Original Evidence: formal/Dockerfile.verus counted proof fn / spec fn strings with grep and synthesized fake "verification results:: verified: N, errors: 0" output.

Remediation Applied:

  • formal/Dockerfile.verus rewritten: Now titled "Verus Structural Regression Tracker (NOT Z3 Verification)", emits "structural inventory:: proof_spec_fn_count: N" instead of fake verification results.
  • formal-verification.yml Verus job renamed from "Verus Implementation Proofs (Docker)" to "Verus Structural Regression Check (NOT Z3)". Parsing updated to match new honest output format.
  • docs/SECURITY_INVARIANTS.md v1.3: All 6 Verus rows downgraded from βœ… to ⚠️ STRUCTURAL with explanatory note: "The CI job performs grep-based regression tracking... does NOT invoke the Verus/Z3 SMT solver."

Compounding factors:

  • verus_proofs.rs lines 1-8: file header explicitly states "⚠ IMPORTANT β€” no verus!{} blocks exist in this file... They are NOT machine-checked proofs."
  • AEAD-005–012 are documented as "abstract only (preconditions subsume postconditions)" in both verus_proofs.rs:L22 and formal-10x-audit.md (RL-1).
  • The prior audit report claimed this was "FIXED" with "30+ real verus!{} proofs" β€” this is factually incorrect.

Impact: 26 claimed properties (AEAD-001–012, GB-001–008, KDF-001–004, WG-001–007) are NOT machine-checked by Z3/SMT in CI. SECURITY_INVARIANTS.md overclaims Verus verification status.

2.4 Proof-to-Code Linkage Quality

Tool Linkage Type Quality
TLA+ Abstract state machines referencing protocol concepts Partial β€” comments name modules but no line-level traceability
ProVerif Dolev-Yao symbolic model Partial β€” maps to encode/decode/crypto flows informally
Tamarin Symbolic key exchange and AEAD rules Partial β€” same as ProVerif
Lean 4 Domain separation lists 7 specific HKDF constants from ratchet.py Good β€” strongest linkage
Verus verus!{} blocks inline in production Rust (aead_wrapper.rs) Excellent linkage IF verified β€” but CI doesn't run Verus

2.5 Unproven Critical Properties

  1. AEAD INT-CTXT (AEAD-005): Requires GF(2^128) GHASH model, open research
  2. Assembly-level constant-time: dudect is statistical, warning-only (exit 0)
  3. All 26 Verus properties: Syntactically present but NOT Z3-checked in CI
  4. Shamir threshold security: Lean axiom (not proved)
  5. Belief propagation progress: Lean approved sorry (API gap)

2.6 Formal Verification Score: 7.5 / 10 (was 7/10)

Impressive breadth across 5 tools with 40+ artifacts. TLA+, ProVerif, Tamarin, and Lean are genuinely CI-executed and blocking. The Verus CI gap (grep-only, not Z3) remains, BUT is now honestly documented β€” SECURITY_INVARIANTS.md v1.3 downgraded all Verus rows to ⚠️ STRUCTURAL and the CI job is clearly labeled as structural regression tracking, not verification. AEAD-005–012 vacuity compounds this. Lean has 2 axioms and 1 sorry. Proof-to-code linkage is protocol-level (except Lean domain separation). Constant-time is statistical only.


3. Fuzzing & Property-Based Testing Audit

3.1 Fuzz Target Inventory

Python Atheris (21 harnesses, CI-gated on push/PR):

Harness LOC Production Code Exercised
fuzz_manifest.py 94 crypto.unpack_manifest, verify_manifest_hmac
fuzz_fountain.py 114 fountain.FountainDecoder, unpack_droplet
fuzz_crypto.py 183 crypto.derive_key, decrypt_to_raw, unpack_manifest
fuzz_windows_guard.py 505 memory_guard.GuardedBuffer
fuzz_mouse_gesture.py 450 Mouse gesture auth module
fuzz_tamper_detection.py 530 tamper_detection.*
fuzz_adversarial_stego.py 626 Stego adversarial rotation
fuzz_ratchet.py 273 ratchet.init_ratchet, encrypt_frame, decrypt_frame
fuzz_manifest_signing.py 345 manifest_signing.*
fuzz_pq_ratchet_beacon.py 306 pq_ratchet_beacon.*
fuzz_master_ratchet.py 253 master_ratchet.*
fuzz_schrodinger.py 298 schrodinger_encode.*, quantum_mixer.*
fuzz_crypto_backend.py 262 crypto_backend.* (Rust FFI)
fuzz_shamir.py 330 shamir_split.*
fuzz_memory_guard.py 214 memory_guard.GuardedBuffer
fuzz_dual_stream.py 210 dual_stream.*
fuzz_stego_multilayer.py 291 stego_multilayer.*
fuzz_pq_hybrid.py 359 pq_hybrid.hybrid_encapsulate/decapsulate
fuzz_x25519_fs.py ~160 NEW x25519_forward_secrecy.derive_shared_secret, keypair gen, serialize/deserialize
fuzz_timelock_expiry.py ~160 NEW timelock_duress.TimeLockPuzzle, expiry.ExpiryManager
fuzz_forensic_cleanup.py ~130 NEW forensic_cleanup._scrub_file_lines, ForensicCleaner init
afl_fuzz_manifest.py 53 crypto.unpack_manifest (AFL++)

Import verification: All Python harnesses import real production modules (confirmed via grep '^from meow_decoder\|^import meow_decoder' across all fuzz files).

Rust cargo-fuzz (9 targets, CI-gated on push/PR):

Target Crate
fuzz_decrypt_frame rust_crypto
fuzz_header_parse rust_crypto
fuzz_hybrid_decapsulate rust_crypto
fuzz_ratchet_step rust_crypto
fuzz_full_decode_pipeline rust_crypto
fuzz_aead crypto_core
fuzz_nonce crypto_core
fuzz_secure_alloc crypto_core
fuzz_pure_crypto crypto_core

Long-fuzz (weekly, long-fuzz.yml): 5 targets Γ— 3600s each: fuzz_tamper_detection.py, fuzz_crypto_backend.py, fuzz_schrodinger.py, fuzz_ratchet.py, fuzz_adversarial_stego.py.

3.2 Fuzz CI Enforcement

  • fuzz.yml: Triggers on push + PR + weekly + manual. Crash gate checks fuzz/crashes/ directory and fails if non-empty.
  • rust-security-suite.yml: 5 rust_crypto + 4 crypto_core targets each run 60s with crash detection and artifact upload.
  • long-fuzz.yml: Weekly 1-hour per target, crash artifacts uploaded.
  • All fuzz jobs have continue-on-error: false (verified β€” prior fix from commit a2b26cf).

3.3 Property-Based Testing

Python (Hypothesis):

  • test_property_based.py β€” 20 tests: encrypt/decrypt roundtrip, fountain encode/decode, nonce uniqueness, key derivation determinism, manifest packing, frame MAC integrity
  • test_property_ratchet_pq.py β€” Ratchet chain one-wayness, key commitment binding, master ratchet monotonicity, PQ beacon roundtrip, manifest signing, Shamir threshold
  • test_property_shamir_dualstream.py β€” Shamir field arithmetic, dual-stream identity, size normalizer roundtrip, decorrelation bounds

Rust (proptest):

  • rust_crypto/tests/property_tests.rs β€” 14 proptest! blocks (496 lines): decrypt roundtrip, nonce exhaustion, Argon2 determinism, HMAC integrity, frame header parsing
  • rust_crypto/tests/proptest_crypto.rs β€” 414 lines additional property tests
  • crypto_core/tests/property_tests.rs β€” 15 proptest blocks (NEW): AEAD roundtrip, ciphertext length invariant, tamper detection, wrong AAD/key rejection, invalid key size, truncated ciphertext, nonce uniqueness, HMAC roundtrip, HMAC wrong-key rejection, SHA-256 determinism + collision resistance, HKDF determinism, constant_time_eq reflexivity + rejection

3.4 Unfuzzed Security-Adjacent Modules

Module Has Fuzz? Has Property Test? Has Unit Tests?
x25519_forward_secrecy.py βœ… NEW fuzz_x25519_fs.py ❌ βœ…
timelock_duress.py βœ… NEW fuzz_timelock_expiry.py ❌ βœ…
expiry.py βœ… NEW fuzz_timelock_expiry.py ❌ βœ…
forensic_cleanup.py βœ… NEW fuzz_forensic_cleanup.py ❌ βœ…
secure_temp.py ❌ ❌ βœ…
decorrelation.py ❌ βœ… (via test_property_shamir_dualstream.py) βœ…
env_safety.py ❌ ❌ βœ…

3.5 Fuzz/Property Score: 9 / 10 (was 8/10)

Exceptional fuzz breadth β€” 21 Python Atheris harnesses (18 original + 3 new) + 9 Rust cargo-fuzz, all CI-gated and exercising real production code. Property-based testing with Hypothesis and proptest covers core crypto invariants, including 15 new proptest blocks for crypto_core. Long-fuzz and mutation testing add depth. Remaining gaps: secure_temp.py and env_safety.py still lack fuzz harnesses; coverage thresholds could be higher.


4. Coverage & CI Enforcement

4.1 Coverage Thresholds

Scope Metric Threshold CI-Enforced? Evidence
Python (batch 1) Line + Branch 80% βœ… --cov-fail-under=80 ci.yml:L155 (raised from 70%)
Python (batch 2) Line + Branch β€” βœ… Blocking gate (no threshold) ci.yml:L193
Security modules (14 files) Line + Branch 85% βœ… --cov-fail-under=85 ci.yml:L522 (raised from 80%, expanded from 7 files)
.coveragerc-security (local) Line + Branch 85% ❌ Local only .coveragerc-security:L48
General .coveragerc Line + Branch 0% ❌ No enforcement .coveragerc:L52
Rust (all crates) Line 70% βœ… NEW β€” tarpaulin output parsing rust-test-coverage.yml (was 0%)

Note: Rust coverage enforcement was added by parsing tarpaulin output percentages for both crypto_core and rust_crypto crates, failing CI if either drops below 70%.

4.2 Security Coverage Gate (Gate 5)

Targets 14 modules (expanded from 7): constant_time, frame_mac, crypto_backend, crypto, metadata_obfuscation, ratchet, pq_hybrid, x25519_forward_secrecy, memory_guard, secure_temp, schrodinger_encode, quantum_mixer, master_ratchet, manifest_signing β€” with --cov-fail-under=85 (raised from 80%).

Modules still excluded from security coverage that handle secrets:

  • meow_decoder/pq_ratchet_beacon.py

4.3 Coverage-Omitted Production Modules

Per .coveragerc, the following are omitted from all coverage measurement:

  • meow_decoder/pq_ratchet_beacon.py, deadmans_switch_cli.py, mobile_bridge.py, logo_eyes.py, hardware_integration.py, 20+ additional modules.

4.4 Mutation Testing

  • Python (mutmut): CI workflow mutation-testing.yml runs on push (path-filtered) + weekly. mutmut_config.py defines paths and patterns.
  • Rust (cargo-mutants): Same workflow, runs cargo-mutants on crypto_core/ and rust_crypto/.
  • Status: Both jobs exist and are CI-gated. Quality of mutation score tracking not assessed (no historical data visible).

4.5 CI Gate Structure

ci.yml "All CI Gates" summary job requires:

  • Preflight (lint) β€” βœ… Blocking
  • Test Batch 1 (coverage β‰₯70%) β€” βœ… Blocking
  • Test Batch 2 β€” βœ… Blocking
  • Security Coverage (β‰₯80%) β€” βœ… Blocking
  • Slow Tests (Monte Carlo) β€” βœ… Blocking
  • Cat Mode Tests (Gates 2-4) β€” ⚠️ Non-blocking (warnings)

Additional blocking workflows:

  • fuzz.yml β€” on push/PR
  • rust-security-suite.yml β€” on push/PR
  • formal-verification.yml β€” on push/PR (path-filtered) + weekly

4.6 Coverage Score: 8 / 10 (was 6/10)

Coverage thresholds substantially improved: Python batch 1 raised to 80%, security gate raised to 85% and expanded from 7 to 14 modules, Rust enforcement added at 70%. Only pq_ratchet_beacon.py remains excluded from the security gate among secret-handling modules. General .coveragerc still has 0% threshold. For a crypto project aspiring to 95%+, further raises are recommended but the current levels are defensible.


5. Reachability & Dead Code Analysis

5.1 Production Inventory

  • 55 Python modules in meow_decoder/
  • 10 Rust modules in crypto_core/src/ (excluding Verus files and wasm/hsm stubs)

5.2 Security-Critical Path Coverage Matrix

Module Formal Fuzz Property Unit Coverage Gate
crypto.py βœ… ProVerif, TLA+ βœ… fuzz_crypto βœ… Hypothesis βœ… βœ… 80%
crypto_backend.py β€” βœ… fuzz_crypto_backend β€” βœ… βœ… 80%
ratchet.py βœ… TLA+, Tamarin βœ… fuzz_ratchet βœ… Hypothesis βœ… (145) βœ… 80%
pq_hybrid.py βœ… ProVerif, Tamarin βœ… fuzz_pq_hybrid βœ… Hypothesis βœ… βœ… 80%
fountain.py βœ… TLA+, Lean βœ… fuzz_fountain βœ… Hypothesis βœ… βœ… 70%
frame_mac.py β€” β€” βœ… Hypothesis βœ… βœ… 80%
constant_time.py β€” β€” β€” βœ… (46) βœ… 80%
schrodinger_encode.py βœ… Tamarin βœ… fuzz_schrodinger β€” βœ… βœ… 85% (NEW)
memory_guard.py β€” βœ… fuzz_memory_guard Γ— 2 β€” βœ… βœ… 85% (NEW)
manifest_signing.py βœ… ProVerif βœ… fuzz_manifest_signing βœ… Hypothesis βœ… βœ… 85% (NEW)
x25519_forward_secrecy.py βœ… Tamarin βœ… NEW fuzz_x25519_fs ❌ βœ… βœ… 85% (NEW)
master_ratchet.py βœ… TLA+ βœ… fuzz_master_ratchet β€” βœ… βœ… 85% (NEW)
encode.py / decode_gif.py βœ… TLA+, ProVerif β€” β€” βœ… βœ… 70%
aead_wrapper.rs ⚠️ Verus (not Z3-checked) βœ… fuzz_aead βœ… NEW proptest βœ… βœ… 70% (NEW)
secure_alloc.rs ⚠️ Verus (not Z3-checked) βœ… fuzz_secure_alloc β€” βœ… βœ… 70% (NEW)
pure_crypto.rs β€” βœ… fuzz_pure_crypto βœ… proptest βœ… βœ… 70% (NEW)
nonce.rs β€” βœ… fuzz_nonce βœ… NEW proptest βœ… βœ… 70% (NEW)

5.3 Test-Mode Separation

  • MEOW_TEST_MODE=1 is set in all CI environments.
  • Argon2 test preset: 32 MiB, 1 iteration, 1 thread.
  • Production "paranoid" preset: 512 MiB, 20 iterations, 4 threads.
  • Production guard in crypto.py: if _PRODUCTION_MODE and not _TEST_MODE: raise RuntimeError β€” prevents accidental production execution without Rust backend.
  • Consequence: Production-strength Argon2 (512 MiB, 20 iter) is never exercised in CI. Only the test preset runs.
  • Assessment: Acceptable for CI speed. The KDF is a well-tested primitive; the risk is configuration, not algorithm. The preset selection mechanism itself is tested in test_fail_closed_enforcement.py.

5.4 Legacy/Dead Code

  • meow_decoder/_archive/ β€” excluded from coverage, no imports from production code.
  • meow_decoder/*_DEBUG.py β€” verbose debug variants, excluded from coverage.
  • crypto_core/src/hsm.rs, tpm.rs, yubikey_piv.rs β€” hardware integration stubs, excluded from tarpaulin.
  • crypto_core/src/wasm.rs β€” WASM bindings, excluded from tarpaulin.

6. Final Verdict

Combined Score: 8.5 / 10 (was 7.5/10)

Category Score Weight Notes
Formal Verification 7.5/10 30% Excellent breadth; Verus now honestly labeled ⚠️ STRUCTURAL
Fuzzing 9/10 25% 22 Python + 9 Rust targets, CI-gated; 3 new harnesses
Property-Based Testing 9/10 15% Hypothesis + proptest; crypto_core gap closed (15 tests)
Coverage Enforcement 8/10 15% Python 80%/85%, Rust 70% β€” all enforced
Test Quality & Meta 8/10 15% Meta-tests, negative tests, mutation testing

Is This Legitimately 10/10?

No, but the primary blockers are significantly reduced. Remaining items:

  1. Verus Z3 not running β€” Still grep-only, but now honestly documented. The CI job and documentation no longer overclaim. Installing the full Verus toolchain in CI would raise this to 10/10 for formal verification honesty.
  2. AEAD-005–012 vacuous lemmas β€” Research-level gap (acknowledged RL-1). These abstract lemmas have preconditions that subsume postconditions.
  3. Lean open items β€” 1 approved sorry (API incompatibility) + 2 textbook axioms. Acceptable for a crypto project.
  4. Coverage could be higher β€” 80%/85%/70% is defensible but below the 95%+ ideal for high-assurance crypto.

One-Line Release Recommendation

The 6 most actionable gaps have been closed: Verus claims are accurate, coverage is enforced across Python and Rust, crypto_core has property tests, and 4 previously-unfuzzed security modules are now fuzzed. The remaining gaps (Verus Z3, Lean axioms, statistical CT) are either research-level or honestly documented.

Prior Audit Report Errata

The prior test-formal-fuzz-audit-results.md (dated 2025-07-15, claiming 10/10) contained the following factual errors:

  • Row 4 ("Verus AEAD/KDF proofs are stubs β†’ FIXED"): The Verus CI Dockerfile still performs grep-based structural analysis only. No Z3 invocation was added. Remediation: Dockerfile and CI job now honestly labeled as structural regression tracking; SECURITY_INVARIANTS.md Verus rows downgraded to ⚠️ STRUCTURAL.
  • Row 11 ("Rust coverage not gated β†’ FIXED"): fail_ci_if_error: true was added, but this gates the Codecov upload, not a coverage percentage threshold. Remediation: Explicit 70% threshold enforcement added via tarpaulin output parsing.
  • "30+ real verus!{} proofs": verus_proofs.rs header explicitly states "no verus!{} blocks exist in this file." The aead_wrapper.rs file contains approximately 1 verus! block, not 30+. Remediation: Claims now match reality.

Remediation Summary

Fix Files Modified
Verus CI honesty formal/Dockerfile.verus, .github/workflows/formal-verification.yml, docs/SECURITY_INVARIANTS.md
Rust coverage threshold .github/workflows/rust-test-coverage.yml
Security coverage gate expansion .coveragerc-security (7 β†’ 14 modules)
crypto_core proptest crypto_core/tests/property_tests.rs (15 tests, all passing)
Missing fuzz harnesses fuzz/fuzz_x25519_fs.py, fuzz/fuzz_timelock_expiry.py, fuzz/fuzz_forensic_cleanup.py, .github/workflows/fuzz.yml
Python coverage thresholds .github/workflows/ci.yml (70β†’80%, 80β†’85%)

End of audit report (post-remediation).