chore: update submodule pointers + add audit docs

hyperpolymath · claude · hyperpolymath · commit d6a5072097ce · 2026-03-30T13:30:06.000+01:00
Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/PROOF-NEEDS.md b/PROOF-NEEDS.md
@@ -0,0 +1,30 @@
+# PROOF-NEEDS.md — nextgen-languages
+
+## Current State
+
+- **src/abi/*.idr**: YES (in oblibeny) — `Interface.idr`
+- **Dangerous patterns**: 4 `Admitted` in ephapax/formal/Semantics.v (ctx_transfer 15/24, subst_lemma, preservation)
+- **LOC**: ~456,000 (OCaml + Rust + Coq + Idris2 + Lean)
+- **Existing proofs**: Ephapax has Coq proofs, Tangle has Lean proofs, Idris2 in multiple sub-languages
+
+## What Needs Proving
+
+| Component | What | Why |
+|-----------|------|-----|
+| Ephapax ctx_transfer (15/24 cases) | Close remaining 9 context transfer cases in Semantics.v | Core type safety theorem is incomplete |
+| Ephapax subst_lemma | Prove substitution lemma | Required for preservation theorem |
+| Ephapax preservation | Close preservation theorem (depends on subst_lemma) | Type preservation is THE fundamental safety property |
+| Ephapax 4th Admitted | Close the 4th Admitted in Semantics.v | Blocks full formal verification claim |
+| AffineScript type safety | Prove OCaml type checker is sound | AffineScript claims affine types but lacks proofs |
+| AffineScript runtime (Rust) | GC and allocator correctness | Memory safety of the runtime is critical |
+| Betlang type checker | Prove bet-check is sound | Compiler correctness depends on type checking |
+| Tangle Lean proofs | Extend Tangle.lean coverage | Existing Lean proofs are partial |
+| Oblibeny ABI | Extend Interface.idr with full package proofs | Current ABI is minimal |
+
+## Recommended Prover
+
+**Coq** for Ephapax (existing proof infrastructure). **Lean4** for Tangle (existing). **Idris2** for ABI layers. **OCaml** extraction from Coq for AffineScript type checker verification.
+
+## Priority
+
+**HIGH** — These are programming language compilers. Type safety proofs are the gold standard for language correctness. The 4 Admitted proofs in Ephapax are concrete, known gaps that block the formal verification claim. AffineScript's runtime Rust code (GC, allocator) is safety-critical.
diff --git a/TEST-NEEDS.md b/TEST-NEEDS.md
@@ -0,0 +1,77 @@
+# TEST-NEEDS.md — nextgen-languages
+
+> Generated 2026-03-29 by punishing audit.
+
+## Current State
+
+| Category     | Count | Notes |
+|-------------|-------|-------|
+| Unit tests   | ~50   | affinescript: test_lexer.ml, test_golden.ml, test_e2e.ml + ~119 .as test files (borrow, codegen). tangle: test_parser.ml, test_typecheck.ml, test_eval.ml |
+| Integration  | ~5    | tangle FFI integration_test.zig, ephapax tests.rs, 7-tentacles structure_test.ts |
+| E2E          | ~3    | affinescript test_e2e.ml + integration tests |
+| Benchmarks   | 4     | tangle: bench_lexer.ml, bench_lexer.rs, bench_parser.ml, bench_parser_rust.rs. betlang: bench_lexer.rs |
+
+**Source modules:** ~772 across 14+ language implementations. Major: affinescript (~87 ML), ephapax (~488 Rust across 19 crates), tangle, eclexia, betlang, anvomidav, wokelang, 7-tentacles, error-lang, julia-the-viper.
+
+## What's Missing
+
+### P2P (Property-Based) Tests
+- [ ] affinescript: borrow checker property tests (arbitrary program shapes)
+- [ ] ephapax: linear type checker property tests — CRITICAL (19 crates, only tests.rs)
+- [ ] tangle: parser roundtrip property tests
+- [ ] All languages: lexer/parser fuzzing for crash resistance
+
+### E2E Tests
+- [ ] Each language: source -> lex -> parse -> typecheck -> codegen -> execute
+- [ ] affinescript: full compile pipeline with borrow checking
+- [ ] ephapax: full linear type checking pipeline (19 crates, needs integration)
+- [ ] tangle: full compilation to target
+- [ ] Cross-language: shared concepts verified across implementations
+
+### Aspect Tests
+- **Security:** No tests for code injection through language constructs, unsafe memory in codegen, sandbox escape in interpreters
+- **Performance:** tangle has lexer/parser benchmarks (good). affinescript: ZERO benchmarks. ephapax: ZERO benchmarks for 19 crates
+- **Concurrency:** No tests for parallel compilation, concurrent type checking
+- **Error handling:** affinescript has test files for error cases (good). Most other languages: ZERO error handling tests
+
+### Build & Execution
+- [ ] OCaml build + test for affinescript, tangle
+- [ ] `cargo test` for ephapax (19 crates!)
+- [ ] Zig build for tangle FFI
+- [ ] Test runners for each language
+
+### Benchmarks Needed
+- [ ] ephapax: type checking time, compilation time, memory usage (19 crates, ZERO benchmarks)
+- [ ] affinescript: compilation pipeline benchmarks
+- [ ] eclexia: parsing/evaluation benchmarks
+- [ ] All languages: parse time vs source size
+
+### Self-Tests
+- [ ] Each language: self-hosting test (can it compile its own test suite?)
+- [ ] Grammar consistency checks
+- [ ] Type system soundness verification
+
+### CRITICAL GAPS
+
+| Language | Source Files | Tests | Status |
+|----------|-------------|-------|--------|
+| affinescript | ~87 ML | ~50 unit + 119 .as | **Good coverage** |
+| ephapax | ~488 Rust (19 crates) | 1 tests.rs | **0.2% — CATASTROPHIC** |
+| tangle | moderate | 3 ML + 1 Zig + 4 bench | Adequate for size |
+| eclexia | unknown | 0 | **Untested** |
+| betlang | unknown | 1 bench | **Untested** |
+| anvomidav | unknown | 0 | **Untested** |
+| wokelang | unknown | 0 | **Untested** |
+| 7-tentacles | unknown | 1 | Minimal |
+| error-lang | unknown | 0 | **Untested** |
+| julia-the-viper | unknown | 0 | **Untested** |
+
+## Priority
+
+**CRITICAL.** Ephapax at 488 Rust source files across 19 crates with effectively 1 test file is catastrophic — this is a compiler with linear types that needs rigorous testing above all else. affinescript is the only well-tested language. At least 6 language implementations have ZERO tests. The tangle benchmarks are a good model for the rest.
+
+## FAKE-FUZZ ALERT
+
+- `tests/fuzz/placeholder.txt` is a scorecard placeholder inherited from rsr-template-repo — it does NOT provide real fuzz testing
+- Replace with an actual fuzz harness (see rsr-template-repo/tests/fuzz/README.adoc) or remove the file
+- Priority: P2 — creates false impression of fuzz coverage
diff --git a/affinescript b/affinescript
@@ -1 +1 @@
-Subproject commit 4aa4646be3c05827d2b406017f03db74c5de440e
+Subproject commit 9d9a7a6ece87cc1f5ff1185421ad814c51b73154
diff --git a/betlang b/betlang
@@ -1 +1 @@
-Subproject commit 5963764a924000b425debf6f25545224cb3cf3de
+Subproject commit d9e2c4656f87f95c24b09b7d5a88042709229927
diff --git a/ephapax b/ephapax
@@ -1 +1 @@
-Subproject commit 13080a67e1befb25b0c60f0312aec4e80375875e
+Subproject commit 164b2f2145c06b872c0dc4ce7f65f6b3c88e6f33
diff --git a/error-lang b/error-lang
@@ -1 +1 @@
-Subproject commit d23222161c7cdc3c00fd9a7bcb1a561eb505b9e4
+Subproject commit 5eaaeb1b8231d77d17d07e435473a14177a17b10
diff --git a/julia-the-viper b/julia-the-viper
@@ -1 +1 @@
-Subproject commit 0832a6f212c72647808ddef4049486b8c4fddd29
+Subproject commit 8db897135dc820dc96620f194de5d1d702cd686d
diff --git a/me-dialect b/me-dialect
@@ -1 +1 @@
-Subproject commit 9e82fdf3cc13e50d3cbafdb2b0c83c5995f8d8b2
+Subproject commit 87a911e469007a40f616e84f449aed53ae5a4d9b
diff --git a/my-lang b/my-lang
@@ -1 +1 @@
-Subproject commit d01f594cd683694d359dd9cbabbd09f625311617
+Subproject commit 33b1ff088d13f1263d5e301af5327364ad8ec7b5
diff --git a/oblibeny b/oblibeny
@@ -1 +1 @@
-Subproject commit dc4821043cbde3ec4e99c5765989532cc6502f22
+Subproject commit 17521c361a8d5b97c13a6ad296779a413f73e1a9
diff --git a/phronesis b/phronesis
@@ -1 +1 @@
-Subproject commit 49dec597c48014e694dd94a73f7dfde09bee99ac
+Subproject commit 89303464d9bf86908f8e6a1a568603856438b48f
diff --git a/tangle b/tangle
@@ -1 +1 @@
-Subproject commit 6331fcfd968339600e4b0ece78d5d737776e7545
+Subproject commit 8ce7be7f4bc0b78ce0ccdbbebf4bb2ec0cf52edb
diff --git a/wokelang b/wokelang
@@ -1 +1 @@
-Subproject commit 3161f5c9c71e1934ec5873b16b8d2b5304a83581
+Subproject commit c3f7307861cb4edf505f2f5870b837e1c5f19850