Skip to content

Commit d6a5072

Browse files
hyperpolymathclaude
andcommitted
chore: update submodule pointers + add audit docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9b87800 commit d6a5072

13 files changed

Lines changed: 118 additions & 11 deletions

PROOF-NEEDS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# PROOF-NEEDS.md — nextgen-languages
2+
3+
## Current State
4+
5+
- **src/abi/*.idr**: YES (in oblibeny) — `Interface.idr`
6+
- **Dangerous patterns**: 4 `Admitted` in ephapax/formal/Semantics.v (ctx_transfer 15/24, subst_lemma, preservation)
7+
- **LOC**: ~456,000 (OCaml + Rust + Coq + Idris2 + Lean)
8+
- **Existing proofs**: Ephapax has Coq proofs, Tangle has Lean proofs, Idris2 in multiple sub-languages
9+
10+
## What Needs Proving
11+
12+
| Component | What | Why |
13+
|-----------|------|-----|
14+
| Ephapax ctx_transfer (15/24 cases) | Close remaining 9 context transfer cases in Semantics.v | Core type safety theorem is incomplete |
15+
| Ephapax subst_lemma | Prove substitution lemma | Required for preservation theorem |
16+
| Ephapax preservation | Close preservation theorem (depends on subst_lemma) | Type preservation is THE fundamental safety property |
17+
| Ephapax 4th Admitted | Close the 4th Admitted in Semantics.v | Blocks full formal verification claim |
18+
| AffineScript type safety | Prove OCaml type checker is sound | AffineScript claims affine types but lacks proofs |
19+
| AffineScript runtime (Rust) | GC and allocator correctness | Memory safety of the runtime is critical |
20+
| Betlang type checker | Prove bet-check is sound | Compiler correctness depends on type checking |
21+
| Tangle Lean proofs | Extend Tangle.lean coverage | Existing Lean proofs are partial |
22+
| Oblibeny ABI | Extend Interface.idr with full package proofs | Current ABI is minimal |
23+
24+
## Recommended Prover
25+
26+
**Coq** for Ephapax (existing proof infrastructure). **Lean4** for Tangle (existing). **Idris2** for ABI layers. **OCaml** extraction from Coq for AffineScript type checker verification.
27+
28+
## Priority
29+
30+
**HIGH** — These are programming language compilers. Type safety proofs are the gold standard for language correctness. The 4 Admitted proofs in Ephapax are concrete, known gaps that block the formal verification claim. AffineScript's runtime Rust code (GC, allocator) is safety-critical.

TEST-NEEDS.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# TEST-NEEDS.md — nextgen-languages
2+
3+
> Generated 2026-03-29 by punishing audit.
4+
5+
## Current State
6+
7+
| Category | Count | Notes |
8+
|-------------|-------|-------|
9+
| Unit tests | ~50 | affinescript: test_lexer.ml, test_golden.ml, test_e2e.ml + ~119 .as test files (borrow, codegen). tangle: test_parser.ml, test_typecheck.ml, test_eval.ml |
10+
| Integration | ~5 | tangle FFI integration_test.zig, ephapax tests.rs, 7-tentacles structure_test.ts |
11+
| E2E | ~3 | affinescript test_e2e.ml + integration tests |
12+
| Benchmarks | 4 | tangle: bench_lexer.ml, bench_lexer.rs, bench_parser.ml, bench_parser_rust.rs. betlang: bench_lexer.rs |
13+
14+
**Source modules:** ~772 across 14+ language implementations. Major: affinescript (~87 ML), ephapax (~488 Rust across 19 crates), tangle, eclexia, betlang, anvomidav, wokelang, 7-tentacles, error-lang, julia-the-viper.
15+
16+
## What's Missing
17+
18+
### P2P (Property-Based) Tests
19+
- [ ] affinescript: borrow checker property tests (arbitrary program shapes)
20+
- [ ] ephapax: linear type checker property tests — CRITICAL (19 crates, only tests.rs)
21+
- [ ] tangle: parser roundtrip property tests
22+
- [ ] All languages: lexer/parser fuzzing for crash resistance
23+
24+
### E2E Tests
25+
- [ ] Each language: source -> lex -> parse -> typecheck -> codegen -> execute
26+
- [ ] affinescript: full compile pipeline with borrow checking
27+
- [ ] ephapax: full linear type checking pipeline (19 crates, needs integration)
28+
- [ ] tangle: full compilation to target
29+
- [ ] Cross-language: shared concepts verified across implementations
30+
31+
### Aspect Tests
32+
- **Security:** No tests for code injection through language constructs, unsafe memory in codegen, sandbox escape in interpreters
33+
- **Performance:** tangle has lexer/parser benchmarks (good). affinescript: ZERO benchmarks. ephapax: ZERO benchmarks for 19 crates
34+
- **Concurrency:** No tests for parallel compilation, concurrent type checking
35+
- **Error handling:** affinescript has test files for error cases (good). Most other languages: ZERO error handling tests
36+
37+
### Build & Execution
38+
- [ ] OCaml build + test for affinescript, tangle
39+
- [ ] `cargo test` for ephapax (19 crates!)
40+
- [ ] Zig build for tangle FFI
41+
- [ ] Test runners for each language
42+
43+
### Benchmarks Needed
44+
- [ ] ephapax: type checking time, compilation time, memory usage (19 crates, ZERO benchmarks)
45+
- [ ] affinescript: compilation pipeline benchmarks
46+
- [ ] eclexia: parsing/evaluation benchmarks
47+
- [ ] All languages: parse time vs source size
48+
49+
### Self-Tests
50+
- [ ] Each language: self-hosting test (can it compile its own test suite?)
51+
- [ ] Grammar consistency checks
52+
- [ ] Type system soundness verification
53+
54+
### CRITICAL GAPS
55+
56+
| Language | Source Files | Tests | Status |
57+
|----------|-------------|-------|--------|
58+
| affinescript | ~87 ML | ~50 unit + 119 .as | **Good coverage** |
59+
| ephapax | ~488 Rust (19 crates) | 1 tests.rs | **0.2% — CATASTROPHIC** |
60+
| tangle | moderate | 3 ML + 1 Zig + 4 bench | Adequate for size |
61+
| eclexia | unknown | 0 | **Untested** |
62+
| betlang | unknown | 1 bench | **Untested** |
63+
| anvomidav | unknown | 0 | **Untested** |
64+
| wokelang | unknown | 0 | **Untested** |
65+
| 7-tentacles | unknown | 1 | Minimal |
66+
| error-lang | unknown | 0 | **Untested** |
67+
| julia-the-viper | unknown | 0 | **Untested** |
68+
69+
## Priority
70+
71+
**CRITICAL.** Ephapax at 488 Rust source files across 19 crates with effectively 1 test file is catastrophic — this is a compiler with linear types that needs rigorous testing above all else. affinescript is the only well-tested language. At least 6 language implementations have ZERO tests. The tangle benchmarks are a good model for the rest.
72+
73+
## FAKE-FUZZ ALERT
74+
75+
- `tests/fuzz/placeholder.txt` is a scorecard placeholder inherited from rsr-template-repo — it does NOT provide real fuzz testing
76+
- Replace with an actual fuzz harness (see rsr-template-repo/tests/fuzz/README.adoc) or remove the file
77+
- Priority: P2 — creates false impression of fuzz coverage

julia-the-viper

Submodule julia-the-viper updated 51 files

0 commit comments

Comments
 (0)