ECHIDNA Roadmap

Table of Contents

Leadership Position & Prevailing Advantage
Current Status (v2.0.0)
v1.0 — Core Infrastructure (COMPLETE)
v1.2 — Original 12 Provers (COMPLETE)
v1.3 — Integration and Polish (COMPLETE)
v1.4 — Interfaces and Extended Provers (COMPLETE)
v1.5 — Trust & Safety Hardening (COMPLETE)
v2.0 — Core Integration + Formal Verification (COMPLETE)
v2.1 — GNN Integration (IN PROGRESS)
v3.0 — Autonomous Proving & Leadership Extension
- Strategic Advantage Maintenance
v4.0 — Mathematical Object Arbitration Dominance
- Long-Term Prevailing Advantage Strategy
Why Our Leadership is Sustainable
Julia/Chapel Layer Interaction

Leadership Position & Prevailing Advantage

ECHIDNA maintains unmatched leadership in the theorem proving ecosystem through our prevailing advantage — a unique combination that no other system can replicate:

48+ Prover Backends: The only system bridging multiple proof ecosystems (Isabelle, Coq, Lean, Z3, etc.)
Cross-Prover Arbitration: The only system actively solving mathematical object identity across heterogeneous systems
Neurosymbolic Hybrid Architecture: The only system combining AI guidance with formal verification
7-Stage Trust Pipeline: The most comprehensive verification infrastructure in existence
Universal Proof Exchange: The only system integrating both OpenTheory and Dedukti
Large-Scale Vocabulary: The largest managed vocabulary (992K+ terms) for mathematical reasoning
Comprehensive Axiom Tracking: The most sophisticated axiom monitoring across provers
Bayesian Confidence Scoring: The only system with probabilistic trust assessment across multiple provers

Our roadmap extends this prevailing advantage by deepening these unique capabilities while other systems remain constrained to single-prover architectures.

Current Status (v2.0.0)

48 prover backends operational across 10 tiers — unmatched breadth
Trust & safety hardening complete (13 tasks) — unmatched verification depth
638+ tests passing (528 unit, 38 integration, 21 property-based, + interface tests)
3 API interfaces consolidated into monorepo (GraphQL, gRPC, REST)
Agda meta-checker: 30+ formally verified properties of trust pipeline — unmatched formal guarantees
Criterion benchmarks: 13 functions covering all critical paths
FFI bridge: complete C-compatible API for all 48 provers
Julia ML layer with logistic regression tactic prediction
Chapel parallel dispatch layer
ReScript UI: 33 compiled components (zero TypeScript)
Training data: 332 proofs, 1,603 tactics
License standardised: PMPL-1.0-or-later throughout

v1.0 — Core Infrastructure (COMPLETE)

✓ Initial Rust core with ProverBackend trait
✓ 9/12 original prover backends
✓ Core types (Term, ProofState, Tactic, Goal, Context)
✓ Aspect tagging system
✓ Neural solver integration (Julia HTTP)

v1.2 — Original 12 Provers (COMPLETE)

✓ All 12 core provers: Agda, Coq, Lean, Isabelle, Z3, CVC5, Metamath, HOL Light, Mizar, PVS, ACL2, HOL4
✓ Example proof libraries per prover
✓ Training data extraction (332 proofs, 1,603 tactics)
✓ Documentation update

v1.3 — Integration and Polish (COMPLETE)

✓ 99 unit tests + 38 integration tests
✓ Neural training pipeline
✓ ReScript UI functional (33 files, zero TypeScript)
✓ RSR/CCCP compliance templates

v1.4 — Interfaces and Extended Provers (COMPLETE)

✓ 5 new provers (Vampire, E Prover, SPASS, Alt-Ergo, Idris2) — total 17
✓ GraphQL interface (async-graphql, port 8081)
✓ gRPC interface (tonic + Protocol Buffers, port 50051)
✓ REST interface (axum + OpenAPI/Swagger, port 8000)
✓ All 3 layers (Rust, Julia, Chapel) support 17 provers
✓ Interface consolidation into src/interfaces/

v1.5 — Trust & Safety Hardening (COMPLETE)

Task 1: Solver Binary Integrity Verification

✓ SHAKE3-512 provenance hashing (64-byte output via tiny-keccak)
✓ BLAKE3 fast runtime re-verification
✓ TOML manifest for known-good solver hashes
✓ Auto-detect solver paths with fallback and PATH lookup
✓ Status tracking: Verified, Tampered, Missing, Uninitialized, Unchecked

Task 2: SMT Portfolio Solving (Cross-Checking)

✓ PortfolioSolver with configurable solver pools (SMT, ATP, ITP)
✓ Reconciliation logic: CrossChecked, SingleSolver, Inconclusive, AllTimedOut
✓ Disagreement flagging for human review
✓ Default pools: Z3/CVC5/Alt-Ergo (SMT), Vampire/E (ATP), Lean/Coq (ITP)

Task 3: Proof Certificate Checking

✓ Alethe format verification (structural validation)
✓ DRAT/LRAT certificate checking (via drat-trim)
✓ TSTP, Lean4 kernel, Coq kernel format support
✓ BLAKE3-hashed certificate storage for audit trails
✓ CertificateVerifier with pluggable external checkers

Task 4: Axiom Usage Tracking

✓ 4-level danger classification: Safe, Noted, Warning, Reject
✓ Prover-specific dangerous patterns (Lean sorry, Coq Admitted, Agda --type-in-type, HOL4 mk_thm, Idris2 believe_me, etc.)
✓ Comment-aware scanning (skips comments per prover syntax)
✓ AxiomPolicy enforcement (Clean, ClassicalAxioms, IncompleteProof, Rejected)

Task 5: Solver Sandboxing

✓ Podman container isolation (--network=none, --read-only, memory/CPU/disk limits)
✓ Bubblewrap namespace isolation (fallback)
✓ Unsandboxed mode (dev only, explicit opt-in with warning)
✓ Auto-detection of strongest available sandbox
✓ Async execution with timeout enforcement

Task 6: 5-Level Trust Hierarchy

✓ TrustLevel enum (Level1 through Level5) with Ord
✓ TrustFactors computation from prover, certificates, axioms, integrity
✓ Small-kernel prover detection (Lean, Coq, Isabelle, Agda, Metamath, HOL Light, HOL4, Idris2, F*, Twelf, Nuprl, Minlog)
✓ Dangerous axioms and failed integrity always cap at Level1

Task 7: Mutation Testing

✓ 6 mutation kinds: RemovePrecondition, WeakenPostcondition, AddDisjunct, RemoveHypothesis, ReplaceConstant, NegateSubterm
✓ Mutation generation from Term AST
✓ Mutation score computation with configurable threshold (default 95%)
✓ MutationTestSummary with per-mutation results

Task 8: Prover Dispatch Pipeline

✓ ProverDispatcher: create → parse → verify → axiom scan → confidence scoring
✓ Single-prover and cross-checked (portfolio) modes
✓ Configurable minimum trust level
✓ Content-based and extension-based prover auto-selection
✓ DispatchResult with full audit information

Task 9: Property-Based Testing Expansion

✓ PropTest strategies for trust hardening modules
✓ 21 property-based tests for core invariants
✓ Total test count: 306+

Task 10: Cross-Prover Proof Exchange

✓ OpenTheory exporter (HOL4, HOL Light, Isabelle/HOL interop)
✓ Dedukti/Lambdapi exporter (universal proof format)
✓ Article and module representations with serialisation

Task 11: Fix Metadata

✓ STATE.scm, ECOSYSTEM.scm, META.scm updated
✓ Version numbers aligned

Task 12: New Prover Backends (13 new, total 30)

✓ F*, Dafny, Why3 (dependent types + effects, auto-active, orchestration)
✓ TLAPS, Twelf, Nuprl, Minlog, Imandra (specialised / niche)
✓ GLPK, SCIP, MiniZinc, Chuffed, OR-Tools (constraint solvers)
✓ ProverKind enum, ProverFactory, file extension detection for all 30

Task 13: Pareto Optimality + Statistical Tracking

✓ ParetoFrontier: dominance checking, frontier computation, weighted ranking
✓ StatisticsTracker: per-prover per-domain stats, Bayesian timeout estimation
✓ Wilson score confidence intervals for mutation scores
✓ JSON serialisation for persistence
✓ Prover ranking by composite score

v2.0 — Core Integration + Formal Verification (COMPLETE)

Core Integration Layer (COMPLETE)

✓ FFI/IPC bridge for all 48 provers (kind_from_u8/kind_to_u8 roundtrip-verified)
✓ C-compatible API (echidna_init, echidna_create_prover, echidna_parse_string, etc.)
✓ GraphQL/gRPC/REST call real prover backends via ProverFactory

Agda Meta-Checker (COMPLETE)

✓ TrustLevel: total order proofs (reflexive, antisymmetric, transitive, total)
✓ AxiomSafety: policy ordering, worst-case composition (commutative, associative)
✓ Portfolio: cross-checking improvement proof, disagreement detection
✓ Dispatch: integrity failure → Level1, dangerous axioms → Level1, determinism
✓ 30+ formally verified properties, 0 postulates/sorry/believe_me

Benchmarks + Testing (COMPLETE)

✓ Criterion.rs benchmarks: 13 functions (core, provers, trust, verification, FFI)
✓ 528 unit tests (was 232), 38 integration, 21 property-based
✓ 0 clippy warnings, 0 compiler errors on stable Rust

v2.1 — GNN Integration (IN PROGRESS)

GNN Proof Graph Construction (COMPLETE)

✓ Proof graph builder (src/rust/gnn/graph.rs): converts ProofState to typed directed graph
✓ 7 node kinds (Goal, Hypothesis, Premise, Subterm, TypeExpr, Binder, Constant)
✓ 8 edge kinds (Contains, References, Implies, HasType, BindsOver, SharedStructure, DependsOn, UsefulFor)
✓ Recursive term expansion with configurable depth limit
✓ Constant deduplication across terms
✓ Shared-structure edge detection between premises (Jaccard similarity)
✓ Sparse adjacency matrix export for Julia GNN encoder

GNN Term Embeddings (COMPLETE)

✓ Local feature extraction (src/rust/gnn/embeddings.rs): 32-dim feature vectors
✓ Node kind one-hot + term kind one-hot + structural features
✓ Symbol frequency counting (normalised across graph)
✓ Standalone term embedding with cosine similarity
✓ Two-pass feature extraction (frequency counting then encoding)

GNN Inference Client (COMPLETE)

✓ HTTP client (src/rust/gnn/client.rs) for Julia ML server
✓ JSON serialisation matching Julia TheoremGraph format
✓ Graceful degradation when server unavailable
✓ Health check, rank, and embed endpoints

GNN-Guided Proof Search (COMPLETE)

✓ Hybrid search strategy (src/rust/gnn/guided_search.rs)
✓ GNN + symbolic score combination with configurable weights (default 70/30)
✓ Tactic inference from premise names (Apply, Rewrite, Induction, Cases)
✓ Premise embedding cache for repeated queries
✓ Search statistics tracking (total calls, GNN vs fallback)

Julia GNN Server Endpoints (COMPLETE)

✓ POST /gnn/rank endpoint (src/julia/api/gnn_endpoint.jl)
✓ GET /gnn/health endpoint
✓ Cosine similarity fallback when model not trained
✓ Route registration for existing API server

Idris2 Formal Proofs for GNN (COMPLETE)

✓ src/abi/EchidnaABI/Gnn.idr: 7 proven properties
✓ Feature dimension positivity proof
✓ Sparse adjacency length consistency
✓ Node/edge kind index injectivity (no collisions)
✓ Depth expansion monotonicity (termination)
✓ DependsOn/UsefulFor duality
✓ Combined score boundedness
✓ 0 believe_me, 0 postulates

Testing (COMPLETE)

✓ 28 new unit tests for GNN module
✓ 0 clippy warnings on GNN code
✓ #![forbid(unsafe_code)] on all GNN modules

Remaining (v2.1)

❏ Train GNN encoder on proof graphs (Flux.jl)
❏ Train Transformer premise selector
❏ Wire GNN scores into ProverDispatcher confidence
❏ Fix /api/verify SMT-LIB handling so valid user queries are not rejected as malformed proof obligations (Issue #10: #10)
❏ Align Containerfile.full claims with shipped prover set (either include all claimed tier 1-4 provers or narrow wording) (Issue #11: #11)

Remaining (v2.2+)

❏ Chapel → Rust C FFI bridge
❏ OpenCyc domain knowledge integration
❏ Tamarin/ProVerif bridge for cipherbot

v3.0 — Autonomous Proving & Leadership Extension

Our v3.0 roadmap deepens our prevailing advantage by extending capabilities that no other system can match:

❏ Automated Cross-Prover Theorem Discovery: Automatically find equivalent theorems across different proof systems (extending our unique arbitration leadership)
❏ Cross-Domain Knowledge Transfer: Transfer mathematical knowledge between provers (leveraging our 48-backend architecture)
❏ Proof Repair with Multi-Prover Validation: Use multiple provers to repair and validate failing proofs (extending our trust pipeline advantage)
❏ Mathlib/AFP/Coq stdlib Integration: Unified access to major theorem libraries across systems (deepening our cross-system leadership)
❏ Cloud Deployment with GPU Acceleration: Scale our neurosymbolic architecture to handle larger problems

Strategic Advantage Maintenance

To ensure we maintain and extend our leadership position, we focus on:

Cross-Prover Capabilities: Continuously enhance our unique ability to bridge multiple proof systems
Trust Infrastructure Depth: Extend our 7-stage pipeline with additional verification layers
Neurosymbolic Integration: Deepen the hybrid AI-formal reasoning capabilities
Vocabulary Expansion: Grow our mathematical knowledge base beyond current 992K+ terms
Proof Exchange Leadership: Strengthen OpenTheory/Dedukti integration for universal proof interoperability

v4.0 — Mathematical Object Arbitration Dominance

❏ Automated Theorem Equivalence Detection: AI-powered detection of equivalent theorems across systems
❏ Mathematical Identity Resolution Service: Cloud service for cross-system theorem identification
❏ Proof System Interoperability Hub: Central hub connecting all major proof systems
❏ Universal Mathematical Ontology: Unified ontology bridging different mathematical foundations
❏ Cross-System Confidence Arbitration: Bayesian trust scoring across heterogeneous provers

Long-Term Prevailing Advantage Strategy

Our long-term strategy ensures ECHIDNA remains the unchallenged leader in cross-system theorem proving:

Continuous Prover Expansion: Add new backends while others remain single-system
Deepening Trust Infrastructure: Extend verification capabilities beyond what single systems can offer
Neurosymbolic Innovation: Advance hybrid reasoning while pure systems stagnate
Cross-System Ecosystem Leadership: Become the de facto standard for mathematical object arbitration
Vocabulary & Knowledge Dominance: Maintain the most comprehensive mathematical knowledge base

Why Our Leadership is Sustainable

Unlike other proof systems that are constrained by their single-prover architecture, ECHIDNA’s multi-prover neurosymbolic design creates a virtuous cycle of advantage:

More Provers → More Cross-System Data → Better Arbitration → More Adoption → More Provers
More Data → Better Neural Guidance → More Successful Proofs → More Trust → More Adoption
More Systems → More Proof Exchange → Better Equivalence Detection → More Mathematical Knowledge → More Value

This cycle ensures that our prevailing advantage grows over time, making it increasingly difficult for single-prover systems to compete in cross-system mathematical reasoning.

Julia/Chapel Layer Interaction

See CORRECTNESS-ARCHITECTURE.md section on "Compute Layer Integration" for the recommended architecture unifying Julia ML, Chapel HPC, and Rust core.

Uh oh!

FilesExpand file tree

ROADMAP.adoc

Latest commit

History