- Leadership Position & Prevailing Advantage
- Current Status (v2.0.0)
- v1.0 — Core Infrastructure (COMPLETE)
- v1.2 — Original 12 Provers (COMPLETE)
- v1.3 — Integration and Polish (COMPLETE)
- v1.4 — Interfaces and Extended Provers (COMPLETE)
- v1.5 — Trust & Safety Hardening (COMPLETE)
- Task 1: Solver Binary Integrity Verification
- Task 2: SMT Portfolio Solving (Cross-Checking)
- Task 3: Proof Certificate Checking
- Task 4: Axiom Usage Tracking
- Task 5: Solver Sandboxing
- Task 6: 5-Level Trust Hierarchy
- Task 7: Mutation Testing
- Task 8: Prover Dispatch Pipeline
- Task 9: Property-Based Testing Expansion
- Task 10: Cross-Prover Proof Exchange
- Task 11: Fix Metadata
- Task 12: New Prover Backends (13 new, total 30)
- Task 13: Pareto Optimality + Statistical Tracking
- v2.0 — Core Integration + Formal Verification (COMPLETE)
- v2.1 — GNN Integration (IN PROGRESS)
- v3.0 — Autonomous Proving & Leadership Extension
- v4.0 — Mathematical Object Arbitration Dominance
- Why Our Leadership is Sustainable
- Julia/Chapel Layer Interaction
ECHIDNA maintains unmatched leadership in the theorem proving ecosystem through our prevailing advantage — a unique combination that no other system can replicate:
-
48+ Prover Backends: The only system bridging multiple proof ecosystems (Isabelle, Coq, Lean, Z3, etc.)
-
Cross-Prover Arbitration: The only system actively solving mathematical object identity across heterogeneous systems
-
Neurosymbolic Hybrid Architecture: The only system combining AI guidance with formal verification
-
7-Stage Trust Pipeline: The most comprehensive verification infrastructure in existence
-
Universal Proof Exchange: The only system integrating both OpenTheory and Dedukti
-
Large-Scale Vocabulary: The largest managed vocabulary (992K+ terms) for mathematical reasoning
-
Comprehensive Axiom Tracking: The most sophisticated axiom monitoring across provers
-
Bayesian Confidence Scoring: The only system with probabilistic trust assessment across multiple provers
Our roadmap extends this prevailing advantage by deepening these unique capabilities while other systems remain constrained to single-prover architectures.
-
48 prover backends operational across 10 tiers — unmatched breadth
-
Trust & safety hardening complete (13 tasks) — unmatched verification depth
-
638+ tests passing (528 unit, 38 integration, 21 property-based, + interface tests)
-
3 API interfaces consolidated into monorepo (GraphQL, gRPC, REST)
-
Agda meta-checker: 30+ formally verified properties of trust pipeline — unmatched formal guarantees
-
Criterion benchmarks: 13 functions covering all critical paths
-
FFI bridge: complete C-compatible API for all 48 provers
-
Julia ML layer with logistic regression tactic prediction
-
Chapel parallel dispatch layer
-
ReScript UI: 33 compiled components (zero TypeScript)
-
Training data: 332 proofs, 1,603 tactics
-
License standardised: PMPL-1.0-or-later throughout
-
✓ Initial Rust core with ProverBackend trait
-
✓ 9/12 original prover backends
-
✓ Core types (Term, ProofState, Tactic, Goal, Context)
-
✓ Aspect tagging system
-
✓ Neural solver integration (Julia HTTP)
-
✓ All 12 core provers: Agda, Coq, Lean, Isabelle, Z3, CVC5, Metamath, HOL Light, Mizar, PVS, ACL2, HOL4
-
✓ Example proof libraries per prover
-
✓ Training data extraction (332 proofs, 1,603 tactics)
-
✓ Documentation update
-
✓ 99 unit tests + 38 integration tests
-
✓ Neural training pipeline
-
✓ ReScript UI functional (33 files, zero TypeScript)
-
✓ RSR/CCCP compliance templates
-
✓ 5 new provers (Vampire, E Prover, SPASS, Alt-Ergo, Idris2) — total 17
-
✓ GraphQL interface (async-graphql, port 8081)
-
✓ gRPC interface (tonic + Protocol Buffers, port 50051)
-
✓ REST interface (axum + OpenAPI/Swagger, port 8000)
-
✓ All 3 layers (Rust, Julia, Chapel) support 17 provers
-
✓ Interface consolidation into
src/interfaces/
-
✓ SHAKE3-512 provenance hashing (64-byte output via tiny-keccak)
-
✓ BLAKE3 fast runtime re-verification
-
✓ TOML manifest for known-good solver hashes
-
✓ Auto-detect solver paths with fallback and PATH lookup
-
✓ Status tracking: Verified, Tampered, Missing, Uninitialized, Unchecked
-
✓ PortfolioSolver with configurable solver pools (SMT, ATP, ITP)
-
✓ Reconciliation logic: CrossChecked, SingleSolver, Inconclusive, AllTimedOut
-
✓ Disagreement flagging for human review
-
✓ Default pools: Z3/CVC5/Alt-Ergo (SMT), Vampire/E (ATP), Lean/Coq (ITP)
-
✓ Alethe format verification (structural validation)
-
✓ DRAT/LRAT certificate checking (via drat-trim)
-
✓ TSTP, Lean4 kernel, Coq kernel format support
-
✓ BLAKE3-hashed certificate storage for audit trails
-
✓ CertificateVerifier with pluggable external checkers
-
✓ 4-level danger classification: Safe, Noted, Warning, Reject
-
✓ Prover-specific dangerous patterns (Lean sorry, Coq Admitted, Agda --type-in-type, HOL4 mk_thm, Idris2 believe_me, etc.)
-
✓ Comment-aware scanning (skips comments per prover syntax)
-
✓ AxiomPolicy enforcement (Clean, ClassicalAxioms, IncompleteProof, Rejected)
-
✓ Podman container isolation (--network=none, --read-only, memory/CPU/disk limits)
-
✓ Bubblewrap namespace isolation (fallback)
-
✓ Unsandboxed mode (dev only, explicit opt-in with warning)
-
✓ Auto-detection of strongest available sandbox
-
✓ Async execution with timeout enforcement
-
✓ TrustLevel enum (Level1 through Level5) with Ord
-
✓ TrustFactors computation from prover, certificates, axioms, integrity
-
✓ Small-kernel prover detection (Lean, Coq, Isabelle, Agda, Metamath, HOL Light, HOL4, Idris2, F*, Twelf, Nuprl, Minlog)
-
✓ Dangerous axioms and failed integrity always cap at Level1
-
✓ 6 mutation kinds: RemovePrecondition, WeakenPostcondition, AddDisjunct, RemoveHypothesis, ReplaceConstant, NegateSubterm
-
✓ Mutation generation from Term AST
-
✓ Mutation score computation with configurable threshold (default 95%)
-
✓ MutationTestSummary with per-mutation results
-
✓ ProverDispatcher: create → parse → verify → axiom scan → confidence scoring
-
✓ Single-prover and cross-checked (portfolio) modes
-
✓ Configurable minimum trust level
-
✓ Content-based and extension-based prover auto-selection
-
✓ DispatchResult with full audit information
-
✓ PropTest strategies for trust hardening modules
-
✓ 21 property-based tests for core invariants
-
✓ Total test count: 306+
-
✓ OpenTheory exporter (HOL4, HOL Light, Isabelle/HOL interop)
-
✓ Dedukti/Lambdapi exporter (universal proof format)
-
✓ Article and module representations with serialisation
-
✓ F*, Dafny, Why3 (dependent types + effects, auto-active, orchestration)
-
✓ TLAPS, Twelf, Nuprl, Minlog, Imandra (specialised / niche)
-
✓ GLPK, SCIP, MiniZinc, Chuffed, OR-Tools (constraint solvers)
-
✓ ProverKind enum, ProverFactory, file extension detection for all 30
-
✓ ParetoFrontier: dominance checking, frontier computation, weighted ranking
-
✓ StatisticsTracker: per-prover per-domain stats, Bayesian timeout estimation
-
✓ Wilson score confidence intervals for mutation scores
-
✓ JSON serialisation for persistence
-
✓ Prover ranking by composite score
-
✓ FFI/IPC bridge for all 48 provers (kind_from_u8/kind_to_u8 roundtrip-verified)
-
✓ C-compatible API (echidna_init, echidna_create_prover, echidna_parse_string, etc.)
-
✓ GraphQL/gRPC/REST call real prover backends via ProverFactory
-
✓ TrustLevel: total order proofs (reflexive, antisymmetric, transitive, total)
-
✓ AxiomSafety: policy ordering, worst-case composition (commutative, associative)
-
✓ Portfolio: cross-checking improvement proof, disagreement detection
-
✓ Dispatch: integrity failure → Level1, dangerous axioms → Level1, determinism
-
✓ 30+ formally verified properties, 0 postulates/sorry/believe_me
-
✓ Proof graph builder (
src/rust/gnn/graph.rs): converts ProofState to typed directed graph -
✓ 7 node kinds (Goal, Hypothesis, Premise, Subterm, TypeExpr, Binder, Constant)
-
✓ 8 edge kinds (Contains, References, Implies, HasType, BindsOver, SharedStructure, DependsOn, UsefulFor)
-
✓ Recursive term expansion with configurable depth limit
-
✓ Constant deduplication across terms
-
✓ Shared-structure edge detection between premises (Jaccard similarity)
-
✓ Sparse adjacency matrix export for Julia GNN encoder
-
✓ Local feature extraction (
src/rust/gnn/embeddings.rs): 32-dim feature vectors -
✓ Node kind one-hot + term kind one-hot + structural features
-
✓ Symbol frequency counting (normalised across graph)
-
✓ Standalone term embedding with cosine similarity
-
✓ Two-pass feature extraction (frequency counting then encoding)
-
✓ HTTP client (
src/rust/gnn/client.rs) for Julia ML server -
✓ JSON serialisation matching Julia TheoremGraph format
-
✓ Graceful degradation when server unavailable
-
✓ Health check, rank, and embed endpoints
-
✓ Hybrid search strategy (
src/rust/gnn/guided_search.rs) -
✓ GNN + symbolic score combination with configurable weights (default 70/30)
-
✓ Tactic inference from premise names (Apply, Rewrite, Induction, Cases)
-
✓ Premise embedding cache for repeated queries
-
✓ Search statistics tracking (total calls, GNN vs fallback)
-
✓ POST /gnn/rank endpoint (
src/julia/api/gnn_endpoint.jl) -
✓ GET /gnn/health endpoint
-
✓ Cosine similarity fallback when model not trained
-
✓ Route registration for existing API server
-
✓
src/abi/EchidnaABI/Gnn.idr: 7 proven properties -
✓ Feature dimension positivity proof
-
✓ Sparse adjacency length consistency
-
✓ Node/edge kind index injectivity (no collisions)
-
✓ Depth expansion monotonicity (termination)
-
✓ DependsOn/UsefulFor duality
-
✓ Combined score boundedness
-
✓ 0 believe_me, 0 postulates
-
✓ 28 new unit tests for GNN module
-
✓ 0 clippy warnings on GNN code
-
✓
#![forbid(unsafe_code)]on all GNN modules
-
❏ Train GNN encoder on proof graphs (Flux.jl)
-
❏ Train Transformer premise selector
-
❏ Wire GNN scores into ProverDispatcher confidence
-
❏ Fix
/api/verifySMT-LIB handling so valid user queries are not rejected as malformed proof obligations (Issue #10: #10) -
❏ Align
Containerfile.fullclaims with shipped prover set (either include all claimed tier 1-4 provers or narrow wording) (Issue #11: #11)
Our v3.0 roadmap deepens our prevailing advantage by extending capabilities that no other system can match:
-
❏ Automated Cross-Prover Theorem Discovery: Automatically find equivalent theorems across different proof systems (extending our unique arbitration leadership)
-
❏ Cross-Domain Knowledge Transfer: Transfer mathematical knowledge between provers (leveraging our 48-backend architecture)
-
❏ Proof Repair with Multi-Prover Validation: Use multiple provers to repair and validate failing proofs (extending our trust pipeline advantage)
-
❏ Mathlib/AFP/Coq stdlib Integration: Unified access to major theorem libraries across systems (deepening our cross-system leadership)
-
❏ Cloud Deployment with GPU Acceleration: Scale our neurosymbolic architecture to handle larger problems
To ensure we maintain and extend our leadership position, we focus on:
-
Cross-Prover Capabilities: Continuously enhance our unique ability to bridge multiple proof systems
-
Trust Infrastructure Depth: Extend our 7-stage pipeline with additional verification layers
-
Neurosymbolic Integration: Deepen the hybrid AI-formal reasoning capabilities
-
Vocabulary Expansion: Grow our mathematical knowledge base beyond current 992K+ terms
-
Proof Exchange Leadership: Strengthen OpenTheory/Dedukti integration for universal proof interoperability
-
❏ Automated Theorem Equivalence Detection: AI-powered detection of equivalent theorems across systems
-
❏ Mathematical Identity Resolution Service: Cloud service for cross-system theorem identification
-
❏ Proof System Interoperability Hub: Central hub connecting all major proof systems
-
❏ Universal Mathematical Ontology: Unified ontology bridging different mathematical foundations
-
❏ Cross-System Confidence Arbitration: Bayesian trust scoring across heterogeneous provers
Our long-term strategy ensures ECHIDNA remains the unchallenged leader in cross-system theorem proving:
-
Continuous Prover Expansion: Add new backends while others remain single-system
-
Deepening Trust Infrastructure: Extend verification capabilities beyond what single systems can offer
-
Neurosymbolic Innovation: Advance hybrid reasoning while pure systems stagnate
-
Cross-System Ecosystem Leadership: Become the de facto standard for mathematical object arbitration
-
Vocabulary & Knowledge Dominance: Maintain the most comprehensive mathematical knowledge base
Unlike other proof systems that are constrained by their single-prover architecture, ECHIDNA’s multi-prover neurosymbolic design creates a virtuous cycle of advantage:
-
More Provers → More Cross-System Data → Better Arbitration → More Adoption → More Provers
-
More Data → Better Neural Guidance → More Successful Proofs → More Trust → More Adoption
-
More Systems → More Proof Exchange → Better Equivalence Detection → More Mathematical Knowledge → More Value
This cycle ensures that our prevailing advantage grows over time, making it increasingly difficult for single-prover systems to compete in cross-system mathematical reasoning.