Skip to content

Latest commit

 

History

History
284 lines (224 loc) · 14.7 KB

File metadata and controls

284 lines (224 loc) · 14.7 KB

ECHIDNA Roadmap

Table of Contents

Leadership Position & Prevailing Advantage

ECHIDNA maintains unmatched leadership in the theorem proving ecosystem through our prevailing advantage — a unique combination that no other system can replicate:

  • 48+ Prover Backends: The only system bridging multiple proof ecosystems (Isabelle, Coq, Lean, Z3, etc.)

  • Cross-Prover Arbitration: The only system actively solving mathematical object identity across heterogeneous systems

  • Neurosymbolic Hybrid Architecture: The only system combining AI guidance with formal verification

  • 7-Stage Trust Pipeline: The most comprehensive verification infrastructure in existence

  • Universal Proof Exchange: The only system integrating both OpenTheory and Dedukti

  • Large-Scale Vocabulary: The largest managed vocabulary (992K+ terms) for mathematical reasoning

  • Comprehensive Axiom Tracking: The most sophisticated axiom monitoring across provers

  • Bayesian Confidence Scoring: The only system with probabilistic trust assessment across multiple provers

Our roadmap extends this prevailing advantage by deepening these unique capabilities while other systems remain constrained to single-prover architectures.

Current Status (v2.0.0)

  • 48 prover backends operational across 10 tiers — unmatched breadth

  • Trust & safety hardening complete (13 tasks) — unmatched verification depth

  • 638+ tests passing (528 unit, 38 integration, 21 property-based, + interface tests)

  • 3 API interfaces consolidated into monorepo (GraphQL, gRPC, REST)

  • Agda meta-checker: 30+ formally verified properties of trust pipeline — unmatched formal guarantees

  • Criterion benchmarks: 13 functions covering all critical paths

  • FFI bridge: complete C-compatible API for all 48 provers

  • Julia ML layer with logistic regression tactic prediction

  • Chapel parallel dispatch layer

  • ReScript UI: 33 compiled components (zero TypeScript)

  • Training data: 332 proofs, 1,603 tactics

  • License standardised: PMPL-1.0-or-later throughout

v1.0 — Core Infrastructure (COMPLETE)

  • ✓ Initial Rust core with ProverBackend trait

  • ✓ 9/12 original prover backends

  • ✓ Core types (Term, ProofState, Tactic, Goal, Context)

  • ✓ Aspect tagging system

  • ✓ Neural solver integration (Julia HTTP)

v1.2 — Original 12 Provers (COMPLETE)

  • ✓ All 12 core provers: Agda, Coq, Lean, Isabelle, Z3, CVC5, Metamath, HOL Light, Mizar, PVS, ACL2, HOL4

  • ✓ Example proof libraries per prover

  • ✓ Training data extraction (332 proofs, 1,603 tactics)

  • ✓ Documentation update

v1.3 — Integration and Polish (COMPLETE)

  • ✓ 99 unit tests + 38 integration tests

  • ✓ Neural training pipeline

  • ✓ ReScript UI functional (33 files, zero TypeScript)

  • ✓ RSR/CCCP compliance templates

v1.4 — Interfaces and Extended Provers (COMPLETE)

  • ✓ 5 new provers (Vampire, E Prover, SPASS, Alt-Ergo, Idris2) — total 17

  • ✓ GraphQL interface (async-graphql, port 8081)

  • ✓ gRPC interface (tonic + Protocol Buffers, port 50051)

  • ✓ REST interface (axum + OpenAPI/Swagger, port 8000)

  • ✓ All 3 layers (Rust, Julia, Chapel) support 17 provers

  • ✓ Interface consolidation into src/interfaces/

v1.5 — Trust & Safety Hardening (COMPLETE)

Task 1: Solver Binary Integrity Verification

  • ✓ SHAKE3-512 provenance hashing (64-byte output via tiny-keccak)

  • ✓ BLAKE3 fast runtime re-verification

  • ✓ TOML manifest for known-good solver hashes

  • ✓ Auto-detect solver paths with fallback and PATH lookup

  • ✓ Status tracking: Verified, Tampered, Missing, Uninitialized, Unchecked

Task 2: SMT Portfolio Solving (Cross-Checking)

  • ✓ PortfolioSolver with configurable solver pools (SMT, ATP, ITP)

  • ✓ Reconciliation logic: CrossChecked, SingleSolver, Inconclusive, AllTimedOut

  • ✓ Disagreement flagging for human review

  • ✓ Default pools: Z3/CVC5/Alt-Ergo (SMT), Vampire/E (ATP), Lean/Coq (ITP)

Task 3: Proof Certificate Checking

  • ✓ Alethe format verification (structural validation)

  • ✓ DRAT/LRAT certificate checking (via drat-trim)

  • ✓ TSTP, Lean4 kernel, Coq kernel format support

  • ✓ BLAKE3-hashed certificate storage for audit trails

  • ✓ CertificateVerifier with pluggable external checkers

Task 4: Axiom Usage Tracking

  • ✓ 4-level danger classification: Safe, Noted, Warning, Reject

  • ✓ Prover-specific dangerous patterns (Lean sorry, Coq Admitted, Agda --type-in-type, HOL4 mk_thm, Idris2 believe_me, etc.)

  • ✓ Comment-aware scanning (skips comments per prover syntax)

  • ✓ AxiomPolicy enforcement (Clean, ClassicalAxioms, IncompleteProof, Rejected)

Task 5: Solver Sandboxing

  • ✓ Podman container isolation (--network=none, --read-only, memory/CPU/disk limits)

  • ✓ Bubblewrap namespace isolation (fallback)

  • ✓ Unsandboxed mode (dev only, explicit opt-in with warning)

  • ✓ Auto-detection of strongest available sandbox

  • ✓ Async execution with timeout enforcement

Task 6: 5-Level Trust Hierarchy

  • ✓ TrustLevel enum (Level1 through Level5) with Ord

  • ✓ TrustFactors computation from prover, certificates, axioms, integrity

  • ✓ Small-kernel prover detection (Lean, Coq, Isabelle, Agda, Metamath, HOL Light, HOL4, Idris2, F*, Twelf, Nuprl, Minlog)

  • ✓ Dangerous axioms and failed integrity always cap at Level1

Task 7: Mutation Testing

  • ✓ 6 mutation kinds: RemovePrecondition, WeakenPostcondition, AddDisjunct, RemoveHypothesis, ReplaceConstant, NegateSubterm

  • ✓ Mutation generation from Term AST

  • ✓ Mutation score computation with configurable threshold (default 95%)

  • ✓ MutationTestSummary with per-mutation results

Task 8: Prover Dispatch Pipeline

  • ✓ ProverDispatcher: create → parse → verify → axiom scan → confidence scoring

  • ✓ Single-prover and cross-checked (portfolio) modes

  • ✓ Configurable minimum trust level

  • ✓ Content-based and extension-based prover auto-selection

  • ✓ DispatchResult with full audit information

Task 9: Property-Based Testing Expansion

  • ✓ PropTest strategies for trust hardening modules

  • ✓ 21 property-based tests for core invariants

  • ✓ Total test count: 306+

Task 10: Cross-Prover Proof Exchange

  • ✓ OpenTheory exporter (HOL4, HOL Light, Isabelle/HOL interop)

  • ✓ Dedukti/Lambdapi exporter (universal proof format)

  • ✓ Article and module representations with serialisation

Task 11: Fix Metadata

  • ✓ STATE.scm, ECOSYSTEM.scm, META.scm updated

  • ✓ Version numbers aligned

Task 12: New Prover Backends (13 new, total 30)

  • ✓ F*, Dafny, Why3 (dependent types + effects, auto-active, orchestration)

  • ✓ TLAPS, Twelf, Nuprl, Minlog, Imandra (specialised / niche)

  • ✓ GLPK, SCIP, MiniZinc, Chuffed, OR-Tools (constraint solvers)

  • ✓ ProverKind enum, ProverFactory, file extension detection for all 30

Task 13: Pareto Optimality + Statistical Tracking

  • ✓ ParetoFrontier: dominance checking, frontier computation, weighted ranking

  • ✓ StatisticsTracker: per-prover per-domain stats, Bayesian timeout estimation

  • ✓ Wilson score confidence intervals for mutation scores

  • ✓ JSON serialisation for persistence

  • ✓ Prover ranking by composite score

v2.0 — Core Integration + Formal Verification (COMPLETE)

Core Integration Layer (COMPLETE)

  • ✓ FFI/IPC bridge for all 48 provers (kind_from_u8/kind_to_u8 roundtrip-verified)

  • ✓ C-compatible API (echidna_init, echidna_create_prover, echidna_parse_string, etc.)

  • ✓ GraphQL/gRPC/REST call real prover backends via ProverFactory

Agda Meta-Checker (COMPLETE)

  • ✓ TrustLevel: total order proofs (reflexive, antisymmetric, transitive, total)

  • ✓ AxiomSafety: policy ordering, worst-case composition (commutative, associative)

  • ✓ Portfolio: cross-checking improvement proof, disagreement detection

  • ✓ Dispatch: integrity failure → Level1, dangerous axioms → Level1, determinism

  • ✓ 30+ formally verified properties, 0 postulates/sorry/believe_me

Benchmarks + Testing (COMPLETE)

  • ✓ Criterion.rs benchmarks: 13 functions (core, provers, trust, verification, FFI)

  • ✓ 528 unit tests (was 232), 38 integration, 21 property-based

  • ✓ 0 clippy warnings, 0 compiler errors on stable Rust

v2.1 — GNN Integration (IN PROGRESS)

GNN Proof Graph Construction (COMPLETE)

  • ✓ Proof graph builder (src/rust/gnn/graph.rs): converts ProofState to typed directed graph

  • ✓ 7 node kinds (Goal, Hypothesis, Premise, Subterm, TypeExpr, Binder, Constant)

  • ✓ 8 edge kinds (Contains, References, Implies, HasType, BindsOver, SharedStructure, DependsOn, UsefulFor)

  • ✓ Recursive term expansion with configurable depth limit

  • ✓ Constant deduplication across terms

  • ✓ Shared-structure edge detection between premises (Jaccard similarity)

  • ✓ Sparse adjacency matrix export for Julia GNN encoder

GNN Term Embeddings (COMPLETE)

  • ✓ Local feature extraction (src/rust/gnn/embeddings.rs): 32-dim feature vectors

  • ✓ Node kind one-hot + term kind one-hot + structural features

  • ✓ Symbol frequency counting (normalised across graph)

  • ✓ Standalone term embedding with cosine similarity

  • ✓ Two-pass feature extraction (frequency counting then encoding)

GNN Inference Client (COMPLETE)

  • ✓ HTTP client (src/rust/gnn/client.rs) for Julia ML server

  • ✓ JSON serialisation matching Julia TheoremGraph format

  • ✓ Graceful degradation when server unavailable

  • ✓ Health check, rank, and embed endpoints

GNN-Guided Proof Search (COMPLETE)

  • ✓ Hybrid search strategy (src/rust/gnn/guided_search.rs)

  • ✓ GNN + symbolic score combination with configurable weights (default 70/30)

  • ✓ Tactic inference from premise names (Apply, Rewrite, Induction, Cases)

  • ✓ Premise embedding cache for repeated queries

  • ✓ Search statistics tracking (total calls, GNN vs fallback)

Julia GNN Server Endpoints (COMPLETE)

  • ✓ POST /gnn/rank endpoint (src/julia/api/gnn_endpoint.jl)

  • ✓ GET /gnn/health endpoint

  • ✓ Cosine similarity fallback when model not trained

  • ✓ Route registration for existing API server

Idris2 Formal Proofs for GNN (COMPLETE)

  • src/abi/EchidnaABI/Gnn.idr: 7 proven properties

  • ✓ Feature dimension positivity proof

  • ✓ Sparse adjacency length consistency

  • ✓ Node/edge kind index injectivity (no collisions)

  • ✓ Depth expansion monotonicity (termination)

  • ✓ DependsOn/UsefulFor duality

  • ✓ Combined score boundedness

  • ✓ 0 believe_me, 0 postulates

Testing (COMPLETE)

  • ✓ 28 new unit tests for GNN module

  • ✓ 0 clippy warnings on GNN code

  • #![forbid(unsafe_code)] on all GNN modules

Remaining (v2.1)

  • ❏ Train GNN encoder on proof graphs (Flux.jl)

  • ❏ Train Transformer premise selector

  • ❏ Wire GNN scores into ProverDispatcher confidence

  • ❏ Fix /api/verify SMT-LIB handling so valid user queries are not rejected as malformed proof obligations (Issue #10: #10)

  • ❏ Align Containerfile.full claims with shipped prover set (either include all claimed tier 1-4 provers or narrow wording) (Issue #11: #11)

Remaining (v2.2+)

  • ❏ Chapel → Rust C FFI bridge

  • ❏ OpenCyc domain knowledge integration

  • ❏ Tamarin/ProVerif bridge for cipherbot

v3.0 — Autonomous Proving & Leadership Extension

Our v3.0 roadmap deepens our prevailing advantage by extending capabilities that no other system can match:

  • Automated Cross-Prover Theorem Discovery: Automatically find equivalent theorems across different proof systems (extending our unique arbitration leadership)

  • Cross-Domain Knowledge Transfer: Transfer mathematical knowledge between provers (leveraging our 48-backend architecture)

  • Proof Repair with Multi-Prover Validation: Use multiple provers to repair and validate failing proofs (extending our trust pipeline advantage)

  • Mathlib/AFP/Coq stdlib Integration: Unified access to major theorem libraries across systems (deepening our cross-system leadership)

  • Cloud Deployment with GPU Acceleration: Scale our neurosymbolic architecture to handle larger problems

Strategic Advantage Maintenance

To ensure we maintain and extend our leadership position, we focus on:

  1. Cross-Prover Capabilities: Continuously enhance our unique ability to bridge multiple proof systems

  2. Trust Infrastructure Depth: Extend our 7-stage pipeline with additional verification layers

  3. Neurosymbolic Integration: Deepen the hybrid AI-formal reasoning capabilities

  4. Vocabulary Expansion: Grow our mathematical knowledge base beyond current 992K+ terms

  5. Proof Exchange Leadership: Strengthen OpenTheory/Dedukti integration for universal proof interoperability

v4.0 — Mathematical Object Arbitration Dominance

  • Automated Theorem Equivalence Detection: AI-powered detection of equivalent theorems across systems

  • Mathematical Identity Resolution Service: Cloud service for cross-system theorem identification

  • Proof System Interoperability Hub: Central hub connecting all major proof systems

  • Universal Mathematical Ontology: Unified ontology bridging different mathematical foundations

  • Cross-System Confidence Arbitration: Bayesian trust scoring across heterogeneous provers

Long-Term Prevailing Advantage Strategy

Our long-term strategy ensures ECHIDNA remains the unchallenged leader in cross-system theorem proving:

  • Continuous Prover Expansion: Add new backends while others remain single-system

  • Deepening Trust Infrastructure: Extend verification capabilities beyond what single systems can offer

  • Neurosymbolic Innovation: Advance hybrid reasoning while pure systems stagnate

  • Cross-System Ecosystem Leadership: Become the de facto standard for mathematical object arbitration

  • Vocabulary & Knowledge Dominance: Maintain the most comprehensive mathematical knowledge base

Why Our Leadership is Sustainable

Unlike other proof systems that are constrained by their single-prover architecture, ECHIDNA’s multi-prover neurosymbolic design creates a virtuous cycle of advantage:

  1. More Provers → More Cross-System Data → Better Arbitration → More Adoption → More Provers

  2. More Data → Better Neural Guidance → More Successful Proofs → More Trust → More Adoption

  3. More Systems → More Proof Exchange → Better Equivalence Detection → More Mathematical Knowledge → More Value

This cycle ensures that our prevailing advantage grows over time, making it increasingly difficult for single-prover systems to compete in cross-system mathematical reasoning.

Julia/Chapel Layer Interaction

See CORRECTNESS-ARCHITECTURE.md section on "Compute Layer Integration" for the recommended architecture unifying Julia ML, Chapel HPC, and Rust core.