@@ -493,6 +493,254 @@ repo-c ✓ - - ✓ - - - ✓ - - -
493493
494494The system CRG grade is determined by the lowest grade in the matrix, not the average.
495495
496+ === Cross-Repo Test Ideas (added 2026-04-17)
497+
498+ These tests span two or more repos and catch failure modes that single-repo blitzes
499+ cannot. Added after the echidna TypeDiscipline transition (commit `8f573f1`) exposed
500+ several drift classes that had been latent for weeks. Each idea lists: **what it tests**,
501+ **where it lives**, **why single-repo tests miss it**, and **estate applicability**.
502+
503+ ==== CR-1. Mirror-enum drift test
504+
505+ **What:** Property test asserting that any enum mirrored across repos stays in sync
506+ with its upstream source of truth. Consumer generates its mirror at test-time from
507+ a shared manifest (JSON / A2ML) that the upstream emits, then diffs against its own
508+ hand-maintained copy. Fails fast when they diverge.
509+
510+ **Where:** In each consumer repo's integration-test directory, plus a companion
511+ manifest at `src/<mirror-enum>.manifest.a2ml` in the upstream.
512+
513+ **Why single-repo misses it:** Upstream adds a variant, downstream mirror stays
514+ stale, consumer keeps compiling because downstream has `_ =>` fallback — drift goes
515+ undetected until a user hits the fallback path with unexpected results.
516+
517+ **Applicability:** `ProverKind` (echidna -> echidnabot + protocol-squisher-bridge);
518+ `Severity` / `Tier` / `BotId` (gitbot-fleet/shared-context -> each bot); future
519+ `TypeDiscipline` (echidna -> hypatia, panic-attack); `NodeKind` / `EdgeKind` in
520+ GNN schemas (echidna -> any ML consumer).
521+
522+ **Concrete gap today:** `echidnabot` and `protocol-squisher-echidna-bridge` both
523+ carry local mirror enums that already lag behind echidna main; silent today because
524+ matches use `_ =>`, loud tomorrow when someone relies on a specific variant.
525+
526+ ==== CR-2. Foreign-enum exhaustive-match lint
527+
528+ **What:** Custom clippy lint (or `cargo miri`-style analysis, or a grep-based CI
529+ job as MVP) that flags every `match` expression over an enum imported from an
530+ external crate that lacks an `_ =>` fallback. The grep MVP is one regex plus a list
531+ of extern-enum names.
532+
533+ **Where:** `developer-ecosystem/standards/hypatia-rules/lint-foreign-exhaustive.a2ml`
534+ as the declarative policy; enforced by hypatia in every downstream repo's CI.
535+
536+ **Why single-repo misses it:** An exhaustive match compiles fine today, but a
537+ one-variant addition in the upstream turns the downstream into a compile failure
538+ on the next `cargo build`. The issue surfaces as upstream's CI problem even though
539+ the fault is in the downstream's coupling decision.
540+
541+ **Applicability:** Every Rust repo with a path-dep on another hyperpolymath repo.
542+ Especially `echidna` consumers; any repo importing `katagoria::Discipline` once
543+ that exists; `gitbot-shared-context` consumers.
544+
545+ **Concrete gap today:** Echidna's `src/rust/main.rs` had three exhaustive matches
546+ over `ProverKind` that were silently wrong before the TypeDiscipline transition
547+ (they never covered the 12 original HP ecosystem variants) and broke loudly the
548+ moment 28 more were added. Nobody noticed for weeks.
549+
550+ ==== CR-3. FFI roundtrip bijection property
551+
552+ **What:** For every FFI boundary that serialises a Rust enum to a numeric or
553+ string representation, a property test asserting `decode(encode(x)) == x` for
554+ every variant — not just a hand-picked subset. Run in both directions where
555+ both are exposed.
556+
557+ **Where:** The originating crate's `tests/ffi_roundtrip.rs`, generated from the
558+ enum's `ALL` const (see reference implementation in echidna
559+ `src/rust/disciplines/mod.rs`).
560+
561+ **Why single-repo misses it:** When a table only tests the variants the author
562+ remembers, it silently tolerates missing entries. echidna's `kind_from_u8` was
563+ missing codes 49–60 for weeks before the TypeDiscipline transition; the
564+ existing roundtrip test only covered codes 0–48 and explicitly *asserted*
565+ `kind_from_u8(49).is_none()`, locking in the bug.
566+
567+ **Applicability:** Every `*_to_u8` / `*_from_u8` or `*_to_str` / `*_from_str`
568+ pair. Every CBOR-encoded proof-state field. Every GraphQL / gRPC / REST enum
569+ bridge. Every Zig ABI that names variants.
570+
571+ ==== CR-4. Shared-fixture corpus contract
572+
573+ **What:** Canonical golden fixtures live in ONE upstream repo (`katagoria/`,
574+ `standards/a2ml/examples/`, `echidna/training_data/`) and every downstream that
575+ consumes that format runs its parser / validator / type-checker over the
576+ *shared* corpus. When katagoria graduates a new discipline, the fixture is
577+ added once and every downstream's next CI run exercises it.
578+
579+ **Where:** Upstream hosts `fixtures/` with an index.a2ml. Downstream
580+ `tests/shared_corpus.rs` dereferences via path dep or git submodule.
581+
582+ **Why single-repo misses it:** Each consumer writes its own fixtures, they
583+ drift from the upstream format, and a format extension in the upstream doesn't
584+ fail any test until a real user trips over it.
585+
586+ **Applicability:** katagoria → echidna + typell + typed-wasm (type disciplines);
587+ a2ml → every repo that emits a2ml; avow-protocol → every repo that uses proven
588+ results; VeriSimDB → every repo that writes octads.
589+
590+ **Concrete gap today:** `katagoria/research/tropical/TropicalKleene.idr` is
591+ standalone; no downstream currently uses it as a fixture, so when tropical
592+ semantics shift, nothing breaks.
593+
594+ ==== CR-5. Claim-grounder adversarial cross-test
595+
596+ **What:** Each claim-grounder (invariant-path, PLASMA, Hypatia, ECHIDNA,
597+ panic-attacker) is run against proofs produced by the others, specifically
598+ constructed to hide violations or to appear cleaner than they are. This makes
599+ the dual-use posture (per the Akerlof memory note) operational: a grounder
600+ that cannot detect another grounder's escape hatches is not actually grounded.
601+
602+ **Where:** `verification-ecosystem/<grounder>/tests/adversarial_cross/`
603+ plus a meta-repo `verification-ecosystem/meta-adversarial-suite/` that
604+ coordinates the round-robin and publishes a weekly summary.
605+
606+ **Why single-repo misses it:** Every grounder tests itself against
607+ hand-picked fixtures by default. The adversarial signal — "tool X can hide
608+ from tool Y" — is only visible when you explicitly set them against each
609+ other.
610+
611+ **Applicability:** All five listed grounders; any future grounder (invariant-path,
612+ PLASMA, Hypatia, ECHIDNA, panic-attacker) immediately on admission.
613+
614+ ==== CR-6. Upstream-HEAD sentinel in downstream CI
615+
616+ **What:** Every downstream-of-a-path-dep repo runs a nightly CI job that
617+ pulls the upstream's `main` branch (overriding its Cargo.toml path dep or
618+ submodule) and runs the full test suite. Result emitted as a status check
619+ on the upstream's recent PRs via Hypatia / gitbot-fleet webhook.
620+
621+ **Where:** `.github/workflows/upstream-head-sentinel.yml` in downstream
622+ repos; Hypatia rule in `standards/hypatia-rules/upstream-sentinel.a2ml`.
623+
624+ **Why single-repo misses it:** The upstream's CI is green, the downstream's
625+ CI against the pinned version is green, but a merge that would break the
626+ downstream is invisible until the pin bumps — often weeks later.
627+
628+ **Applicability:** `echidna` consumers (echidnabot, panic-attack, hypatia,
629+ protocol-squisher-echidna-bridge); `shared-context` -> every gitbot;
630+ `a2ml-rs` -> every repo consuming A2ML; `proven` -> every repo consuming a
631+ formally-verified alternative.
632+
633+ **Concrete gap today:** Today's echidna commit `8f573f1` added 28 new
634+ ProverKind variants. Downstream echidnabot silently continues to compile
635+ (for now) but its dispatcher is mis-calibrated. An upstream-HEAD sentinel
636+ would have pinged immediately.
637+
638+ ==== CR-7. Workspace-aware blitz matrix
639+
640+ **What:** The blitz matrix (§ Part VI) distinguishes four cells per (repo,
641+ category): `in-isolation` (cargo test with this member only), `in-workspace`
642+ (cargo test at workspace root), `workspace-root-with-bins` (cargo test
643+ `--all-targets`), and `cross-workspace` (downstream workspace importing this
644+ as a path dep). Many bugs only appear in one of the four.
645+
646+ **Where:** An enriched matrix format in `.machine_readable/testing/blitz.a2ml`
647+ per repo, rolled up at the ecosystem level in `standards/dashboards/blitz.a2ml`.
648+
649+ **Why single-repo misses it:** `cargo test --lib` can pass while `cargo test
650+ --all-targets` fails on the same enum, because bin / test-target exhaustive
651+ matches are outside the lib's compilation unit. echidna hit exactly this today.
652+
653+ **Applicability:** All workspace-member repos (echidna, gitbot-fleet,
654+ developer-ecosystem monorepo, nextgen-languages monorepo, nextgen-databases).
655+
656+ ==== CR-8. Cross-prover proof-exchange roundtrip
657+
658+ **What:** For every proof-interchange format (OpenTheory, Dedukti, TSTP,
659+ Alethe, DRAT/LRAT, custom CBOR), a roundtrip test: prove theorem T in prover
660+ A, export to format F, re-import to prover B, re-check T. Must hold across
661+ {Coq, Lean, HOL4, HOL Light, Isabelle} × {OpenTheory, Dedukti} pairs at
662+ minimum.
663+
664+ **Where:** `echidna/tests/proof_exchange_roundtrip.rs` with the heavy lifting
665+ delegated to the `src/rust/exchange/` module. Parallel suite in any future
666+ proof-exchange host.
667+
668+ **Why single-repo misses it:** The exchange module has unit tests for each
669+ direction independently; nothing today asserts semantic preservation across
670+ the roundtrip.
671+
672+ **Applicability:** `echidna/src/rust/exchange/`; any repo that ships an
673+ exchange format bridge.
674+
675+ ==== CR-9. Schema-surface drift detector
676+
677+ **What:** Every public surface (Rust enum, JSON schema, A2ML schema, GraphQL
678+ type, Zig ABI) emits a machine-readable schema snapshot at build time to
679+ `.schemas/<surface>.a2ml`. Every PR's CI computes the new snapshot and diffs
680+ against the merged-main version; any change is flagged with a SEMVER
681+ classification (patch / minor / major).
682+
683+ **Where:** `just emit-schemas` recipe per repo; shared implementation in
684+ `developer-ecosystem/schema-snapshot-rs` (TBD).
685+
686+ **Why single-repo misses it:** Today a `pub enum` variant addition in the
687+ upstream is invisible to the downstream until someone imports the crate
688+ again. A schema snapshot makes the surface change a visible PR artifact.
689+
690+ **Applicability:** Every public-facing crate / library / protocol.
691+
692+ ==== CR-10. Stale-fixture-and-baseline decay alarm
693+
694+ **What:** Any test fixture, proof corpus sample, benchmark baseline, or
695+ regression gold file that has not been regenerated in N days (default 90)
696+ raises a decay alarm in the daily CI pulse. Bench baselines specifically
697+ decay faster (30 days) because upstream prover performance shifts.
698+
699+ **Where:** `standards/hypatia-rules/fixture-decay.a2ml` + per-repo
700+ `.machine_readable/fixtures.a2ml` manifest listing every fixture with its
701+ last-regenerated timestamp.
702+
703+ **Why single-repo misses it:** Tests keep passing against stale fixtures
704+ that no longer reflect real inputs. Benchmarks pass against baselines that
705+ no longer reflect current performance. Drift is slow and undetected.
706+
707+ **Applicability:** echidna (`training_data/*.jsonl`, `premises_merged.jsonl`,
708+ etc.); proven (Idris2 proof libraries); a2ml (grammar examples); any repo
709+ with a `fixtures/` or `corpora/` directory.
710+
711+ === Weak Points Spotted in the Current Taxonomy (2026-04-17)
712+
713+ Recorded at the same time as the cross-repo ideas above, things that feel
714+ under-specified in this document itself:
715+
716+ - **No "coupling test" category.** Tests that assert drift-freedom between
717+ two independent artefacts (mirror enums, redundant parsers, serde + custom
718+ Display, etc.) don't fit cleanly into the 16 categories. Proposal: add
719+ Category 17 "Coupling / Drift-freedom" in v1.1.
720+ - **Proof-regression is single-prover.** Category 16 catches "Coq no longer
721+ accepts proof P" but not "Coq and Lean give different counterexamples for
722+ the same formula". Cross-prover regression deserves its own sub-category.
723+ - **Benchmark taxonomy is single-process.** Six Sigma classification
724+ (§ Part IV) applies to one binary's hot path. There is no equivalent for
725+ end-to-end pipelines (panic-attack → hypatia → echidna → verisim) where
726+ the interesting latency is the tail across all four.
727+ - **CRG mapping doesn't include cross-repo rows.** § "CRG ↔ Test Category
728+ Mapping" maps repo-local categories to grades. A repo with mirror-enum
729+ drift or FFI roundtrip gaps has a visible-externally failure mode that
730+ its in-repo blitz will not catch. Proposal: a new "Ecosystem CRG" rollup
731+ grade that can only be A if all CR-* cross-repo checks pass.
732+ - **Dogfooding is listed (§ Part VII) but not adversarial.** Dogfooding
733+ asks "does the author use their own tool?"; CR-5 (claim-grounder
734+ adversarial cross-test) asks the much harder "can the tool fool its
735+ peers?". The latter is closer to real robustness.
736+ - **"Shared fixtures" are implied but never required.** Several repos
737+ (katagoria, a2ml, avow-protocol) should host canonical fixtures that
738+ every downstream consumes, but the taxonomy doesn't elevate this to a
739+ first-class obligation. CR-4 above proposes a fix.
740+ - **"Run time" categories don't include warm-boot vs cold-boot separation.**
741+ Many bugs (FFI init, plugin registry, GNN client handshake) only appear
742+ once per process lifetime. A cold-start test category is missing.
743+
496744== Appendix A: Test Category Quick Reference
497745
498746[cols="1,1,1"]
0 commit comments