Skip to content

Commit 2e95c6e

Browse files
hyperpolymathclaude
andcommitted
docs(testing): add cross-repo test ideas + weak-point audit to TESTING-TAXONOMY
Extends Part VI (Cross-Repo Blitz Rule) with ten concrete cross-repo test ideas, all catching failure modes that single-repo blitzes cannot. Added after the echidna TypeDiscipline transition exposed several drift classes that had been latent for weeks — the examples throughout the new section draw on what we found today. New subsections in Part VI: CR-1 Mirror-enum drift test CR-2 Foreign-enum exhaustive-match lint CR-3 FFI roundtrip bijection property CR-4 Shared-fixture corpus contract CR-5 Claim-grounder adversarial cross-test (dual-use Akerlof principle) CR-6 Upstream-HEAD sentinel in downstream CI CR-7 Workspace-aware blitz matrix (four-cell per-category) CR-8 Cross-prover proof-exchange roundtrip CR-9 Schema-surface drift detector CR-10 Stale-fixture-and-baseline decay alarm Each idea lists what it tests, where it lives, why single-repo tests miss it, and estate applicability. Concrete gaps observed today are called out inline so the document doubles as a diagnosis rather than just a checklist. Also adds a "Weak Points Spotted" appendix listing under-specified aspects of the current taxonomy itself: - No "coupling / drift-freedom" test category — proposed for v1.1. - Proof-regression is single-prover; no cross-prover counterexample drift. - Benchmark taxonomy is single-process; no end-to-end pipeline Six Sigma. - CRG mapping has no cross-repo rollup grade. - Dogfooding is listed but not adversarial (CR-5 is the upgrade). - Shared fixtures are implied but not a first-class obligation. - No warm-boot vs cold-boot separation in runtime tests. No structural reorganisation of the doc — all additions land inside the existing Part VI and before Appendix A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0368f96 commit 2e95c6e

1 file changed

Lines changed: 248 additions & 0 deletions

File tree

testing-and-benchmarking/TESTING-TAXONOMY.adoc

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -493,6 +493,254 @@ repo-c ✓ - - ✓ - - - ✓ - - -
493493

494494
The system CRG grade is determined by the lowest grade in the matrix, not the average.
495495

496+
=== Cross-Repo Test Ideas (added 2026-04-17)
497+
498+
These tests span two or more repos and catch failure modes that single-repo blitzes
499+
cannot. Added after the echidna TypeDiscipline transition (commit `8f573f1`) exposed
500+
several drift classes that had been latent for weeks. Each idea lists: **what it tests**,
501+
**where it lives**, **why single-repo tests miss it**, and **estate applicability**.
502+
503+
==== CR-1. Mirror-enum drift test
504+
505+
**What:** Property test asserting that any enum mirrored across repos stays in sync
506+
with its upstream source of truth. Consumer generates its mirror at test-time from
507+
a shared manifest (JSON / A2ML) that the upstream emits, then diffs against its own
508+
hand-maintained copy. Fails fast when they diverge.
509+
510+
**Where:** In each consumer repo's integration-test directory, plus a companion
511+
manifest at `src/<mirror-enum>.manifest.a2ml` in the upstream.
512+
513+
**Why single-repo misses it:** Upstream adds a variant, downstream mirror stays
514+
stale, consumer keeps compiling because downstream has `_ =>` fallback — drift goes
515+
undetected until a user hits the fallback path with unexpected results.
516+
517+
**Applicability:** `ProverKind` (echidna -> echidnabot + protocol-squisher-bridge);
518+
`Severity` / `Tier` / `BotId` (gitbot-fleet/shared-context -> each bot); future
519+
`TypeDiscipline` (echidna -> hypatia, panic-attack); `NodeKind` / `EdgeKind` in
520+
GNN schemas (echidna -> any ML consumer).
521+
522+
**Concrete gap today:** `echidnabot` and `protocol-squisher-echidna-bridge` both
523+
carry local mirror enums that already lag behind echidna main; silent today because
524+
matches use `_ =>`, loud tomorrow when someone relies on a specific variant.
525+
526+
==== CR-2. Foreign-enum exhaustive-match lint
527+
528+
**What:** Custom clippy lint (or `cargo miri`-style analysis, or a grep-based CI
529+
job as MVP) that flags every `match` expression over an enum imported from an
530+
external crate that lacks an `_ =>` fallback. The grep MVP is one regex plus a list
531+
of extern-enum names.
532+
533+
**Where:** `developer-ecosystem/standards/hypatia-rules/lint-foreign-exhaustive.a2ml`
534+
as the declarative policy; enforced by hypatia in every downstream repo's CI.
535+
536+
**Why single-repo misses it:** An exhaustive match compiles fine today, but a
537+
one-variant addition in the upstream turns the downstream into a compile failure
538+
on the next `cargo build`. The issue surfaces as upstream's CI problem even though
539+
the fault is in the downstream's coupling decision.
540+
541+
**Applicability:** Every Rust repo with a path-dep on another hyperpolymath repo.
542+
Especially `echidna` consumers; any repo importing `katagoria::Discipline` once
543+
that exists; `gitbot-shared-context` consumers.
544+
545+
**Concrete gap today:** Echidna's `src/rust/main.rs` had three exhaustive matches
546+
over `ProverKind` that were silently wrong before the TypeDiscipline transition
547+
(they never covered the 12 original HP ecosystem variants) and broke loudly the
548+
moment 28 more were added. Nobody noticed for weeks.
549+
550+
==== CR-3. FFI roundtrip bijection property
551+
552+
**What:** For every FFI boundary that serialises a Rust enum to a numeric or
553+
string representation, a property test asserting `decode(encode(x)) == x` for
554+
every variant — not just a hand-picked subset. Run in both directions where
555+
both are exposed.
556+
557+
**Where:** The originating crate's `tests/ffi_roundtrip.rs`, generated from the
558+
enum's `ALL` const (see reference implementation in echidna
559+
`src/rust/disciplines/mod.rs`).
560+
561+
**Why single-repo misses it:** When a table only tests the variants the author
562+
remembers, it silently tolerates missing entries. echidna's `kind_from_u8` was
563+
missing codes 49–60 for weeks before the TypeDiscipline transition; the
564+
existing roundtrip test only covered codes 0–48 and explicitly *asserted*
565+
`kind_from_u8(49).is_none()`, locking in the bug.
566+
567+
**Applicability:** Every `*_to_u8` / `*_from_u8` or `*_to_str` / `*_from_str`
568+
pair. Every CBOR-encoded proof-state field. Every GraphQL / gRPC / REST enum
569+
bridge. Every Zig ABI that names variants.
570+
571+
==== CR-4. Shared-fixture corpus contract
572+
573+
**What:** Canonical golden fixtures live in ONE upstream repo (`katagoria/`,
574+
`standards/a2ml/examples/`, `echidna/training_data/`) and every downstream that
575+
consumes that format runs its parser / validator / type-checker over the
576+
*shared* corpus. When katagoria graduates a new discipline, the fixture is
577+
added once and every downstream's next CI run exercises it.
578+
579+
**Where:** Upstream hosts `fixtures/` with an index.a2ml. Downstream
580+
`tests/shared_corpus.rs` dereferences via path dep or git submodule.
581+
582+
**Why single-repo misses it:** Each consumer writes its own fixtures, they
583+
drift from the upstream format, and a format extension in the upstream doesn't
584+
fail any test until a real user trips over it.
585+
586+
**Applicability:** katagoria → echidna + typell + typed-wasm (type disciplines);
587+
a2ml → every repo that emits a2ml; avow-protocol → every repo that uses proven
588+
results; VeriSimDB → every repo that writes octads.
589+
590+
**Concrete gap today:** `katagoria/research/tropical/TropicalKleene.idr` is
591+
standalone; no downstream currently uses it as a fixture, so when tropical
592+
semantics shift, nothing breaks.
593+
594+
==== CR-5. Claim-grounder adversarial cross-test
595+
596+
**What:** Each claim-grounder (invariant-path, PLASMA, Hypatia, ECHIDNA,
597+
panic-attacker) is run against proofs produced by the others, specifically
598+
constructed to hide violations or to appear cleaner than they are. This makes
599+
the dual-use posture (per the Akerlof memory note) operational: a grounder
600+
that cannot detect another grounder's escape hatches is not actually grounded.
601+
602+
**Where:** `verification-ecosystem/<grounder>/tests/adversarial_cross/`
603+
plus a meta-repo `verification-ecosystem/meta-adversarial-suite/` that
604+
coordinates the round-robin and publishes a weekly summary.
605+
606+
**Why single-repo misses it:** Every grounder tests itself against
607+
hand-picked fixtures by default. The adversarial signal — "tool X can hide
608+
from tool Y" — is only visible when you explicitly set them against each
609+
other.
610+
611+
**Applicability:** All five listed grounders; any future grounder (invariant-path,
612+
PLASMA, Hypatia, ECHIDNA, panic-attacker) immediately on admission.
613+
614+
==== CR-6. Upstream-HEAD sentinel in downstream CI
615+
616+
**What:** Every downstream-of-a-path-dep repo runs a nightly CI job that
617+
pulls the upstream's `main` branch (overriding its Cargo.toml path dep or
618+
submodule) and runs the full test suite. Result emitted as a status check
619+
on the upstream's recent PRs via Hypatia / gitbot-fleet webhook.
620+
621+
**Where:** `.github/workflows/upstream-head-sentinel.yml` in downstream
622+
repos; Hypatia rule in `standards/hypatia-rules/upstream-sentinel.a2ml`.
623+
624+
**Why single-repo misses it:** The upstream's CI is green, the downstream's
625+
CI against the pinned version is green, but a merge that would break the
626+
downstream is invisible until the pin bumps — often weeks later.
627+
628+
**Applicability:** `echidna` consumers (echidnabot, panic-attack, hypatia,
629+
protocol-squisher-echidna-bridge); `shared-context` -> every gitbot;
630+
`a2ml-rs` -> every repo consuming A2ML; `proven` -> every repo consuming a
631+
formally-verified alternative.
632+
633+
**Concrete gap today:** Today's echidna commit `8f573f1` added 28 new
634+
ProverKind variants. Downstream echidnabot silently continues to compile
635+
(for now) but its dispatcher is mis-calibrated. An upstream-HEAD sentinel
636+
would have pinged immediately.
637+
638+
==== CR-7. Workspace-aware blitz matrix
639+
640+
**What:** The blitz matrix (§ Part VI) distinguishes four cells per (repo,
641+
category): `in-isolation` (cargo test with this member only), `in-workspace`
642+
(cargo test at workspace root), `workspace-root-with-bins` (cargo test
643+
`--all-targets`), and `cross-workspace` (downstream workspace importing this
644+
as a path dep). Many bugs only appear in one of the four.
645+
646+
**Where:** An enriched matrix format in `.machine_readable/testing/blitz.a2ml`
647+
per repo, rolled up at the ecosystem level in `standards/dashboards/blitz.a2ml`.
648+
649+
**Why single-repo misses it:** `cargo test --lib` can pass while `cargo test
650+
--all-targets` fails on the same enum, because bin / test-target exhaustive
651+
matches are outside the lib's compilation unit. echidna hit exactly this today.
652+
653+
**Applicability:** All workspace-member repos (echidna, gitbot-fleet,
654+
developer-ecosystem monorepo, nextgen-languages monorepo, nextgen-databases).
655+
656+
==== CR-8. Cross-prover proof-exchange roundtrip
657+
658+
**What:** For every proof-interchange format (OpenTheory, Dedukti, TSTP,
659+
Alethe, DRAT/LRAT, custom CBOR), a roundtrip test: prove theorem T in prover
660+
A, export to format F, re-import to prover B, re-check T. Must hold across
661+
{Coq, Lean, HOL4, HOL Light, Isabelle} × {OpenTheory, Dedukti} pairs at
662+
minimum.
663+
664+
**Where:** `echidna/tests/proof_exchange_roundtrip.rs` with the heavy lifting
665+
delegated to the `src/rust/exchange/` module. Parallel suite in any future
666+
proof-exchange host.
667+
668+
**Why single-repo misses it:** The exchange module has unit tests for each
669+
direction independently; nothing today asserts semantic preservation across
670+
the roundtrip.
671+
672+
**Applicability:** `echidna/src/rust/exchange/`; any repo that ships an
673+
exchange format bridge.
674+
675+
==== CR-9. Schema-surface drift detector
676+
677+
**What:** Every public surface (Rust enum, JSON schema, A2ML schema, GraphQL
678+
type, Zig ABI) emits a machine-readable schema snapshot at build time to
679+
`.schemas/<surface>.a2ml`. Every PR's CI computes the new snapshot and diffs
680+
against the merged-main version; any change is flagged with a SEMVER
681+
classification (patch / minor / major).
682+
683+
**Where:** `just emit-schemas` recipe per repo; shared implementation in
684+
`developer-ecosystem/schema-snapshot-rs` (TBD).
685+
686+
**Why single-repo misses it:** Today a `pub enum` variant addition in the
687+
upstream is invisible to the downstream until someone imports the crate
688+
again. A schema snapshot makes the surface change a visible PR artifact.
689+
690+
**Applicability:** Every public-facing crate / library / protocol.
691+
692+
==== CR-10. Stale-fixture-and-baseline decay alarm
693+
694+
**What:** Any test fixture, proof corpus sample, benchmark baseline, or
695+
regression gold file that has not been regenerated in N days (default 90)
696+
raises a decay alarm in the daily CI pulse. Bench baselines specifically
697+
decay faster (30 days) because upstream prover performance shifts.
698+
699+
**Where:** `standards/hypatia-rules/fixture-decay.a2ml` + per-repo
700+
`.machine_readable/fixtures.a2ml` manifest listing every fixture with its
701+
last-regenerated timestamp.
702+
703+
**Why single-repo misses it:** Tests keep passing against stale fixtures
704+
that no longer reflect real inputs. Benchmarks pass against baselines that
705+
no longer reflect current performance. Drift is slow and undetected.
706+
707+
**Applicability:** echidna (`training_data/*.jsonl`, `premises_merged.jsonl`,
708+
etc.); proven (Idris2 proof libraries); a2ml (grammar examples); any repo
709+
with a `fixtures/` or `corpora/` directory.
710+
711+
=== Weak Points Spotted in the Current Taxonomy (2026-04-17)
712+
713+
Recorded at the same time as the cross-repo ideas above, things that feel
714+
under-specified in this document itself:
715+
716+
- **No "coupling test" category.** Tests that assert drift-freedom between
717+
two independent artefacts (mirror enums, redundant parsers, serde + custom
718+
Display, etc.) don't fit cleanly into the 16 categories. Proposal: add
719+
Category 17 "Coupling / Drift-freedom" in v1.1.
720+
- **Proof-regression is single-prover.** Category 16 catches "Coq no longer
721+
accepts proof P" but not "Coq and Lean give different counterexamples for
722+
the same formula". Cross-prover regression deserves its own sub-category.
723+
- **Benchmark taxonomy is single-process.** Six Sigma classification
724+
(§ Part IV) applies to one binary's hot path. There is no equivalent for
725+
end-to-end pipelines (panic-attack → hypatia → echidna → verisim) where
726+
the interesting latency is the tail across all four.
727+
- **CRG mapping doesn't include cross-repo rows.** § "CRG ↔ Test Category
728+
Mapping" maps repo-local categories to grades. A repo with mirror-enum
729+
drift or FFI roundtrip gaps has a visible-externally failure mode that
730+
its in-repo blitz will not catch. Proposal: a new "Ecosystem CRG" rollup
731+
grade that can only be A if all CR-* cross-repo checks pass.
732+
- **Dogfooding is listed (§ Part VII) but not adversarial.** Dogfooding
733+
asks "does the author use their own tool?"; CR-5 (claim-grounder
734+
adversarial cross-test) asks the much harder "can the tool fool its
735+
peers?". The latter is closer to real robustness.
736+
- **"Shared fixtures" are implied but never required.** Several repos
737+
(katagoria, a2ml, avow-protocol) should host canonical fixtures that
738+
every downstream consumes, but the taxonomy doesn't elevate this to a
739+
first-class obligation. CR-4 above proposes a fix.
740+
- **"Run time" categories don't include warm-boot vs cold-boot separation.**
741+
Many bugs (FFI init, plugin registry, GNN client handshake) only appear
742+
once per process lifetime. A cold-start test category is missing.
743+
496744
== Appendix A: Test Category Quick Reference
497745

498746
[cols="1,1,1"]

0 commit comments

Comments
 (0)