feat(validation): unified leakage_probes module (PR 3.1)#65
Merged
Conversation
Subsume `relational_leakage.py` and broaden it to the full design-doc /
acceptance-gates leakage taxonomy. PR 3.2's `release_quality.py` and PR
3.3's `validate_release_candidate.py` driver consume this module.
Probe families
--------------
* Direct (G4.1-G4.3) — `probe_banned_columns`, `probe_banned_tables`.
Generalised to accept caller-supplied banned sets so non-relational
publication channels (Kaggle / HF / instructor companion) can declare
their own without depending on the snapshot-safe defaults.
* Time-window (G4.4) — `probe_snapshot_window`. Generalised over
`(table, ts_col)` pairs so future event tables can register their
own anchor.
* Relational (G4.5) — `probe_deterministic_reconstruction`,
`deterministic_relational_reconstruction`. Lifted unchanged.
* Split (G6.1-G6.3) — `probe_split_id_overlap` (lead/account/contact
cross-split overlap), `probe_split_near_duplicates` (deterministic
rounded-vector hashing — no sklearn dep, no O(N²)),
`probe_split_label_drift` (opt-in).
* Model-realism (G5.1-G5.4) — `probe_bonus_model_auc` (lifted, opt-in),
new opt-in `probe_id_only_baseline` (G5.3, hashes IDs to bound
feature width), `probe_feature_subset_baseline` (G5.1/G5.2 umbrella;
HistGBM only — handles NaN natively, LR would not).
`PROBE_REGISTRY` is the single source of truth (probe → taxonomy /
opt-in flag); meta-test asserts every module-level `probe_*` is
registered, so a future "I added a probe but forgot to wire it"
regression fails loudly.
Two orchestrators: `run_all_probes` / `run_all_probes_on_dataframes`
(structural; signature unchanged so `validate_bundle` keeps working
without touching `bundle_checks.py` semantics) and new
`run_split_probes` (split-level over `{split_name: DataFrame}`).
Migration
---------
`leadforge/validation/relational_leakage.py` deleted; every internal
call site updated:
* `leadforge/validation/{bundle_checks,invariants}.py`
* `leadforge/render/{manifests,relational_snapshot_safe}.py`
* `leadforge/exposure/filters.py` (docstring only)
* `scripts/probe_relational_leakage.py`
`tests/validation/test_relational_leakage.py` renamed via `git mv`
to `test_leakage_probes.py`; existing 33 tests preserved verbatim
plus 24 new tests for the new probes + meta-coverage.
`RelationalLeakageError` retained (now spans every taxonomy in the
module, not just relational join reconstruction) with `LeakageError`
alias for the new umbrella name.
Acceptance
----------
* `BUNDLE_SCHEMA_VERSION` unchanged at "5" — PR 3.1 is purely additive
on the validator side; on-disk bundle shape is unchanged.
* 1067/1067 tests pass; ruff + mypy clean.
* `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced}
--max-accuracy 0.65` exits 0 on every public tier (path E rate = 0.000).
* `scripts/verify_hash_determinism.py` PASS — 67/67 files identical
across pinned-timestamp builds.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
`LeakageError = RelationalLeakageError` was exported in `__all__` but referenced nowhere internally — exactly the kind of dead alias the project guidance flags as backwards-compat noise. Drop it; callers that want the new umbrella name can import `RelationalLeakageError` directly (the docstring already documents that it spans every taxonomy in the module, not just relational join reconstruction). Also tidy the header comment above `deterministic_relational_reconstruction` — the PR 2.1 → 3.1 hop made "Lifted from PR 1.1's ``scripts/...``" historically muddled; the script-side re-export is the load-bearing fact and was already noted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR consolidates the prior relational_leakage validation logic into a new unified leadforge.validation.leakage_probes module, expanding coverage to a broader leakage taxonomy (direct, time-window, relational, split, and model-realism) and updating internal call sites/tests accordingly as part of the v1 release validation hardening sequence.
Changes:
- Introduces
leadforge/validation/leakage_probes.pyimplementing the unified probe taxonomy, probe registry, and bundle/split orchestrators. - Migrates existing imports/call sites from
leadforge.validation.relational_leakageto the new module and deletes the old module. - Expands/renames validation tests to cover new split/model-realism probes and adds registry meta-tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/validation/leakage_probes.py |
New unified leakage probe implementations + registry + orchestrators; replaces the deleted module. |
leadforge/validation/bundle_checks.py |
Switches bundle validation wiring to import/run probes from leakage_probes. |
leadforge/validation/invariants.py |
Updates snapshot-safe contract imports to the new module. |
leadforge/render/relational_snapshot_safe.py |
Updates documentation/imports for snapshot-safe contract constants source. |
leadforge/render/manifests.py |
Updates manifest redaction-constant imports to the new module. |
leadforge/exposure/filters.py |
Updates docstring references to contract constants in the new module. |
scripts/probe_relational_leakage.py |
Re-exports deterministic reconstruction from the new canonical module location. |
tests/validation/test_leakage_probes.py |
Renamed/expanded tests to cover split/model-realism probes and registry meta-coverage. |
tests/integration/test_snapshot_safe_bundle.py |
Updates imports to the new module constants. |
tests/render/test_bundle_schema_v5_contract.py |
Updates reference in comments to the new module. |
leadforge/validation/relational_leakage.py |
Deleted (superseded by leakage_probes). |
.agent-plan.md |
Updates phase plan status to reflect PR 3.1 completion. |
Comments suppressed due to low confidence (2)
tests/validation/test_leakage_probes.py:802
- This test’s name/docstring says the probe “fires” / “must catch it”, but the assertions only check that it returns a list (and the comment says it won’t always fire). Either assert the expected finding (make the synthetic data reliably separable under the hashing approach) or rename/reword the test to reflect that it’s only a smoke test for wiring.
tests/validation/test_leakage_probes.py:946 - Docstring says “four documented taxonomies”, but the validation set includes five (direct/time_window/relational/split/model_realism). Update the docstring to match the actual taxonomy list.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Seven small fixes from the PR review: leakage_probes.py * Module docstring referenced a meta-test name that didn't exist (``test_probe_registry_covers_all_probes``); rename to the actual ``test_probe_registry_covers_every_module_level_probe``. * `probe_split_id_overlap` and `probe_split_near_duplicates` were slicing a `set` before sorting (``list(shared)[:N]``); set iteration order is implementation-defined, so the resulting ``sample=...`` message field would drift across runs. Sort first, then slice. * `probe_split_near_duplicates` claimed to skip when no requested columns were numeric, but actually coerced everything to NaN and then stringified — yielding a single saturating ``"nan|nan|..."`` signature that collided as a false-positive across splits. Drop rows whose entire signature is NaN before intersecting and update the docstring to match. * `_hash_id_columns` docstring said "32-bit integers" but the dtype was int64; clarify as "32-bit hashes (stored as int64)". test_leakage_probes.py * `test_probe_id_only_baseline_fires_when_id_encodes_label` claimed the probe must catch the leak but only asserted ``isinstance(..., list)`` — a "fires" demonstration is structurally impossible from hashed lead_ids alone (independent train/test hashes ⇒ HistGBM can't generalise). Rename to ``_runs_cleanly_on_partitioned_ids``, document why a positive-fire test isn't possible from this synthetic data, and point at the feature-subset baseline test for the actual fire path. * Meta-test docstring said "four documented taxonomies" but the validation set has five (direct/time_window/relational/split/ model_realism); fix the prose. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #65 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 3.1 of the v1 dataset release sequence — first of three Phase-3 PRs (release validation hardening). Lays the foundation that PR 3.2 (
release_quality.py+reporting.py) and PR 3.3 (validate_release_candidate.pydriver) build on.Subsumes the PR 2.1
relational_leakage.pymodule and broadens it to the full leakage taxonomy fromdocs/release/v1_release_design.md/docs/release/v1_acceptance_gates.md: direct, time-window, relational, split, and model-realism. Every public-bundle leakage gate G4–G6 has a probe in this module.What's new
leadforge/validation/leakage_probes.py(new, ~600 LOC + ~1k LOC tests)Direct (G4.1-G4.3).
probe_banned_columns,probe_banned_tables. Lifted from PR 2.1 and generalised to accept caller-supplied banned sets — non-relational publication channels (Kaggle / HF / instructor companion) can declare their own without depending on the snapshot-safe defaults.Time-window (G4.4).
probe_snapshot_window. Lifted and generalised over(table, ts_col)pairs so future event tables can register their own anchor.Relational (G4.5).
probe_deterministic_reconstruction,deterministic_relational_reconstruction. Lifted unchanged.Split (G6.1-G6.3, NEW).
probe_split_id_overlap— lead/account/contact cross-split overlap;probe_split_near_duplicates— deterministic rounded-vector hashing (no sklearn dep, no O(N²); under-reports rather than over-reports — documented in the docstring);probe_split_label_drift— opt-in.Model-realism (G5.1-G5.4, NEW). All opt-in with required
max_auc=...(PR 3.3 supplies per-tier bands).probe_bonus_model_auc— lifted from PR 2.1 (also opt-in);probe_id_only_baseline(G5.3) — hashes IDs to bound feature width and avoid memorisation; HistGBM only;probe_feature_subset_baseline(G5.1/G5.2 umbrella) — caller declares the suspect column subset; HistGBM only (handles NaN natively, LR would not).Probe registry & meta-test.
PROBE_REGISTRY: Mapping[str, ProbeSpec]is the single source of truth (probe → taxonomy / opt-in flag). Meta-testtest_probe_registry_covers_every_module_level_probeenforces that any new module-levelprobe_*function is registered, so the "I-added-a-probe-but-forgot-to-wire-it" regression fails loudly.Orchestrators.
run_all_probes/run_all_probes_on_dataframes— structural, signature unchanged sovalidate_bundlekeeps working without touchingbundle_checks.pysemantics. Bonus probe stays opt-in.run_split_probes(new) — split-level over{split_name: DataFrame}. PR 3.2/3.3 will plumb this through the release-quality driver.Migration
leadforge/validation/relational_leakage.pydeleted. Every internal call site updated:leadforge/validation/{bundle_checks,invariants}.pyleadforge/render/{manifests,relational_snapshot_safe}.pyleadforge/exposure/filters.py(docstring only)scripts/probe_relational_leakage.pytests/validation/test_relational_leakage.pyrenamed viagit mvtotest_leakage_probes.py(preserves blame); existing 33 tests preserved verbatim plus 24 new tests for the new probes + meta-coverage.RelationalLeakageErrorretained — it now spans every taxonomy in the module, not just relational join reconstruction;LeakageErroralias added for the new umbrella name.What's NOT in scope
validate_bundle(PR 3.3 once bands are calibrated).release_quality.py+reporting.py(PR 3.2).scripts/validate_release_candidate.py+ resolvingTBD-*gates inv1_acceptance_gates.md(PR 3.3).Test plan
pytest— 1067/1067 pass (994 pre-PR + 24 new probe tests + 8 generalisation tests + 5 meta-tests; one renamed; the rest unaffected).ruff check— clean.ruff format --check— clean.mypy leadforge/— clean (80 source files).scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65— all three exit 0 (path A-E rates = 0.000).scripts/verify_hash_determinism.py— PASS, 67/67 files identical.BUNDLE_SCHEMA_VERSIONunchanged at"5"— PR is purely additive on the validator side; on-disk bundle shape is unchanged.tests/integration/test_snapshot_safe_bundle.py(existing) still validates clean bundles via the renamed module.Risks / non-obvious choices
RelationalLeakageErroras the canonical exception class. Renaming would cascade through everyexceptclause for negligible benefit; the docstring now explicitly documents that it spans every taxonomy.probe_id_only_baselinehashes IDs to int64 instead of one-hot encoding. One-hot would explode the dimension and guarantee AUC ≈ 1.0 on train via memorisation, defeating the probe's purpose.🤖 Generated with Claude Code