feat(validation): unified leakage_probes module (PR 3.1) by shaypal5 · Pull Request #65 · leadforge-dev/leadforge

shaypal5 · 2026-05-05T20:50:30Z

Summary

PR 3.1 of the v1 dataset release sequence — first of three Phase-3 PRs (release validation hardening). Lays the foundation that PR 3.2 (release_quality.py + reporting.py) and PR 3.3 (validate_release_candidate.py driver) build on.

Subsumes the PR 2.1 relational_leakage.py module and broadens it to the full leakage taxonomy from docs/release/v1_release_design.md / docs/release/v1_acceptance_gates.md: direct, time-window, relational, split, and model-realism. Every public-bundle leakage gate G4–G6 has a probe in this module.

What's new

`leadforge/validation/leakage_probes.py` (new, ~600 LOC + ~1k LOC tests)

Direct (G4.1-G4.3). probe_banned_columns, probe_banned_tables. Lifted from PR 2.1 and generalised to accept caller-supplied banned sets — non-relational publication channels (Kaggle / HF / instructor companion) can declare their own without depending on the snapshot-safe defaults.

Time-window (G4.4). probe_snapshot_window. Lifted and generalised over (table, ts_col) pairs so future event tables can register their own anchor.

Relational (G4.5). probe_deterministic_reconstruction, deterministic_relational_reconstruction. Lifted unchanged.

Split (G6.1-G6.3, NEW).

probe_split_id_overlap — lead/account/contact cross-split overlap;
probe_split_near_duplicates — deterministic rounded-vector hashing (no sklearn dep, no O(N²); under-reports rather than over-reports — documented in the docstring);
probe_split_label_drift — opt-in.

Model-realism (G5.1-G5.4, NEW). All opt-in with required max_auc=... (PR 3.3 supplies per-tier bands).

probe_bonus_model_auc — lifted from PR 2.1 (also opt-in);
probe_id_only_baseline (G5.3) — hashes IDs to bound feature width and avoid memorisation; HistGBM only;
probe_feature_subset_baseline (G5.1/G5.2 umbrella) — caller declares the suspect column subset; HistGBM only (handles NaN natively, LR would not).

Probe registry & meta-test. PROBE_REGISTRY: Mapping[str, ProbeSpec] is the single source of truth (probe → taxonomy / opt-in flag). Meta-test test_probe_registry_covers_every_module_level_probe enforces that any new module-level probe_* function is registered, so the "I-added-a-probe-but-forgot-to-wire-it" regression fails loudly.

Orchestrators.

run_all_probes / run_all_probes_on_dataframes — structural, signature unchanged so validate_bundle keeps working without touching bundle_checks.py semantics. Bonus probe stays opt-in.
run_split_probes (new) — split-level over {split_name: DataFrame}. PR 3.2/3.3 will plumb this through the release-quality driver.

Migration

leadforge/validation/relational_leakage.py deleted. Every internal call site updated:

leadforge/validation/{bundle_checks,invariants}.py
leadforge/render/{manifests,relational_snapshot_safe}.py
leadforge/exposure/filters.py (docstring only)
scripts/probe_relational_leakage.py

tests/validation/test_relational_leakage.py renamed via git mv to test_leakage_probes.py (preserves blame); existing 33 tests preserved verbatim plus 24 new tests for the new probes + meta-coverage.

RelationalLeakageError retained — it now spans every taxonomy in the module, not just relational join reconstruction; LeakageError alias added for the new umbrella name.

What's NOT in scope

Calibrating per-tier bands for the opt-in model-realism probes (PR 3.3).
Wiring split / model-realism probes into validate_bundle (PR 3.3 once bands are calibrated).
release_quality.py + reporting.py (PR 3.2).
scripts/validate_release_candidate.py + resolving TBD-* gates in v1_acceptance_gates.md (PR 3.3).

Test plan

pytest — 1067/1067 pass (994 pre-PR + 24 new probe tests + 8 generalisation tests + 5 meta-tests; one renamed; the rest unaffected).
ruff check — clean.
ruff format --check — clean.
mypy leadforge/ — clean (80 source files).
scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65 — all three exit 0 (path A-E rates = 0.000).
scripts/verify_hash_determinism.py — PASS, 67/67 files identical.
BUNDLE_SCHEMA_VERSION unchanged at "5" — PR is purely additive on the validator side; on-disk bundle shape is unchanged.
Round-trip integration: tests/integration/test_snapshot_safe_bundle.py (existing) still validates clean bundles via the renamed module.

Risks / non-obvious choices

Kept RelationalLeakageError as the canonical exception class. Renaming would cascade through every except clause for negligible benefit; the docstring now explicitly documents that it spans every taxonomy.
Near-duplicate detection uses rounded-vector hashing, not cosine similarity. Cosine machinery would flake on sparse one-hot data and drift across sklearn versions; rounded hashing is reproducible, deterministic, and orthogonal to model choice. The docstring is explicit that it under-reports rather than over-reports.
probe_id_only_baseline hashes IDs to int64 instead of one-hot encoding. One-hot would explode the dimension and guarantee AUC ≈ 1.0 on train via memorisation, defeating the probe's purpose.
Model-realism probes use HistGBM, not LR. HistGBM handles NaN natively; the probes are calibrated baselines, not pipeline replicas.

🤖 Generated with Claude Code

Subsume `relational_leakage.py` and broaden it to the full design-doc / acceptance-gates leakage taxonomy. PR 3.2's `release_quality.py` and PR 3.3's `validate_release_candidate.py` driver consume this module. Probe families -------------- * Direct (G4.1-G4.3) — `probe_banned_columns`, `probe_banned_tables`. Generalised to accept caller-supplied banned sets so non-relational publication channels (Kaggle / HF / instructor companion) can declare their own without depending on the snapshot-safe defaults. * Time-window (G4.4) — `probe_snapshot_window`. Generalised over `(table, ts_col)` pairs so future event tables can register their own anchor. * Relational (G4.5) — `probe_deterministic_reconstruction`, `deterministic_relational_reconstruction`. Lifted unchanged. * Split (G6.1-G6.3) — `probe_split_id_overlap` (lead/account/contact cross-split overlap), `probe_split_near_duplicates` (deterministic rounded-vector hashing — no sklearn dep, no O(N²)), `probe_split_label_drift` (opt-in). * Model-realism (G5.1-G5.4) — `probe_bonus_model_auc` (lifted, opt-in), new opt-in `probe_id_only_baseline` (G5.3, hashes IDs to bound feature width), `probe_feature_subset_baseline` (G5.1/G5.2 umbrella; HistGBM only — handles NaN natively, LR would not). `PROBE_REGISTRY` is the single source of truth (probe → taxonomy / opt-in flag); meta-test asserts every module-level `probe_*` is registered, so a future "I added a probe but forgot to wire it" regression fails loudly. Two orchestrators: `run_all_probes` / `run_all_probes_on_dataframes` (structural; signature unchanged so `validate_bundle` keeps working without touching `bundle_checks.py` semantics) and new `run_split_probes` (split-level over `{split_name: DataFrame}`). Migration --------- `leadforge/validation/relational_leakage.py` deleted; every internal call site updated: * `leadforge/validation/{bundle_checks,invariants}.py` * `leadforge/render/{manifests,relational_snapshot_safe}.py` * `leadforge/exposure/filters.py` (docstring only) * `scripts/probe_relational_leakage.py` `tests/validation/test_relational_leakage.py` renamed via `git mv` to `test_leakage_probes.py`; existing 33 tests preserved verbatim plus 24 new tests for the new probes + meta-coverage. `RelationalLeakageError` retained (now spans every taxonomy in the module, not just relational join reconstruction) with `LeakageError` alias for the new umbrella name. Acceptance ---------- * `BUNDLE_SCHEMA_VERSION` unchanged at "5" — PR 3.1 is purely additive on the validator side; on-disk bundle shape is unchanged. * 1067/1067 tests pass; ruff + mypy clean. * `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65` exits 0 on every public tier (path E rate = 0.000). * `scripts/verify_hash_determinism.py` PASS — 67/67 files identical across pinned-timestamp builds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`LeakageError = RelationalLeakageError` was exported in `__all__` but referenced nowhere internally — exactly the kind of dead alias the project guidance flags as backwards-compat noise. Drop it; callers that want the new umbrella name can import `RelationalLeakageError` directly (the docstring already documents that it spans every taxonomy in the module, not just relational join reconstruction). Also tidy the header comment above `deterministic_relational_reconstruction` — the PR 2.1 → 3.1 hop made "Lifted from PR 1.1's ``scripts/...``" historically muddled; the script-side re-export is the load-bearing fact and was already noted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR consolidates the prior relational_leakage validation logic into a new unified leadforge.validation.leakage_probes module, expanding coverage to a broader leakage taxonomy (direct, time-window, relational, split, and model-realism) and updating internal call sites/tests accordingly as part of the v1 release validation hardening sequence.

Changes:

Introduces leadforge/validation/leakage_probes.py implementing the unified probe taxonomy, probe registry, and bundle/split orchestrators.
Migrates existing imports/call sites from leadforge.validation.relational_leakage to the new module and deletes the old module.
Expands/renames validation tests to cover new split/model-realism probes and adds registry meta-tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`leadforge/validation/leakage_probes.py`	New unified leakage probe implementations + registry + orchestrators; replaces the deleted module.
`leadforge/validation/bundle_checks.py`	Switches bundle validation wiring to import/run probes from `leakage_probes`.
`leadforge/validation/invariants.py`	Updates snapshot-safe contract imports to the new module.
`leadforge/render/relational_snapshot_safe.py`	Updates documentation/imports for snapshot-safe contract constants source.
`leadforge/render/manifests.py`	Updates manifest redaction-constant imports to the new module.
`leadforge/exposure/filters.py`	Updates docstring references to contract constants in the new module.
`scripts/probe_relational_leakage.py`	Re-exports deterministic reconstruction from the new canonical module location.
`tests/validation/test_leakage_probes.py`	Renamed/expanded tests to cover split/model-realism probes and registry meta-coverage.
`tests/integration/test_snapshot_safe_bundle.py`	Updates imports to the new module constants.
`tests/render/test_bundle_schema_v5_contract.py`	Updates reference in comments to the new module.
`leadforge/validation/relational_leakage.py`	Deleted (superseded by `leakage_probes`).
`.agent-plan.md`	Updates phase plan status to reflect PR 3.1 completion.

Comments suppressed due to low confidence (2)

tests/validation/test_leakage_probes.py:802

This test’s name/docstring says the probe “fires” / “must catch it”, but the assertions only check that it returns a list (and the comment says it won’t always fire). Either assert the expected finding (make the synthetic data reliably separable under the hashing approach) or rename/reword the test to reflect that it’s only a smoke test for wiring.
tests/validation/test_leakage_probes.py:946
Docstring says “four documented taxonomies”, but the validation set includes five (direct/time_window/relational/split/model_realism). Update the docstring to match the actual taxonomy list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Seven small fixes from the PR review: leakage_probes.py * Module docstring referenced a meta-test name that didn't exist (``test_probe_registry_covers_all_probes``); rename to the actual ``test_probe_registry_covers_every_module_level_probe``. * `probe_split_id_overlap` and `probe_split_near_duplicates` were slicing a `set` before sorting (``list(shared)[:N]``); set iteration order is implementation-defined, so the resulting ``sample=...`` message field would drift across runs. Sort first, then slice. * `probe_split_near_duplicates` claimed to skip when no requested columns were numeric, but actually coerced everything to NaN and then stringified — yielding a single saturating ``"nan|nan|..."`` signature that collided as a false-positive across splits. Drop rows whose entire signature is NaN before intersecting and update the docstring to match. * `_hash_id_columns` docstring said "32-bit integers" but the dtype was int64; clarify as "32-bit hashes (stored as int64)". test_leakage_probes.py * `test_probe_id_only_baseline_fires_when_id_encodes_label` claimed the probe must catch the leak but only asserted ``isinstance(..., list)`` — a "fires" demonstration is structurally impossible from hashed lead_ids alone (independent train/test hashes ⇒ HistGBM can't generalise). Rename to ``_runs_cleanly_on_partitioned_ids``, document why a positive-fire test isn't possible from this synthetic data, and point at the feature-subset baseline test for the actual fire path. * Meta-test docstring said "four documented taxonomies" but the validation set has five (direct/time_window/relational/split/ model_realism); fix the prose. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-05T21:08:39Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #65 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25402374833 attempt 1
Comment timestamp: 2026-05-05T21:07:49.110388+00:00
PR head commit: 8ad4465200d4ceb18600a94534ef7f5936962e07

Copilot AI review requested due to automatic review settings May 5, 2026 20:50

shaypal5 added this to the dataset: leadforge-lead-scoring-v1 milestone May 5, 2026

shaypal5 added type: feature New capability layer: validation validation/ invariants and checks labels May 5, 2026

Copilot started reviewing on behalf of shaypal5 May 5, 2026 20:51 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread leadforge/validation/leakage_probes.py Outdated

Comment thread leadforge/validation/leakage_probes.py

Comment thread leadforge/validation/leakage_probes.py

Comment thread leadforge/validation/leakage_probes.py

Comment thread leadforge/validation/leakage_probes.py

shaypal5 merged commit 8ad9260 into main May 5, 2026
8 checks passed

shaypal5 deleted the feat/leakage-probes-module branch May 5, 2026 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(validation): unified leakage_probes module (PR 3.1)#65

feat(validation): unified leakage_probes module (PR 3.1)#65
shaypal5 merged 3 commits into
mainfrom
feat/leakage-probes-module

shaypal5 commented May 5, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 5, 2026

Summary

What's new

leadforge/validation/leakage_probes.py (new, ~600 LOC + ~1k LOC tests)

Migration

What's NOT in scope

Test plan

Risks / non-obvious choices

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`leadforge/validation/leakage_probes.py` (new, ~600 LOC + ~1k LOC tests)