feat: Milestone 11 — validation harness (invariants, realism, difficulty, drift) by shaypal5 · Pull Request #18 · leadforge-dev/leadforge

shaypal5 · 2026-04-29T07:34:54Z

Summary

validation/invariants.py: determinism check (same seed → identical SHA-256 hashes across all Parquet files) and exposure monotonicity (student_public ⊂ research_instructor — metadata/, tables, tasks, core files)
validation/realism.py: conversion-rate bounds (1%–95%), non-empty core tables, non-negative counts, valid booleans, multi-stage distribution diversity
validation/difficulty.py: known-difficulty validation from manifest + difficulty ordering check (currently no-op until engine modulates by difficulty)
validation/drift.py: cross-seed stability — conversion rate spread (max/min < 5×), degenerate seed detection (0% or 100%), stage diversity across seeds
validation/bundle_checks.py: validate_bundle() gains include_realism flag to optionally run realism + difficulty checks alongside structural validation

18 new validation tests; total 581 passing. Ruff + mypy clean.

Test plan

🤖 Generated with Claude Code

…lty, drift) Add four specialised validation modules and wire them into the existing bundle_checks orchestrator: - invariants.py: determinism check (same seed → identical SHA-256 hashes) and exposure monotonicity (student_public ⊂ research_instructor) - realism.py: conversion-rate bounds, non-empty core tables, non-negative counts, valid booleans, multi-stage distribution diversity - difficulty.py: known-difficulty validation + ordering check (no-op until engine modulates by difficulty) - drift.py: cross-seed stability — conversion rate spread, degenerate seed detection, stage diversity bundle_checks.validate_bundle() gains an include_realism flag so the CLI validate command can optionally run distributional checks alongside the structural ones. 18 new validation tests; total 581 passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new “validation harness” to the LeadForge codebase to assert key invariants (determinism, exposure monotonicity), distributional realism heuristics, difficulty manifest sanity checks, and cross-seed drift/stability checks; and wires these into the existing bundle validation entrypoint with accompanying tests.

Changes:

Introduces new validation modules: invariants, realism, difficulty, and drift (plus corresponding tests).
Extends validate_bundle() to optionally include realism + difficulty checks via an include_realism flag (defaulting to enabled).
Updates planning doc to mark Milestone 11 complete and outline Milestone 12.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`leadforge/validation/realism.py`	Adds conversion-rate bounds, feature-range checks, and non-degeneracy heuristics.
`leadforge/validation/invariants.py`	Adds determinism (hash parity) and exposure monotonicity checks.
`leadforge/validation/drift.py`	Adds cross-seed stability checks (rate spread, degenerate seeds, stage diversity).
`leadforge/validation/difficulty.py`	Adds manifest difficulty validation and a placeholder ordering check.
`leadforge/validation/bundle_checks.py`	Wires realism/difficulty into the main validation orchestrator behind a flag.
`tests/validation/test_realism.py`	Test coverage for realism checks (happy path + manifest corruption case).
`tests/validation/test_invariants.py`	Test coverage for determinism + exposure monotonicity.
`tests/validation/test_drift.py`	Test coverage for cross-seed stability behavior.
`tests/validation/test_difficulty.py`	Test coverage for known/unknown difficulty and ordering no-op behavior.
`.agent-plan.md`	Updates milestone status and next steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. Delete dead I/O in check_difficulty_ordering (was computing rates then discarding them) 2. Pass manifest dict down to check_realism/check_difficulty instead of re-reading manifest.json 3 times per validation pass 3. Remove unused manifest param from _check_conversion_rate 4. _check_table_nonempty now reads actual Parquet files via pyarrow metadata instead of trusting manifest row counts 5. check_exposure_monotonicity now verifies content equality (SHA-256) for shared tables, task splits, and feature_dictionary.csv 6. check_determinism now compares core non-Parquet files (manifest.json, dataset_card.md, feature_dictionary.csv) in addition to Parquet files 7. (test perf — accepted as-is; module-scoped fixtures already amortize) 8. _check_feature_ranges now reads only needed columns via pyarrow schema introspection instead of loading the entire Parquet file 9. Derive count/boolean column lists from LEAD_SNAPSHOT_FEATURES schema instead of hardcoding — prevents silent drift 10. Add 4 negative-path tests for drift.py (0% rate, 100% rate, wide spread, single-stage degeneracy) 11. Add 5 negative-path tests for realism.py (low rate, high rate, negative counts, non-boolean values, single stage) 12. Use epsilon comparisons for float rate checks in drift.py 13. Move check_realism/check_difficulty imports to top level Total: 590 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- COPILOT-3/7: exposure monotonicity now checks both directions for core files, tables, and task splits (flags extras in instructor that are missing from student, and vice versa) - COPILOT-4: boolean validation uses pd.api.types.is_bool_dtype() instead of value-set comparison, correctly detecting integer-coded booleans (0/1 == True/False in Python) - COPILOT-8: drift check now surfaces an explicit error when a seed's train.parquet is missing, instead of silently skipping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-29T07:55:53Z

pr-agent-context report:

This run includes unresolved review comments on PR #18.

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/validation/difficulty.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313768
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `rates` is populated by reading Parquet files, but the collected values are never used (the function always returns an empty list). This adds unnecessary I/O and can confuse readers; either implement the intended ordering check / report the observed rates via the returned errors, or remove the unused computation until the feature is ready.

## COPILOT-2
Location: leadforge/validation/difficulty.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313788
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The TODO/comment claims “return the observed rates for informational purposes”, but this function only returns `errors` and currently always returns `[]`. Either include the observed rates in the returned messages (or change the API to return a structured result) or update the comment/docstring so it matches the actual behavior.

## COPILOT-3
Location: leadforge/validation/invariants.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313807
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    This “same tables” check only enforces `student_tables ⊆ instructor_tables` (it doesn’t flag tables present only in the instructor bundle). If the intent is that *only* `metadata/` differs between exposure modes, consider also checking `instructor_tables - student_tables` (and similarly for task files) to enforce equality outside of metadata.

## COPILOT-4
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313836
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    Boolean validation here can miss integer-coded booleans (0/1) because `0 == False` and `1 == True`, so `unique - {True, False}` will be empty even when the column is not actually boolean. Consider checking the dtype (e.g., `is_bool_dtype` / pandas BooleanDtype) and/or using identity-based checks (`v is True/False`) to reliably detect non-boolean encodings.

## COPILOT-5
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313850
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `pd.read_parquet(train_path)` loads the full task table even though only a handful of columns are checked below. To keep validation fast on large bundles, read only the needed columns (the union of the count + bool feature columns that actually exist).

## COPILOT-6
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313874
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    This non-empty-table check relies only on `manifest["tables"][...]["row_count"]`. That can produce false positives/negatives if the manifest is wrong (e.g., row_count stale or corrupted) and won't catch an actually-empty parquet when row_count is nonzero/missing. Since this module already reads Parquet files for other checks, consider validating non-emptiness against the actual table files (accounts/contacts/leads) rather than the manifest metadata.

## COPILOT-7
Location: leadforge/validation/invariants.py
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313886
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    Docstring says “Both must have the same core files”, but the logic only checks one direction (student has file && instructor missing). If the instructor has a core file that the student is missing, this will currently pass even though the bundles no longer match. Either enforce equality for these core files (or at least require they exist in both) or update the wording to reflect the intended one-way subset behavior.

## COPILOT-8
Location: leadforge/validation/drift.py:34
URL: https://github.com/leadforge-dev/leadforge/pull/18#discussion_r3159313906
Root author: copilot-pull-request-reviewer

Comment:
    When `train.parquet` is missing for a seed, the code silently `continue`s, which can cause this check to return `[]` even though some inputs were invalid/incomplete. Consider appending an explicit error for missing required artifacts (or otherwise surfacing that a seed was skipped) so instability checks don’t silently degrade into a no-op.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25097334293 attempt 1
Comment timestamp: 2026-04-29T07:55:04.881452+00:00
PR head commit: 8cc1378076b16e6ee2e1cfe074f124fc1ff18d02

Copilot

Pull request overview

Adds a validation harness to the Leadforge bundle pipeline, introducing invariant checks (determinism + exposure monotonicity) and distributional “realism” / drift sanity checks, and wiring optional realism+difficulty validation into the existing bundle validator.

Changes:

Introduces new validation modules: invariants (determinism, exposure monotonicity), realism (sanity bounds), drift (cross-seed stability), difficulty (manifest difficulty validation + placeholder ordering check).
Adds new test coverage for the new validation modules under tests/validation/.
Extends validate_bundle() with an include_realism flag to optionally run realism+difficulty checks in addition to structural validation.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`leadforge/validation/realism.py`	Implements distributional sanity checks (conversion rate bounds, non-empty core tables, feature range/type checks, stage diversity).
`leadforge/validation/invariants.py`	Adds determinism comparison and exposure-mode monotonicity checks.
`leadforge/validation/drift.py`	Adds cross-seed stability checks to detect degenerate seeds / wide conversion-rate spread.
`leadforge/validation/difficulty.py`	Validates manifest difficulty field and provides a (currently no-op) ordering check.
`leadforge/validation/bundle_checks.py`	Wires realism+difficulty into the main bundle validator behind `include_realism`.
`tests/validation/test_realism.py`	Tests realism checks by mutating a generated bundle to trigger specific errors.
`tests/validation/test_invariants.py`	Tests determinism and exposure monotonicity validation behavior.
`tests/validation/test_drift.py`	Tests drift detection for degenerate conversion/stage distributions across seeds.
`tests/validation/test_difficulty.py`	Tests difficulty validation and asserts ordering check is a no-op for v1.
`.agent-plan.md`	Updates milestone tracking notes to mark validation harness as completed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    rate = df["converted_within_90_days"].mean()
+


+def _check_table_nonempty(root: Path, manifest: dict[str, Any]) -> list[str]:
+    """Core tables should have at least 1 row (verified from actual files)."""
+    errors: list[str] = []
+    required_nonempty = {"accounts", "contacts", "leads"}
+
+    for table_name in required_nonempty:
+        parquet_path = root / f"tables/{table_name}.parquet"
+        if not parquet_path.exists():
+            errors.append(f"Table '{table_name}' file missing")
+        else:
+            meta = pq.read_metadata(parquet_path)
+            if meta.num_rows == 0:
+                errors.append(f"Table '{table_name}' has 0 rows")
+
+    return errors


@@ -37,6 +44,11 @@ def validate_bundle(bundle_root: Path) -> list[str]:
    errors.extend(_check_task_splits(bundle_root, manifest))
    errors.extend(_check_fk_integrity(tables))
    errors.extend(_check_leakage(bundle_root, manifest))
+
+    if include_realism:
+        errors.extend(check_realism(bundle_root, manifest))
+        errors.extend(check_difficulty(manifest))
+


+    # Compare core non-Parquet files that must also be deterministic.
+    for fname in ("manifest.json", "dataset_card.md", "feature_dictionary.csv"):
+        fa = bundle_a / fname
+        fb = bundle_b / fname
+        if fa.exists() and fb.exists():
+            if file_sha256(fa) != file_sha256(fb):
+                errors.append(f"Hash mismatch: {fname}")
+        elif fa.exists() != fb.exists():
+            errors.append(f"File '{fname}' exists in one bundle but not the other")


Copilot AI review requested due to automatic review settings April 29, 2026 07:34

shaypal5 added this to the v0.5.0 — CLI-complete release candidate milestone Apr 29, 2026

shaypal5 added type: feature New capability layer: validation validation/ invariants and checks labels Apr 29, 2026

Copilot started reviewing on behalf of shaypal5 April 29, 2026 07:35 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Apr 29, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings April 29, 2026 07:54

Copilot started reviewing on behalf of shaypal5 April 29, 2026 07:55 View session

shaypal5 self-assigned this Apr 29, 2026

shaypal5 merged commit 258e610 into main Apr 29, 2026
7 checks passed

shaypal5 deleted the feat/milestone-11-validation-harness branch April 29, 2026 07:57

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Milestone 11 — validation harness (invariants, realism, difficulty, drift)#18

feat: Milestone 11 — validation harness (invariants, realism, difficulty, drift)#18
shaypal5 merged 3 commits into
mainfrom
feat/milestone-11-validation-harness

shaypal5 commented Apr 29, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Apr 29, 2026

Summary

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants