leadforge-dev · shaypal5 · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026
diff --git a/.agent-plan.md b/.agent-plan.md
@@ -6,7 +6,7 @@
 
 ## Current System State
 
-**v0.5.0 in progress — Milestones 7–11 complete, v5 dataset shipped.** Full simulation engine + render/bundle + exposure filtering + CLI commands + validation harness implemented. v4 engine changes merged (PR #21). v5 dataset generated and validated (all 10 checks pass). PR-agent refresh fallback wiring fixed for bot-authored reviews. 609 tests passing.
+**v0.5.0 in progress — Milestones 7–11 complete, v5 dataset shipped + canonical validation module.** Full simulation engine + render/bundle + exposure filtering + CLI commands + validation harness implemented. v4 engine changes merged (PR #21). v5 dataset regenerated with boosted leakage trap (snapshot day 10, Poisson(1) target-correlated boost) and validated via canonical sklearn pipeline (all checks pass). Canonical validation module added as single source of truth (`leadforge/validation/lead_scoring.py`).
 
 ---
 
@@ -55,14 +55,23 @@ Build pipeline:
 
 No engine changes required — v5 is a build pipeline + validation improvement.
 
-- [x] `scripts/build_v5_snapshot.py` — day-14 snapshot, ACV capping [18k–120k], `__leakage__` naming, `days_since_first_touch` momentum feature
+- [x] `scripts/build_v5_snapshot.py` — day-10 snapshot, ACV capping [18k–120k], `__leakage__` naming, `days_since_first_touch` momentum feature, Poisson(1) trap boost
 - [x] `scripts/validate_v5_dataset.py` — 10 checks: hold-out AUC/PR-AUC, multi-seed leakage robustness, Precision@K, Lift@K, duplicate check, ACV range, missingness bounds
 - [x] `scripts/quick_baseline_eval_v5.py` — LR + RF baselines, value-aware scoring demo, feature importance
-- [x] Generate `lead_scoring_intro_v5.csv` (1000 rows × 19 cols, 30% conversion, hold-out AUC 0.632)
-- [x] Leakage trap robustly validated: mean delta 0.033, min delta 0.015 across 10 seeds
+- [x] Generate `lead_scoring_intro_v5.csv` (1000 rows × 19 cols, 30% conversion, hold-out AUC 0.648)
+- [x] Leakage trap robustly validated: mean delta 0.081, min delta 0.035 across 10 seeds
 - [x] `RELEASE_v5.md` with instructor/student notes, value-aware scoring section, full validation results
 - [x] Updated `BACKGROUND.md` with value-aware lead scoring section
 
+### v5-validation: Canonical validation module (PR #26)
+
+- [x] `leadforge/validation/lead_scoring.py` — single source of truth validation with canonical sklearn pipeline (ColumnTransformer + OneHotEncoder + StandardScaler + LR)
+- [x] `scripts/validate_lead_scoring_dataset.py` — CLI entrypoint for validation
+- [x] `tests/validation/test_lead_scoring.py` — 12 tests
+- [x] CI job added to `.github/workflows/ci.yml` for dataset validation
+- [x] v5 dataset regenerated (snapshot day 10, trap boost) — all checks pass, exit code 0
+- [x] `RELEASE_v5.md` updated with canonical pipeline metrics
+
 ---
 
 ## Deferred Items

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -57,3 +57,28 @@ jobs:
           path: .coverage.${{ matrix.python-version }}
           include-hidden-files: true
           if-no-files-found: ignore
+
+  validate-dataset:
+    name: Validate lead scoring dataset
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install -e ".[dev,scripts]"
+      - name: Check for v5 dataset
+        id: check
+        run: |
+          if [ -f "lead_scoring_intro_v5.csv" ]; then
+            echo "found=true" >> "$GITHUB_OUTPUT"
+            echo "csv=lead_scoring_intro_v5.csv" >> "$GITHUB_OUTPUT"
+          else
+            echo "found=false" >> "$GITHUB_OUTPUT"
+          fi
+      - name: Run validator
+        if: steps.check.outputs.found == 'true'
+        run: python scripts/validate_lead_scoring_dataset.py --csv "${{ steps.check.outputs.csv }}" --enforce-1000
+      - name: Skip (no dataset)
+        if: steps.check.outputs.found != 'true'
+        run: echo "No lead_scoring_intro_v5.csv found in repo root — skipping validation"