Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .agent-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,38 @@ Documentation + CI:
- [x] `CHANGELOG.md` — `Unreleased` renamed to `v1.0.0 — (2026-05-02)`; milestone headings folded into collapsible development history
- [x] `.agent-plan.md` — updated to reflect v1.0.0 release

### v7: Purely causal leakage trap with canonical validation (PR #50)

Engine changes:
- [x] `leadforge/mechanisms/counts.py` — `LatentDecayIntensity` follow-up ramp: `followup_boost_after_day`, `followup_boost_factor`, `followup_ramp_days`, `followup_latent_weights` parameters; `_effective_boost(t)` and `_latent_multiplier(t, latents)` methods
- [x] `leadforge/mechanisms/policies.py` — `_FOLLOWUP_LATENT_WEIGHTS` per motif family (budget_readiness, process_maturity, contact_authority); wired into `assign_mechanisms()` with `followup_boost_after_day=20, followup_boost_factor=10.0, followup_ramp_days=10`

Build pipeline:
- [x] `leadforge/pipelines/build_v7.py` — all pipeline functions (identical to v6 minus `boost_leakage_trap`); purely causal trap via `compute_post_snapshot_touches`
- [x] `scripts/build_v7_snapshot.py` — CLI: generates both student + instructor CSVs
- [x] `scripts/validate_v7_dataset.py` — validates both exports: basic checks, determinism, baseline AUC, tree improvement, value-aware ranking, trap delta (10 seeds), cohort split; honest thresholds for purely causal trap
- [x] `scripts/quick_baseline_eval_v7.py` — LR + RF + GBM baselines, value-aware ranking, feature importance, trap detection

Datasets:
- [x] `lead_scoring_intro/lead_scoring_intro_v7.csv` — 1000 rows × 20 cols (student-safe, no leakage)
- [x] `lead_scoring_intro/lead_scoring_intro_v7_instructor.csv` — 1000 rows × 21 cols (+ `__leakage__touches_post_snapshot_21_90`)

Validation results:
- [x] Baseline AUC: 0.671 (within [0.58, 0.90]; snapshot day 20)
- [x] GBM improvement: +0.072 over LR (5-seed average)
- [x] Trap delta: mean 0.0125, 10/10 positive (purely causal — no label injection, honest threshold mean≥0.008)
- [x] Value-aware uplift: +13.4% at K=25, +20.3% at K=50
- [x] Cohort split AUC gap: 0.089 (random 0.683 → cohort 0.594)
- [x] All mandatory checks pass

Documentation + CI:
- [x] `lead_scoring_intro/RELEASE_v7.md` — column dictionary, missingness patterns, metrics, teaching guidance (4 lectures), trap evaluation
- [x] `lead_scoring_intro/BACKGROUND_v7.md` — ProcureFlow business context for students (snapshot day 20, regions US/UK)
- [x] `.github/workflows/ci.yml` — `validate-dataset-v7` job added
- [x] `tests/scripts/test_build_v7_snapshot.py` — 32+ tests for pipeline functions
- [x] `tests/mechanisms/test_mechanisms.py` — 9 new tests for follow-up ramp mechanism
- [x] All 839 tests pass; lint + format clean

### Fix: direct conversion bypass for pre-SQL leads (PR #45, closes #44)

- [x] `leadforge/simulation/engine.py` — added `_DIRECT_CONVERSION_STAGES` and `_DIRECT_CONVERSION_DISCOUNT` (0.01) constants; pre-SQL leads (`mql`, `sal`) now have a small daily probability of converting directly, bypassing the full funnel
Expand Down Expand Up @@ -223,6 +255,18 @@ Documentation + CI:
| M14: Notebook 4 (recipe customization) | Discarded | Premature |
| M15: Docs polish + v1.0 release | **Done** | README, CHANGELOG, version bump to 1.0.0 complete; architecture diagram and notebooks remain post-v1 |

### v7 follow-up: pipeline deduplication + LatentDecayIntensity cleanup

From self-review of PR #50. Should be a single follow-up PR.

| Item | Description |
|---|---|
| Extract shared pipeline functions | Move `subsample`, `inject_missingness`, `derive_features`, `softcap_expected_acv`, `assign_acquisition_wave`, `rename_and_select` into `leadforge/pipelines/common.py`. v5/v6/v7 modules import from common + add version-specific logic. |
| Extract shared ML pipeline | Canonical sklearn pipeline (preprocessor + LR) used by validators and eval scripts should be a single shared utility. |
| Deduplicate feature lists | `CAT_FEATURES`, `NUM_FEATURES`, `BINARY_FEATURES` defined once (in pipeline module or shared constants), imported by validators and eval scripts. |
| Group followup params into dataclass | Introduce `FollowupRampConfig(boost_after_day, boost_factor, ramp_days, latent_weights)` in `mechanisms/counts.py`. Replace 4 constructor params with `followup: FollowupRampConfig | None = None`. |
| Fix `subsample` silent short-return | `subsample()` can return fewer than `n` rows when there aren't enough negatives. Should raise `ValueError` instead. Fix in the shared `common.py`. |

### From post-v1 list

- Second vertical
Expand Down
28 changes: 28 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,31 @@ jobs:
- name: Skip v6 (no dataset)
if: steps.check-v6.outputs.found != 'true'
run: echo "No v6 datasets found — skipping v6 validation"

validate-dataset-v7:
name: Validate v7 lead scoring dataset
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -e ".[dev,scripts]"
- name: Check for v7 datasets
id: check-v7
run: |
STUDENT="lead_scoring_intro/lead_scoring_intro_v7.csv"
INSTRUCTOR="lead_scoring_intro/lead_scoring_intro_v7_instructor.csv"
if [ -f "$STUDENT" ] && [ -f "$INSTRUCTOR" ]; then
echo "found=true" >> "$GITHUB_OUTPUT"
echo "student=$STUDENT" >> "$GITHUB_OUTPUT"
echo "instructor=$INSTRUCTOR" >> "$GITHUB_OUTPUT"
else
echo "found=false" >> "$GITHUB_OUTPUT"
fi
- name: Run v7 validator
if: steps.check-v7.outputs.found == 'true'
run: python scripts/validate_v7_dataset.py "${{ steps.check-v7.outputs.student }}" "${{ steps.check-v7.outputs.instructor }}"
- name: Skip v7 (no dataset)
if: steps.check-v7.outputs.found != 'true'
run: echo "No v7 datasets found — skipping v7 validation"
62 changes: 62 additions & 0 deletions lead_scoring_intro/BACKGROUND_v7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# BACKGROUND v7 — Lead Scoring Intro Dataset

## Business context

You are a data scientist at **ProcureFlow**, a mid-market B2B SaaS company selling AP automation and procurement workflow software. ProcureFlow targets companies with 200–2,000+ employees in manufacturing, logistics, healthcare, and professional services.

The sales team generates leads through three channels:
- **Inbound marketing**: content downloads, webinars, website forms
- **SDR outbound**: cold outreach by sales development representatives
- **Partner referral**: introductions through consulting and technology partners

### The lead scoring problem

The sales team can only actively work a limited number of leads per quarter. Your job is to build a **lead scoring model** that predicts which leads are most likely to convert to paying customers within 90 days of entering the pipeline.

A good lead score helps sales prioritize their time — contacting high-probability leads first and deprioritizing unlikely conversions.

## Dataset description

The dataset contains **1,000 leads** observed at **day 20** of their lifecycle (the "snapshot day"). All features are computed from activity that occurred during the first 20 days. The target variable (`converted`) indicates whether the lead converted to a paying customer within 90 days.

### Deal sizes

ProcureFlow's annual contract value (ACV) ranges from **$18,000** (starter plan, small companies) to **$120,000** (enterprise plan, large companies). The `expected_acv` column provides an estimate of the deal size for each lead based on company size and any existing opportunity data.

This variation in deal size means that not all conversions are equally valuable — a model that identifies high-value conversions may be more useful than one that maximizes the number of conversions.

### Acquisition waves

Leads enter the pipeline in three cohorts (`acquisition_wave`): A (earliest), B (middle), C (most recent). These roughly correspond to different time periods. The market conditions and lead mix may vary across cohorts, which is relevant for thinking about how models perform on future data.

## What to expect

- **Base conversion rate**: ~30%
- **Baseline AUC**: A simple logistic regression achieves ~0.67 AUC
- **Missingness**: 5 columns have missing values (1–7% each) due to different data collection processes across lead sources
- **Feature interactions**: The relationship between engagement and conversion is nonlinear — tree-based models capture this better than linear models

## Key columns

| Column | What it measures |
|---|---|
| `industry` | Business sector |
| `region` | Geography (US, UK) |
| `company_size` | Employee headcount band |
| `company_revenue` | Revenue band |
| `contact_role` | Job function of primary contact |
| `seniority` | Job level |
| `lead_source` | How the lead was acquired |
| `opportunity_created` | Whether sales opened an opportunity |
| `demo_completed` | Whether the lead viewed demo content |
| `expected_acv` | Estimated deal size (USD) |
| `inbound_touches` | Marketing touches received (days 0–20) |
| `outbound_touches` | Sales touches initiated (days 0–20) |
| `touches_week_1` | Touches in first 7 days |
| `touches_last_7_days` | Touches in days 14–20 (recent momentum) |
| `days_since_first_touch` | Time since first engagement |
| `web_sessions` | Website visits |
| `sales_activities` | Sales rep logged activities |
| `days_since_last_touch` | Recency of last engagement |
| `acquisition_wave` | Cohort (A, B, or C) |
| `converted` | **Target**: 1 = converted within 90 days |
Loading
Loading