diff --git a/.agent-plan.md b/.agent-plan.md index cfa7c27..80f8b22 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -38,13 +38,17 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie - [x] Excludes `current_stage` and leakage-flagged columns - [x] Works from pre-generated Parquet files (no leadforge install needed) -### Public release — Phase 5: Generate final release + upload (pending) +### Public release — Phase 5: Generate final release + upload (in progress) -- [ ] Run build script, verify SHA-256 hash determinism +- [x] Regenerate release bundles with difficulty-aware engine (PR #52 merged) +- [x] Verify three tiers produce different conversion rates (intro 41.5%, intermediate 20.1%, advanced 7.9%) +- [x] Update release/README.md — remove stale "Known limitations", add conversion rates to dataset summary +- [x] Update release/HF_DATASET_CARD.md — add conversion rates to summary table +- [ ] Verify SHA-256 hash determinism (re-run build, compare hashes) - [ ] Upload to Kaggle and HuggingFace - [ ] Announce -### Difficulty modulation ✓ (PR pending) +### Difficulty modulation ✓ (PR #52, merged) - [x] `leadforge/core/models.py` — `DifficultyParams` frozen dataclass; optional field on `GenerationConfig` - [x] `leadforge/mechanisms/policies.py` — `assign_mechanisms()` accepts `difficulty_params`; per-motif calibration computes target daily hazard from conversion_rate_range; signal_strength scales LatentScore weights diff --git a/release/HF_DATASET_CARD.md b/release/HF_DATASET_CARD.md index 529ef69..1f377d7 100644 --- a/release/HF_DATASET_CARD.md +++ b/release/HF_DATASET_CARD.md @@ -50,7 +50,7 @@ A relational, reproducible, multi-difficulty lead scoring dataset generated by [ ## Why this dataset? 1. **Relational structure.** 9 normalized tables plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling. -2. **Three difficulty tiers.** Same world, different signal-to-noise ratios. +2. **Three difficulty tiers.** Same world, different conversion rates, signal-to-noise ratios, and missingness. 3. **Reproducible and leakage-safe.** Deterministic generation (seed 42), SHA-256 hashes, explicit leakage trap. ## Quick start @@ -79,6 +79,7 @@ df = pd.read_csv("hf://datasets/leadforge/leadforge-b2b-lead-scoring/intermediat | Leads | 5,000 | 5,000 | 5,000 | | Features | 35 | 35 | 35 | | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` | +| Conversion rate | 41.5% | 20.1% | 7.9% | | Signal strength | 0.90 | 0.70 | 0.50 | | Noise scale | 0.10 | 0.30 | 0.55 | | Missing rate | 2% | 8% | 18% | diff --git a/release/README.md b/release/README.md index 0895879..697f28e 100644 --- a/release/README.md +++ b/release/README.md @@ -8,7 +8,7 @@ Most public lead scoring datasets are flat CSVs with opaque provenance. This one 1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling. -2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles. Each tier declares different signal strength, noise, and missingness parameters in its manifest. (See [Known limitations](#known-limitations) for current status.) +2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness. 3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. Leakage-prone columns (`total_touches_all`, `current_stage`) are explicitly flagged in the feature dictionary. All features are anchored at the snapshot date -- no post-cutoff data leaks in. @@ -108,10 +108,14 @@ leadforge generate \ | Contacts | 4,200 | 4,200 | 4,200 | | Columns | 35 (34 features + 1 target) | 35 | 35 | | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` | +| Conversion rate (target) | 30-45% | 18-28% | 8-15% | +| Conversion rate (observed) | 41.5% | 20.1% | 7.9% | | Signal strength | 0.90 | 0.70 | 0.50 | | Noise scale | 0.10 | 0.30 | 0.55 | | Missing rate | 2% | 8% | 18% | +Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine. Target ranges are defined in `difficulty_profiles.yaml`. + ## The scenario **Veridian Technologies** is a Series B startup (Austin, US) selling **Veridian Procure**, a cloud-based procurement and AP automation platform, to mid-market firms (200-2,000 employees) in the US and UK. @@ -152,10 +156,6 @@ The `intermediate_instructor/` bundle includes the full hidden causal structure: This enables research on causal inference, model interpretability, and DGP-aware evaluation. -## Known limitations - -- **Difficulty tiers share the same conversion rate.** The simulation engine does not yet modulate conversion rates by difficulty profile. All three tiers produce similar base rates (~70%). The difficulty profiles are declared in each bundle's manifest and will produce meaningfully different signal-to-noise ratios once the engine is updated. For now, the primary difference between tiers is the declared profile metadata. - ## Provenance | Field | Value |