From 4b33e1208d8110a91e615e8d956639aea08772a7 Mon Sep 17 00:00:00 2001 From: Shay Palachy Date: Mon, 4 May 2026 01:32:54 +0300 Subject: [PATCH 1/2] fix: update release docs for difficulty modulation and regenerate bundles Remove stale "Known limitations" section claiming difficulty tiers share the same conversion rate. Replace with a "Difficulty tiers" section documenting actual conversion ranges. Regenerated all 4 release bundles with the difficulty-aware engine (PR #52): intro 41.5%, intermediate 20.1%, advanced 7.9%. Co-Authored-By: Claude Opus 4.6 --- .agent-plan.md | 8 +++++--- release/README.md | 15 ++++++++++++--- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/.agent-plan.md b/.agent-plan.md index cfa7c27..b885940 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -38,13 +38,15 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie - [x] Excludes `current_stage` and leakage-flagged columns - [x] Works from pre-generated Parquet files (no leadforge install needed) -### Public release — Phase 5: Generate final release + upload (pending) +### Public release — Phase 5: Generate final release + upload (in progress) -- [ ] Run build script, verify SHA-256 hash determinism +- [x] Regenerate release bundles with difficulty-aware engine (PR #52 merged) +- [x] Verify three tiers produce different conversion rates +- [x] Update release/README.md — remove stale "Known limitations", add difficulty tier table - [ ] Upload to Kaggle and HuggingFace - [ ] Announce -### Difficulty modulation ✓ (PR pending) +### Difficulty modulation ✓ (PR #52, merged) - [x] `leadforge/core/models.py` — `DifficultyParams` frozen dataclass; optional field on `GenerationConfig` - [x] `leadforge/mechanisms/policies.py` — `assign_mechanisms()` accepts `difficulty_params`; per-motif calibration computes target daily hazard from conversion_rate_range; signal_strength scales LatentScore weights diff --git a/release/README.md b/release/README.md index 0895879..acfd108 100644 --- a/release/README.md +++ b/release/README.md @@ -8,7 +8,7 @@ Most public lead scoring datasets are flat CSVs with opaque provenance. This one 1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling. -2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles. Each tier declares different signal strength, noise, and missingness parameters in its manifest. (See [Known limitations](#known-limitations) for current status.) +2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness. Intro is the easiest (30-45% conversion), intermediate is moderate (18-28%), and advanced is hardest (8-15%). 3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. Leakage-prone columns (`total_touches_all`, `current_stage`) are explicitly flagged in the feature dictionary. All features are anchored at the snapshot date -- no post-cutoff data leaks in. @@ -152,9 +152,18 @@ The `intermediate_instructor/` bundle includes the full hidden causal structure: This enables research on causal inference, model interpretability, and DGP-aware evaluation. -## Known limitations +## Difficulty tiers -- **Difficulty tiers share the same conversion rate.** The simulation engine does not yet modulate conversion rates by difficulty profile. All three tiers produce similar base rates (~70%). The difficulty profiles are declared in each bundle's manifest and will produce meaningfully different signal-to-noise ratios once the engine is updated. For now, the primary difference between tiers is the declared profile metadata. +The three student_public bundles share the same underlying world (company, product, personas, seed) but differ in difficulty: + +| Parameter | Intro | Intermediate | Advanced | +|---|---|---|---| +| Conversion rate | 30-45% | 18-28% | 8-15% | +| Signal strength | 0.90 | 0.70 | 0.50 | +| Noise scale | 0.10 | 0.30 | 0.55 | +| Missing rate | 2% | 8% | 18% | + +Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine, not just declared in metadata. ## Provenance From 805cda8789be04de7c8246e1349f90932c346d8e Mon Sep 17 00:00:00 2001 From: Shay Palachy Date: Mon, 4 May 2026 02:01:11 +0300 Subject: [PATCH 2/2] fix: consolidate difficulty info, update HF card, restore determinism task MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove duplicate "Difficulty tiers" section from README; merge conversion rate (target + observed) into the existing Dataset summary table. Single source of truth, no duplication. - Remove hardcoded ranges from intro paragraph to reduce staleness risk. - Add conversion rate row to HF_DATASET_CARD.md summary table and update difficulty bullet — card was inconsistent with README. - Restore SHA-256 determinism verification as an unchecked Phase 5 task in .agent-plan.md (was silently dropped in previous commit). - Point to difficulty_profiles.yaml as source of truth for target ranges. Co-Authored-By: Claude Opus 4.6 --- .agent-plan.md | 6 ++++-- release/HF_DATASET_CARD.md | 3 ++- release/README.md | 19 +++++-------------- 3 files changed, 11 insertions(+), 17 deletions(-) diff --git a/.agent-plan.md b/.agent-plan.md index b885940..80f8b22 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -41,8 +41,10 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie ### Public release — Phase 5: Generate final release + upload (in progress) - [x] Regenerate release bundles with difficulty-aware engine (PR #52 merged) -- [x] Verify three tiers produce different conversion rates -- [x] Update release/README.md — remove stale "Known limitations", add difficulty tier table +- [x] Verify three tiers produce different conversion rates (intro 41.5%, intermediate 20.1%, advanced 7.9%) +- [x] Update release/README.md — remove stale "Known limitations", add conversion rates to dataset summary +- [x] Update release/HF_DATASET_CARD.md — add conversion rates to summary table +- [ ] Verify SHA-256 hash determinism (re-run build, compare hashes) - [ ] Upload to Kaggle and HuggingFace - [ ] Announce diff --git a/release/HF_DATASET_CARD.md b/release/HF_DATASET_CARD.md index 529ef69..1f377d7 100644 --- a/release/HF_DATASET_CARD.md +++ b/release/HF_DATASET_CARD.md @@ -50,7 +50,7 @@ A relational, reproducible, multi-difficulty lead scoring dataset generated by [ ## Why this dataset? 1. **Relational structure.** 9 normalized tables plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling. -2. **Three difficulty tiers.** Same world, different signal-to-noise ratios. +2. **Three difficulty tiers.** Same world, different conversion rates, signal-to-noise ratios, and missingness. 3. **Reproducible and leakage-safe.** Deterministic generation (seed 42), SHA-256 hashes, explicit leakage trap. ## Quick start @@ -79,6 +79,7 @@ df = pd.read_csv("hf://datasets/leadforge/leadforge-b2b-lead-scoring/intermediat | Leads | 5,000 | 5,000 | 5,000 | | Features | 35 | 35 | 35 | | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` | +| Conversion rate | 41.5% | 20.1% | 7.9% | | Signal strength | 0.90 | 0.70 | 0.50 | | Noise scale | 0.10 | 0.30 | 0.55 | | Missing rate | 2% | 8% | 18% | diff --git a/release/README.md b/release/README.md index acfd108..697f28e 100644 --- a/release/README.md +++ b/release/README.md @@ -8,7 +8,7 @@ Most public lead scoring datasets are flat CSVs with opaque provenance. This one 1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling. -2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness. Intro is the easiest (30-45% conversion), intermediate is moderate (18-28%), and advanced is hardest (8-15%). +2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness. 3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. Leakage-prone columns (`total_touches_all`, `current_stage`) are explicitly flagged in the feature dictionary. All features are anchored at the snapshot date -- no post-cutoff data leaks in. @@ -108,10 +108,14 @@ leadforge generate \ | Contacts | 4,200 | 4,200 | 4,200 | | Columns | 35 (34 features + 1 target) | 35 | 35 | | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` | +| Conversion rate (target) | 30-45% | 18-28% | 8-15% | +| Conversion rate (observed) | 41.5% | 20.1% | 7.9% | | Signal strength | 0.90 | 0.70 | 0.50 | | Noise scale | 0.10 | 0.30 | 0.55 | | Missing rate | 2% | 8% | 18% | +Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine. Target ranges are defined in `difficulty_profiles.yaml`. + ## The scenario **Veridian Technologies** is a Series B startup (Austin, US) selling **Veridian Procure**, a cloud-based procurement and AP automation platform, to mid-market firms (200-2,000 employees) in the US and UK. @@ -152,19 +156,6 @@ The `intermediate_instructor/` bundle includes the full hidden causal structure: This enables research on causal inference, model interpretability, and DGP-aware evaluation. -## Difficulty tiers - -The three student_public bundles share the same underlying world (company, product, personas, seed) but differ in difficulty: - -| Parameter | Intro | Intermediate | Advanced | -|---|---|---|---| -| Conversion rate | 30-45% | 18-28% | 8-15% | -| Signal strength | 0.90 | 0.70 | 0.50 | -| Noise scale | 0.10 | 0.30 | 0.55 | -| Missing rate | 2% | 8% | 18% | - -Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine, not just declared in metadata. - ## Provenance | Field | Value |