leadforge-dev · shaypal5 · May 3, 2026 · May 3, 2026 · May 3, 2026
diff --git a/.agent-plan.md b/.agent-plan.md
@@ -38,13 +38,17 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie
 - [x] Excludes `current_stage` and leakage-flagged columns
 - [x] Works from pre-generated Parquet files (no leadforge install needed)
 
-### Public release — Phase 5: Generate final release + upload (pending)
+### Public release — Phase 5: Generate final release + upload (in progress)
 
-- [ ] Run build script, verify SHA-256 hash determinism
+- [x] Regenerate release bundles with difficulty-aware engine (PR #52 merged)
+- [x] Verify three tiers produce different conversion rates (intro 41.5%, intermediate 20.1%, advanced 7.9%)
+- [x] Update release/README.md — remove stale "Known limitations", add conversion rates to dataset summary
+- [x] Update release/HF_DATASET_CARD.md — add conversion rates to summary table
+- [ ] Verify SHA-256 hash determinism (re-run build, compare hashes)
 - [ ] Upload to Kaggle and HuggingFace
 - [ ] Announce
 
-### Difficulty modulation ✓ (PR pending)
+### Difficulty modulation ✓ (PR #52, merged)
 
 - [x] `leadforge/core/models.py` — `DifficultyParams` frozen dataclass; optional field on `GenerationConfig`
 - [x] `leadforge/mechanisms/policies.py` — `assign_mechanisms()` accepts `difficulty_params`; per-motif calibration computes target daily hazard from conversion_rate_range; signal_strength scales LatentScore weights

diff --git a/release/HF_DATASET_CARD.md b/release/HF_DATASET_CARD.md
@@ -50,7 +50,7 @@ A relational, reproducible, multi-difficulty lead scoring dataset generated by [
 ## Why this dataset?
 
 1. **Relational structure.** 9 normalized tables plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling.
-2. **Three difficulty tiers.** Same world, different signal-to-noise ratios.
+2. **Three difficulty tiers.** Same world, different conversion rates, signal-to-noise ratios, and missingness.
 3. **Reproducible and leakage-safe.** Deterministic generation (seed 42), SHA-256 hashes, explicit leakage trap.
 
 ## Quick start
@@ -79,6 +79,7 @@ df = pd.read_csv("hf://datasets/leadforge/leadforge-b2b-lead-scoring/intermediat
 | Leads | 5,000 | 5,000 | 5,000 |
 | Features | 35 | 35 | 35 |
 | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` |
+| Conversion rate | 41.5% | 20.1% | 7.9% |
 | Signal strength | 0.90 | 0.70 | 0.50 |
 | Noise scale | 0.10 | 0.30 | 0.55 |
 | Missing rate | 2% | 8% | 18% |

diff --git a/release/README.md b/release/README.md
@@ -8,7 +8,7 @@ Most public lead scoring datasets are flat CSVs with opaque provenance. This one
 
 1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling.
 
-2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles. Each tier declares different signal strength, noise, and missingness parameters in its manifest. (See [Known limitations](#known-limitations) for current status.)
+2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness.
 
 3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. Leakage-prone columns (`total_touches_all`, `current_stage`) are explicitly flagged in the feature dictionary. All features are anchored at the snapshot date -- no post-cutoff data leaks in.
 
@@ -108,10 +108,14 @@ leadforge generate \
 | Contacts | 4,200 | 4,200 | 4,200 |
 | Columns | 35 (34 features + 1 target) | 35 | 35 |
 | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` |
+| Conversion rate (target) | 30-45% | 18-28% | 8-15% |
+| Conversion rate (observed) | 41.5% | 20.1% | 7.9% |
 | Signal strength | 0.90 | 0.70 | 0.50 |
 | Noise scale | 0.10 | 0.30 | 0.55 |
 | Missing rate | 2% | 8% | 18% |
 
+Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine. Target ranges are defined in `difficulty_profiles.yaml`.
+
 ## The scenario
 
 **Veridian Technologies** is a Series B startup (Austin, US) selling **Veridian Procure**, a cloud-based procurement and AP automation platform, to mid-market firms (200-2,000 employees) in the US and UK.
@@ -152,10 +156,6 @@ The `intermediate_instructor/` bundle includes the full hidden causal structure:
 
 This enables research on causal inference, model interpretability, and DGP-aware evaluation.
 
-## Known limitations
-
-- **Difficulty tiers share the same conversion rate.** The simulation engine does not yet modulate conversion rates by difficulty profile. All three tiers produce similar base rates (~70%). The difficulty profiles are declared in each bundle's manifest and will produce meaningfully different signal-to-noise ratios once the engine is updated. For now, the primary difference between tiers is the declared profile metadata.
-
 ## Provenance
 
 | Field | Value |