From 4b33e1208d8110a91e615e8d956639aea08772a7 Mon Sep 17 00:00:00 2001
From: Shay Palachy <shaypal5@users.noreply.github.com>
Date: Mon, 4 May 2026 01:32:54 +0300
Subject: [PATCH 1/2] fix: update release docs for difficulty modulation and
 regenerate bundles

Remove stale "Known limitations" section claiming difficulty tiers share
the same conversion rate. Replace with a "Difficulty tiers" section
documenting actual conversion ranges. Regenerated all 4 release bundles
with the difficulty-aware engine (PR #52): intro 41.5%, intermediate
20.1%, advanced 7.9%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .agent-plan.md    |  8 +++++---
 release/README.md | 15 ++++++++++++---
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/.agent-plan.md b/.agent-plan.md
index cfa7c27..b885940 100644
--- a/.agent-plan.md
+++ b/.agent-plan.md
@@ -38,13 +38,15 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie
 - [x] Excludes `current_stage` and leakage-flagged columns
 - [x] Works from pre-generated Parquet files (no leadforge install needed)
 
-### Public release — Phase 5: Generate final release + upload (pending)
+### Public release — Phase 5: Generate final release + upload (in progress)
 
-- [ ] Run build script, verify SHA-256 hash determinism
+- [x] Regenerate release bundles with difficulty-aware engine (PR #52 merged)
+- [x] Verify three tiers produce different conversion rates
+- [x] Update release/README.md — remove stale "Known limitations", add difficulty tier table
 - [ ] Upload to Kaggle and HuggingFace
 - [ ] Announce
 
-### Difficulty modulation ✓ (PR pending)
+### Difficulty modulation ✓ (PR #52, merged)
 
 - [x] `leadforge/core/models.py` — `DifficultyParams` frozen dataclass; optional field on `GenerationConfig`
 - [x] `leadforge/mechanisms/policies.py` — `assign_mechanisms()` accepts `difficulty_params`; per-motif calibration computes target daily hazard from conversion_rate_range; signal_strength scales LatentScore weights
diff --git a/release/README.md b/release/README.md
index 0895879..acfd108 100644
--- a/release/README.md
+++ b/release/README.md
@@ -8,7 +8,7 @@ Most public lead scoring datasets are flat CSVs with opaque provenance. This one
 
 1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling.
 
-2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles. Each tier declares different signal strength, noise, and missingness parameters in its manifest. (See [Known limitations](#known-limitations) for current status.)
+2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness. Intro is the easiest (30-45% conversion), intermediate is moderate (18-28%), and advanced is hardest (8-15%).
 
 3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. Leakage-prone columns (`total_touches_all`, `current_stage`) are explicitly flagged in the feature dictionary. All features are anchored at the snapshot date -- no post-cutoff data leaks in.
 
@@ -152,9 +152,18 @@ The `intermediate_instructor/` bundle includes the full hidden causal structure:
 
 This enables research on causal inference, model interpretability, and DGP-aware evaluation.
 
-## Known limitations
+## Difficulty tiers
 
-- **Difficulty tiers share the same conversion rate.** The simulation engine does not yet modulate conversion rates by difficulty profile. All three tiers produce similar base rates (~70%). The difficulty profiles are declared in each bundle's manifest and will produce meaningfully different signal-to-noise ratios once the engine is updated. For now, the primary difference between tiers is the declared profile metadata.
+The three student_public bundles share the same underlying world (company, product, personas, seed) but differ in difficulty:
+
+| Parameter | Intro | Intermediate | Advanced |
+|---|---|---|---|
+| Conversion rate | 30-45% | 18-28% | 8-15% |
+| Signal strength | 0.90 | 0.70 | 0.50 |
+| Noise scale | 0.10 | 0.30 | 0.55 |
+| Missing rate | 2% | 8% | 18% |
+
+Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine, not just declared in metadata.
 
 ## Provenance
 

From 805cda8789be04de7c8246e1349f90932c346d8e Mon Sep 17 00:00:00 2001
From: Shay Palachy <shaypal5@users.noreply.github.com>
Date: Mon, 4 May 2026 02:01:11 +0300
Subject: [PATCH 2/2] fix: consolidate difficulty info, update HF card, restore
 determinism task
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Remove duplicate "Difficulty tiers" section from README; merge
  conversion rate (target + observed) into the existing Dataset summary
  table. Single source of truth, no duplication.
- Remove hardcoded ranges from intro paragraph to reduce staleness risk.
- Add conversion rate row to HF_DATASET_CARD.md summary table and
  update difficulty bullet — card was inconsistent with README.
- Restore SHA-256 determinism verification as an unchecked Phase 5 task
  in .agent-plan.md (was silently dropped in previous commit).
- Point to difficulty_profiles.yaml as source of truth for target ranges.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .agent-plan.md             |  6 ++++--
 release/HF_DATASET_CARD.md |  3 ++-
 release/README.md          | 19 +++++--------------
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/.agent-plan.md b/.agent-plan.md
index b885940..80f8b22 100644
--- a/.agent-plan.md
+++ b/.agent-plan.md
@@ -41,8 +41,10 @@ First public dataset release: `leadforge-b2b-lead-scoring`. Three difficulty tie
 ### Public release — Phase 5: Generate final release + upload (in progress)
 
 - [x] Regenerate release bundles with difficulty-aware engine (PR #52 merged)
-- [x] Verify three tiers produce different conversion rates
-- [x] Update release/README.md — remove stale "Known limitations", add difficulty tier table
+- [x] Verify three tiers produce different conversion rates (intro 41.5%, intermediate 20.1%, advanced 7.9%)
+- [x] Update release/README.md — remove stale "Known limitations", add conversion rates to dataset summary
+- [x] Update release/HF_DATASET_CARD.md — add conversion rates to summary table
+- [ ] Verify SHA-256 hash determinism (re-run build, compare hashes)
 - [ ] Upload to Kaggle and HuggingFace
 - [ ] Announce
 
diff --git a/release/HF_DATASET_CARD.md b/release/HF_DATASET_CARD.md
index 529ef69..1f377d7 100644
--- a/release/HF_DATASET_CARD.md
+++ b/release/HF_DATASET_CARD.md
@@ -50,7 +50,7 @@ A relational, reproducible, multi-difficulty lead scoring dataset generated by [
 ## Why this dataset?
 
 1. **Relational structure.** 9 normalized tables plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling.
-2. **Three difficulty tiers.** Same world, different signal-to-noise ratios.
+2. **Three difficulty tiers.** Same world, different conversion rates, signal-to-noise ratios, and missingness.
 3. **Reproducible and leakage-safe.** Deterministic generation (seed 42), SHA-256 hashes, explicit leakage trap.
 
 ## Quick start
@@ -79,6 +79,7 @@ df = pd.read_csv("hf://datasets/leadforge/leadforge-b2b-lead-scoring/intermediat
 | Leads | 5,000 | 5,000 | 5,000 |
 | Features | 35 | 35 | 35 |
 | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` |
+| Conversion rate | 41.5% | 20.1% | 7.9% |
 | Signal strength | 0.90 | 0.70 | 0.50 |
 | Noise scale | 0.10 | 0.30 | 0.55 |
 | Missing rate | 2% | 8% | 18% |
diff --git a/release/README.md b/release/README.md
index acfd108..697f28e 100644
--- a/release/README.md
+++ b/release/README.md
@@ -8,7 +8,7 @@ Most public lead scoring datasets are flat CSVs with opaque provenance. This one
 
 1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling.
 
-2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness. Intro is the easiest (30-45% conversion), intermediate is moderate (18-28%), and advanced is hardest (8-15%).
+2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness.
 
 3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. Leakage-prone columns (`total_touches_all`, `current_stage`) are explicitly flagged in the feature dictionary. All features are anchored at the snapshot date -- no post-cutoff data leaks in.
 
@@ -108,10 +108,14 @@ leadforge generate \
 | Contacts | 4,200 | 4,200 | 4,200 |
 | Columns | 35 (34 features + 1 target) | 35 | 35 |
 | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` |
+| Conversion rate (target) | 30-45% | 18-28% | 8-15% |
+| Conversion rate (observed) | 41.5% | 20.1% | 7.9% |
 | Signal strength | 0.90 | 0.70 | 0.50 |
 | Noise scale | 0.10 | 0.30 | 0.55 |
 | Missing rate | 2% | 8% | 18% |
 
+Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine. Target ranges are defined in `difficulty_profiles.yaml`.
+
 ## The scenario
 
 **Veridian Technologies** is a Series B startup (Austin, US) selling **Veridian Procure**, a cloud-based procurement and AP automation platform, to mid-market firms (200-2,000 employees) in the US and UK.
@@ -152,19 +156,6 @@ The `intermediate_instructor/` bundle includes the full hidden causal structure:
 
 This enables research on causal inference, model interpretability, and DGP-aware evaluation.
 
-## Difficulty tiers
-
-The three student_public bundles share the same underlying world (company, product, personas, seed) but differ in difficulty:
-
-| Parameter | Intro | Intermediate | Advanced |
-|---|---|---|---|
-| Conversion rate | 30-45% | 18-28% | 8-15% |
-| Signal strength | 0.90 | 0.70 | 0.50 |
-| Noise scale | 0.10 | 0.30 | 0.55 |
-| Missing rate | 2% | 8% | 18% |
-
-Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine, not just declared in metadata.
-
 ## Provenance
 
 | Field | Value |