diff --git a/.agent-plan.md b/.agent-plan.md
index 689505a..3c29ba0 100644
--- a/.agent-plan.md
+++ b/.agent-plan.md
@@ -40,9 +40,10 @@ Goal: ship a best-in-class educational synthetic CRM lead-scoring dataset family
 - [x] PR 3.3: `scripts/validate_release_candidate.py` (new) — release-candidate driver. Orchestrates `regenerate_tier_for_seeds(spec, seeds, workdir)` × N=5 (default) per tier, calls `measure_release_quality`, runs `run_split_probes` against each tier's canonical seed, renders the JSON / markdown / figure contract via `render_report`, and gates on YAML-declared bands. Flags: `--release-dir`, `--workdir`, `--out-dir`, `--bands`, `--seeds`, `--cohort-canonical-seed`, `--tiers`, `--quick` (N=2 with 500-lead populations; ~20s end-to-end), `--no-rebuild` (reuses workdir for fast band-tweak iteration). Exit codes: 0 pass / 1 gate failure / 2 pre-flight error. Driver vs `leadforge validate` boundary documented in the script docstring (one-bundle structural contract vs. cross-seed × cross-tier release-readiness panel — complementary, not merged). `leadforge/validation/difficulty.py` extended with `BandSpec` / `TierBands` / `LeakageProbeBands` / `AcceptanceBands` / `GateFailure` dataclasses and `load_bands` / `check_release_bands` (consumes `ReleaseQualityReport` + per-tier `LeakageReport`s, returns `list[GateFailure]`). G7.4.4 (cross-tier GBM−LR positivity) softened to follow per-tier `gbm_minus_lr_auc` bands rather than hard-fail on the boolean — the v1 dataset's snapshot is dominated by linear features and HistGBM does not consistently beat LR; documented as a known v1→v2 finding with the cross-tier check tracked as informational. `docs/release/v1_acceptance_gates_bands.yaml` (new) is the operational source of truth for numeric bands; `docs/release/v1_acceptance_gates.md` updated to remove every `TBD-*` placeholder and to record medians + rationale per gate. `release/_release_quality/` workdir gitignored; `release/validation/` (validation_report.{json,md} + 7 pinned figures: lift_curve_{intro,intermediate,advanced}, calibration_intermediate, leakage_delta, cohort_shift, value_capture) committed. New tests: `tests/validation/test_difficulty_bands.py` (29 tests over band parsing / per-tier checks / cross-seed spread / cohort shift / cross-tier ordering / leakage findings / GateFailure immutability) and `tests/scripts/test_validate_release_candidate.py` (19 tests over CLI helpers, mocked pipeline, end-to-end --quick run); 1152/1152 tests pass; ruff + mypy clean; `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65` exits 0 on every public tier; `scripts/verify_hash_determinism.py` PASS 67/67 files identical; `BUNDLE_SCHEMA_VERSION` unchanged at 5 (purely additive driver+gating layer). First authentic full-release run baseline (seeds 42–46): intro AP 0.7608 / LR AUC 0.879 / GBM AUC 0.873; intermediate AP 0.5752 / LR AUC 0.886 / GBM AUC 0.876; advanced AP 0.3514 / LR AUC 0.886 / GBM AUC 0.873; cross-tier AP / P@100 / conversion-rate ordering all hold; GBM−LR delta is slightly negative in every tier (−0.0045 / −0.0072 / −0.0133 — the v1→v2 finding above).
 
 ### Phase 4 — Channel-signal audit + dataset card hardening
-- [ ] `scripts/audit_channel_signal.py` → `docs/release/channel_signal_audit.md`
-- [ ] `release/README.md` rewrite (release-grade dataset card; macro-framing paragraph; simulation-simplifications section)
-- [ ] `docs/release/{generation_method,feature_dictionary}.md`
+- [x] PR 4.1: `scripts/audit_channel_signal.py` (new) — analysis driver. For each tier (and each of `lead_source` / `first_touch_channel`), computes per-channel conversion rate + univariate AUC scored as the empirical positive rate per channel (a 1-D Bayes classifier, equivalent to a saturated LR on one-hot channel features). Writes `docs/release/channel_signal_audit.{md,json}`. CLI: `--release-dir`, `--tier`, `--task`, `--channel-column`, `--out-md`, `--out-json`, `--print`. Determinism guarded by `tests/scripts/test_audit_channel_signal.py` (10 tests: per-channel rollup, closed-form univariate AUC, single-class fallback, missing-column error, build/render round-trip, byte-identical re-run against the committed `release/` bundles, error paths). Audit verdict on the canonical PR 2.2 bundles: **weak channel signal** — across all three tiers and both channel columns the largest per-channel rate spread is 0.043 and the largest univariate AUC is 0.521, well below the G2 / Gemini v2 industry MQL→SQL band (SEO ~51%, PPC ~26%, Email <1%). v1 drives conversion through motif-family hazards keyed off latent traits, not channel-conditional probabilities; channel-conditional encoding is tracked in `docs/release/post_v1_roadmap.md`.
+- [x] PR 4.1: `docs/release/generation_method.md` (new) — standalone DGP summary written for external readers (Kaggle/HF). Reads alone, references `docs/leadforge_architecture_spec.md`. Covers the five generation layers (motif families → mechanism layer → population → 90-day daily simulation → snapshot rendering), bundle output contract, public-vs-instructor split, calibration / validation, and an explicit "what this is not" boundary. Satisfies G10.2.
+- [x] PR 4.1: `docs/release/feature_dictionary.md` (new) — narrative companion to the per-bundle `feature_dictionary.csv`. Groups every public-mode column by analytical role (lead identity / firmographics / personographics / engagement / funnel / value / leakage trap / target), documents difficulty modulation parameters, modelling defaults, and the deliberate `total_touches_all` trap. Satisfies G10.3.
+- [x] PR 4.1: `release/README.md` (substantial rewrite) — release-grade dataset card per Datasheets-for-Datasets / Data Cards Playbook checklist (G10.1). New sections: macro framing paragraph (2024–2026 SaaS context, recommendation #19), simulation simplifications (modelled / approximate / not modelled, per chatgpt v2 §2.6), calibration documentation linking to `release/validation/validation_report.md`, public-vs-instructor redaction policy with concrete column lists citing `BANNED_LEAD_COLUMNS` / `BANNED_OPP_COLUMNS` / `BANNED_TABLES` / `SNAPSHOT_FILTERED_TABLES` from `leadforge/validation/leakage_probes.py`, intended-use vs out-of-scope-use, known limitations (G7.4.4 GBM−LR sign finding, weak channel signal from the Phase 4 audit, flat AUC across tiers, small cohort-shift gap), composition section per Datasheets format, adversarial-framing pointer (placeholder link to `docs/release/break_me_guide.md` that lands in PR 6.3), and a maintenance plan. Every realism / calibration / difficulty claim in the card is anchored to `validation_report.md` per G10.6. `BUNDLE_SCHEMA_VERSION` unchanged at 5 (documentation-only PR); 1167/1167 tests pass; ruff + mypy clean; `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65` exits 0 on every public tier; `scripts/verify_hash_determinism.py` PASS 67/67; `scripts/validate_release_candidate.py --no-rebuild` exits 0.
 
 ### Phase 5 — Platform packaging
 - [ ] `scripts/package_kaggle_release.py` → `release/kaggle/dataset-metadata.json`
diff --git a/docs/release/channel_signal_audit.json b/docs/release/channel_signal_audit.json
new file mode 100644
index 0000000..0b95903
--- /dev/null
+++ b/docs/release/channel_signal_audit.json
@@ -0,0 +1,241 @@
+{
+  "channel_columns": [
+    "lead_source",
+    "first_touch_channel"
+  ],
+  "industry_mql_to_sql_benchmarks": {
+    "Email": 0.005,
+    "PPC": 0.26,
+    "SEO": 0.51
+  },
+  "label_column": "converted_within_90_days",
+  "release_dir": "release",
+  "task": "converted_within_90_days",
+  "tiers": [
+    {
+      "columns": [
+        {
+          "channels": [
+            {
+              "conversion_rate": 0.43439490445859874,
+              "n": 1570,
+              "n_converted": 682,
+              "name": "inbound_marketing",
+              "share": 0.44857142857142857
+            },
+            {
+              "conversion_rate": 0.39111747851002865,
+              "n": 698,
+              "n_converted": 273,
+              "name": "partner_referral",
+              "share": 0.19942857142857143
+            },
+            {
+              "conversion_rate": 0.4025974025974026,
+              "n": 1232,
+              "n_converted": 496,
+              "name": "sdr_outbound",
+              "share": 0.352
+            }
+          ],
+          "column": "lead_source",
+          "n_test": 750,
+          "n_train": 3500,
+          "rate_spread": 0.04327742594857009,
+          "test_conversion_rate": 0.4266666666666667,
+          "train_conversion_rate": 0.4145714285714286,
+          "univariate_auc_in_sample": 0.5199794894149169,
+          "univariate_auc_out_of_sample": 0.5013517441860464
+        },
+        {
+          "channels": [
+            {
+              "conversion_rate": 0.43439490445859874,
+              "n": 1570,
+              "n_converted": 682,
+              "name": "inbound_marketing",
+              "share": 0.44857142857142857
+            },
+            {
+              "conversion_rate": 0.39111747851002865,
+              "n": 698,
+              "n_converted": 273,
+              "name": "partner_referral",
+              "share": 0.19942857142857143
+            },
+            {
+              "conversion_rate": 0.4025974025974026,
+              "n": 1232,
+              "n_converted": 496,
+              "name": "sdr_outbound",
+              "share": 0.352
+            }
+          ],
+          "column": "first_touch_channel",
+          "n_test": 750,
+          "n_train": 3500,
+          "rate_spread": 0.04327742594857009,
+          "test_conversion_rate": 0.4266666666666667,
+          "train_conversion_rate": 0.4145714285714286,
+          "univariate_auc_in_sample": 0.5199794894149169,
+          "univariate_auc_out_of_sample": 0.5013517441860464
+        }
+      ],
+      "n_test": 750,
+      "n_train": 3500,
+      "test_conversion_rate": 0.4266666666666667,
+      "tier": "intro",
+      "train_conversion_rate": 0.4145714285714286
+    },
+    {
+      "columns": [
+        {
+          "channels": [
+            {
+              "conversion_rate": 0.21273885350318472,
+              "n": 1570,
+              "n_converted": 334,
+              "name": "inbound_marketing",
+              "share": 0.44857142857142857
+            },
+            {
+              "conversion_rate": 0.17621776504297995,
+              "n": 698,
+              "n_converted": 123,
+              "name": "partner_referral",
+              "share": 0.19942857142857143
+            },
+            {
+              "conversion_rate": 0.2012987012987013,
+              "n": 1232,
+              "n_converted": 248,
+              "name": "sdr_outbound",
+              "share": 0.352
+            }
+          ],
+          "column": "lead_source",
+          "n_test": 750,
+          "n_train": 3500,
+          "rate_spread": 0.03652108846020477,
+          "test_conversion_rate": 0.22266666666666668,
+          "train_conversion_rate": 0.20142857142857143,
+          "univariate_auc_in_sample": 0.5212431012826857,
+          "univariate_auc_out_of_sample": 0.5139326835180411
+        },
+        {
+          "channels": [
+            {
+              "conversion_rate": 0.21273885350318472,
+              "n": 1570,
+              "n_converted": 334,
+              "name": "inbound_marketing",
+              "share": 0.44857142857142857
+            },
+            {
+              "conversion_rate": 0.17621776504297995,
+              "n": 698,
+              "n_converted": 123,
+              "name": "partner_referral",
+              "share": 0.19942857142857143
+            },
+            {
+              "conversion_rate": 0.2012987012987013,
+              "n": 1232,
+              "n_converted": 248,
+              "name": "sdr_outbound",
+              "share": 0.352
+            }
+          ],
+          "column": "first_touch_channel",
+          "n_test": 750,
+          "n_train": 3500,
+          "rate_spread": 0.03652108846020477,
+          "test_conversion_rate": 0.22266666666666668,
+          "train_conversion_rate": 0.20142857142857143,
+          "univariate_auc_in_sample": 0.5212431012826857,
+          "univariate_auc_out_of_sample": 0.5139326835180411
+        }
+      ],
+      "n_test": 750,
+      "n_train": 3500,
+      "test_conversion_rate": 0.22266666666666668,
+      "tier": "intermediate",
+      "train_conversion_rate": 0.20142857142857143
+    },
+    {
+      "columns": [
+        {
+          "channels": [
+            {
+              "conversion_rate": 0.08152866242038216,
+              "n": 1570,
+              "n_converted": 128,
+              "name": "inbound_marketing",
+              "share": 0.44857142857142857
+            },
+            {
+              "conversion_rate": 0.07593123209169055,
+              "n": 698,
+              "n_converted": 53,
+              "name": "partner_referral",
+              "share": 0.19942857142857143
+            },
+            {
+              "conversion_rate": 0.07792207792207792,
+              "n": 1232,
+              "n_converted": 96,
+              "name": "sdr_outbound",
+              "share": 0.352
+            }
+          ],
+          "column": "lead_source",
+          "n_test": 750,
+          "n_train": 3500,
+          "rate_spread": 0.005597430328691616,
+          "test_conversion_rate": 0.07866666666666666,
+          "train_conversion_rate": 0.07914285714285714,
+          "univariate_auc_in_sample": 0.5083011208921436,
+          "univariate_auc_out_of_sample": 0.5225784296892246
+        },
+        {
+          "channels": [
+            {
+              "conversion_rate": 0.08152866242038216,
+              "n": 1570,
+              "n_converted": 128,
+              "name": "inbound_marketing",
+              "share": 0.44857142857142857
+            },
+            {
+              "conversion_rate": 0.07593123209169055,
+              "n": 698,
+              "n_converted": 53,
+              "name": "partner_referral",
+              "share": 0.19942857142857143
+            },
+            {
+              "conversion_rate": 0.07792207792207792,
+              "n": 1232,
+              "n_converted": 96,
+              "name": "sdr_outbound",
+              "share": 0.352
+            }
+          ],
+          "column": "first_touch_channel",
+          "n_test": 750,
+          "n_train": 3500,
+          "rate_spread": 0.005597430328691616,
+          "test_conversion_rate": 0.07866666666666666,
+          "train_conversion_rate": 0.07914285714285714,
+          "univariate_auc_in_sample": 0.5083011208921436,
+          "univariate_auc_out_of_sample": 0.5225784296892246
+        }
+      ],
+      "n_test": 750,
+      "n_train": 3500,
+      "test_conversion_rate": 0.07866666666666666,
+      "tier": "advanced",
+      "train_conversion_rate": 0.07914285714285714
+    }
+  ]
+}
diff --git a/docs/release/channel_signal_audit.md b/docs/release/channel_signal_audit.md
new file mode 100644
index 0000000..2cc3d56
--- /dev/null
+++ b/docs/release/channel_signal_audit.md
@@ -0,0 +1,66 @@
+# Channel-signal audit — leadforge-lead-scoring-v1
+
+Audit produced by `scripts/audit_channel_signal.py`; see `channel_signal_audit.json` for the machine-readable form.
+
+**Scope.** For every tier we compute per-channel conversion rates on the train split and the univariate AUC of channel against `converted_within_90_days`, scored as the empirical positive rate per channel (a 1-D Bayes classifier). Two AUCs are reported: an **in-sample** number (train rates → train labels — biased upward by construction) and an **out-of-sample** number (train rates → test labels — directly comparable to the `source_only` baselines in `release/validation/validation_report.json`).
+
+**Caveat on the industry benchmark.** The G2 / Gemini v2 numbers below are single-step **MQL→SQL** rates (recommendation #8 in `docs/external_review/summaries/recommendations_pass.md`). v1's label is **90-day closed-won**, the entire funnel resolved. The two metrics are not directly comparable; the table is reproduced for context only.
+
+## Industry benchmark (context, not target)
+
+| Channel | MQL→SQL conversion rate |
+|---|---|
+| Email | 0.50% |
+| PPC | 26.00% |
+| SEO | 51.00% |
+
+## Tier: `intro`
+
+`n_train = 3500` (90-day conversion rate 41.46%); `n_test = 750` (rate 42.67%).
+
+### Columns: `lead_source`, `first_touch_channel` (audit values identical)
+
+Per-channel rate spread (max − min): **0.0433**  ·  In-sample univariate AUC: **0.5200**  ·  Out-of-sample univariate AUC: **0.5014**
+
+| Channel | n (train) | Share (train) | Converted (train) | Train rate |
+|---|---:|---:|---:|---:|
+| `inbound_marketing` | 1570 | 44.86% | 682 | 43.44% |
+| `partner_referral` | 698 | 19.94% | 273 | 39.11% |
+| `sdr_outbound` | 1232 | 35.20% | 496 | 40.26% |
+
+## Tier: `intermediate`
+
+`n_train = 3500` (90-day conversion rate 20.14%); `n_test = 750` (rate 22.27%).
+
+### Columns: `lead_source`, `first_touch_channel` (audit values identical)
+
+Per-channel rate spread (max − min): **0.0365**  ·  In-sample univariate AUC: **0.5212**  ·  Out-of-sample univariate AUC: **0.5139**
+
+| Channel | n (train) | Share (train) | Converted (train) | Train rate |
+|---|---:|---:|---:|---:|
+| `inbound_marketing` | 1570 | 44.86% | 334 | 21.27% |
+| `partner_referral` | 698 | 19.94% | 123 | 17.62% |
+| `sdr_outbound` | 1232 | 35.20% | 248 | 20.13% |
+
+## Tier: `advanced`
+
+`n_train = 3500` (90-day conversion rate 7.91%); `n_test = 750` (rate 7.87%).
+
+### Columns: `lead_source`, `first_touch_channel` (audit values identical)
+
+Per-channel rate spread (max − min): **0.0056**  ·  In-sample univariate AUC: **0.5083**  ·  Out-of-sample univariate AUC: **0.5226**
+
+| Channel | n (train) | Share (train) | Converted (train) | Train rate |
+|---|---:|---:|---:|---:|
+| `inbound_marketing` | 1570 | 44.86% | 128 | 8.15% |
+| `partner_referral` | 698 | 19.94% | 53 | 7.59% |
+| `sdr_outbound` | 1232 | 35.20% | 96 | 7.79% |
+
+## Discussion
+
+The numbers above answer one question: *how strongly does channel alone signal 90-day conversion in v1?* They do not answer *whether v1 matches industry channel performance*, since the benchmarks measure a different funnel transition (single MQL→SQL step) and v1 measures the entire funnel resolved over 90 days. Treat the v1 numbers as an internal description of the simulator's channel signal.
+
+Two empirical observations a reader can make from the numbers above:
+
+1. **The out-of-sample univariate AUC is the comparable number** for any external baseline. It uses train-derived rates scored against held-out test labels — the same shape as the `source_only` HistGBM baseline reported in `release/validation/validation_report.json`, which is built on the same task splits with `lead_source` + `first_touch_channel` as the only features. The in-sample number is biased upward by construction — small at v1's N but visible — and is reported here for transparency rather than comparison.
+2. **The numerical conclusion is bundle-specific.** When the per-channel rate spread is small and the OOS univariate AUC is close to chance, channel alone is a weak feature for the bundle this audit was run against. v1's bundles currently produce that outcome (see the per-tier sections above) — consistent with the design: the simulator drives conversion through motif-family hazards keyed off latent traits, not channel-conditional probabilities. Channel-conditional encoding is tracked as post-v1 work in `docs/release/post_v1_roadmap.md`.
diff --git a/docs/release/feature_dictionary.md b/docs/release/feature_dictionary.md
new file mode 100644
index 0000000..790354a
--- /dev/null
+++ b/docs/release/feature_dictionary.md
@@ -0,0 +1,210 @@
+# Feature dictionary — `leadforge-lead-scoring-v1`
+
+Narrative companion to the per-tier `feature_dictionary.csv` shipped
+inside each public bundle. The CSV is the authoritative
+machine-readable spec (column / dtype / description / category /
+target flag / leakage flag); this document groups features by
+analytical role and adds the prose explanation, modelling
+recommendations, and pedagogical caveats that don't fit a CSV row.
+
+The grouping below covers every feature in the public student-facing
+snapshot — the same 32 columns ship in `intro`, `intermediate`, and
+`advanced` bundles. The instructor companion adds the hidden truth
+in `metadata/`; it does not change the feature list.
+
+| Category | Columns | Modelling default |
+|---|---|---|
+| Lead identity & timing | 4 | drop `lead_id`; keep `lead_created_at` for cohort splits, drop for production |
+| Lead source & channel | 2 | keep both |
+| Firmographics | 5 | keep all |
+| Personographics | 3 | keep all (categorical encoders welcome) |
+| Engagement (snapshot-window) | 10 | keep all |
+| Funnel & sales-process | 4 | keep all |
+| Value | 2 | keep all |
+| Leakage trap | 1 | **drop** unless deliberately demonstrating leakage |
+| Target | 1 | label — never used as a feature |
+
+## Lead identity and timing
+
+| Column | Dtype | Source | Modelling notes |
+|---|---|---|---|
+| `lead_id` | string | identity | Opaque, deterministic per run; not informative. Use as a join key or row index, never as a feature. |
+| `account_id` | string | identity | Foreign key into `tables/accounts.parquet`. Out-of-sample accounts may appear in test; if you fit account-level features, watch for cold-start. |
+| `contact_id` | string | identity | Foreign key into `tables/contacts.parquet`. Same warning. |
+| `lead_created_at` | string (ISO-8601) | simulation clock | Lead birthday; useful for cohort/time-shift evaluation (see `docs/release/v1_acceptance_gates.md` G6.4). Drop or bin it for production models — feeding raw timestamps to a linear model is rarely what you want. |
+
+## Lead source and channel
+
+Two columns describe how each lead entered the funnel. They are
+populated from the recipe's GTM-motion mix
+(`inbound_marketing` 45%, `sdr_outbound` 35%, `partner_referral`
+20%) and are identical between the two columns in v1 — both encode
+the same origination channel under different field names.
+
+| Column | Dtype | Why it might matter |
+|---|---|---|
+| `lead_source` | string | Origination channel; one of `inbound_marketing` / `sdr_outbound` / `partner_referral`. |
+| `first_touch_channel` | string | Marketing channel of the first recorded touch. Always equals `lead_source` in v1; the field exists to support post-v1 work where origination and first-touch can diverge. |
+
+**Caveat.** Per [`docs/release/channel_signal_audit.md`](channel_signal_audit.md),
+v1's channel signal is weak: per-channel rate spread ≤ 0.043 and
+univariate AUC ≤ 0.521 across all tiers, well below the G2 /
+Gemini v2 industry MQL→SQL band (SEO ~51%, PPC ~26%, Email <1%).
+Expect modest feature importance from these columns; do not expect
+channel to be a top-tier predictor in v1.
+
+## Firmographics (account-level)
+
+These describe the buying organisation. They come from the recipe's
+narrative spec (industry, region, employee bands, revenue bands)
+and from latent traits sampled per account. Five columns plus the
+`account_id` foreign key listed under "Lead identity and timing"
+above; all five are fair to use as features.
+
+| Column | Dtype | Why it might matter |
+|---|---|---|
+| `industry` | string | Categorical mix is fixed by the recipe (`manufacturing`, `logistics`, `professional_services`, `healthcare_non_clinical`); motif-family latent biases create modest cross-industry conversion-rate differences. |
+| `region` | string | `US` / `UK`. Currently a low-signal axis — the simulator does not model channel-by-region interactions. |
+| `employee_band` | string | Bands are aligned with the ICP range (200–2,000 employees, plus tails). Larger accounts trend toward higher expected ACV. |
+| `estimated_revenue_band` | string | Bands span `$1M-$10M` to `$200M+`; correlated with `employee_band` by design. |
+| `process_maturity_band` | string | A discretisation of the latent `process_maturity` trait — *visible* signal of `motif_family.fit_dominant`'s "fit beats engagement" story. |
+
+## Personographics (contact-level)
+
+These describe the primary contact attached to the lead. Three
+categorical features (the `contact_id` foreign key is listed
+under "Lead identity and timing"); all three are fair to use.
+
+| Column | Dtype | Why it might matter |
+|---|---|---|
+| `role_function` | string | Functional area: `finance`, `ops`, `it`, `procurement`. Drives demo-page views and the demo/trial path through `motif_family.demo_trial_mediated`. |
+| `seniority` | string | `c_suite` / `vp` / `director` / `manager` / `individual_contributor`. Strongly correlated with the latent `contact_authority` trait that gates `motif_family.buying_committee_friction`. |
+| `buyer_role` | string | `economic_buyer`, `champion`, `technical_evaluator`, `end_user`. Hand-mapped from `role_function` × `seniority`. |
+
+## Engagement (snapshot-window aggregates)
+
+Ten engagement features computed strictly over events on days
+`[0, snapshot_day]` (with `snapshot_day = 30` for v1). The simulator
+emits touches, sessions, and page views every day from
+`lead_created_at` onward; the renderer aggregates them up to but
+not past day 30. The 90-day label window resolves separately, so
+features cannot encode events that drove the late-window outcome.
+
+| Column | Dtype | What it captures |
+|---|---|---|
+| `touch_count` | Int64 | All marketing/sales touches in the snapshot window. |
+| `inbound_touch_count` | Int64 | Inbound touches only. |
+| `outbound_touch_count` | Int64 | Outbound touches only. |
+| `session_count` | Int64 | Web/trial session count. |
+| `pricing_page_views` | Int64 | Cumulative pricing-page views across sessions. |
+| `demo_page_views` | Int64 | Cumulative demo-page views across sessions. |
+| `total_session_duration_seconds` | Int64 | Cumulative seconds across all sessions. |
+| `touches_week_1` | Int64 | Touches in days 0–7 inclusive (early urgency proxy; the snapshot builder uses `_day <= 7`, which is 8 day values). |
+| `touches_last_7_days` | Int64 | Touches in the last 7 days of the snapshot window — for `snapshot_day=30`, days 24–30 inclusive (the snapshot builder uses `_day > snapshot_day - 7`). |
+| `days_since_first_touch` | Float64 | NaN if the lead has had zero touches by snapshot day. |
+
+## Funnel and sales-process
+
+The funnel state at snapshot day, exposed via four columns. None of
+these are terminal stages — `current_stage` (which can encode
+`closed_won` / `closed_lost`) is redacted from public bundles via
+the exposure layer.
+
+| Column | Dtype | What it captures |
+|---|---|---|
+| `activity_count` | Int64 | Sales-activity events (calls, demos, follow-ups) in the snapshot window. |
+| `days_since_last_touch` | Float64 | Recency of the most recent touch; NaN if zero touches. |
+| `opportunity_created` | boolean | Whether *any* opportunity was created by snapshot day, regardless of state. |
+| `has_open_opportunity` | boolean | Whether an opportunity existed in an open stage at snapshot day. |
+
+## Value
+
+Two value features. Both are useful as inputs to value-aware
+ranking (`expected_acv × P(convert)`); see notebook 4 once Phase 6
+ships.
+
+| Column | Dtype | What it captures |
+|---|---|---|
+| `opportunity_estimated_acv` | Float64 | Estimated ACV of the most recent open opportunity at snapshot day; NaN if no opportunity. |
+| `expected_acv` | Float64 | Falls back to a revenue-band midpoint heuristic when no opportunity exists, so it has fewer NaNs than `opportunity_estimated_acv`. |
+
+## Leakage trap (deliberate)
+
+| Column | Dtype | Why it ships |
+|---|---|---|
+| `total_touches_all` | Int64 | Counts touches across the full 90-day horizon — not the snapshot window. Flagged `leakage_risk=True` in the CSV (the per-bundle dictionary has columns `name,dtype,description,category,is_target,leakage_risk`); documented in `release/README.md`. The gap `total_touches_all − touch_count` carries label-correlated signal because high-converting leads accumulate more late-window touches in the simulator. **Drop this column from your features unless you are explicitly demonstrating leakage detection.** |
+
+## Target
+
+| Column | Dtype | Definition |
+|---|---|---|
+| `converted_within_90_days` | boolean | True iff a `closed_won` event occurred within 90 days of `lead_created_at`. Derived from simulated events; never sampled directly. |
+
+## Difficulty modulation
+
+Difficulty profiles distort the same feature set with different
+parameters; columns and dtypes are identical across tiers. The
+distortions are applied in `leadforge/render/snapshots.py` via
+`_apply_difficulty_distortions()`:
+
+- **Gaussian noise** on float features. `intro` 0.10, `intermediate`
+  0.30, `advanced` 0.55 (multipliers applied to per-feature
+  standard deviations).
+- **MCAR missingness.** `intro` 2%, `intermediate` 8%,
+  `advanced` 18%.
+- **Outlier injection** at the same per-tier rate as missingness.
+- **Signal strength.** Latent-score weights are multiplied by 0.90
+  (`intro`), 0.70 (`intermediate`), and 0.50 (`advanced`),
+  weakening the link between latent traits and conversion as
+  difficulty rises.
+
+The conversion-rate band for each tier is recipe-defined; observed
+medians across the canonical seed sweep (42–46) are
+0.4267 (`intro`), 0.2160 (`intermediate`), 0.0840 (`advanced`).
+See `release/validation/validation_report.md` for the full
+cross-seed × cross-tier metrics panel.
+
+## Recommended modelling defaults
+
+A short opinionated checklist for a first model. Note: the flat
+`lead_scoring.csv` and the per-task Parquet splits ship every column
+in the table above, including the IDs — the recommendation is what to
+**use as features**, not what's in the file.
+
+1. **Identifiers — drop before fitting.** `lead_id` is opaque and
+   carries no signal; drop it. `account_id` / `contact_id` are joinable
+   keys, useful only when you're computing cross-table aggregates;
+   drop from the feature matrix unless you actually use them. Drop or
+   bin `lead_created_at` — feeding raw timestamps to a linear model
+   is rarely what you want; use it as the cohort key for time-shift
+   evaluation instead.
+2. **Trap — drop.** `total_touches_all` is the deliberate leakage
+   trap. Drop unless you're demonstrating leakage detection.
+3. **Categoricals — encode.** One-hot or target-encode `industry`,
+   `region`, `employee_band`, `estimated_revenue_band`,
+   `process_maturity_band`, `role_function`, `seniority`,
+   `buyer_role`, `lead_source`, `first_touch_channel`. The two
+   channel columns carry identical values in v1; pick one.
+4. **Engagement and funnel — keep all.** The `Float64` columns carry
+   NaN for "no event in window", which is itself a signal — encode
+   missingness explicitly rather than imputing to zero blindly.
+5. **Value-aware ranking.** Use `expected_acv` over
+   `opportunity_estimated_acv`; the latter is missing for leads
+   without an opportunity. Multiply by your model's predicted
+   probability for a default value-weighted ranker.
+6. **Cohort evaluation.** Sort by `lead_created_at` and split
+   chronologically; the random-split AUC is *not* the right number to
+   report if your downstream use is forecasting.
+
+## See also
+
+- `release/{intro,intermediate,advanced}/feature_dictionary.csv` —
+  the authoritative machine-readable spec, regenerated with each
+  bundle.
+- `release/README.md` — the dataset card.
+- `docs/release/generation_method.md` — how the underlying
+  events are generated.
+- `docs/release/channel_signal_audit.md` — how strongly each
+  channel column signals conversion in v1.
+- `release/validation/validation_report.md` — calibration, lift,
+  P@K, model-family deltas, cross-seed bands.
diff --git a/docs/release/generation_method.md b/docs/release/generation_method.md
new file mode 100644
index 0000000..12029d3
--- /dev/null
+++ b/docs/release/generation_method.md
@@ -0,0 +1,166 @@
+# Generation method — `leadforge-lead-scoring-v1`
+
+A standalone summary of how the dataset is generated, written for
+external readers. Read this before opening the bundle if you want to
+know what the data is and how much you can trust each piece of it; for
+the full architecture, see [`docs/leadforge_architecture_spec.md`].
+
+## What the dataset is
+
+`leadforge-lead-scoring-v1` is a synthetic mid-market B2B SaaS
+lead-scoring dataset generated by
+[leadforge](https://github.com/leadforge-dev/leadforge), an
+open-source Python framework. Every row, event, and edge is produced
+by code in this repository — there is no real CRM behind the data.
+The generator is deterministic given a fixed
+`(recipe, configuration, seed, package version)` tuple, and the
+recipe and seed are recorded in each bundle's `manifest.json`.
+
+The published family contains three difficulty tiers — `intro`,
+`intermediate`, and `advanced` — sharing one fictional company
+narrative ("Veridian Procure", a procurement / AP automation SaaS).
+The tiers differ only in noise, missingness, and signal strength,
+modulated by a difficulty profile that the simulator consumes; the
+underlying causal structure is identical. A separate
+`*_instructor` companion ships the full hidden truth (causal graph,
+latent registry, mechanism summary, full-horizon relational tables).
+
+## Generation pipeline at a glance
+
+Generation runs in five layers, top to bottom. Every layer is
+deterministic, every layer is seeded from a single root via named
+substreams, and every layer is testable in isolation.
+
+1. **Hidden world structure.** A directed acyclic graph (DAG) of
+   latent traits, lead states, sales-process states, and the
+   `Converted within 90 days` outcome node, sampled from one of five
+   *motif families* and then perturbed by stochastic rewiring. The
+   motif families are intentionally non-uniform: `fit_dominant`,
+   `intent_dominant`, `sales_execution_sensitive`,
+   `demo_trial_mediated`, `buying_committee_friction`. Two
+   independently-sampled bundles share neither the exact graph nor
+   the edge weights, but they share the constraint that the graph is
+   acyclic, every node is reachable from a root, and the outcome
+   node is reachable from every non-root subgraph.
+2. **Mechanism layer.** Every node in the sampled graph receives a
+   concrete mechanism — a logistic latent score, a Poisson intensity
+   for touch counts, a recency-decayed engagement intensity for
+   sessions, a categorical influence for source channel, a stage
+   transition hazard, a conversion hazard, etc. Mechanisms are
+   assigned by motif family, so a `fit_dominant` graph and an
+   `intent_dominant` graph end up with materially different
+   behavior at simulation time. Mechanism parameters are calibrated
+   so each tier hits its target conversion-rate band; the
+   `intermediate` tier is the canonical difficulty profile.
+3. **Population layer.** Accounts (1,500), contacts (4,200), and
+   leads (5,000) are drawn with deterministic foreign keys and
+   ID-stable namespaces (`acct_000001`, `lead_000001`, …). Each
+   entity carries a vector of latent traits seeded from the world
+   graph: account fit, process maturity, contact authority,
+   problem awareness, urgency, etc. Industry, region, employee
+   band, role, and seniority are all drawn from the recipe's
+   narrative spec; firmographic correlations come from
+   motif-family latent biases applied during sampling.
+4. **Simulation engine.** A 90-day discrete-time simulator
+   advances every lead day-by-day from MQL through the funnel
+   (`mql → sal → sql → demo_scheduled → demo_completed →
+   proposal_sent → negotiation → closed_won/closed_lost`). Each
+   day, hazards from the mechanism layer fire: stage transitions,
+   touches (inbound vs outbound, recency-decayed), web sessions
+   (pricing-page views, demo-page views), sales activities,
+   churn, and direct conversion for unusual fast paths. Once a
+   lead reaches `closed_won`, opportunities, customers, and
+   subscriptions materialise with deterministic foreign keys.
+   `converted_within_90_days` is *event-derived*: it is true iff
+   a `closed_won` event occurred within the configured label
+   window, never sampled directly.
+5. **Snapshot rendering.** For every lead, the renderer freezes a
+   feature snapshot at `snapshot_day` (30 days for v1).
+   Aggregates such as `touch_count`, `session_count`,
+   `pricing_page_views`, `expected_acv`, and
+   `days_since_last_touch` only see events on days
+   `[0, snapshot_day]`; the label resolves over the full 90-day
+   horizon. The deliberate exception is `total_touches_all`,
+   which counts the full-horizon touch history and is flagged as
+   a pedagogical leakage trap in the feature dictionary.
+
+## Bundle output
+
+Each bundle writes a fixed directory layout — a manifest, dataset
+card, feature dictionary, relational tables, and the
+`converted_within_90_days` task split. The manifest records the
+recipe, seed, package version, exposure mode, snapshot day, label
+window, schema version, table inventory with row counts, SHA-256
+hashes for every file, and the exact set of redacted columns. Two
+runs with the same `(recipe, seed, version)` produce byte-identical
+bundles modulo the wall-clock `generation_timestamp` field;
+`scripts/verify_hash_determinism.py` enforces this.
+
+The public (`student_public`) bundle and the instructor companion
+share the same generator run; they differ only in *what is
+published*. Filtering happens during rendering, not during
+simulation:
+
+- Public bundles route relational tables through
+  `to_dataframes_snapshot_safe`, which (a) filters event tables
+  per-lead by `lead_created_at + snapshot_day`, (b) drops
+  terminal-state columns from `leads` and `opportunities`, and
+  (c) omits `customers` and `subscriptions` entirely (their
+  presence is conversion-conditional).
+- Instructor companions skip the snapshot-safe writer and ship
+  full-horizon tables plus a `metadata/` directory containing the
+  hidden world graph, latent registry, mechanism summary, and
+  full world spec. They are not appropriate input for the
+  student-facing task.
+
+The exact column lists are pinned by `BANNED_LEAD_COLUMNS`,
+`BANNED_OPP_COLUMNS`, `BANNED_TABLES`, and
+`SNAPSHOT_FILTERED_TABLES` in
+`leadforge/validation/leakage_probes.py`; the validator imports the
+same constants the writer uses, so the contract is single-sourced.
+
+## Calibration and validation
+
+Difficulty calibration is empirical, not analytic: the
+intermediate tier is sampled, the conversion-rate band is checked,
+and the signal-strength multiplier is tuned until five seeds
+(42–46) hit the target band with stable variance. The intro and
+advanced tiers reuse the same mechanism assignments with different
+distortion parameters (Gaussian noise on float features, MCAR
+missingness, outlier injection) calibrated the same way.
+
+Every claim made about realism, calibration, or difficulty is
+backed by `release/validation/validation_report.md`, which is
+regenerated by `scripts/validate_release_candidate.py`. The driver
+runs the full release-quality panel — per-tier ROC-AUC, PR-AUC, log
+loss, Brier, calibration bins, lift, P@K, top-decile rate,
+expected-ACV capture, model-family deltas, cross-seed bands,
+random-vs-cohort split degradation, and the full leakage probe
+taxonomy — and exits non-zero if anything falls outside the bands
+declared in `docs/release/v1_acceptance_gates_bands.yaml`.
+
+## What this is not
+
+- Not a substitute for real CRM data. The vertical, narrative,
+  and motif families are deliberate fictions chosen to teach
+  lead-scoring patterns without exposing real customer data.
+- Not a benchmark. The difficulty tiers are calibrated for
+  pedagogy, not for cross-paper comparability.
+- Not a temporally rich dataset. The simulator runs in
+  daily steps over a 90-day horizon. Sales-cycle distributions
+  are whatever falls out of the daily hazards, not log-normal /
+  Weibull tails. Demographic strings are clean (no
+  free-text-job-title messiness). Both are tracked as post-v1
+  scope in `docs/release/post_v1_roadmap.md`.
+
+## Further reading
+
+For the deeper design rationale — why a DAG, why motif families,
+why event-derived labels, why public-vs-instructor — see
+[`docs/leadforge_design_doc.md`] and
+[`docs/leadforge_architecture_spec.md`]. Both documents are aimed at
+contributors and document the package internals; this doc stays at
+the conceptual level external readers need.
+
+[`docs/leadforge_design_doc.md`]: ../leadforge_design_doc.md
+[`docs/leadforge_architecture_spec.md`]: ../leadforge_architecture_spec.md
diff --git a/release/README.md b/release/README.md
index cb85b61..cb1329c 100644
--- a/release/README.md
+++ b/release/README.md
@@ -1,103 +1,80 @@
-# LeadForge: Synthetic B2B Lead Scoring Dataset
-
-A relational, reproducible, multi-difficulty lead scoring dataset generated by [leadforge](https://github.com/leadforge-dev/leadforge) -- an open-source Python framework for synthetic CRM/funnel data.
-
-## Why this dataset?
-
-Most public lead scoring datasets are flat CSVs with opaque provenance. This one is different:
-
-1. **Relational structure.** 9 normalized tables (accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, subscriptions) plus ML-ready task splits. Practice feature engineering from raw tables, or grab the flat file and start modeling.
-
-2. **Three difficulty tiers.** Same company, same product, same buyer personas -- different difficulty profiles that produce meaningfully different conversion rates, noise levels, and missingness.
-
-3. **Reproducible and leakage-safe.** Deterministic generation from a fixed seed. SHA-256 hashes for every file in `manifest.json`. The label-encoding `current_stage` column is stripped from the public bundles in the exposure layer. Event-aggregate features (`touch_count`, `session_count`, `pricing_page_views`, ...) are computed over a 30-day window — they cannot encode events that happen *after* day 30, even though the label resolves over a 90-day window. The only leakage-flagged column that ships in `student_public` is the deliberately included pedagogical trap `total_touches_all`, which counts the full 90-day touch history and is marked `is_leakage_trap=True` in the feature dictionary.
+# LeadForge: Synthetic B2B Lead Scoring Dataset (`leadforge-lead-scoring-v1`)
+
+A relational, reproducible, three-tier synthetic CRM dataset family for
+teaching lead scoring at scale. Generated by
+[leadforge](https://github.com/leadforge-dev/leadforge), an
+open-source Python framework for synthetic CRM/funnel data. The
+framework version is decoupled from the dataset version: the package
+stays at `1.x`; the dataset is published under the explicit `…-v1`
+tag.
+
+## Why lead scoring matters in 2024–2026
+
+Mid-market SaaS vendors entered 2024–2026 with growth slowing and
+customer-acquisition costs rising[^macro], so predicting *which* leads
+convert within a fixed window has moved from a marketing nicety to a
+survival skill. This dataset teaches that skill on a relational
+substrate, with the realistic confusions (snapshot-window discipline,
+leakage traps, channel signal weaker than vendor blogs imply) that
+students will hit when they finally get hands on real CRM data.
+
+[^macro]: Macroeconomic framing summarised in
+[`docs/external_review/summaries/gemini_v2_summary.md`](../docs/external_review/summaries/gemini_v2_summary.md)
+(median public-SaaS growth 30%→25% from 2023 to 2025; New CAC Ratio
+rose materially in 2024).
 
 ## What's inside
 
 ```
 release/
-|-- README.md                            # This file
-|-- LICENSE                              # MIT
-|-- intro/                               # Difficulty tier 1
-|   |-- manifest.json                    # Provenance: seed, recipe, version, file hashes
-|   |-- dataset_card.md                  # Human-readable dataset summary
-|   |-- feature_dictionary.csv           # Column descriptions, types, leakage flags
-|   |-- lead_scoring.csv                 # Flat convenience file (all splits + split column)
-|   |-- tables/                          # 9 relational Parquet tables
-|   |   |-- accounts.parquet
-|   |   |-- contacts.parquet
-|   |   |-- leads.parquet
-|   |   |-- touches.parquet
-|   |   |-- sessions.parquet
-|   |   |-- sales_activities.parquet
-|   |   |-- opportunities.parquet
-|   |   |-- customers.parquet
-|   |   |-- subscriptions.parquet
-|   |-- tasks/converted_within_90_days/  # Pre-split ML task
-|       |-- train.parquet                # 70% of leads
-|       |-- valid.parquet                # 15% of leads
-|       |-- test.parquet                 # 15% of leads
-|-- intermediate/                        # Difficulty tier 2 (same structure)
-|-- advanced/                            # Difficulty tier 3 (same structure)
-|-- intermediate_instructor/             # Research companion (adds metadata/)
-|   |-- metadata/                        # Hidden causal structure
-|       |-- graph.json                   # World graph (DAG)
-|       |-- graph.graphml                # World graph (GraphML)
-|       |-- world_spec.json              # Full generation config
-|       |-- latent_registry.json         # Per-entity latent trait values
-|       |-- mechanism_summary.json       # Causal mechanism assignments
-|-- notebooks/
-    |-- 01_baseline_lead_scoring.ipynb   # Baseline modeling walkthrough
+├── intro/ intermediate/ advanced/    # student_public bundles, one per difficulty tier
+│   ├── manifest.json                 # provenance + file hashes
+│   ├── dataset_card.md               # auto-rendered per-bundle card
+│   ├── feature_dictionary.csv        # authoritative column spec
+│   ├── lead_scoring.csv              # flat convenience CSV (all splits)
+│   ├── tables/*.parquet              # 7 snapshot-safe relational tables
+│   └── tasks/converted_within_90_days/{train,valid,test}.parquet
+├── intermediate_instructor/          # research companion: full-horizon tables + metadata/
+├── notebooks/01_baseline_lead_scoring.ipynb
+└── validation/                       # validation_report.{json,md} + figures
 ```
 
-## Quick start
+`student_public` bundles ship the snapshot-safe relational view;
+`research_instructor` companions ship the full-horizon view plus the
+hidden causal structure (DAG, latent registry, mechanism summary)
+under `metadata/`. The full layout is documented in each bundle's
+`manifest.json`.
 
-### Option 1: Flat CSV (simplest)
+## Quick start
 
 ```python
-import pandas as pd
-
+# Flat CSV
 df = pd.read_csv("intermediate/lead_scoring.csv")
-train = df[df["split"] == "train"].drop(columns=["split"])
-test = df[df["split"] == "test"].drop(columns=["split"])
-```
-
-### Option 2: Parquet task splits (recommended)
-
-```python
-import pandas as pd
 
+# Parquet task splits (recommended)
 train = pd.read_parquet("intermediate/tasks/converted_within_90_days/train.parquet")
-test = pd.read_parquet("intermediate/tasks/converted_within_90_days/test.parquet")
-```
-
-**Note:** The label `converted_within_90_days` is evaluated over the full **90 days** from lead creation. Event-aggregate features (`touch_count`, `session_count`, `pricing_page_views`, `expected_acv`, `days_since_last_touch`, ...) observe **only the first 30 days** of that window — so even when a lead converts on day 50, the features are frozen at day 30 and cannot encode the conversion event. The deliberate exception is `total_touches_all`, a leakage trap (flagged `leakage_risk=True` and `is_leakage_trap=True` in `feature_dictionary.csv`) that counts touches over the full 90-day horizon. Exclude it from your feature set unless you're explicitly demonstrating leakage detection. The label-encoding `current_stage` column is *not* present in `student_public` bundles -- it appears only in `intermediate_instructor/`.
+test  = pd.read_parquet("intermediate/tasks/converted_within_90_days/test.parquet")
 
-### Option 3: Relational tables (feature engineering)
-
-```python
-import pandas as pd
-
-accounts = pd.read_parquet("intermediate/tables/accounts.parquet")
-leads = pd.read_parquet("intermediate/tables/leads.parquet")
+# Relational tables (feature engineering — example)
+leads   = pd.read_parquet("intermediate/tables/leads.parquet")
 touches = pd.read_parquet("intermediate/tables/touches.parquet")
-
-# Engineer your own features from raw event tables
-touch_counts = touches.groupby("lead_id").size().rename("my_touch_count")
-features = leads.merge(accounts, on="account_id").merge(touch_counts, on="lead_id", how="left")
+my_touch_count = (
+    touches.groupby("lead_id").size().rename("my_touch_count").reset_index()
+)
+features = leads.merge(my_touch_count, on="lead_id", how="left")
+
+# Reproduce from source
+# pip install leadforge
+# leadforge generate --recipe b2b_saas_procurement_v1 --seed 42 \
+#                    --mode student_public --difficulty intermediate --out my_bundle
 ```
 
-### Option 4: Reproduce from source
-
-```bash
-pip install leadforge
-leadforge generate \
-  --recipe b2b_saas_procurement_v1 \
-  --seed 42 \
-  --mode student_public \
-  --difficulty intermediate \
-  --out my_bundle
-```
+The label `converted_within_90_days` resolves over a 90-day window;
+engagement features (`touch_count`, `session_count`, etc.) are
+computed strictly over events on days `[0, 30]`. The deliberate
+exception is `total_touches_all`, the leakage trap — flagged
+`leakage_risk=True` in `feature_dictionary.csv`. Drop it from your
+feature set unless you're demonstrating leakage detection.
 
 ## Dataset summary
 
@@ -106,66 +83,150 @@ leadforge generate \
 | Leads | 5,000 | 5,000 | 5,000 |
 | Accounts | 1,500 | 1,500 | 1,500 |
 | Contacts | 4,200 | 4,200 | 4,200 |
-| Columns | 32 (student_public) / 34 (instructor) | 32 / 34 | 32 / 34 |
+| Snapshot columns | 32 / 34* | 32 / 34* | 32 / 34* |
 | Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` |
-| Conversion rate (target) | 30-45% | 18-28% | 8-15% |
-| Conversion rate (observed) | 41.5% | 20.1% | 7.9% |
+| Conversion rate (recipe band) | 24–61% | 12–31% | 4–12% |
+| Conversion rate (median, seeds 42–46) | 42.67% | 21.60% | 8.40% |
 | Signal strength | 0.90 | 0.70 | 0.50 |
 | Noise scale | 0.10 | 0.30 | 0.55 |
 | Missing rate | 2% | 8% | 18% |
 
-Higher difficulty means weaker signal, more noise, more missingness, and lower base conversion rate -- all modulated in the simulation engine. Target ranges are defined in `difficulty_profiles.yaml`.
+\* `student_public` / `research_instructor`. Difficulty is modulated
+by the simulation engine — signal strength on latent-trait weights,
+Gaussian noise on float features, MCAR missingness, outlier rate —
+not post-hoc label flipping.
 
 ## The scenario
 
-**Veridian Technologies** is a Series B startup (Austin, US) selling **Veridian Procure**, a cloud-based procurement and AP automation platform, to mid-market firms (200-2,000 employees) in the US and UK.
-
-The sales funnel runs through inbound marketing (45%), SDR outbound (35%), and partner referrals (20%). Four buyer personas drive deals: VP Finance (economic buyer), AP Manager (champion), IT Director (technical evaluator), and Procurement Manager (end user).
-
-**Task:** predict whether a lead will convert (closed-won) within 90 days of entering the funnel.
-
-## Feature dictionary
-
-Each bundle contains a `dataset_card.md` and a `feature_dictionary.csv` with the authoritative, auto-generated column list, descriptions, dtypes, and `leakage_risk` flags. Refer to those rather than mirroring counts here, which would drift.
-
-**Leakage handling (bundle schema v4)**
-
-Two separate mechanisms keep the published feature set leakage-safe:
-
-1. **Windowed snapshot.** Every event-aggregate feature is computed over a 30-day window (`manifest.snapshot_day == 30`); the label resolves over the full 90 days (`manifest.label_window_days == 90`). Features cannot see touches, sessions, or opportunities that occurred after day 30. The only feature that intentionally crosses this line is `total_touches_all`, the pedagogical trap.
-2. **Column redaction.** A small set of columns that *would* encode the label structurally (`current_stage`, `is_sql`) are stripped from `student_public` bundles entirely — both from `tasks/` splits and from `tables/leads.parquet`, so feature engineering off the relational tables cannot recover them.
-
-| Column | Status in `student_public` | Status in `intermediate_instructor` | Why |
-|---|---|---|---|
-| `current_stage` | redacted (gone from task splits and `tables/leads.parquet`) | retained | At day 90 this contains terminal stages (`closed_won`/`closed_lost`) that encode the label directly. |
-| `is_sql` | redacted | retained | `is_sql=False` predicts non-conversion with very high probability — measured across 5 seeds, P(conv \| is_sql=False) = 0.061 ± 0.026 (intro) / 0.020 ± 0.010 (intermediate) / 0.011 ± 0.004 (advanced). |
-| `is_mql` | removed entirely (no mode has it) | removed entirely | Every lead is initialised at MQL stage in the simulator, so the field was constant `True` and carried no information. |
-| `total_touches_all` | retained | retained | Deliberate pedagogical leakage trap. Counts touches over the full 90-day horizon while every other touch feature stops at day 30, so the gap (`total_touches_all - touch_count`) carries real signal. Flagged `leakage_risk=True` in `feature_dictionary.csv`. Train with and without it, compare AUC, explain the gap. |
-
-The `redacted_columns` and `snapshot_day` fields in each bundle's `manifest.json` record exactly what was stripped and at what window features were computed.
-
-## Research companion
-
-The `intermediate_instructor/` bundle includes the full hidden causal structure:
-
-- **World graph:** The DAG of causal relationships driving lead outcomes
-- **Latent registry:** Per-entity latent trait values (account fit, contact authority, engagement propensity)
-- **Mechanism summary:** How each node in the graph maps to simulation behavior
-
-This enables research on causal inference, model interpretability, and DGP-aware evaluation.
-
-## Provenance
+**Veridian Technologies** is a fictional Series B startup (Austin, US)
+selling **Veridian Procure**, a procurement / AP automation SaaS, to
+mid-market firms (200–2,000 employees) in the US and UK. The funnel
+runs through inbound marketing (45%), SDR outbound (35%), and
+partner referrals (20%); four personas drive deals (VP Finance, AP
+Manager, IT Director, Procurement Manager). **Task:** predict whether
+a lead converts (`closed_won`) within 90 days. ACV bands are
+$18k–$120k. See
+[`docs/release/generation_method.md`](../docs/release/generation_method.md)
+for the full DGP, and the deeper "what's modelled / approximate / not
+modelled" breakdown that this README only summarises.
+
+## Public vs instructor: what's redacted
+
+Filtering happens **during rendering**, not during simulation. The
+redaction contract is single-sourced in
+[`leadforge/validation/leakage_probes.py`](../leadforge/validation/leakage_probes.py);
+the snapshot-safe writer and the validator import the same constants,
+so they cannot drift apart.
+
+| Source-of-truth constant | Public bundle treatment |
+|---|---|
+| `BANNED_LEAD_COLUMNS = ("converted_within_90_days", "conversion_timestamp")` | Dropped from `tables/leads.parquet` |
+| `BANNED_OPP_COLUMNS = ("close_outcome", "closed_at")` | Dropped from `tables/opportunities.parquet` |
+| `BANNED_TABLES = ("customers", "subscriptions")` | Omitted from public bundles |
+| `SNAPSHOT_FILTERED_TABLES` (touches, sessions, sales_activities, opportunities) | Filtered per-lead by `lead_created_at + snapshot_day` |
+| Snapshot redaction (`current_stage`, `is_sql`) | Stripped from `tasks/` splits and `tables/leads.parquet` |
+| `total_touches_all` (deliberate trap) | **Retained in both modes**; flagged `leakage_risk=True` |
+
+Each bundle's `manifest.json` records `relational_snapshot_safe`,
+`redacted_columns`, and `snapshot_day`, so the bundle is
+self-describing.
+
+## Calibration
+
+Every realism / calibration / difficulty claim in this README is
+backed by
+[`validation/validation_report.md`](validation/validation_report.md),
+regenerated by
+[`scripts/validate_release_candidate.py`](../scripts/validate_release_candidate.py)
+with bands declared in
+[`docs/release/v1_acceptance_gates_bands.yaml`](../docs/release/v1_acceptance_gates_bands.yaml).
+Headline cross-seed medians (seeds 42–46):
+
+| Tier | LR AUC | AP | P@100 | Brier |
+|---|---|---|---|---|
+| intro | 0.879 | 0.761 | 0.80 | 0.130 |
+| intermediate | 0.886 | 0.575 | 0.59 | 0.110 |
+| advanced | 0.886 | 0.351 | 0.34 | 0.061 |
+
+AP, P@100, conversion-rate, and lift orderings hold across the
+intended difficulty axis (intro > intermediate > advanced).
+
+## Intended uses
+
+- Teaching baseline lead-scoring on a flat snapshot.
+- Teaching relational feature engineering against snapshot-safe tables.
+- Teaching leakage detection (the `total_touches_all` trap is
+  designed to be discoverable).
+- Teaching calibration, lift, P@K, value-aware ranking
+  (`expected_acv × P(convert)`), and cohort-shift evaluation.
+- Comparing model families under a controlled DGP.
+
+## Out-of-scope uses
+
+- **Production lead scoring.** The company, product, and customers are
+  fictional.
+- **Vendor benchmarking / paper baselines.** Difficulty tiers are
+  calibrated for pedagogy, not cross-paper comparability.
+- **Causal-inference research that requires recovery of the true DGP.**
+  The instructor companion exposes the hidden graph for teaching, not
+  designed counterfactuals.
+- **Demographic / fairness research.** v1 does not model protected
+  attributes.
+
+## Known limitations
+
+- **Difficulty signal on raw AUC is flat.** LR AUC is ~0.88 across
+  every tier. Difficulty is visible in AP, P@K, Brier, and value
+  capture. Treat AUC as a sanity check, not a difficulty signal.
+- **GBM does not consistently beat LR (gate G7.4.4).** GBM−LR AUC delta
+  is slightly negative in every tier (intro −0.0045, intermediate
+  −0.0072, advanced −0.0133); v1's snapshot is dominated by linear
+  features. v2 will inject non-linear interactions in the simulator.
+- **Channel signal is weak.** Per
+  [`docs/release/channel_signal_audit.md`](../docs/release/channel_signal_audit.md),
+  out-of-sample univariate AUC of `lead_source` is ≈0.50–0.52 across
+  all tiers and the per-channel rate spread is ≤0.05. The simulator
+  does not encode channel-conditional probabilities; channel-conditional
+  encoding is post-v1 work.
+- **Cohort-shift degradation is small.** v1 has no time-of-year drift
+  baked in; the cohort-shift gate (G6.4) is informational and will
+  bite in v2.
+
+## Composition
+
+- **Entities.** Accounts, contacts, leads, touches, sessions,
+  sales_activities, opportunities (public); plus customers and
+  subscriptions (instructor only). Per-row counts per bundle live in
+  `manifest.json`.
+- **Features.** 32 public columns grouped by analytical role in
+  [`docs/release/feature_dictionary.md`](../docs/release/feature_dictionary.md);
+  the per-bundle `feature_dictionary.csv` is the authoritative
+  machine-readable spec.
+- **Label.** `converted_within_90_days` (boolean), event-derived from
+  the simulator. Never sampled directly.
+- **Splits.** 70/15/15 train/valid/test, deterministic given seed;
+  recorded in `tasks/converted_within_90_days/task_manifest.json`.
+- **Provenance.** Recipe `b2b_saas_procurement_v1`, seed 42, package
+  version stamped in `manifest.json`.
+
+## Maintenance, adversarial framing, license
+
+We *want* the dataset to be broken. Issue templates ship under
+`.github/ISSUE_TEMPLATE/` (Phase 6); the break-me guide lands as
+`docs/release/break_me_guide.md` (PR 6.3). Once Phase 6 ships,
+`docs/release/v2_decision_log.md` will track every accepted finding
+and the design call that came from it. File issues at
+[leadforge-dev/leadforge](https://github.com/leadforge-dev/leadforge);
+PRs welcome.
 
 | Field | Value |
 |---|---|
-| Generator | [leadforge](https://github.com/leadforge-dev/leadforge) v1.0.0 |
+| Generator | leadforge `1.0.0+` |
 | Recipe | `b2b_saas_procurement_v1` |
-| Seed | 42 |
-| Format | Parquet + CSV |
-| License | MIT |
-
-Every bundle includes a `manifest.json` with the exact package version, recipe, seed, generation timestamp, and SHA-256 hashes for all data files. To verify integrity or regenerate, install leadforge and run the generation command above.
-
-## License
+| Canonical seed | 42 (cross-seed sweep: 42–46) |
+| Bundle schema version | 5 |
+| Format | Parquet (canonical) + CSV (convenience) |
+| License | MIT — see [LICENSE](LICENSE) |
 
-MIT. See [LICENSE](LICENSE).
+Verify integrity with `leadforge validate <bundle_dir>`; every file
+is hashed in `manifest.json`.
diff --git a/scripts/audit_channel_signal.py b/scripts/audit_channel_signal.py
new file mode 100644
index 0000000..12a50f4
--- /dev/null
+++ b/scripts/audit_channel_signal.py
@@ -0,0 +1,628 @@
+#!/usr/bin/env python3
+"""Audit how strongly the lead-source channel signals conversion.
+
+Companion analysis for PR 4.1 (recommendation #8 v1 scope from
+``docs/external_review/summaries/recommendations_pass.md``).  For every
+tier in a release bundle family we compute, separately for ``lead_source``
+and ``first_touch_channel``:
+
+* per-channel conversion rate, share, and counts on the **train** split
+* the **in-sample** univariate AUC: per-channel rates derived on train
+  and scored against train labels (a 1-D Bayes classifier; biased upward
+  for small categorical alphabets)
+* the **out-of-sample** univariate AUC: per-channel rates derived on
+  train and scored against **test** labels — directly comparable to the
+  ``source_only`` baselines in ``release/validation/validation_report.json``
+
+The script does not assign a categorical "weak / moderate / strong"
+verdict.  Industry MQL→SQL benchmarks are surfaced for context only;
+they measure a different funnel transition (single MQL→SQL step, not
+the 90-day closed-won label v1 reports), so a hard comparison would be
+a category error.  The audit doc states the v1 numbers and an explicit
+caveat; readers draw the comparison.
+
+Outputs (defaults are pinned via the v1 acceptance gates):
+
+* ``docs/release/channel_signal_audit.md`` — human-readable audit
+* ``docs/release/channel_signal_audit.json`` — machine-readable sibling
+
+The script is deterministic given a fixed bundle: it reads
+``train.parquet`` and ``test.parquet`` only, derives empirical rates,
+and uses ``sklearn.metrics.roc_auc_score`` with no fit-time randomness.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from collections.abc import Sequence
+from dataclasses import asdict, dataclass
+from pathlib import Path
+from typing import Any, Final
+
+import pandas as pd
+from sklearn.metrics import roc_auc_score
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+CHANNEL_COLUMNS: Final[tuple[str, ...]] = ("lead_source", "first_touch_channel")
+LABEL_COLUMN: Final[str] = "converted_within_90_days"
+DEFAULT_TIERS: Final[tuple[str, ...]] = ("intro", "intermediate", "advanced")
+DEFAULT_TASK: Final[str] = "converted_within_90_days"
+
+#: G2 industry MQL→SQL conversion rates surfaced in
+#: ``docs/external_review/summaries/gemini_v2_summary.md`` (recommendation #8).
+#: They measure a single MQL→SQL transition, NOT v1's 90-day closed-won
+#: label.  Stored as a tuple of pairs so the dataclass field is genuinely
+#: immutable; converted to a plain dict at JSON-render time.
+INDUSTRY_MQL_TO_SQL_BENCHMARKS: Final[tuple[tuple[str, float], ...]] = (
+    ("Email", 0.005),
+    ("PPC", 0.26),
+    ("SEO", 0.51),
+)
+
+DEFAULT_RELEASE_DIR: Final[Path] = Path("release")
+DEFAULT_OUT_MD: Final[Path] = Path("docs/release/channel_signal_audit.md")
+DEFAULT_OUT_JSON: Final[Path] = Path("docs/release/channel_signal_audit.json")
+
+
+# ---------------------------------------------------------------------------
+# Result dataclasses
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class ChannelStats:
+    """Per-channel rollup for one channel column on the train split."""
+
+    name: str
+    n: int
+    share: float
+    n_converted: int
+    conversion_rate: float
+
+
+@dataclass(frozen=True)
+class ChannelAudit:
+    """Audit results for one channel column in one tier.
+
+    Per-channel statistics come from the train split.
+    ``univariate_auc_in_sample`` re-uses train labels (bias-prone but
+    matches the historical 1-D Bayes-classifier interpretation);
+    ``univariate_auc_out_of_sample`` scores the train-derived rates
+    against the held-out test split.
+    """
+
+    column: str
+    n_train: int
+    n_test: int
+    train_conversion_rate: float
+    test_conversion_rate: float
+    channels: tuple[ChannelStats, ...]
+    rate_spread: float
+    univariate_auc_in_sample: float
+    univariate_auc_out_of_sample: float
+
+
+@dataclass(frozen=True)
+class ChannelGroup:
+    """One or more channel columns with byte-identical audit values.
+
+    v1's ``lead_source`` and ``first_touch_channel`` produce identical
+    numbers in every tier — this dataclass lets the markdown renderer
+    collapse them into one section without losing information.
+    """
+
+    columns: tuple[str, ...]
+    audit: ChannelAudit
+
+
+@dataclass(frozen=True)
+class TierAudit:
+    """Audit results for one tier across every channel column."""
+
+    tier: str
+    n_train: int
+    n_test: int
+    train_conversion_rate: float
+    test_conversion_rate: float
+    columns: tuple[ChannelAudit, ...]
+
+
+@dataclass(frozen=True)
+class AuditReport:
+    """Full audit: every requested tier × channel column."""
+
+    release_dir: str
+    task: str
+    label_column: str
+    channel_columns: tuple[str, ...]
+    tiers: tuple[TierAudit, ...]
+    industry_mql_to_sql_benchmarks: tuple[tuple[str, float], ...]
+
+
+# ---------------------------------------------------------------------------
+# Pure functions
+# ---------------------------------------------------------------------------
+
+
+def _label_to_int(series: pd.Series) -> pd.Series:
+    """Coerce a label column to ``int``.
+
+    Handles three dtypes the v1 bundles actually carry: numpy ``bool``,
+    pandas nullable ``BooleanDtype`` (used by the parquet schema), and
+    plain numeric.  Other dtypes raise via ``pd.to_numeric``.
+    """
+
+    if pd.api.types.is_bool_dtype(series):
+        return series.astype("Int64").astype(int)
+    return pd.to_numeric(series, errors="raise").astype(int)
+
+
+def _conversion_rate(df: pd.DataFrame, label_col: str) -> float:
+    if len(df) == 0:
+        return 0.0
+    return float(int(_label_to_int(df[label_col]).sum()) / len(df))
+
+
+def _auc_or_chance(y: pd.Series, scores: pd.Series) -> float:
+    """ROC AUC, falling back to ``0.5`` when undefined (single class)."""
+
+    if y.nunique() < 2:
+        return 0.5
+    return float(roc_auc_score(y.to_numpy(), scores.to_numpy()))
+
+
+def audit_channel(
+    train: pd.DataFrame,
+    channel_col: str,
+    *,
+    test: pd.DataFrame,
+    label_col: str = LABEL_COLUMN,
+) -> ChannelAudit:
+    """Per-channel stats and univariate AUCs (in-sample + OOS).
+
+    Both AUCs use the same scoring function: the per-channel positive
+    rate derived from the train split.  The "in-sample" AUC scores
+    that against train labels (biased upward by construction); the
+    "out-of-sample" AUC scores it against held-out test labels and
+    is directly comparable to the ``source_only`` baselines in
+    ``release/validation/validation_report.json``.
+    """
+
+    for df_name, df in (("train", train), ("test", test)):
+        if channel_col not in df.columns:
+            raise KeyError(f"channel column {channel_col!r} not present in {df_name}")
+        if label_col not in df.columns:
+            raise KeyError(f"label column {label_col!r} not present in {df_name}")
+
+    y_train = _label_to_int(train[label_col])
+    n_train = len(train)
+    n_test = len(test)
+    train_rate = float(int(y_train.sum()) / n_train) if n_train else 0.0
+    test_rate = _conversion_rate(test, label_col)
+
+    grouped = train.assign(_y=y_train).groupby(channel_col, dropna=False)
+    rows: list[ChannelStats] = []
+    for name, sub in sorted(grouped, key=lambda kv: str(kv[0])):
+        n = len(sub)
+        n_conv = int(sub["_y"].sum())
+        rows.append(
+            ChannelStats(
+                name=str(name),
+                n=n,
+                share=float(n / n_train) if n_train else 0.0,
+                n_converted=n_conv,
+                conversion_rate=float(n_conv / n) if n else 0.0,
+            )
+        )
+
+    rate_spread = (
+        max(c.conversion_rate for c in rows) - min(c.conversion_rate for c in rows) if rows else 0.0
+    )
+
+    if len(rows) < 2:
+        in_sample_auc = 0.5
+        oos_auc = 0.5
+    else:
+        rate_lookup = {c.name: c.conversion_rate for c in rows}
+        train_scores = train[channel_col].astype(str).map(rate_lookup).astype(float)
+        in_sample_auc = _auc_or_chance(y_train, train_scores)
+
+        # Test-set channels are scored using the train-derived rates;
+        # any channel value unseen on train falls back to the train
+        # base rate so the AUC stays well-defined.
+        test_scores = (
+            test[channel_col].astype(str).map(rate_lookup).fillna(train_rate).astype(float)
+        )
+        y_test = _label_to_int(test[label_col])
+        oos_auc = _auc_or_chance(y_test, test_scores)
+
+    return ChannelAudit(
+        column=channel_col,
+        n_train=n_train,
+        n_test=n_test,
+        train_conversion_rate=train_rate,
+        test_conversion_rate=test_rate,
+        channels=tuple(rows),
+        rate_spread=float(rate_spread),
+        univariate_auc_in_sample=in_sample_auc,
+        univariate_auc_out_of_sample=oos_auc,
+    )
+
+
+def audit_tier(
+    train: pd.DataFrame,
+    tier: str,
+    *,
+    test: pd.DataFrame,
+    channel_columns: Sequence[str] = CHANNEL_COLUMNS,
+    label_col: str = LABEL_COLUMN,
+) -> TierAudit:
+    """Run :func:`audit_channel` for every channel column on one tier."""
+
+    train_rate = _conversion_rate(train, label_col)
+    test_rate = _conversion_rate(test, label_col)
+    columns = tuple(
+        audit_channel(train, col, test=test, label_col=label_col) for col in channel_columns
+    )
+    return TierAudit(
+        tier=tier,
+        n_train=len(train),
+        n_test=len(test),
+        train_conversion_rate=train_rate,
+        test_conversion_rate=test_rate,
+        columns=columns,
+    )
+
+
+def load_split(release_dir: Path, tier: str, split: str, task: str = DEFAULT_TASK) -> pd.DataFrame:
+    """Load ``release_dir/<tier>/tasks/<task>/<split>.parquet``."""
+
+    path = release_dir / tier / "tasks" / task / f"{split}.parquet"
+    if not path.exists():
+        raise FileNotFoundError(f"missing {split} split for tier {tier!r}: {path}")
+    return pd.read_parquet(path)
+
+
+def build_report(
+    release_dir: Path,
+    tiers: Sequence[str] = DEFAULT_TIERS,
+    *,
+    task: str = DEFAULT_TASK,
+    channel_columns: Sequence[str] = CHANNEL_COLUMNS,
+    label_col: str = LABEL_COLUMN,
+) -> AuditReport:
+    """Run the audit across every requested tier."""
+
+    tier_audits: list[TierAudit] = []
+    for tier in tiers:
+        train = load_split(release_dir, tier, "train", task=task)
+        test = load_split(release_dir, tier, "test", task=task)
+        tier_audits.append(
+            audit_tier(
+                train,
+                tier=tier,
+                test=test,
+                channel_columns=channel_columns,
+                label_col=label_col,
+            )
+        )
+
+    return AuditReport(
+        release_dir=str(release_dir),
+        task=task,
+        label_column=label_col,
+        channel_columns=tuple(channel_columns),
+        tiers=tuple(tier_audits),
+        industry_mql_to_sql_benchmarks=INDUSTRY_MQL_TO_SQL_BENCHMARKS,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Rendering
+# ---------------------------------------------------------------------------
+
+
+def report_to_dict(report: AuditReport) -> dict[str, Any]:
+    """Convert the report to a JSON-primitive dict.
+
+    The dataclass stores ``industry_mql_to_sql_benchmarks`` as a tuple
+    of pairs (immutability); this helper converts it back into a
+    ``{name: rate}`` mapping for the JSON output, where a dict shape
+    is more ergonomic for downstream tooling.
+    """
+
+    payload = asdict(report)
+    payload["industry_mql_to_sql_benchmarks"] = dict(report.industry_mql_to_sql_benchmarks)
+    return payload
+
+
+def render_json(report: AuditReport) -> str:
+    """Render the audit report as a deterministic JSON string."""
+
+    return json.dumps(report_to_dict(report), indent=2, sort_keys=True) + "\n"
+
+
+def _format_pct(x: float) -> str:
+    return f"{x * 100:.2f}%"
+
+
+def _audit_signature(audit: ChannelAudit) -> tuple[Any, ...]:
+    """Hashable signature used to group columns whose audits are identical."""
+
+    return (
+        audit.n_train,
+        audit.n_test,
+        audit.train_conversion_rate,
+        audit.test_conversion_rate,
+        tuple(_stats_signature(c) for c in audit.channels),
+        audit.rate_spread,
+        audit.univariate_auc_in_sample,
+        audit.univariate_auc_out_of_sample,
+    )
+
+
+def _stats_signature(stats: ChannelStats) -> tuple[Any, ...]:
+    """Hashable tuple representing one ``ChannelStats``."""
+
+    return (stats.name, stats.n, stats.share, stats.n_converted, stats.conversion_rate)
+
+
+def _group_identical_columns(audits: Sequence[ChannelAudit]) -> list[ChannelGroup]:
+    """Collapse columns whose audit values are byte-identical."""
+
+    groups: list[ChannelGroup] = []
+    seen_signatures: dict[tuple[Any, ...], int] = {}
+    for audit in audits:
+        sig = _audit_signature(audit)
+        if sig in seen_signatures:
+            idx = seen_signatures[sig]
+            existing = groups[idx]
+            groups[idx] = ChannelGroup(
+                columns=existing.columns + (audit.column,),
+                audit=existing.audit,
+            )
+        else:
+            seen_signatures[sig] = len(groups)
+            groups.append(ChannelGroup(columns=(audit.column,), audit=audit))
+    return groups
+
+
+def render_markdown(
+    report: AuditReport,
+    *,
+    md_path: Path | None = None,
+    json_path: Path | None = None,
+) -> str:
+    """Render the audit report as Markdown.
+
+    The inline "see also" link to the machine-readable sibling adapts
+    to the actual output paths: when ``md_path`` and ``json_path`` are
+    given, the link is the JSON path expressed *relative to the
+    markdown file's directory* so it works whether the artifacts are
+    written to the canonical ``docs/release/`` location, a tmp
+    directory, or anywhere a CI script overrides.  When neither is
+    given, the link is the canonical ``channel_signal_audit.json``
+    filename.
+    """
+
+    if md_path is not None and json_path is not None:
+        try:
+            json_link = str(Path(json_path).relative_to(Path(md_path).parent))
+        except ValueError:
+            # Different drive roots — keep the markdown readable by
+            # falling back to the caller's path verbatim.
+            json_link = str(json_path)
+    else:
+        json_link = DEFAULT_OUT_JSON.name
+
+    lines: list[str] = []
+    lines.append("# Channel-signal audit — leadforge-lead-scoring-v1")
+    lines.append("")
+    lines.append(
+        "Audit produced by `scripts/audit_channel_signal.py`; see "
+        f"`{json_link}` for the machine-readable form."
+    )
+    lines.append("")
+    lines.append(
+        "**Scope.** For every tier we compute per-channel conversion rates on the train "
+        "split and the univariate AUC of channel against `converted_within_90_days`, "
+        "scored as the empirical positive rate per channel (a 1-D Bayes classifier). Two "
+        "AUCs are reported: an **in-sample** number (train rates → train labels — biased "
+        "upward by construction) and an **out-of-sample** number (train rates → test labels "
+        "— directly comparable to the `source_only` baselines in "
+        "`release/validation/validation_report.json`)."
+    )
+    lines.append("")
+    lines.append(
+        "**Caveat on the industry benchmark.** The G2 / Gemini v2 numbers below are "
+        "single-step **MQL→SQL** rates (recommendation #8 in "
+        "`docs/external_review/summaries/recommendations_pass.md`). v1's label is "
+        "**90-day closed-won**, the entire funnel resolved. The two metrics are not "
+        "directly comparable; the table is reproduced for context only."
+    )
+    lines.append("")
+
+    lines.append("## Industry benchmark (context, not target)")
+    lines.append("")
+    lines.append("| Channel | MQL→SQL conversion rate |")
+    lines.append("|---|---|")
+    for name, rate in report.industry_mql_to_sql_benchmarks:
+        lines.append(f"| {name} | {_format_pct(rate)} |")
+    lines.append("")
+
+    for tier in report.tiers:
+        lines.append(f"## Tier: `{tier.tier}`")
+        lines.append("")
+        lines.append(
+            f"`n_train = {tier.n_train}` (90-day conversion rate "
+            f"{_format_pct(tier.train_conversion_rate)}); "
+            f"`n_test = {tier.n_test}` (rate "
+            f"{_format_pct(tier.test_conversion_rate)})."
+        )
+        lines.append("")
+
+        groups = _group_identical_columns(tier.columns)
+        for group in groups:
+            cols_label = ", ".join(f"`{c}`" for c in group.columns)
+            if len(group.columns) > 1:
+                heading = f"### Columns: {cols_label} (audit values identical)"
+            else:
+                heading = f"### Column: {cols_label}"
+            lines.append(heading)
+            lines.append("")
+            lines.append(
+                f"Per-channel rate spread (max − min): **{group.audit.rate_spread:.4f}**  ·  "
+                f"In-sample univariate AUC: **{group.audit.univariate_auc_in_sample:.4f}**  ·  "
+                f"Out-of-sample univariate AUC: **{group.audit.univariate_auc_out_of_sample:.4f}**"
+            )
+            lines.append("")
+            lines.append("| Channel | n (train) | Share (train) | Converted (train) | Train rate |")
+            lines.append("|---|---:|---:|---:|---:|")
+            for ch in group.audit.channels:
+                lines.append(
+                    f"| `{ch.name}` | {ch.n} | {_format_pct(ch.share)} | "
+                    f"{ch.n_converted} | {_format_pct(ch.conversion_rate)} |"
+                )
+            lines.append("")
+
+    lines.append("## Discussion")
+    lines.append("")
+    lines.append(
+        "The numbers above answer one question: *how strongly does channel alone signal "
+        "90-day conversion in v1?* They do not answer *whether v1 matches industry channel "
+        "performance*, since the benchmarks measure a different funnel transition (single "
+        "MQL→SQL step) and v1 measures the entire funnel resolved over 90 days. Treat the "
+        "v1 numbers as an internal description of the simulator's channel signal."
+    )
+    lines.append("")
+    lines.append("Two empirical observations a reader can make from the numbers above:")
+    lines.append("")
+    lines.append(
+        "1. **The out-of-sample univariate AUC is the comparable number** for any "
+        "external baseline. It uses train-derived rates scored against held-out test "
+        "labels — the same shape as the `source_only` HistGBM baseline reported in "
+        "`release/validation/validation_report.json`, which is built on the same task "
+        "splits with `lead_source` + `first_touch_channel` as the only features. The "
+        "in-sample number is biased upward by construction — small at v1's N but "
+        "visible — and is reported here for transparency rather than comparison."
+    )
+    lines.append(
+        "2. **The numerical conclusion is bundle-specific.** When the per-channel rate "
+        "spread is small and the OOS univariate AUC is close to chance, channel alone "
+        "is a weak feature for the bundle this audit was run against. v1's bundles "
+        "currently produce that outcome (see the per-tier sections above) — consistent "
+        "with the design: the simulator drives conversion through motif-family hazards "
+        "keyed off latent traits, not channel-conditional probabilities. "
+        "Channel-conditional encoding is tracked as post-v1 work in "
+        "`docs/release/post_v1_roadmap.md`."
+    )
+    lines.append("")
+    return "\n".join(lines)
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+
+def _parse_args(argv: Sequence[str] | None) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Audit how strongly source channel signals conversion in a release "
+        "bundle family.",
+    )
+    parser.add_argument(
+        "--release-dir",
+        type=Path,
+        default=DEFAULT_RELEASE_DIR,
+        help="release bundle root containing one subdirectory per tier (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--tier",
+        action="append",
+        dest="tiers",
+        default=None,
+        help="limit the audit to one tier (repeatable; default: intro/intermediate/advanced)",
+    )
+    parser.add_argument(
+        "--task",
+        default=DEFAULT_TASK,
+        help="task subdirectory under each tier (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--channel-column",
+        action="append",
+        dest="channel_columns",
+        default=None,
+        help="channel column to audit (repeatable; default: lead_source + first_touch_channel)",
+    )
+    parser.add_argument(
+        "--out-md",
+        type=Path,
+        default=DEFAULT_OUT_MD,
+        help="markdown output path (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--out-json",
+        type=Path,
+        default=DEFAULT_OUT_JSON,
+        help="JSON output path (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--print",
+        action="store_true",
+        help="print the markdown report to stdout in addition to writing it",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    args = _parse_args(argv)
+    release_dir: Path = args.release_dir
+    tiers: tuple[str, ...] = tuple(args.tiers) if args.tiers else DEFAULT_TIERS
+    channel_columns: tuple[str, ...] = (
+        tuple(args.channel_columns) if args.channel_columns else CHANNEL_COLUMNS
+    )
+
+    if not release_dir.exists():
+        print(f"error: release directory not found: {release_dir}", file=sys.stderr)
+        return 2
+
+    try:
+        report = build_report(
+            release_dir,
+            tiers,
+            task=args.task,
+            channel_columns=channel_columns,
+        )
+    except FileNotFoundError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+    except KeyError as exc:
+        print(f"error: required column missing: {exc}", file=sys.stderr)
+        return 2
+
+    md = render_markdown(report, md_path=args.out_md, json_path=args.out_json)
+    js = render_json(report)
+
+    args.out_md.parent.mkdir(parents=True, exist_ok=True)
+    args.out_json.parent.mkdir(parents=True, exist_ok=True)
+    # Pin UTF-8 explicitly so the audit output is byte-identical across
+    # operating systems and locale configurations.
+    args.out_md.write_text(md, encoding="utf-8")
+    args.out_json.write_text(js, encoding="utf-8")
+
+    if args.print:
+        sys.stdout.write(md)
+
+    print(f"wrote {args.out_md}", file=sys.stderr)
+    print(f"wrote {args.out_json}", file=sys.stderr)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/scripts/test_audit_channel_signal.py b/tests/scripts/test_audit_channel_signal.py
new file mode 100644
index 0000000..aad8ea9
--- /dev/null
+++ b/tests/scripts/test_audit_channel_signal.py
@@ -0,0 +1,360 @@
+"""Tests for ``scripts/audit_channel_signal.py``.
+
+Exercises the per-channel rollup, in-sample / out-of-sample univariate
+AUC scorers, the JSON + markdown rendering paths, and two integrity
+properties against the committed ``release/`` bundles:
+
+1. ``lead_source`` and ``first_touch_channel`` carry identical values in
+   every tier (the feature dictionary's claim).
+2. The committed ``docs/release/channel_signal_audit.{md,json}`` are
+   byte-identical to a fresh run of the audit script.
+
+Both properties fail loudly if the bundles are regenerated without
+re-running the audit, or if the simulator ever diverges the two
+channel columns.
+"""
+
+from __future__ import annotations
+
+import importlib.util
+import json
+import sys
+from pathlib import Path
+
+import pandas as pd
+import pytest
+
+_SCRIPT_PATH = Path(__file__).resolve().parents[2] / "scripts" / "audit_channel_signal.py"
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_spec = importlib.util.spec_from_file_location("audit_channel_signal", _SCRIPT_PATH)
+assert _spec is not None
+assert _spec.loader is not None
+audit_module = importlib.util.module_from_spec(_spec)
+sys.modules["audit_channel_signal"] = audit_module
+_spec.loader.exec_module(audit_module)
+
+
+_INTRO_TRAIN = (
+    _REPO_ROOT / "release" / "intro" / "tasks" / "converted_within_90_days" / "train.parquet"
+)
+_RELEASE_BUNDLES_PRESENT = _INTRO_TRAIN.exists()
+
+_TIERS = ("intro", "intermediate", "advanced")
+
+
+# ---------------------------------------------------------------------------
+# Synthetic fixtures
+# ---------------------------------------------------------------------------
+
+
+def _toy_split(n_per_channel: int = 20) -> pd.DataFrame:
+    """Three channels with deliberately different conversion rates.
+
+    Channel rates: ``A`` 100%, ``B`` 50%, ``C`` 0%.
+    """
+
+    rows = []
+    for ch, rate in [("A", 1.0), ("B", 0.5), ("C", 0.0)]:
+        for i in range(n_per_channel):
+            rows.append(
+                {
+                    "lead_source": ch,
+                    "first_touch_channel": ch,
+                    "converted_within_90_days": bool(i < int(rate * n_per_channel)),
+                }
+            )
+    return pd.DataFrame(rows)
+
+
+# ---------------------------------------------------------------------------
+# Per-channel rollup
+# ---------------------------------------------------------------------------
+
+
+def test_audit_channel_returns_per_channel_stats() -> None:
+    train = _toy_split()
+    audit = audit_module.audit_channel(train, "lead_source", test=train)
+    assert audit.column == "lead_source"
+    assert audit.n_train == 60
+    assert audit.train_conversion_rate == pytest.approx(0.5)
+    names = [c.name for c in audit.channels]
+    assert names == ["A", "B", "C"]  # sorted by name
+    by_name = {c.name: c for c in audit.channels}
+    assert by_name["A"].conversion_rate == pytest.approx(1.0)
+    assert by_name["B"].conversion_rate == pytest.approx(0.5)
+    assert by_name["C"].conversion_rate == pytest.approx(0.0)
+    assert audit.rate_spread == pytest.approx(1.0)
+
+
+def test_audit_channel_in_sample_auc_pair_counting() -> None:
+    """Closed-form check of the in-sample univariate AUC.
+
+    20 pos from A (rate 1.0), 10 pos / 10 neg from B (rate 0.5, tied),
+    20 neg from C (rate 0.0).  Pair-counting AUC:
+        A_pos vs B_neg : 200 wins
+        A_pos vs C_neg : 400 wins
+        B_pos vs B_neg : 100 ties → +50
+        B_pos vs C_neg : 200 wins
+    → 850 / 900 = 17/18.
+    """
+
+    train = _toy_split()
+    audit = audit_module.audit_channel(train, "lead_source", test=train)
+    assert audit.univariate_auc_in_sample == pytest.approx(17 / 18)
+
+
+def test_audit_channel_oos_auc_matches_in_sample_when_test_is_train() -> None:
+    """When the test split is the train split, OOS AUC == in-sample AUC."""
+
+    train = _toy_split()
+    audit = audit_module.audit_channel(train, "lead_source", test=train)
+    assert audit.univariate_auc_out_of_sample == pytest.approx(audit.univariate_auc_in_sample)
+
+
+def test_audit_channel_oos_auc_handles_unseen_test_categories() -> None:
+    """Test categories not present in train get the train base rate fallback."""
+
+    train = _toy_split()
+    test = pd.DataFrame(
+        {
+            "lead_source": ["A", "B", "C", "Z", "Z"],  # Z is unseen
+            "first_touch_channel": ["A", "B", "C", "Z", "Z"],
+            "converted_within_90_days": [True, True, False, True, False],
+        }
+    )
+    audit = audit_module.audit_channel(train, "lead_source", test=test)
+    # AUC is well-defined (no NaN) — the unseen categories fall back to
+    # the train base rate (0.5), which produces ties against any seen
+    # category whose rate also equals 0.5.
+    assert 0.0 <= audit.univariate_auc_out_of_sample <= 1.0
+
+
+def test_audit_channel_handles_single_class_label() -> None:
+    train = _toy_split()
+    train["converted_within_90_days"] = False
+    audit = audit_module.audit_channel(train, "lead_source", test=train)
+    assert audit.univariate_auc_in_sample == 0.5
+    assert audit.univariate_auc_out_of_sample == 0.5
+
+
+def test_audit_channel_raises_on_missing_column() -> None:
+    train = _toy_split()
+    with pytest.raises(KeyError):
+        audit_module.audit_channel(train, "no_such_column", test=train)
+
+
+def test_audit_tier_runs_every_channel_column() -> None:
+    train = _toy_split()
+    tier = audit_module.audit_tier(train, "intro", test=train)
+    cols = {c.column for c in tier.columns}
+    assert cols == {"lead_source", "first_touch_channel"}
+    assert tier.tier == "intro"
+    assert tier.n_train == 60
+    assert tier.n_test == 60
+
+
+# ---------------------------------------------------------------------------
+# Build / render
+# ---------------------------------------------------------------------------
+
+
+def test_render_json_round_trip() -> None:
+    train = _toy_split()
+    tier = audit_module.audit_tier(train, "intro", test=train)
+    report = audit_module.AuditReport(
+        release_dir="release",
+        task="converted_within_90_days",
+        label_column="converted_within_90_days",
+        channel_columns=audit_module.CHANNEL_COLUMNS,
+        tiers=(tier,),
+        industry_mql_to_sql_benchmarks=audit_module.INDUSTRY_MQL_TO_SQL_BENCHMARKS,
+    )
+    js = audit_module.render_json(report)
+    parsed = json.loads(js)
+    assert parsed["tiers"][0]["tier"] == "intro"
+    # Industry benchmarks render as a {name: rate} dict in the JSON
+    # (renderer converts the immutable tuple-of-pairs back).
+    assert parsed["industry_mql_to_sql_benchmarks"]["SEO"] == pytest.approx(0.51)
+
+
+def test_render_markdown_collapses_identical_columns() -> None:
+    """When two columns produce identical audits, the renderer groups them."""
+
+    train = _toy_split()  # lead_source == first_touch_channel by construction
+    tier = audit_module.audit_tier(train, "intro", test=train)
+    report = audit_module.AuditReport(
+        release_dir="release",
+        task="converted_within_90_days",
+        label_column="converted_within_90_days",
+        channel_columns=audit_module.CHANNEL_COLUMNS,
+        tiers=(tier,),
+        industry_mql_to_sql_benchmarks=audit_module.INDUSTRY_MQL_TO_SQL_BENCHMARKS,
+    )
+    md = audit_module.render_markdown(report)
+    assert "audit values identical" in md
+    # Each tier should render the columns once, not twice.
+    assert md.count("Per-channel rate spread") == 1
+
+
+def test_render_markdown_renders_distinct_columns_separately() -> None:
+    """When two columns differ, the renderer keeps them in separate sections."""
+
+    train = _toy_split()
+    train["first_touch_channel"] = "A"  # force divergence from lead_source
+    tier = audit_module.audit_tier(train, "intro", test=train)
+    report = audit_module.AuditReport(
+        release_dir="release",
+        task="converted_within_90_days",
+        label_column="converted_within_90_days",
+        channel_columns=audit_module.CHANNEL_COLUMNS,
+        tiers=(tier,),
+        industry_mql_to_sql_benchmarks=audit_module.INDUSTRY_MQL_TO_SQL_BENCHMARKS,
+    )
+    md = audit_module.render_markdown(report)
+    assert "audit values identical" not in md
+    assert md.count("Per-channel rate spread") == 2
+
+
+def test_render_markdown_includes_discussion_section() -> None:
+    train = _toy_split()
+    tier = audit_module.audit_tier(train, "intro", test=train)
+    report = audit_module.AuditReport(
+        release_dir="release",
+        task="converted_within_90_days",
+        label_column="converted_within_90_days",
+        channel_columns=audit_module.CHANNEL_COLUMNS,
+        tiers=(tier,),
+        industry_mql_to_sql_benchmarks=audit_module.INDUSTRY_MQL_TO_SQL_BENCHMARKS,
+    )
+    md = audit_module.render_markdown(report)
+    assert "## Discussion" in md
+    assert "## Industry benchmark (context, not target)" in md
+
+
+# ---------------------------------------------------------------------------
+# CLI determinism + error paths
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release/intro bundle not present")
+def test_release_audit_is_deterministic(tmp_path: Path) -> None:
+    """Two back-to-back runs against the committed release bundle must
+    produce byte-identical JSON and markdown output."""
+
+    out_md = tmp_path / "audit.md"
+    out_json = tmp_path / "audit.json"
+    cli_args = [
+        "--release-dir",
+        str(_REPO_ROOT / "release"),
+        "--out-md",
+        str(out_md),
+        "--out-json",
+        str(out_json),
+    ]
+    assert audit_module.main(cli_args) == 0
+    bytes_md_a = out_md.read_bytes()
+    bytes_json_a = out_json.read_bytes()
+
+    assert audit_module.main(cli_args) == 0
+    bytes_md_b = out_md.read_bytes()
+    bytes_json_b = out_json.read_bytes()
+
+    assert bytes_md_a == bytes_md_b
+    assert bytes_json_a == bytes_json_b
+
+
+def test_main_reports_missing_release_dir(
+    tmp_path: Path, capsys: pytest.CaptureFixture[str]
+) -> None:
+    rc = audit_module.main(
+        [
+            "--release-dir",
+            str(tmp_path / "nope"),
+            "--out-md",
+            str(tmp_path / "audit.md"),
+            "--out-json",
+            str(tmp_path / "audit.json"),
+        ]
+    )
+    captured = capsys.readouterr()
+    assert rc == 2
+    assert "release directory not found" in captured.err
+
+
+def test_main_reports_missing_train_split(
+    tmp_path: Path, capsys: pytest.CaptureFixture[str]
+) -> None:
+    (tmp_path / "release").mkdir()
+    rc = audit_module.main(
+        [
+            "--release-dir",
+            str(tmp_path / "release"),
+            "--tier",
+            "intro",
+            "--out-md",
+            str(tmp_path / "audit.md"),
+            "--out-json",
+            str(tmp_path / "audit.json"),
+        ]
+    )
+    captured = capsys.readouterr()
+    assert rc == 2
+    assert "missing train split" in captured.err
+
+
+# ---------------------------------------------------------------------------
+# Integrity properties against the committed release/ bundles
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release/ bundles not present")
+@pytest.mark.parametrize("tier", _TIERS)
+def test_lead_source_equals_first_touch_channel_in_v1(tier: str) -> None:
+    """Locks the feature-dict claim that the two channel columns are
+    identical in v1.  If the simulator ever diverges them, this test
+    fails and ``docs/release/feature_dictionary.md`` must be updated."""
+
+    for split in ("train", "test", "valid"):
+        df = audit_module.load_split(_REPO_ROOT / "release", tier, split)
+        assert (df["lead_source"] == df["first_touch_channel"]).all(), (
+            f"{tier}/{split}: lead_source diverges from first_touch_channel"
+        )
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release/ bundles not present")
+def test_committed_audit_artifacts_match_fresh_regeneration(
+    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+) -> None:
+    """A fresh audit run against the committed bundles must match the
+    committed ``docs/release/channel_signal_audit.{md,json}`` exactly.
+
+    If this fails, the bundles drifted without re-running the audit.
+    Regenerate via ``python scripts/audit_channel_signal.py`` from the
+    repo root.
+    """
+
+    # The committed JSON records ``release_dir`` as the literal path
+    # the developer passed on the command line.  Re-run the audit
+    # exactly as the developer would: from the repo root, with the
+    # default (relative) ``release`` argument.
+    monkeypatch.chdir(_REPO_ROOT)
+
+    # The committed markdown links the JSON sibling by relative
+    # filename (rendered from --out-md and --out-json being siblings),
+    # so re-run with the same basenames so the byte comparison covers
+    # the full file including the link line.
+    out_md = tmp_path / "channel_signal_audit.md"
+    out_json = tmp_path / "channel_signal_audit.json"
+    rc = audit_module.main(
+        [
+            "--out-md",
+            str(out_md),
+            "--out-json",
+            str(out_json),
+        ]
+    )
+    assert rc == 0
+    committed_md = (_REPO_ROOT / "docs" / "release" / "channel_signal_audit.md").read_bytes()
+    committed_json = (_REPO_ROOT / "docs" / "release" / "channel_signal_audit.json").read_bytes()
+    assert out_md.read_bytes() == committed_md
+    assert out_json.read_bytes() == committed_json