Skip to content

PR 4.1: channel-signal audit + release-grade dataset card#69

Merged
shaypal5 merged 7 commits into
mainfrom
docs/channel-signal-audit-and-dataset-card
May 6, 2026
Merged

PR 4.1: channel-signal audit + release-grade dataset card#69
shaypal5 merged 7 commits into
mainfrom
docs/channel-signal-audit-and-dataset-card

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

PR 4.1 — the only PR in Phase 4 of the v1 dataset release sequence. Documentation hardening on top of the leakage-safe bundles delivered in Phase 2 and the validation panel delivered in Phase 3. Adds one analysis script and four prose deliverables; no engine, render, or validation behaviour changes.

Roadmap: docs/release/v1_release_roadmap.md §"Phase 4 — PR 4.1".

What lands

  1. scripts/audit_channel_signal.py (new) + 18 tests. For each tier and each of lead_source / first_touch_channel, computes per-channel conversion rate (train split) plus two univariate AUCs:

    • In-sample AUC: train rates → train labels (the historical 1-D Bayes interpretation; biased upward).
    • Out-of-sample AUC: train rates → held-out test labels — directly comparable to the source_only HistGBM baselines in release/validation/validation_report.json. The OOS numbers reproduce the validation-report cells for seed 42 exactly (intro 0.5014, intermediate 0.5139, advanced 0.5226).

    The audit does not assign a categorical "weak / moderate / strong" verdict and does not make a hard comparison between v1's 90-day closed-won label and the G2 / Gemini v2 industry MQL→SQL benchmark band — those measure different funnel transitions. The benchmark band is reproduced for context with an explicit caveat; the audit's Discussion section states facts and lets the reader compare.

  2. docs/release/channel_signal_audit.md + sibling JSON (new) — auto-generated audit. v1's channel signal is empirically weak (per-channel rate spread ≤ 0.05, OOS univariate AUC ≈ 0.50–0.52 across all tiers), as expected: the simulator drives conversion through motif-family hazards keyed off latent traits, not channel-conditional probabilities. Channel-conditional encoding stays scoped post-v1.

  3. docs/release/generation_method.md (new) — standalone DGP summary written for external readers. Reads alone, references docs/leadforge_architecture_spec.md. Covers the five generation layers, public-vs-instructor split, calibration / validation, and an explicit "what this is not" boundary. Satisfies G10.2.

  4. docs/release/feature_dictionary.md (new) — narrative companion to the per-bundle feature_dictionary.csv. Groups every public-mode column by analytical role, documents difficulty modulation parameters, modelling defaults (now consistent with the flat-CSV shape), and the deliberate total_touches_all trap. Satisfies G10.3.

  5. release/README.md (substantial rewrite, ~228 lines) — release-grade dataset card per Datasheets-for-Datasets / Data Cards Playbook (G10.1). Macro framing with citation to gemini_v2_summary.md, simulation simplifications (linked to generation_method.md for detail), calibration documentation linking to validation_report.md, public-vs-instructor redaction with concrete column lists citing BANNED_LEAD_COLUMNS / BANNED_OPP_COLUMNS / BANNED_TABLES / SNAPSHOT_FILTERED_TABLES from leakage_probes.py, intended/out-of-scope use, known limitations (G7.4.4 + weak channel + flat AUC + small cohort gap), composition section, adversarial-framing pointer, maintenance plan.

Failure modes the test plan guards against

  • A future engine change diverges lead_source and first_touch_channel without updating the feature-dictionary claim that they're identical in v1.
    test_lead_source_equals_first_touch_channel_in_v1 (parametrized over intro/intermediate/advanced) walks every train/valid/test split and asserts equality. Fails loudly; forces a doc update.

  • A future bundle regeneration changes the channel mix but the audit artifacts in docs/release/channel_signal_audit.{md,json} are not re-run.
    test_committed_audit_artifacts_match_fresh_regeneration re-runs the audit against the committed release/ bundles and asserts byte-equality with the committed audit MD/JSON. Fails CI on drift.

  • A sklearn upgrade silently changes ROC-AUC behavior for a saturated 1-D categorical scorer.
    test_audit_channel_in_sample_auc_pair_counting locks the closed-form value (17/18) on a synthetic fixture; test_audit_channel_oos_auc_matches_in_sample_when_test_is_train checks the OOS code path; test_audit_channel_oos_auc_handles_unseen_test_categories guards the train-base-rate fallback for unseen test categories.

  • A future PR adds a new channel column but the audit doesn't render it (or worse, renders an identical-looking duplicate).
    test_render_markdown_collapses_identical_columns and test_render_markdown_renders_distinct_columns_separately cover both branches of the channel-grouping logic.

  • A BANNED_LEAD_COLUMNS rename silently invalidates the README's redaction-policy table.
    Not gated by a test in this PR — the table cites the constant by name, so a grep-once-after-rename habit catches it. Tracked as a follow-up: a docs-vs-code consistency check that imports the constant and asserts the README mentions it.

  • The validator regenerates validation_report.{json,md} with a fresh wall-clock timestamp.
    → I deliberately reverted the regenerated report files in this PR to keep the diff focused; this PR does not modify the validator and the reverted timestamp is the only field that drifted.

Test plan (commands)

  • python scripts/audit_channel_signal.py --release-dir release/ runs end-to-end, writes the audit MD + JSON
  • pytest tests/scripts/test_audit_channel_signal.py -q18 tests pass
  • Full suite: pytest -q1175 passed
  • ruff check . — passed
  • ruff format --check . — 168 files already formatted
  • mypy leadforge/ scripts/audit_channel_signal.py — no issues
  • python scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65 — all tiers exit 0
  • python scripts/verify_hash_determinism.pyPASS 67/67 files identical
  • python scripts/validate_release_candidate.py --no-rebuild — exits 0 (3 tiers × 5 seeds, leakage findings: 0)
  • No TBD-* strings re-introduced in docs/release/v1_acceptance_gates.md
  • BUNDLE_SCHEMA_VERSION unchanged at 5 (documentation-only PR)

Self-review history

This PR was rebased on top of itself after a brutal self-review pass:

  • Audit methodology was rewritten to compute the OOS univariate AUC, drop the categorical verdict bands and prose (which made a category-error comparison between 90-day-conversion and MQL→SQL rates), use a tuple-of-pairs immutable dataclass field for the industry benchmarks, and group identical channel columns in the markdown output. See commit ccab336.
  • Doc cleanup: trimmed release/README.md from 434 to 228 lines, dropped the developer-facing module-paths table from generation_method.md, fixed a factually wrong is_leakage_trap=True claim in feature_dictionary.md, and reworded the modelling-defaults checklist to be consistent with the flat-CSV shape. See commit 61964a9.

🤖 Generated with Claude Code

shaypal5 and others added 3 commits May 6, 2026 10:58
scripts/audit_channel_signal.py audits how strongly source channel
signals conversion across the release tier family. For each tier we
compute per-channel conversion rates and the univariate AUC of channel
against converted_within_90_days (scored as the empirical positive rate
per channel — a 1-D Bayes classifier equivalent to a saturated logistic
regression on one-hot channel features). Outputs JSON + Markdown to
docs/release/channel_signal_audit.{json,md}.

Tests guard determinism against the committed release/ bundles (a
double-run produces byte-identical output) plus per-channel rollup,
univariate AUC closed-form, single-class fallback, error paths, and the
CLI wiring.

The audit confirms what the v1 DGP predicts: channel signal in v1 is
weak — across all three tiers the largest per-channel rate spread is
0.043 and the largest univariate AUC is 0.521, well below the G2 /
Gemini v2 industry MQL→SQL band (SEO ~51% vs Email <1%). v1 drives
conversion through motif-family hazards keyed off latent traits, not
channel-conditional probabilities; channel-conditional encoding is
tracked as post-v1 work in docs/release/post_v1_roadmap.md.

Roadmap: docs/release/v1_release_roadmap.md §"Phase 4 — PR 4.1".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re dictionary (PR 4.1 deliverable 3-5)

* docs/release/generation_method.md (new) — standalone DGP summary for
  external readers. Reads alone, references the architecture spec.
  Covers the five generation layers (motif families → mechanism layer
  → population → simulation engine → snapshot rendering), the public-
  vs-instructor split, calibration / validation, and the explicit
  "what this is not" boundary.

* docs/release/feature_dictionary.md (new) — narrative companion to
  the per-bundle feature_dictionary.csv. Groups the 32 public columns
  by analytical role (lead identity / firmographics / personographics /
  engagement / funnel / value) plus the deliberate trap and the target.
  Documents difficulty modulation parameters, modelling defaults, and
  pedagogical caveats. Satisfies G10.3.

* release/README.md (substantial rewrite) — release-grade dataset card
  per Datasheets-for-Datasets / Data Cards Playbook checklist (G10.1):
  - macro framing paragraph (2024–2026 SaaS context, recommendation #19)
  - simulation simplifications section (chatgpt v2 §2.6 — modelled /
    approximate / not modelled)
  - calibration documentation linking to validation_report.md
  - public-vs-instructor redaction policy with concrete column lists
    citing BANNED_LEAD_COLUMNS / BANNED_OPP_COLUMNS / BANNED_TABLES /
    SNAPSHOT_FILTERED_TABLES from leakage_probes.py
  - intended use vs out-of-scope use
  - known limitations including the G7.4.4 GBM-vs-LR finding and the
    weak channel signal from the Phase 4 audit
  - composition section (entities / features / label / splits /
    provenance) per Datasheets format
  - adversarial-framing pointer (placeholder link to break-me guide
    that lands in PR 6.3)
  - maintenance plan

All claims about realism, calibration, or difficulty are anchored to
release/validation/validation_report.md per G10.6.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 6, 2026 12:21
@shaypal5 shaypal5 added the type: docs Documentation or narrative changes label May 6, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a deterministic “channel-signal” audit plus release-grade documentation for the v1 dataset release, without changing engine/render/validation behavior. The main addition is a new analysis script (and tests) that generates a Markdown/JSON audit report, alongside substantial dataset-card hardening and two new release-facing prose docs.

Changes:

  • Add scripts/audit_channel_signal.py to compute per-channel conversion rates and a deterministic univariate AUC, emitting docs/release/channel_signal_audit.{md,json}.
  • Add test coverage for the audit logic, rendering, CLI error paths, and (when release/ is present) byte-identical determinism across repeated runs.
  • Rewrite release/README.md into a release-grade dataset card and add docs/release/{generation_method,feature_dictionary}.md.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/audit_channel_signal.py New deterministic channel-signal audit script with JSON/Markdown rendering + CLI.
tests/scripts/test_audit_channel_signal.py New unit + CLI tests, including an optional determinism guard against committed release/ bundles.
docs/release/channel_signal_audit.md Generated audit report (human-readable).
docs/release/channel_signal_audit.json Generated audit report (machine-readable).
release/README.md Major rewrite into a release-grade dataset card with calibration, redaction policy, limitations, and usage guidance.
docs/release/generation_method.md New external-reader-friendly generation method overview.
docs/release/feature_dictionary.md New narrative feature dictionary grouped by analytical role + modelling guidance.
.agent-plan.md Roadmap checklist updated to mark Phase 4 PR 4.1 items complete.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/audit_channel_signal.py Outdated
Comment thread release/README.md Outdated
Comment thread docs/release/feature_dictionary.md
Comment thread docs/release/feature_dictionary.md Outdated
…tionary

Self-review caught a gap: the prior commit grouped 30 of 32 public
columns; lead_source and first_touch_channel were referenced in the
"recommended modelling defaults" checklist but did not appear in any
category table. Adds a "Lead source & channel" subsection that
describes both columns, calls out that they're identical in v1, and
cross-references the channel-signal audit so readers don't expect
top-tier feature importance from these columns. Updates the summary
table to reflect 32 documented columns. Also corrects two minor wording
issues (firmographics "Six" → "Five", personographics "all four" →
"all three", and a typo "bandage" → "discretisation").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

shaypal5 and others added 2 commits May 6, 2026 16:19
…oup identical columns

Self-review of the previous PR-4.1 commit surfaced four problems with
audit_channel_signal.py:

* The univariate AUC was computed in-sample (train rates → train labels),
  guaranteed >= 0.5 by construction and not directly comparable to the
  source_only baselines in release/validation/validation_report.json.
* The "weak / moderate / strong" verdict made a hard comparison between
  v1's 90-day closed-won label and the G2 / Gemini v2 industry MQL→SQL
  benchmark band.  The two metrics measure different funnel transitions;
  the comparison was a category error.
* The verdict prose hard-coded a "50 percentage points" claim and a
  specific architectural narrative ("v1 drives conversion through
  motif-family hazards") inside the script — both would silently drift
  from the data and the codebase over time.
* lead_source and first_touch_channel produce byte-identical audits in
  v1 yet were rendered as two parallel tables per tier.

Fixes:

* audit_channel now takes both train and test DataFrames and returns
  univariate_auc_in_sample (the historical 1-D Bayes interpretation,
  retained for transparency) plus univariate_auc_out_of_sample (train
  rates scored against held-out test labels).  The OOS numbers
  reproduce the source_only HistGBM baselines in validation_report.json
  for seed 42 cell-for-cell (intro 0.5014, intermediate 0.5139,
  advanced 0.5226).
* Verdict bands and the _classify_signal / _verdict_paragraph helpers
  are gone.  The markdown report now ends with a Discussion section
  written by hand around the actual numbers, with an explicit caveat
  that the industry benchmarks measure MQL→SQL (not 90-day closed-won)
  and are reproduced for context only.
* INDUSTRY_MQL_TO_SQL_BENCHMARKS is now a tuple of pairs (genuinely
  immutable; matches dataclass(frozen=True) semantics).  report_to_dict
  converts it back to a {name: rate} dict for the JSON output.
* render_markdown groups channel columns whose audits are
  byte-identical into one section with a header listing all columns
  ("Columns: lead_source, first_touch_channel (audit values
  identical)").  The JSON keeps per-column entries.

New tests in tests/scripts/test_audit_channel_signal.py:

* OOS AUC == in-sample AUC when test=train (sanity check)
* OOS AUC stays well-defined when the test split contains channels
  unseen on train (train-base-rate fallback)
* render_markdown collapses two identical columns into one section
  AND keeps two distinct columns in two sections
* test_lead_source_equals_first_touch_channel_in_v1 (parametrized
  over intro/intermediate/advanced) — locks the feature-dictionary
  claim that the two channel columns are identical in v1.  If the
  simulator ever diverges them, the doc must be updated.
* test_committed_audit_artifacts_match_fresh_regeneration — re-runs
  the audit against the committed bundles and asserts byte-equality
  with the committed docs/release/channel_signal_audit.{md,json}.
  CI gate against bundles regenerated without re-running the audit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ct consistency

* release/README.md (~434 → ~228 lines): trimmed to a release-grade
  landing card.  The full DGP, motif families, simulation
  simplifications, and module map move to docs/release/generation_method.md
  (linked).  Macro-framing claim now cites
  docs/external_review/summaries/gemini_v2_summary.md as the source of
  the 30%→25% growth and CAC-ratio numbers (previously presented as if
  primary research).  Composition + maintenance sections compressed
  into the table at the bottom.
* docs/release/generation_method.md: dropped the "Where the code
  lives" module table.  This doc is for external readers; module
  paths belong in the developer-facing design doc and architecture
  spec.  Ends with a single short pointer to those.
* docs/release/feature_dictionary.md: fixed a factually wrong claim
  about the leakage trap (the per-bundle CSV has columns
  ``name,dtype,description,category,is_target,leakage_risk`` — there
  is no ``is_leakage_trap`` column).  Reworded the modelling-default
  checklist to acknowledge that the flat ``lead_scoring.csv`` and the
  Parquet task splits ship every column listed in the dictionary
  including the IDs — the recommendation says what to use as features,
  not what's in the file.  Also notes that ``lead_source`` and
  ``first_touch_channel`` carry identical values in v1 (locked by the
  new test), so picking one is fine.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 6, 2026 13:28
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comment thread docs/release/feature_dictionary.md Outdated
Comment thread scripts/audit_channel_signal.py Outdated
Comment thread scripts/audit_channel_signal.py
Comment thread scripts/audit_channel_signal.py Outdated
Six fixes from the Copilot reviews on PR #69:

* scripts/audit_channel_signal.py — _label_to_int now uses
  pd.api.types.is_bool_dtype() so it explicitly handles pandas
  nullable BooleanDtype (the actual parquet dtype on the v1 bundles)
  alongside numpy bool.  Previously it worked via a coincidental
  pd.to_numeric fallback, with a comment that misled future readers.
* scripts/audit_channel_signal.py — render_markdown now takes both
  md_path and json_path and emits the JSON link as a relative path
  to the markdown's directory, so a `--out-md`/`--out-json` override
  produces a markdown report whose link target is correct.  Defaults
  to the canonical "channel_signal_audit.json" basename when called
  without paths (the unit-test path).
* scripts/audit_channel_signal.py — main() pins encoding="utf-8" on
  both write_text() calls so the audit output is byte-identical
  across operating systems and locale configurations.
* scripts/audit_channel_signal.py — Discussion section is no longer
  bundle-specific.  The previous prose claimed "for seed 42 the OOS
  numbers below match the report cell-for-cell" — true for the
  committed bundle but wrong for any other --release-dir.  The new
  prose talks about which AUC is comparable and what conclusion the
  numbers in the per-tier sections support, both bundle-agnostic.
* release/README.md — fixed the relational-feature-engineering
  Quick start example.  The previous snippet did
  `leads.merge(touch_counts, on="lead_id")` where touch_counts was
  a Series with lead_id in its index, not as a column — would error
  in modern pandas.  The new snippet uses .reset_index() and merges
  the resulting DataFrame.
* docs/release/feature_dictionary.md — touches_week_1 documented as
  "days 0–7 inclusive" (8 day values) and touches_last_7_days
  qualified with "for snapshot_day=30, days 24–30 inclusive".
  Previously claimed "days 0–6" for week_1, which mismatched the
  snapshot builder's _day <= 7 window.

Test changes:

* test_release_audit_is_deterministic now writes both runs to the
  same path (back-to-back overwrite) instead of distinct tmp paths,
  so the relative-link rendering doesn't make the two outputs differ.
* test_committed_audit_artifacts_match_fresh_regeneration uses the
  canonical "channel_signal_audit.{md,json}" basenames in tmp_path,
  so the relative link in the regenerated markdown matches the
  committed file's link.

Two stale Copilot threads (firmographics "Six columns" and "bandage"
typo) were already addressed in commit f6b274e during the first
self-review pass.

1175/1175 tests pass; ruff + mypy clean; the regenerated audit
artifacts are byte-identical via the canonical-path mode.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 6, 2026 14:03
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #69 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: scripts/audit_channel_signal.py
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195378952
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `_label_to_int()` only special-cases NumPy/pandas `bool` dtype, but snapshots/task splits in this repo use pandas nullable `boolean` dtype (see `tests/render/test_render.py` asserting `.dtype.name == "boolean"`). Consider aligning with the existing pattern used elsewhere (e.g., `series.astype("boolean").fillna(False).astype(int)`) so the audit script reliably handles the canonical label dtype and any unexpected nulls without `to_numeric(..., errors="raise")` failures.

## COPILOT-2
Location: release/README.md
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195379011
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The Quick start example merges a Series (`touch_counts`) using `on="lead_id"`, but `touch_counts` has `lead_id` in its index (not a column). As written this snippet is likely to error or produce an unintended join. Convert `touch_counts` to a DataFrame/reset its index, or merge with `right_index=True` so the example is runnable as-is.

## COPILOT-3
Location: docs/release/feature_dictionary.md:71
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195379043
Root author: copilot-pull-request-reviewer

Comment:
    The firmographics section says "Six columns" but the table lists five (`industry`, `region`, `employee_band`, `estimated_revenue_band`, `process_maturity_band`). If the sixth is `account_id` (currently documented under identity), it would help to either include it explicitly here or adjust the count so the narrative matches the feature list.

## COPILOT-4
Location: docs/release/feature_dictionary.md
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195379081
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    Typo: "A bandage of the latent `process_maturity` trait" reads like a misspelling (likely "banding"/"banded"), and could confuse readers about what the column represents.

## COPILOT-5
Location: docs/release/feature_dictionary.md
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195813969
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `touches_week_1` is described as covering days 0–6, but the snapshot builder counts touches where `_day <= 7` (days 0–7 inclusive). Please align the documentation with the implemented window (or adjust the implementation if 0–6 is intended) to avoid confusing readers about what the feature represents.

## COPILOT-6
Location: scripts/audit_channel_signal.py
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195814053
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    For byte-identical reproducibility across operating systems/locales, consider writing the markdown/JSON outputs with an explicit encoding (e.g., UTF-8). `Path.write_text()` uses the platform default encoding, which can change the output bytes on Windows and break the determinism/integrity guarantees you're testing for.

## COPILOT-7
Location: scripts/audit_channel_signal.py:439
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195814096
Root author: copilot-pull-request-reviewer

Comment:
    `render_markdown()` hard-codes links to `docs/release/channel_signal_audit.json` and `release/validation/validation_report.json`. Since the CLI supports `--out-json` and `--release-dir`, these links can become incorrect when running the script on a non-canonical directory. Consider wording this as “see the JSON output alongside this file” or derive the paths from the actual CLI args/report metadata.

## COPILOT-8
Location: scripts/audit_channel_signal.py
URL: https://github.com/leadforge-dev/leadforge/pull/69#discussion_r3195814127
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The Discussion section includes a seed-specific claim (“For seed 42 the OOS numbers below match…”), but the report content is generated for whatever bundle `--release-dir` points at. If someone runs the script on a different seed/bundle family, this statement can become false. Consider removing the seed-specific assertion or computing/printing the bundle seed(s) from the manifests and interpolating that into the text.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25440138616 attempt 1
Comment timestamp: 2026-05-06T14:03:35.105557+00:00
PR head commit: f84c8b6fff8c7b127e14a25119f41b9ad7f92a0b

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines +12 to +13
`advanced` bundles. The instructor companion adds the hidden truth
in `metadata/`; it does not change the feature list.
Comment thread .agent-plan.md
- [ ] `scripts/audit_channel_signal.py` → `docs/release/channel_signal_audit.md`
- [ ] `release/README.md` rewrite (release-grade dataset card; macro-framing paragraph; simulation-simplifications section)
- [ ] `docs/release/{generation_method,feature_dictionary}.md`
- [x] PR 4.1: `scripts/audit_channel_signal.py` (new) — analysis driver. For each tier (and each of `lead_source` / `first_touch_channel`), computes per-channel conversion rate + univariate AUC scored as the empirical positive rate per channel (a 1-D Bayes classifier, equivalent to a saturated LR on one-hot channel features). Writes `docs/release/channel_signal_audit.{md,json}`. CLI: `--release-dir`, `--tier`, `--task`, `--channel-column`, `--out-md`, `--out-json`, `--print`. Determinism guarded by `tests/scripts/test_audit_channel_signal.py` (10 tests: per-channel rollup, closed-form univariate AUC, single-class fallback, missing-column error, build/render round-trip, byte-identical re-run against the committed `release/` bundles, error paths). Audit verdict on the canonical PR 2.2 bundles: **weak channel signal** — across all three tiers and both channel columns the largest per-channel rate spread is 0.043 and the largest univariate AUC is 0.521, well below the G2 / Gemini v2 industry MQL→SQL band (SEO ~51%, PPC ~26%, Email <1%). v1 drives conversion through motif-family hazards keyed off latent traits, not channel-conditional probabilities; channel-conditional encoding is tracked in `docs/release/post_v1_roadmap.md`.
Comment on lines +413 to +420
if md_path is not None and json_path is not None:
try:
json_link = str(Path(json_path).relative_to(Path(md_path).parent))
except ValueError:
# Different drive roots — keep the markdown readable by
# falling back to the caller's path verbatim.
json_link = str(json_path)
else:
@shaypal5 shaypal5 merged commit 226d21e into main May 6, 2026
13 of 22 checks passed
@shaypal5 shaypal5 deleted the docs/channel-signal-audit-and-dataset-card branch May 6, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: docs Documentation or narrative changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants