Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions release/_preview_committed/huggingface_public.html
Original file line number Diff line number Diff line change
Expand Up @@ -167,24 +167,23 @@ <h2 class="section__heading">Configurations / Subsets <span class="section__coun
<section class="readme">
<h1>LeadForge: Synthetic B2B Lead Scoring Dataset (<code>leadforge-lead-scoring-v1</code>)</h1>
<p>A relational, reproducible, three-tier synthetic CRM dataset family for
teaching lead scoring at scale. Generated by
teaching lead scoring at scale. Created by
<a href="https://www.shaypalachy.com/">Shay Palachy Affek</a> and generated by
<a href="https://github.com/leadforge-dev/leadforge">leadforge</a>, an
open-source Python framework for synthetic CRM/funnel data. The
framework version is decoupled from the dataset version: the package
stays at <code>1.x</code>; the dataset is published under the explicit <code>…-v1</code>
tag.</p>
<h2>Why lead scoring matters in 2024–2026</h2>
<p>Mid-market SaaS vendors entered 2024–2026 with growth slowing and
customer-acquisition costs rising[^macro], so predicting <em>which</em> leads
convert within a fixed window has moved from a marketing nicety to a
survival skill. This dataset teaches that skill on a relational
substrate, with the realistic confusions (snapshot-window discipline,
leakage traps, channel signal weaker than vendor blogs imply) that
students will hit when they finally get hands on real CRM data.</p>
<p>[^macro]: Macroeconomic framing summarised in
<a href="https://github.com/leadforge-dev/leadforge/blob/main/docs/external_review/summaries/gemini_v2_summary.md"><code>docs/external_review/summaries/gemini_v2_summary.md</code></a>
(median public-SaaS growth 30%→25% from 2023 to 2025; New CAC Ratio
rose materially in 2024).</p>
customer-acquisition costs rising (median public-SaaS growth 30%→25%
from 2023 to 2025; New CAC Ratio rose materially in 2024), so
predicting <em>which</em> leads convert within a fixed window has moved from
a marketing nicety to a survival skill. This dataset teaches that
skill on a relational substrate, with the realistic confusions
(snapshot-window discipline, leakage traps, channel signal weaker than
vendor blogs imply) that students will hit when they finally get hands
on real CRM data.</p>
<h2>What's inside</h2>
<pre><code>.
├── intro/ intermediate/ advanced/ # student_public bundles, one per difficulty tier
Expand Down Expand Up @@ -601,6 +600,10 @@ <h2>Maintenance, adversarial framing, license</h2>
</table>
<p>Verify integrity with <code>leadforge validate &lt;bundle_dir&gt;</code>; every file
is hashed in <code>manifest.json</code>.</p>
<h2>Credits</h2>
<p>Created by <a href="https://www.shaypalachy.com/">Shay Palachy Affek</a>.
Dataset generated with <a href="https://github.com/leadforge-dev/leadforge">leadforge</a> (MIT).
Profiles: <a href="https://huggingface.co/shaypal5">HuggingFace</a> · <a href="https://www.kaggle.com/derelictpanda">Kaggle</a> · <a href="https://github.com/shaypalachy">GitHub</a></p>
</section>
<footer class="dataset-footer">
<div class="dataset-footer__license">License: mit</div>
Expand Down
5 changes: 5 additions & 0 deletions tests/scripts/test_preview_hf_page.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,12 @@
# ``test_preview_kaggle_page.py`` for rationale.
_LINK_OK_PREFIXES = (
"https://github.com/leadforge-dev/leadforge",
"https://github.com/shaypalachy",
"https://huggingface.co/datasets/leadforge",
"https://huggingface.co/datasets/shaypal5",
"https://huggingface.co/shaypal5",
"https://www.kaggle.com/derelictpanda",
"https://www.shaypalachy.com/",
"https://example.com",
"LICENSE",
"#",
Expand Down
Loading