From 4cba7672fefff7d995e6ff474640a74f912d30eb Mon Sep 17 00:00:00 2001
From: Shay Palachy A relational, reproducible, three-tier synthetic CRM dataset family for
-teaching lead scoring at scale. Generated by
+teaching lead scoring at scale. Created by
+Shay Palachy Affek and generated by
leadforge, an
open-source Python framework for synthetic CRM/funnel data. The
framework version is decoupled from the dataset version: the package
@@ -175,16 +176,14 @@ Configurations / Subsets
LeadForge: Synthetic B2B Lead Scoring Dataset (
leadforge-lead-scoring-v1)LeadForge: Synthetic B2B Lead Scoring Dataset (
leadforge-lead-scoring-
tag.
Mid-market SaaS vendors entered 2024–2026 with growth slowing and -customer-acquisition costs rising[^macro], so predicting which leads -convert within a fixed window has moved from a marketing nicety to a -survival skill. This dataset teaches that skill on a relational -substrate, with the realistic confusions (snapshot-window discipline, -leakage traps, channel signal weaker than vendor blogs imply) that -students will hit when they finally get hands on real CRM data.
-[^macro]: Macroeconomic framing summarised in
-docs/external_review/summaries/gemini_v2_summary.md
-(median public-SaaS growth 30%→25% from 2023 to 2025; New CAC Ratio
-rose materially in 2024).
.
├── intro/ intermediate/ advanced/ # student_public bundles, one per difficulty tier
@@ -601,6 +600,10 @@ Maintenance, adversarial framing, license
Verify integrity with leadforge validate <bundle_dir>; every file
is hashed in manifest.json.
+Credits
+Created by Shay Palachy Affek.
+Dataset generated with leadforge (MIT).
+Profiles: HuggingFace · Kaggle · GitHub