Skip to content

feat(scripts): Kaggle release packager + cover image#70

Closed
shaypal5 wants to merge 4 commits into
mainfrom
feat/kaggle-release-packager
Closed

feat(scripts): Kaggle release packager + cover image#70
shaypal5 wants to merge 4 commits into
mainfrom
feat/kaggle-release-packager

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 6, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add scripts/package_kaggle_release.py, a deterministic Kaggle dry-run packager for the public v1 tiers.
  • Generate and commit release/kaggle/dataset-metadata.json plus a Kaggle-shaped upload directory with public tier files, README, license, and cover image.
  • Add release/dataset-cover-image.png as a reproducible hand-designed funnel diagram generated by the packager.
  • Update .agent-plan.md for PR 5.1 and leave the Hugging Face packager/load_dataset smoke test for PR 5.2.

Cover Image Decision

Source: reproducible hand-designed funnel diagram generated locally by the packager. Rationale: it gives clean procurement/AP automation branding, avoids stock licensing risk, and remains deterministic/testable.

Validation

  • python scripts/package_kaggle_release.py --dry-run
  • pytest tests/scripts/test_package_kaggle_release.py
  • ruff format --check .
  • ruff check .
  • mypy leadforge/
  • mypy leadforge/ scripts/package_kaggle_release.py
  • python scripts/probe_relational_leakage.py release/intro --max-accuracy 0.65
  • python scripts/probe_relational_leakage.py release/intermediate --max-accuracy 0.65
  • python scripts/probe_relational_leakage.py release/advanced --max-accuracy 0.65
  • python scripts/verify_hash_determinism.py — PASS 67/67
  • python scripts/validate_release_candidate.py --no-rebuild
  • pytest — 1182 passed, 1 existing warning

Notes

  • scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65 is represented above as three per-bundle invocations because the current script accepts one bundle path per run.
  • The validator touched release/validation/validation_report.{json,md} timestamps during validation; that expected drift was reverted before commit.

Copilot AI review requested due to automatic review settings May 6, 2026 14:33
@shaypal5 shaypal5 added type: feature New capability layer: cli cli/ command-line interface labels May 6, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a deterministic “dry-run” Kaggle packaging surface for the v1 public dataset, including a reproducible cover image generator and a committed Kaggle-shaped upload directory under release/kaggle/.

Changes:

  • Added scripts/package_kaggle_release.py to generate/validate dataset-metadata.json, assemble the Kaggle upload directory, and generate a deterministic cover image.
  • Added tests/scripts/test_package_kaggle_release.py to validate metadata constraints, resource discovery/schema ordering, cover image constraints, and regeneration determinism.
  • Committed a Kaggle upload directory (release/kaggle/) containing tier artifacts plus Kaggle metadata/README/license/cover image, and updated .agent-plan.md for Phase 5.1.

Reviewed changes

Copilot reviewed 18 out of 53 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
.agent-plan.md Updates Phase 5 plan to mark Kaggle packager + cover image as complete.
scripts/package_kaggle_release.py New Kaggle packager: discover resources, build Kaggle metadata, generate cover image, assemble upload dir, and validate outputs.
tests/scripts/test_package_kaggle_release.py New tests covering metadata validation, cover image generation/validation, resource schema ordering, packaging output shape, and regeneration checks.
release/kaggle/README.md Kaggle package README content committed into the upload directory.
release/kaggle/LICENSE MIT license file included in Kaggle upload directory.
release/kaggle/dataset-cover-image.png Generated cover image committed for Kaggle upload.
release/kaggle/dataset-metadata.json Generated Kaggle dataset-metadata.json committed for upload.
release/kaggle/intro/dataset_card.md Intro tier dataset card included in Kaggle package.
release/kaggle/intro/feature_dictionary.csv Intro tier feature dictionary included in Kaggle package.
release/kaggle/intro/lead_scoring.csv Intro tier flat CSV included in Kaggle package.
release/kaggle/intro/manifest.json Intro tier manifest included in Kaggle package.
release/kaggle/intro/tables/accounts.parquet Intro tier accounts table parquet included in Kaggle package.
release/kaggle/intro/tables/contacts.parquet Intro tier contacts table parquet included in Kaggle package.
release/kaggle/intro/tables/leads.parquet Intro tier leads table parquet included in Kaggle package.
release/kaggle/intro/tables/opportunities.parquet Intro tier opportunities table parquet included in Kaggle package.
release/kaggle/intro/tables/sales_activities.parquet Intro tier sales_activities table parquet included in Kaggle package.
release/kaggle/intro/tables/sessions.parquet Intro tier sessions table parquet included in Kaggle package.
release/kaggle/intro/tables/touches.parquet Intro tier touches table parquet included in Kaggle package.
release/kaggle/intro/tasks/converted_within_90_days/task_manifest.json Intro tier task manifest included in Kaggle package.
release/kaggle/intro/tasks/converted_within_90_days/train.parquet Intro tier train split parquet included in Kaggle package.
release/kaggle/intro/tasks/converted_within_90_days/valid.parquet Intro tier valid split parquet included in Kaggle package.
release/kaggle/intro/tasks/converted_within_90_days/test.parquet Intro tier test split parquet included in Kaggle package.
release/kaggle/intermediate/dataset_card.md Intermediate tier dataset card included in Kaggle package.
release/kaggle/intermediate/feature_dictionary.csv Intermediate tier feature dictionary included in Kaggle package.
release/kaggle/intermediate/lead_scoring.csv Intermediate tier flat CSV included in Kaggle package.
release/kaggle/intermediate/manifest.json Intermediate tier manifest included in Kaggle package.
release/kaggle/intermediate/tables/accounts.parquet Intermediate tier accounts table parquet included in Kaggle package.
release/kaggle/intermediate/tables/contacts.parquet Intermediate tier contacts table parquet included in Kaggle package.
release/kaggle/intermediate/tables/leads.parquet Intermediate tier leads table parquet included in Kaggle package.
release/kaggle/intermediate/tables/opportunities.parquet Intermediate tier opportunities table parquet included in Kaggle package.
release/kaggle/intermediate/tables/sales_activities.parquet Intermediate tier sales_activities table parquet included in Kaggle package.
release/kaggle/intermediate/tables/sessions.parquet Intermediate tier sessions table parquet included in Kaggle package.
release/kaggle/intermediate/tables/touches.parquet Intermediate tier touches table parquet included in Kaggle package.
release/kaggle/intermediate/tasks/converted_within_90_days/task_manifest.json Intermediate tier task manifest included in Kaggle package.
release/kaggle/intermediate/tasks/converted_within_90_days/train.parquet Intermediate tier train split parquet included in Kaggle package.
release/kaggle/intermediate/tasks/converted_within_90_days/valid.parquet Intermediate tier valid split parquet included in Kaggle package.
release/kaggle/intermediate/tasks/converted_within_90_days/test.parquet Intermediate tier test split parquet included in Kaggle package.
release/kaggle/advanced/dataset_card.md Advanced tier dataset card included in Kaggle package.
release/kaggle/advanced/feature_dictionary.csv Advanced tier feature dictionary included in Kaggle package.
release/kaggle/advanced/lead_scoring.csv Advanced tier flat CSV included in Kaggle package.
release/kaggle/advanced/manifest.json Advanced tier manifest included in Kaggle package.
release/kaggle/advanced/tables/accounts.parquet Advanced tier accounts table parquet included in Kaggle package.
release/kaggle/advanced/tables/contacts.parquet Advanced tier contacts table parquet included in Kaggle package.
release/kaggle/advanced/tables/leads.parquet Advanced tier leads table parquet included in Kaggle package.
release/kaggle/advanced/tables/opportunities.parquet Advanced tier opportunities table parquet included in Kaggle package.
release/kaggle/advanced/tables/sales_activities.parquet Advanced tier sales_activities table parquet included in Kaggle package.
release/kaggle/advanced/tables/sessions.parquet Advanced tier sessions table parquet included in Kaggle package.
release/kaggle/advanced/tables/touches.parquet Advanced tier touches table parquet included in Kaggle package.
release/kaggle/advanced/tasks/converted_within_90_days/task_manifest.json Advanced tier task manifest included in Kaggle package.
release/kaggle/advanced/tasks/converted_within_90_days/train.parquet Advanced tier train split parquet included in Kaggle package.
release/kaggle/advanced/tasks/converted_within_90_days/valid.parquet Advanced tier valid split parquet included in Kaggle package.
release/kaggle/advanced/tasks/converted_within_90_days/test.parquet Advanced tier test split parquet included in Kaggle package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/package_kaggle_release.py
Comment thread scripts/package_kaggle_release.py
Comment thread scripts/package_kaggle_release.py
Comment thread tests/scripts/test_package_kaggle_release.py
Comment thread release/kaggle/README.md
Comment thread release/kaggle/README.md Outdated
Comment thread release/kaggle/dataset-metadata.json Outdated
@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #70 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25442782908 attempt 1
Comment timestamp: 2026-05-06T14:51:18.861692+00:00
PR head commit: f2e4f9a6d0687b981ac41344f6fb8eec0b37827e

@shaypal5

shaypal5 commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

Closing: this PR was opened from an accidental pasted task and should be discarded.

@shaypal5 shaypal5 closed this May 6, 2026
@shaypal5 shaypal5 deleted the feat/kaggle-release-packager branch May 6, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: cli cli/ command-line interface type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants