feat(scripts): Kaggle release packager + cover image#70
Closed
shaypal5 wants to merge 4 commits into
Closed
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds a deterministic “dry-run” Kaggle packaging surface for the v1 public dataset, including a reproducible cover image generator and a committed Kaggle-shaped upload directory under release/kaggle/.
Changes:
- Added
scripts/package_kaggle_release.pyto generate/validatedataset-metadata.json, assemble the Kaggle upload directory, and generate a deterministic cover image. - Added
tests/scripts/test_package_kaggle_release.pyto validate metadata constraints, resource discovery/schema ordering, cover image constraints, and regeneration determinism. - Committed a Kaggle upload directory (
release/kaggle/) containing tier artifacts plus Kaggle metadata/README/license/cover image, and updated.agent-plan.mdfor Phase 5.1.
Reviewed changes
Copilot reviewed 18 out of 53 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
.agent-plan.md |
Updates Phase 5 plan to mark Kaggle packager + cover image as complete. |
scripts/package_kaggle_release.py |
New Kaggle packager: discover resources, build Kaggle metadata, generate cover image, assemble upload dir, and validate outputs. |
tests/scripts/test_package_kaggle_release.py |
New tests covering metadata validation, cover image generation/validation, resource schema ordering, packaging output shape, and regeneration checks. |
release/kaggle/README.md |
Kaggle package README content committed into the upload directory. |
release/kaggle/LICENSE |
MIT license file included in Kaggle upload directory. |
release/kaggle/dataset-cover-image.png |
Generated cover image committed for Kaggle upload. |
release/kaggle/dataset-metadata.json |
Generated Kaggle dataset-metadata.json committed for upload. |
release/kaggle/intro/dataset_card.md |
Intro tier dataset card included in Kaggle package. |
release/kaggle/intro/feature_dictionary.csv |
Intro tier feature dictionary included in Kaggle package. |
release/kaggle/intro/lead_scoring.csv |
Intro tier flat CSV included in Kaggle package. |
release/kaggle/intro/manifest.json |
Intro tier manifest included in Kaggle package. |
release/kaggle/intro/tables/accounts.parquet |
Intro tier accounts table parquet included in Kaggle package. |
release/kaggle/intro/tables/contacts.parquet |
Intro tier contacts table parquet included in Kaggle package. |
release/kaggle/intro/tables/leads.parquet |
Intro tier leads table parquet included in Kaggle package. |
release/kaggle/intro/tables/opportunities.parquet |
Intro tier opportunities table parquet included in Kaggle package. |
release/kaggle/intro/tables/sales_activities.parquet |
Intro tier sales_activities table parquet included in Kaggle package. |
release/kaggle/intro/tables/sessions.parquet |
Intro tier sessions table parquet included in Kaggle package. |
release/kaggle/intro/tables/touches.parquet |
Intro tier touches table parquet included in Kaggle package. |
release/kaggle/intro/tasks/converted_within_90_days/task_manifest.json |
Intro tier task manifest included in Kaggle package. |
release/kaggle/intro/tasks/converted_within_90_days/train.parquet |
Intro tier train split parquet included in Kaggle package. |
release/kaggle/intro/tasks/converted_within_90_days/valid.parquet |
Intro tier valid split parquet included in Kaggle package. |
release/kaggle/intro/tasks/converted_within_90_days/test.parquet |
Intro tier test split parquet included in Kaggle package. |
release/kaggle/intermediate/dataset_card.md |
Intermediate tier dataset card included in Kaggle package. |
release/kaggle/intermediate/feature_dictionary.csv |
Intermediate tier feature dictionary included in Kaggle package. |
release/kaggle/intermediate/lead_scoring.csv |
Intermediate tier flat CSV included in Kaggle package. |
release/kaggle/intermediate/manifest.json |
Intermediate tier manifest included in Kaggle package. |
release/kaggle/intermediate/tables/accounts.parquet |
Intermediate tier accounts table parquet included in Kaggle package. |
release/kaggle/intermediate/tables/contacts.parquet |
Intermediate tier contacts table parquet included in Kaggle package. |
release/kaggle/intermediate/tables/leads.parquet |
Intermediate tier leads table parquet included in Kaggle package. |
release/kaggle/intermediate/tables/opportunities.parquet |
Intermediate tier opportunities table parquet included in Kaggle package. |
release/kaggle/intermediate/tables/sales_activities.parquet |
Intermediate tier sales_activities table parquet included in Kaggle package. |
release/kaggle/intermediate/tables/sessions.parquet |
Intermediate tier sessions table parquet included in Kaggle package. |
release/kaggle/intermediate/tables/touches.parquet |
Intermediate tier touches table parquet included in Kaggle package. |
release/kaggle/intermediate/tasks/converted_within_90_days/task_manifest.json |
Intermediate tier task manifest included in Kaggle package. |
release/kaggle/intermediate/tasks/converted_within_90_days/train.parquet |
Intermediate tier train split parquet included in Kaggle package. |
release/kaggle/intermediate/tasks/converted_within_90_days/valid.parquet |
Intermediate tier valid split parquet included in Kaggle package. |
release/kaggle/intermediate/tasks/converted_within_90_days/test.parquet |
Intermediate tier test split parquet included in Kaggle package. |
release/kaggle/advanced/dataset_card.md |
Advanced tier dataset card included in Kaggle package. |
release/kaggle/advanced/feature_dictionary.csv |
Advanced tier feature dictionary included in Kaggle package. |
release/kaggle/advanced/lead_scoring.csv |
Advanced tier flat CSV included in Kaggle package. |
release/kaggle/advanced/manifest.json |
Advanced tier manifest included in Kaggle package. |
release/kaggle/advanced/tables/accounts.parquet |
Advanced tier accounts table parquet included in Kaggle package. |
release/kaggle/advanced/tables/contacts.parquet |
Advanced tier contacts table parquet included in Kaggle package. |
release/kaggle/advanced/tables/leads.parquet |
Advanced tier leads table parquet included in Kaggle package. |
release/kaggle/advanced/tables/opportunities.parquet |
Advanced tier opportunities table parquet included in Kaggle package. |
release/kaggle/advanced/tables/sales_activities.parquet |
Advanced tier sales_activities table parquet included in Kaggle package. |
release/kaggle/advanced/tables/sessions.parquet |
Advanced tier sessions table parquet included in Kaggle package. |
release/kaggle/advanced/tables/touches.parquet |
Advanced tier touches table parquet included in Kaggle package. |
release/kaggle/advanced/tasks/converted_within_90_days/task_manifest.json |
Advanced tier task manifest included in Kaggle package. |
release/kaggle/advanced/tasks/converted_within_90_days/train.parquet |
Advanced tier train split parquet included in Kaggle package. |
release/kaggle/advanced/tasks/converted_within_90_days/valid.parquet |
Advanced tier valid split parquet included in Kaggle package. |
release/kaggle/advanced/tasks/converted_within_90_days/test.parquet |
Advanced tier test split parquet included in Kaggle package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #70 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.Run metadata: |
Contributor
Author
|
Closing: this PR was opened from an accidental pasted task and should be discarded. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/package_kaggle_release.py, a deterministic Kaggle dry-run packager for the public v1 tiers.release/kaggle/dataset-metadata.jsonplus a Kaggle-shaped upload directory with public tier files, README, license, and cover image.release/dataset-cover-image.pngas a reproducible hand-designed funnel diagram generated by the packager..agent-plan.mdfor PR 5.1 and leave the Hugging Face packager/load_dataset smoke test for PR 5.2.Cover Image Decision
Source: reproducible hand-designed funnel diagram generated locally by the packager. Rationale: it gives clean procurement/AP automation branding, avoids stock licensing risk, and remains deterministic/testable.
Validation
python scripts/package_kaggle_release.py --dry-runpytest tests/scripts/test_package_kaggle_release.pyruff format --check .ruff check .mypy leadforge/mypy leadforge/ scripts/package_kaggle_release.pypython scripts/probe_relational_leakage.py release/intro --max-accuracy 0.65python scripts/probe_relational_leakage.py release/intermediate --max-accuracy 0.65python scripts/probe_relational_leakage.py release/advanced --max-accuracy 0.65python scripts/verify_hash_determinism.py— PASS 67/67python scripts/validate_release_candidate.py --no-rebuildpytest— 1182 passed, 1 existing warningNotes
scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65is represented above as three per-bundle invocations because the current script accepts one bundle path per run.release/validation/validation_report.{json,md}timestamps during validation; that expected drift was reverted before commit.