Skip to content

Add player_bio_download notebook for biographical enrichment#10

Open
colettace wants to merge 1 commit into
mainfrom
codex/create-player-bio-download-notebook
Open

Add player_bio_download notebook for biographical enrichment#10
colettace wants to merge 1 commit into
mainfrom
codex/create-player-bio-download-notebook

Conversation

@colettace

Copy link
Copy Markdown
Owner

Motivation

  • Provide a standalone, re-runnable raw-enrichment workflow that extracts player biographical attributes separate from ranking logic.
  • Create a stable, versionable player-bio layer to power downstream joins and lookups.
  • Ensure a consistent schema and basic data quality checks so outputs are reliable for downstream pipelines.

Description

  • Add notebooks/player_bio_download.ipynb, a notebook that loads a player universe from pybaseball leaderboards and optionally the latest status roster snapshot.
  • Map Fangraphs ↔ MLBAM IDs using chadwick_register, then batch-fetch player bio fields from the MLB Stats API /people endpoint with configurable PEOPLE_BATCH_SIZE and basic HTTP/URL error handling.
  • Normalize to a stable schema with columns mlbam_id, fangraphs_id, full_name, birth_date, computed age, bats, throws, birth_country, birth_city, and debut_date, deduped by mlbam_id.
  • Add basic quality checks for per-column missing rates and handedness domain validation (bats ∈ {L,R,S}, throws ∈ {L,R}), and persist output to data/player_bio_<DATE_TAG>.parquet.

Testing

  • Validated notebook JSON syntax with python3 -m json.tool notebooks/player_bio_download.ipynb, which completed successfully.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant