Skip to content

Engine-driven full-season bot evaluation#260

Merged
mitchwebster merged 2 commits into
mainfrom
eval-engine-driven
Jun 7, 2026
Merged

Engine-driven full-season bot evaluation#260
mitchwebster merged 2 commits into
mainfrom
eval-engine-driven

Conversation

@mitchwebster

Copy link
Copy Markdown
Owner

Summary

Adds a way to evaluate a bot through the real Go engine instead of a parallel Python harness, so eval logic can't drift from what runs week to week.

  • make evaluate-bot BOT=bots/nfl2025/<bot>.py YEAR=2025 RUNS=n builds the py_grpc_server image, then drafts and replays a full historical season — draft → weekly waivers + scoring → playoffs — against a baseline field of containerized standard-bot opponents, and reports where the bot finished (avg finish, championships, playoff apps), aggregated over RUNS.
  • Reuses the engine's own handlers (runDraft, performWeeklyFantasyActions, updateWeeklyScores, playoff/standings) — no re-implemented game logic. New code is just the season-replay loop + standings readout (pkg/engine/SeasonReplayHandler.go) and the orchestration command (pkg/cmd/evaluate/main.go).
  • Each run copies season.db to a scratch year (2999, gitignored) so the tracked DB is never mutated. evaluate-bot resolves DOCKER_HOST from the active docker context (works on Docker Desktop and Linux/CI).
  • fetchLeagueSettings now delegates to a shared engine.BuildDefaultLeagueSettings.

Engine bug fix (surfaced by the eval)

Running a real season exposed a latent bug: NewGameStateHandlerForDraft ran a blanket AutoMigrate that reconciled the Python-built players table (via Bot.Players). GORM's SQLite migrator first choked on its dangling FOREIGN KEY clause, then (FK removed) rebuilt VARCHARtext and silently NULLed every column's data (e.g. rank) — so bots read a rankless pool and crashed.

Fix: drop the dangling FK + unused relationships from blitz_env/models.py (the bots table doesn't exist when build-season materializes players), rebuild season.db, and create only the engine-owned league-state tables instead of migrating the prebuilt one. Row writes (draft assignments) still go through the Player model. A regression test against a real season.db copy fails without the fix (rank nulled) and passes with it.

Test plan

  • go test ./pkg/engine/... — incl. a Docker-free 17-week season replay (with playoffs) and the players-corruption regression test.
  • go build ./pkg/cmd/... ./pkg/engine/...
  • python3 -m pytest tests/ (13 passed, incl. the shipped-season.db check on the rebuilt FK-free DB).
  • End-to-end make evaluate-bot run completed a full containerized season on 2025 data.

🤖 Generated with Claude Code

Mitch Webster and others added 2 commits June 6, 2026 21:14
NewGameStateHandlerForDraft ran a blanket AutoMigrate that reconciled the Python-built
players table (via Bot.Players) — GORM's SQLite migrator first choked on its dangling
FOREIGN KEY clause, then (FK removed) rebuilt VARCHAR->text and silently NULLed every
column's data (e.g. rank), so bots read a rankless pool and crashed.

- blitz_env/models.py: drop the dangling FK on Player.current_bot_id (the bots table
  doesn't exist when build-season materializes players) + the unused Player.bot /
  Bot.players relationships; rebuild data/game_states/2025/season.db (FK-free).
- pkg/gamestate/handler.go: create only the engine-owned league-state tables (CreateTable
  if absent) instead of AutoMigrate, so the prebuilt players table is never reconciled.
  Row reads/writes (draft assignments) still go through the Player model.
- Adds BuildDefaultLeagueSettings + a regression test (real season.db copy) that fails
  without the fix (rank nulled) and passes with it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Evaluate any bot through the REAL engine instead of a parallel Python harness: a single
`make evaluate-bot BOT=... YEAR=2025 RUNS=n` builds the image, then drafts and replays a
full historical season (draft + weekly waivers + scoring + playoffs) against a baseline
field of containerized standard-bot opponents, and reports where the bot finished.

- pkg/engine/SeasonReplayHandler.go: ReplaySeason loop (reuses performWeeklyFantasyActions
  + updateWeeklyScores) and FinalStandings, with a Docker-free integration test that
  replays a full 17-week season incl. playoffs through the engine's own scoring.
- pkg/cmd/evaluate/main.go: baseline-field league build + scratch-year isolation (copies
  season.db so the tracked DB is never mutated) + N-run aggregation.
- pkg/cmd/engine_bootstrap.go: fetchLeagueSettings delegates to BuildDefaultLeagueSettings.
- Makefile: evaluate-bot target (builds image, resolves DOCKER_HOST from the active docker
  context); CLAUDE.md documents it as the authoritative evaluator; .gitignore the scratch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mitchwebster mitchwebster merged commit dc91a36 into main Jun 7, 2026
1 of 2 checks passed
@mitchwebster mitchwebster deleted the eval-engine-driven branch June 7, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant