Engine-driven full-season bot evaluation by mitchwebster · Pull Request #260 · mitchwebster/botblitz

mitchwebster · 2026-06-07T04:14:54Z

Summary

Adds a way to evaluate a bot through the real Go engine instead of a parallel Python harness, so eval logic can't drift from what runs week to week.

make evaluate-bot BOT=bots/nfl2025/<bot>.py YEAR=2025 RUNS=n builds the py_grpc_server image, then drafts and replays a full historical season — draft → weekly waivers + scoring → playoffs — against a baseline field of containerized standard-bot opponents, and reports where the bot finished (avg finish, championships, playoff apps), aggregated over RUNS.
Reuses the engine's own handlers (runDraft, performWeeklyFantasyActions, updateWeeklyScores, playoff/standings) — no re-implemented game logic. New code is just the season-replay loop + standings readout (pkg/engine/SeasonReplayHandler.go) and the orchestration command (pkg/cmd/evaluate/main.go).
Each run copies season.db to a scratch year (2999, gitignored) so the tracked DB is never mutated. evaluate-bot resolves DOCKER_HOST from the active docker context (works on Docker Desktop and Linux/CI).
fetchLeagueSettings now delegates to a shared engine.BuildDefaultLeagueSettings.

Engine bug fix (surfaced by the eval)

Running a real season exposed a latent bug: NewGameStateHandlerForDraft ran a blanket AutoMigrate that reconciled the Python-built players table (via Bot.Players). GORM's SQLite migrator first choked on its dangling FOREIGN KEY clause, then (FK removed) rebuilt VARCHAR→text and silently NULLed every column's data (e.g. rank) — so bots read a rankless pool and crashed.

Fix: drop the dangling FK + unused relationships from blitz_env/models.py (the bots table doesn't exist when build-season materializes players), rebuild season.db, and create only the engine-owned league-state tables instead of migrating the prebuilt one. Row writes (draft assignments) still go through the Player model. A regression test against a real season.db copy fails without the fix (rank nulled) and passes with it.

Test plan

go test ./pkg/engine/... — incl. a Docker-free 17-week season replay (with playoffs) and the players-corruption regression test.
go build ./pkg/cmd/... ./pkg/engine/...
python3 -m pytest tests/ (13 passed, incl. the shipped-season.db check on the rebuilt FK-free DB).
End-to-end make evaluate-bot run completed a full containerized season on 2025 data.

🤖 Generated with Claude Code

NewGameStateHandlerForDraft ran a blanket AutoMigrate that reconciled the Python-built players table (via Bot.Players) — GORM's SQLite migrator first choked on its dangling FOREIGN KEY clause, then (FK removed) rebuilt VARCHAR->text and silently NULLed every column's data (e.g. rank), so bots read a rankless pool and crashed. - blitz_env/models.py: drop the dangling FK on Player.current_bot_id (the bots table doesn't exist when build-season materializes players) + the unused Player.bot / Bot.players relationships; rebuild data/game_states/2025/season.db (FK-free). - pkg/gamestate/handler.go: create only the engine-owned league-state tables (CreateTable if absent) instead of AutoMigrate, so the prebuilt players table is never reconciled. Row reads/writes (draft assignments) still go through the Player model. - Adds BuildDefaultLeagueSettings + a regression test (real season.db copy) that fails without the fix (rank nulled) and passes with it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Evaluate any bot through the REAL engine instead of a parallel Python harness: a single `make evaluate-bot BOT=... YEAR=2025 RUNS=n` builds the image, then drafts and replays a full historical season (draft + weekly waivers + scoring + playoffs) against a baseline field of containerized standard-bot opponents, and reports where the bot finished. - pkg/engine/SeasonReplayHandler.go: ReplaySeason loop (reuses performWeeklyFantasyActions + updateWeeklyScores) and FinalStandings, with a Docker-free integration test that replays a full 17-week season incl. playoffs through the engine's own scoring. - pkg/cmd/evaluate/main.go: baseline-field league build + scratch-year isolation (copies season.db so the tracked DB is never mutated) + N-run aggregation. - pkg/cmd/engine_bootstrap.go: fetchLeagueSettings delegates to BuildDefaultLeagueSettings. - Makefile: evaluate-bot target (builds image, resolves DOCKER_HOST from the active docker context); CLAUDE.md documents it as the authoritative evaluator; .gitignore the scratch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Mitch Webster and others added 2 commits June 6, 2026 21:14

mitchwebster merged commit dc91a36 into main Jun 7, 2026
1 of 2 checks passed

mitchwebster deleted the eval-engine-driven branch June 7, 2026 04:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engine-driven full-season bot evaluation#260

Engine-driven full-season bot evaluation#260
mitchwebster merged 2 commits into
mainfrom
eval-engine-driven

mitchwebster commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mitchwebster commented Jun 7, 2026

Summary

Engine bug fix (surfaced by the eval)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant