Engine-driven full-season bot evaluation#260
Merged
Merged
Conversation
NewGameStateHandlerForDraft ran a blanket AutoMigrate that reconciled the Python-built players table (via Bot.Players) — GORM's SQLite migrator first choked on its dangling FOREIGN KEY clause, then (FK removed) rebuilt VARCHAR->text and silently NULLed every column's data (e.g. rank), so bots read a rankless pool and crashed. - blitz_env/models.py: drop the dangling FK on Player.current_bot_id (the bots table doesn't exist when build-season materializes players) + the unused Player.bot / Bot.players relationships; rebuild data/game_states/2025/season.db (FK-free). - pkg/gamestate/handler.go: create only the engine-owned league-state tables (CreateTable if absent) instead of AutoMigrate, so the prebuilt players table is never reconciled. Row reads/writes (draft assignments) still go through the Player model. - Adds BuildDefaultLeagueSettings + a regression test (real season.db copy) that fails without the fix (rank nulled) and passes with it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Evaluate any bot through the REAL engine instead of a parallel Python harness: a single `make evaluate-bot BOT=... YEAR=2025 RUNS=n` builds the image, then drafts and replays a full historical season (draft + weekly waivers + scoring + playoffs) against a baseline field of containerized standard-bot opponents, and reports where the bot finished. - pkg/engine/SeasonReplayHandler.go: ReplaySeason loop (reuses performWeeklyFantasyActions + updateWeeklyScores) and FinalStandings, with a Docker-free integration test that replays a full 17-week season incl. playoffs through the engine's own scoring. - pkg/cmd/evaluate/main.go: baseline-field league build + scratch-year isolation (copies season.db so the tracked DB is never mutated) + N-run aggregation. - pkg/cmd/engine_bootstrap.go: fetchLeagueSettings delegates to BuildDefaultLeagueSettings. - Makefile: evaluate-bot target (builds image, resolves DOCKER_HOST from the active docker context); CLAUDE.md documents it as the authoritative evaluator; .gitignore the scratch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a way to evaluate a bot through the real Go engine instead of a parallel Python harness, so eval logic can't drift from what runs week to week.
make evaluate-bot BOT=bots/nfl2025/<bot>.py YEAR=2025 RUNS=nbuilds thepy_grpc_serverimage, then drafts and replays a full historical season — draft → weekly waivers + scoring → playoffs — against a baseline field of containerizedstandard-botopponents, and reports where the bot finished (avg finish, championships, playoff apps), aggregated overRUNS.runDraft,performWeeklyFantasyActions,updateWeeklyScores, playoff/standings) — no re-implemented game logic. New code is just the season-replay loop + standings readout (pkg/engine/SeasonReplayHandler.go) and the orchestration command (pkg/cmd/evaluate/main.go).season.dbto a scratch year (2999, gitignored) so the tracked DB is never mutated.evaluate-botresolvesDOCKER_HOSTfrom the active docker context (works on Docker Desktop and Linux/CI).fetchLeagueSettingsnow delegates to a sharedengine.BuildDefaultLeagueSettings.Engine bug fix (surfaced by the eval)
Running a real season exposed a latent bug:
NewGameStateHandlerForDraftran a blanketAutoMigratethat reconciled the Python-builtplayerstable (viaBot.Players). GORM's SQLite migrator first choked on its danglingFOREIGN KEYclause, then (FK removed) rebuiltVARCHAR→textand silently NULLed every column's data (e.g.rank) — so bots read a rankless pool and crashed.Fix: drop the dangling FK + unused relationships from
blitz_env/models.py(thebotstable doesn't exist whenbuild-seasonmaterializesplayers), rebuildseason.db, and create only the engine-owned league-state tables instead of migrating the prebuilt one. Row writes (draft assignments) still go through the Player model. A regression test against a realseason.dbcopy fails without the fix (rank nulled) and passes with it.Test plan
go test ./pkg/engine/...— incl. a Docker-free 17-week season replay (with playoffs) and the players-corruption regression test.go build ./pkg/cmd/... ./pkg/engine/...python3 -m pytest tests/(13 passed, incl. the shipped-season.dbcheck on the rebuilt FK-free DB).make evaluate-botrun completed a full containerized season on 2025 data.🤖 Generated with Claude Code