feat: add SQLite persistent storage for benchmark results (#26)#34
Conversation
Code Review — PR #34 (SQLite persistent storage)Thanks for the contribution! The overall design is clean and the backward-compatibility approach (JSON/CSV unchanged, SQLite as opt-in) is exactly right. I found 3 bugs that will cause crashes in production and 2 lower-priority issues that need addressing before merge. 🔴 Bug 1 — Duplicate
|
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | 🔴 High | utils/storage.py |
Duplicate results rows corrupt get_run() on repeated run_id |
| 2 | 🔴 High | evaluation/logger.py |
IndexError when rows is empty in _append_csv_summary |
| 3 | 🔴 High | evaluation/logger.py |
Storage() connections never closed → file handle leak |
| 4 | 🟡 Medium | utils/storage.py |
None values in metric arrays break arithmetic downstream |
| 5 | 🟡 Low | tests/test_pipeline.py |
Integration tests leak connections and use raw SQL for cleanup |
Once the three 🔴 issues are fixed I'm happy to approve. The design and test coverage are solid — these are all mechanical fixes.
- Bug 1: DELETE stale results before INSERT in save_run() to prevent duplicate rows on repeated run_id (fixes get_run() corruption) - Bug 2: Add empty-rows guard in _append_csv_summary() to prevent IndexError - Bug 3: Close Storage() connections in log_run() and list_runs() via try/finally to prevent file handle leaks on Windows - Issue 4: Document None-handling contract in get_run() docstring; filter None from compare_runs() recall arrays - Issue 5: Close Storage() in test cleanup to match the Storage unit tests Adds test_storage_save_run_idempotent to verify fix for Bug 1.
022423e to
6fcc8db
Compare
|
[@Neal006] Thanks for the thorough review! All 5 issues have been addressed: 🔴 Bug 1 — Added 🟡 Issue 4 — Added docstring note in All 30 tests pass (23 existing + 7 new). |
Code Review — Approved with Minor NotesHey @Sugaria0427, great work on this PR! The implementation is solid and correct — the schema matches the spec in issue #26 exactly, it's backward compatible, the idempotency handling is done right, and the pre-existing I verified that nothing in the codebase reads the Minor issues to fix before this can land
None of these are blocking except the merge conflict. Once you resolve that and push, this is ready to go. Thanks again for the contribution — this is exactly what the project needed! |
Replaces flat JSON/CSV log storage with a queryable SQLite database while maintaining full backward compatibility. New files: - utils/storage.py — Storage class wrapping Python stdlib sqlite3. Schema: runs (run_id, timestamp, config_json) + results (run_id FK, backend, turn, metric, value). API: save_run, get_run, list_runs, compare_runs. - utils/migrate_legacy_logs.py — one-shot idempotent migration script that imports existing experiment_logs/*.json into SQLite. Modified files: - evaluation/logger.py — log_run() now calls Storage().save_run() alongside existing JSON+CSV writes. list_runs() queries SQLite first, falls back to filesystem scan. Fixes pre-existing has_llm_eval bug in _append_csv_summary. - tests/test_pipeline.py — 6 new tests (4 Storage CRUD + 2 logger integration). - CHANGELOG.md — documented the new feature. - .gitignore — added experiment_logs/memorylens.db. Closes Neal006#26
- Bug 1: DELETE stale results before INSERT in save_run() to prevent duplicate rows on repeated run_id (fixes get_run() corruption) - Bug 2: Add empty-rows guard in _append_csv_summary() to prevent IndexError - Bug 3: Close Storage() connections in log_run() and list_runs() via try/finally to prevent file handle leaks on Windows - Issue 4: Document None-handling contract in get_run() docstring; filter None from compare_runs() recall arrays - Issue 5: Close Storage() in test cleanup to match the Storage unit tests Adds test_storage_save_run_idempotent to verify fix for Bug 1.
…, comment - Resolve merge conflict in CHANGELOG.md (rebase on upstream/main) - Remove dead sys.exit branch in migrate_legacy_logs.py (unreachable) - Move import warnings to top of evaluation/logger.py (consistent style) - Add clarifying comment in list_runs() about migration bypass
6fcc8db to
61f98d7
Compare
|
All 4 minor issues are now fixed and the rebase is complete:
All 30 tests still pass. |
Merged! Great work @Sugaria0427You addressed every issue from the review — This is a genuinely well-crafted contribution:
Really appreciated the care you put into this. Welcome to the contributor list! |
|
Thanks @Neal006 ! It was a pleasure working on this — the codebase is well-structured and your review feedback was very thorough. |
Summary
Replace flat JSON/CSV log storage with a queryable SQLite database for benchmark results, while maintaining full backward compatibility.
Related issue
Closes #26
Type of change
How was this tested?
Checklist