Skip to content

chore(nlu): track benchmark_solver PNGs and CSV for review#9

Open
arushi-jain-27 wants to merge 5 commits into
pr1/nlu-interfacing-minimal-mazegenfrom
pr/benchmark-solver-artifacts
Open

chore(nlu): track benchmark_solver PNGs and CSV for review#9
arushi-jain-27 wants to merge 5 commits into
pr1/nlu-interfacing-minimal-mazegenfrom
pr/benchmark-solver-artifacts

Conversation

@arushi-jain-27
Copy link
Copy Markdown
Collaborator

@helenlu66 @pranavguru: This branch has validated mazes using my BFS solver.
It looks like 212/214 mazes pass validation. The 2 mazes that are failing are the corridor mazes in M6 where the walls are actually blocking the path to the goal.

@arushi-jain-27 arushi-jain-27 force-pushed the pr/benchmark-solver-artifacts branch from 69b652b to 9ef7aa3 Compare May 5, 2026 05:44
arushi-jain-27 and others added 2 commits May 6, 2026 04:47
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@arushi-jain-27 arushi-jain-27 force-pushed the pr/benchmark-solver-artifacts branch from b7fb2fb to 337e043 Compare May 6, 2026 04:47
Drop benchmark_mazes_metadata.csv from version control; smoke still writes it
locally. Ignore *.csv under benchmark_solver/.

Co-authored-by: Cursor <cursoragent@cursor.com>
@arushi-jain-27 arushi-jain-27 force-pushed the pr/benchmark-solver-artifacts branch from 337e043 to 30fc5b7 Compare May 6, 2026 04:50
arushi-jain-27 and others added 2 commits May 6, 2026 04:51
Version benchmark_mazes_metadata.csv under smoke_tests/results/benchmark_solver/;
stop ignoring *.csv there so the whole benchmark_solver output set is committed.

Co-authored-by: Cursor <cursoragent@cursor.com>
…derer

Re-run smoke_benchmark_mazes.py on pr1-aligned code so committed PNGs and CSV
match local renders after rendering/coordinate fixes (be38a58 and stack).

Co-authored-by: Cursor <cursoragent@cursor.com>
@seanrivera seanrivera assigned seanrivera and unassigned seanrivera May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants