fix(cli): 0.6.1 dogfood papercuts batch 2 (eval view, hub footer, check hint)#756
Conversation
…ck hint) Three contained dogfood papercuts: - `bench eval view <job-dir>` rendered a blank "No trajectory files found" when given a job directory (the natural value from `eval create`'s "Artifacts:" line) because render_rollout never looked below the top level. It now indexes the rollout subdirectories with a "drill into one" pointer. - `bench hub env list` now prints a footer (`Showing N of M…`) with how to refine (--search/--owner/--limit/--json). A 20-row page of a 1270-environment catalog previously read as "the provider only has 20". - `bench tasks check --level publication-grade` verifier-missing errors now carry a remediation hint (author verifier/verifier.md; note `tasks migrate` does not generate it), so the migrate→publication-grade path is no longer a dead-end. Regression test for the eval-view job-dir index. Full suite 4070 passed. Deferred (needs a verifier-model design decision, not a quick patch): the scaffold no-op verifier — verifier.md wires the `./test.sh` script strategy while a separate test_outputs.py is orphaned; connecting them means either running pytest in the task's own sandbox (needs pytest in its Dockerfile) or switching verifier.md to a pytest strategy.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6295c8ef64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| f"View one with <code>bench eval view {html.escape(rollout_dir.name)}/" | ||
| f"<rollout></code>:</p><ul>{items}</ul>" |
There was a problem hiding this comment.
Preserve the parent job path in the drill-in hint
When the user passes the job directory printed by eval create (for example jobs/<job>), this hint only interpolates rollout_dir.name, so it suggests bench eval view <job>/<rollout> instead of bench eval view jobs/<job>/<rollout>. Running the suggested command from the same CWD as the original run still points at a nonexistent path, so the new index leads users to a broken next step; construct the hint from the original rollout_dir path rather than just its basename.
Useful? React with 👍 / 👎.
Bugbot couldn't run - usage limit reachedBugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit. A user or team admin can review and increase usage limits in the Cursor dashboard. (requestId: serverGenReqId_e0fc55a7-d812-4fab-80f6-2c616e2316dd) |
Second batch of 0.6.0 dogfood fast-follow papercuts (targets
main).eval view <job-dir>no longer shows a blank "No trajectory files found" —render_rolloutnow indexes the job dir's rollout subdirectories so you know to drill in. (Regression test added.)hub env listprints aShowing N of M…footer with how to refine (--search/--owner/--limit/--json) — a 20-row page of a 1270-env catalog previously read as "the provider only has 20".tasks check --level publication-gradeverifier-missing errors now include a remediation hint (authorverifier/verifier.md;tasks migratedoesn't generate it) instead of a dead-end.Full suite 4070 passed; ruff/format/ty clean.
Deferred — the scaffold no-op verifier (needs your design call)
The highest-value papercut isn't a quick patch:
verifier.mdwirescommand: ./test.sh(script strategy),test.shwrites0.0, and a separatetest_outputs.pyis orphaned. Connecting them is a verifier-model decision:test.shrunspytest test_outputs.py— requires pytest in the task's own Dockerfile (a real constraint on every scaffolded task).verifier.mdto a pytest strategy ontest_outputs.py— benchflow's harness runs pytest, no task-Dockerfile dependency (cleaner, if a pytest strategy exists).test_outputs.py— author editstest.shonly (smallest change, removes the pytest option).Which model do you want? I'll implement it once you pick.
Note
Low Risk
User-facing CLI messaging and HTML viewer behavior only; no changes to eval execution, auth, or data handling.
Overview
Three small CLI/UX fixes from 0.6.0 dogfood follow-ups.
bench eval viewon a job directory (typical path fromeval createartifacts) no longer shows an empty “No trajectory files found” page.render_rolloutscans child dirs forturn*.txtor ACP trajectory files and returns HTML that lists rollout names and how to openbench eval view <job>/<rollout>.bench hub env listadds a dim footer: Showing N (optionally of total when the API returns it) plus pointers to--search,--owner,--limit, and--json, so a short page is not mistaken for the full catalog.bench tasks check --level publication-gradeexpands missingverifier/andverifier/verifier.mderrors with remediation text (author per docs;bench tasks migratedoes not create the verifier package). CHANGELOG documents these; a regression test covers job-dir indexing.Reviewed by Cursor Bugbot for commit 6295c8e. Bugbot is set up for automated code reviews on this repo. Configure here.