Skip to content

test(cli): fix CI-fragile help-parse blocking the 0.6.0 release gate#750

Merged
xdotli merged 1 commit into
release/v0.6.0from
fix/ci-help-parse-ansi
Jun 13, 2026
Merged

test(cli): fix CI-fragile help-parse blocking the 0.6.0 release gate#750
xdotli merged 1 commit into
release/v0.6.0from
fix/ci-help-parse-ansi

Conversation

@xdotli

@xdotli xdotli commented Jun 13, 2026

Copy link
Copy Markdown
Member

What

Fixes the sole failure on #665's test gate (1 failed / 4112 passed) that's blocking the 0.6.0 release→main merge.

test_environment_group_is_hidden_but_still_resolves (in test_cli_hub_env.py) matched -anchored command rows in bench --help raw output without stripping ANSI. On CI, Rich emits color codes (FORCE_COLOR), so the ^\s*│ anchor matched nothing → rows == set()assert 'sandbox' in set() failed. It passed locally only because there were no ANSI codes.

Fix

Assert against the Click command registry (typer.main.get_command(app) + .hidden) instead of a regex over rendered help — authoritative and immune to ANSI/locale rendering differences. It still checks the exact contract: sandbox visible, environment hidden-but-registered, and bench environment create resolves.

Proof

  • Fixed test passes under FORCE_COLOR=1 (reproduces CI's rendering); the old regex-on-raw-output yields EMPTY SET under FORCE_COLOR=1 (exactly the CI failure).
  • The two other help-row parsers (test_cli_docs_drift, test_cli_adopt_aliases) already strip ANSI, so they were unaffected — this was the only fragile one.

Once merged, #665's gate re-runs and should pass.


Note

Low Risk
Test-only change with no production CLI behavior; aligns with an existing pattern in test_cli_docs_drift.py.

Overview
Fixes a CI-only failure in test_environment_group_is_hidden_but_still_resolves that was blocking the release gate.

The test used a -anchored regex on raw bench --help text to decide which top-level commands are visible. On CI, Rich/ANSI color codes break that match, so the parsed command set was empty and sandbox assertions failed despite correct CLI behavior.

The change inspects the Typer/Click command tree via typer.main.get_command(app) and each subcommand’s hidden flag instead of parsing rendered help. The same contract is checked: sandbox is visible, environment is hidden but still registered, and bench environment create still resolves.

Reviewed by Cursor Bugbot for commit c81f926. Bugbot is set up for automated code reviews on this repo. Configure here.

`test_environment_group_is_hidden_but_still_resolves` matched `│`-anchored rows
in `bench --help` raw output WITHOUT stripping ANSI. On CI, Rich emits color
codes (FORCE_COLOR), so the `^\s*│` anchor matched nothing → `rows == set()` →
`assert 'sandbox' in set()` failed. It passed locally only because there were no
ANSI codes. This was the sole failure on #665's `test` gate (1 failed / 4112
passed), blocking the 0.6.0 release.

Fix: assert against the Click command registry (`typer.main.get_command(app)` +
`.hidden`) instead of a regex over rendered help — authoritative and immune to
ANSI/locale rendering differences. Verified the fix passes under FORCE_COLOR=1
(which reproduces the CI failure with the old code).

The two other help-row parsers (test_cli_docs_drift, test_cli_adopt_aliases)
already strip ANSI, so they were unaffected.
@xdotli xdotli merged commit 306cf29 into release/v0.6.0 Jun 13, 2026
1 of 2 checks passed
@xdotli xdotli deleted the fix/ci-help-parse-ansi branch June 13, 2026 23:20
@cursor

cursor Bot commented Jun 13, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_0303b32d-86f3-47c2-b4b8-88580716aafc)

bingran-you pushed a commit to bingran-you/benchflow that referenced this pull request Jun 14, 2026
benchflow-ai#750's test rewrite passed `ruff check` (lint) but I didn't run
`ruff format --check` — the CI `test` job runs format-check as a gating step and
failed on `Would reformat: tests/test_cli_hub_env.py`. This applies ruff format
(collapses the set comprehension to ruff's canonical form). No logic change.

Verified: ruff format --check + ruff check + ty + the test file all pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant