Skip to content

fix(task): port guard fixes from #651 — schema_version major gate, judge empty-inputs, network-mode contradiction#714

Merged
xdotli merged 1 commit into
release/v0.6.0from
fix/v06-guard-ports
Jun 13, 2026
Merged

fix(task): port guard fixes from #651 — schema_version major gate, judge empty-inputs, network-mode contradiction#714
xdotli merged 1 commit into
release/v0.6.0from
fix/v06-guard-ports

Conversation

@Yiminnn

@Yiminnn Yiminnn commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Ports four validation guards from PR #651 (main) onto release/v0.6.0. A code audit of 0.6.0-rc.6 on 2026-06-12 verified all four guards are absent on this branch; each was a real hole, not a refactor artifact.

What each guard closes

1. schema_version major-version gate (src/benchflow/task/config.py)

v0.6 TaskConfig.schema_version accepted any string ("banana", "99.0") and carried it through silently. Ported validate_schema_version + _SUPPORTED_SCHEMA_MAJORS = frozenset({1}): non-numeric versions and majors outside {1} are now a hard ValueError. Minor versions stay permissive; the "1.3" default and existing "1.0"/"1.x" tasks are unaffected.

2. Agent-judge empty-inputs runtime backstop (src/benchflow/task/verifier_core.py)

_collect_agent_judge_inputs had no emptiness check, unlike its _collect_ors_episode_inputs sibling. This is the silent-zero-evidence judge hole: an agent-judge strategy whose inputs resolved empty would invoke the judge with no evidence at all and still produce a graded reward — a score backed by nothing, indistinguishable from a real one. Now raises AgentJudgeInputError ("agent-judge strategy ... must declare inputs") before any judge call.

3. verifier_document empty-list fix (src/benchflow/task/verifier_document.py)

The parse-time checks for agent-judge and ors-episode inputs used not isinstance(inputs, list) or not all(...) — but all([]) is True, so an empty list passed a check whose error message says "must be a non-empty list of strings". Added or not inputs to both branches, making the document-level validation match its own contract (and backstopping guard 2 at parse time).

4. Network-mode contradiction hard-error (src/benchflow/task/config.py)

SandboxConfig silently coerced an explicitly declared network_mode (e.g. allowlist + allowed_hosts) down to no-network whenever the deprecated allow_internet = false was also present — a silent, surprising network downgrade. Ported the model_fields_set-based reconciliation: an explicit contradiction is now a hard ValueError directing authors to drop the deprecated flag, while the legacy path (allow_internet = false with no explicit network_mode) still coerces to no-network as before. Reconciliation also re-runs _validate_network_policy_fields at the end so it can never emit a self-contradictory config.

Source

Reference implementation: PR #651 (src/benchflow/task/config.py, src/benchflow/task/verifier.py, src/benchflow/task/verifier_document.py), adapted to the v0.6 module layout (verifier_core behind the benchflow.task.verifier façade; AgentJudgeInputError from verifier_errors).

Tests

  • New tests/test_v06_guard_ports.py (15 tests) mirroring the Superseded by #760: task.md v0.6.2 cutover #651 tests (test_network_policy_reconcile.py, test_agent_judge_inputs.py): contradiction raises, legacy back-compat coercion preserved, post-reconcile invariant, schema major gate (reject 99.0/banana, accept 1.0/default), document-level empty-inputs rejection for both strategy types, and the async runtime backstop on _collect_agent_judge_inputs.
  • Directly-touched modules green: test_task_config.py, test_verifier_document.py, test_internet_policy.py, plus the verifier strategy/output/judge suites (126 passed).
  • Full unit suite (pytest tests/ --ignore=tests/integration): 3834 passed, 49 skipped, 0 failures — no existing fixture trips any guard, so no compatibility accommodations were needed.
  • ruff format / ruff check clean on all changed files.

🤖 Generated with Claude Code

@xdotli xdotli merged commit a8337c5 into release/v0.6.0 Jun 13, 2026
1 check passed
@xdotli xdotli deleted the fix/v06-guard-ports branch June 13, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants