fix(task): port guard fixes from #651 — schema_version major gate, judge empty-inputs, network-mode contradiction by Yiminnn · Pull Request #714 · benchflow-ai/benchflow

Yiminnn · 2026-06-12T23:58:07Z

Ports four validation guards from PR #651 (main) onto release/v0.6.0. A code audit of 0.6.0-rc.6 on 2026-06-12 verified all four guards are absent on this branch; each was a real hole, not a refactor artifact.

What each guard closes

1. `schema_version` major-version gate (`src/benchflow/task/config.py`)

v0.6 TaskConfig.schema_version accepted any string ("banana", "99.0") and carried it through silently. Ported validate_schema_version + _SUPPORTED_SCHEMA_MAJORS = frozenset({1}): non-numeric versions and majors outside {1} are now a hard ValueError. Minor versions stay permissive; the "1.3" default and existing "1.0"/"1.x" tasks are unaffected.

2. Agent-judge empty-inputs runtime backstop (`src/benchflow/task/verifier_core.py`)

_collect_agent_judge_inputs had no emptiness check, unlike its _collect_ors_episode_inputs sibling. This is the silent-zero-evidence judge hole: an agent-judge strategy whose inputs resolved empty would invoke the judge with no evidence at all and still produce a graded reward — a score backed by nothing, indistinguishable from a real one. Now raises AgentJudgeInputError ("agent-judge strategy ... must declare inputs") before any judge call.

3. `verifier_document` empty-list fix (`src/benchflow/task/verifier_document.py`)

The parse-time checks for agent-judge and ors-episode inputs used not isinstance(inputs, list) or not all(...) — but all([]) is True, so an empty list passed a check whose error message says "must be a non-empty list of strings". Added or not inputs to both branches, making the document-level validation match its own contract (and backstopping guard 2 at parse time).

4. Network-mode contradiction hard-error (`src/benchflow/task/config.py`)

SandboxConfig silently coerced an explicitly declared network_mode (e.g. allowlist + allowed_hosts) down to no-network whenever the deprecated allow_internet = false was also present — a silent, surprising network downgrade. Ported the model_fields_set-based reconciliation: an explicit contradiction is now a hard ValueError directing authors to drop the deprecated flag, while the legacy path (allow_internet = false with no explicit network_mode) still coerces to no-network as before. Reconciliation also re-runs _validate_network_policy_fields at the end so it can never emit a self-contradictory config.

Source

Reference implementation: PR #651 (src/benchflow/task/config.py, src/benchflow/task/verifier.py, src/benchflow/task/verifier_document.py), adapted to the v0.6 module layout (verifier_core behind the benchflow.task.verifier façade; AgentJudgeInputError from verifier_errors).

Tests

New tests/test_v06_guard_ports.py (15 tests) mirroring the Superseded by #760: task.md v0.6.2 cutover #651 tests (test_network_policy_reconcile.py, test_agent_judge_inputs.py): contradiction raises, legacy back-compat coercion preserved, post-reconcile invariant, schema major gate (reject 99.0/banana, accept 1.0/default), document-level empty-inputs rejection for both strategy types, and the async runtime backstop on _collect_agent_judge_inputs.
Directly-touched modules green: test_task_config.py, test_verifier_document.py, test_internet_policy.py, plus the verifier strategy/output/judge suites (126 passed).
Full unit suite (pytest tests/ --ignore=tests/integration): 3834 passed, 49 skipped, 0 failures — no existing fixture trips any guard, so no compatibility accommodations were needed.
ruff format / ruff check clean on all changed files.

🤖 Generated with Claude Code

…mpty-inputs, network contradiction

fix(task): port #651 guard fixes — schema_version major gate, judge e…

289258a

…mpty-inputs, network contradiction

This was referenced Jun 13, 2026

feat(tasks)!: convert SkillsBench to native task.md packages benchflow-ai/skillsbench#929

Merged

Superseded by #760: task.md v0.6.2 cutover #651

Closed

xdotli merged commit a8337c5 into release/v0.6.0 Jun 13, 2026
1 check passed

xdotli deleted the fix/v06-guard-ports branch June 13, 2026 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(task): port guard fixes from #651 — schema_version major gate, judge empty-inputs, network-mode contradiction#714

fix(task): port guard fixes from #651 — schema_version major gate, judge empty-inputs, network-mode contradiction#714
xdotli merged 1 commit into
release/v0.6.0from
fix/v06-guard-ports

Yiminnn commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yiminnn commented Jun 12, 2026

What each guard closes

1. schema_version major-version gate (src/benchflow/task/config.py)

2. Agent-judge empty-inputs runtime backstop (src/benchflow/task/verifier_core.py)

3. verifier_document empty-list fix (src/benchflow/task/verifier_document.py)

4. Network-mode contradiction hard-error (src/benchflow/task/config.py)

Source

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `schema_version` major-version gate (`src/benchflow/task/config.py`)

2. Agent-judge empty-inputs runtime backstop (`src/benchflow/task/verifier_core.py`)

3. `verifier_document` empty-list fix (`src/benchflow/task/verifier_document.py`)

4. Network-mode contradiction hard-error (`src/benchflow/task/config.py`)