fix: hard guards against experiment fabrication cascade by r3n3x · Pull Request #263 · aiming-lab/AutoResearchClaw

r3n3x · 2026-05-18T08:53:14Z

Problem

The pipeline can proceed from Stage 12 (EXPERIMENT_RUN) through Stage 20 (QUALITY_GATE) and export a fabricated paper even when the experiment produced zero real metrics. This is the exact failure mode documented in #165.

In the reported case, Stage 12 completed in 3.46 seconds with no real data, but:

Stage 12 unconditionally returned StageStatus.DONE
QUALITY_GATE was in NONCRITICAL_STAGES (failures are warnings, not blocks)
The quality gate scored the fabricated paper based solely on LLM judgment without checking if real experiment data exists

Changes

1. Stage 12: Experiment output validation (`_execution.py`)

Added three hard guards that return StageStatus.FAILED instead of unconditional DONE:

Failed + zero metrics: If the experiment returned non-zero exit code and produced no real float metrics → FAILED
Stdout failure signals: If stdout contains "FAIL:", "NaN/divergence", or "Traceback" with zero metrics → FAILED
Suspiciously fast completion: If experiment "completed" in <30s with zero metrics (budget typically 7200s) → FAILED as misclassified crash

2. Stage 20: Quality gate enforcement (`_review_publish.py`)

Added a hard block when VerifiedRegistry has zero experiment values AND the experiment summary reports failure. Even if the LLM assigns a passing quality score, a paper with no grounded data must not be exported.

3. NONCRITICAL_STAGES correction (`stages.py`)

Removed QUALITY_GATE from NONCRITICAL_STAGES. A skipped quality gate allows fabricated results to be exported — it must be treated as a gate, not a suggestion.

4. Anomaly detection (integrated into Stage 12)

The existing P1 warning for <5s completion is preserved. The new <30s guard with zero metrics catches the broader class of misclassified crashes.

Testing

8 new tests in tests/test_fabrication_guards.py:

TestNoncriticalStages (2): Verify QUALITY_GATE is critical, KNOWLEDGE_ARCHIVE still noncritical
TestStage12HardGuards (4): Failed+zero metrics, suspicious speed, success with metrics, stdout failure signals
TestStage20HardGuard (1): Quality gate blocks zero verified values
TestStage12DurationAnomaly (1): Code structure verification

All 754 existing tests continue to pass (1 pre-existing async test failure unrelated to this PR).

Related Issues

Closes #165 (Cascading Pipeline Failure)
Related to #238 (Hallucinated references)

Breaking Changes

None for correct pipelines. Pipelines that previously "succeeded" with fabricated experiment data will now correctly fail at Stage 12 or Stage 20, which is the intended behavior.

Four targeted fixes to prevent the pipeline from proceeding when Stage 12 completes without real experiment data (issue aiming-lab#165): 1. Stage 12 validation (_execution.py): - Return FAILED when experiment has zero real metrics and status=failed - Return FAILED on stdout failure signals with zero metrics - Return FAILED on suspiciously fast completion (<30s) with zero metrics - These replace the unconditional DONE return that let fabricated papers through the quality gate 2. Stage 20 quality gate enforcement (_review_publish.py): - Block pipeline when VerifiedRegistry has zero values AND experiment failed, regardless of LLM-assigned quality score - Prevents the exact failure mode: 3.46s experiment -> fabricated paper -> quality gate passes 3. NONCRITICAL_STAGES correction (stages.py): - Remove QUALITY_GATE from noncritical stages - A skipped quality gate allows fabricated results to export; it must be treated as a gate, not a suggestion 4. Anomaly detection (integrated into Stage 12 guard): - Experiments completing in <30s with zero metrics are classified as misclassified crashes rather than successful runs Tests: 8 new tests covering all guard paths.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: hard guards against experiment fabrication cascade#263

fix: hard guards against experiment fabrication cascade#263
r3n3x wants to merge 1 commit into
aiming-lab:mainfrom
r3n3x:fix/hard-guards-against-fabrication

r3n3x commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

r3n3x commented May 18, 2026

Problem

Changes

1. Stage 12: Experiment output validation (_execution.py)

2. Stage 20: Quality gate enforcement (_review_publish.py)

3. NONCRITICAL_STAGES correction (stages.py)

4. Anomaly detection (integrated into Stage 12)

Testing

Related Issues

Breaking Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Stage 12: Experiment output validation (`_execution.py`)

2. Stage 20: Quality gate enforcement (`_review_publish.py`)

3. NONCRITICAL_STAGES correction (`stages.py`)