Description
Run the same benchmark twice — once with [memory] enabled=true and once with [memory] enabled=false — and produce a delta comparison showing Zeph's memory value.
Part of epic #2827. See spec: .local/specs/zeph-bench/spec.md FR-006, US-004.
Scope
--baseline flag on zeph bench run
- Runs the full scenario set twice: first pass with memory enabled, second pass with memory disabled (config override)
- Writes
baseline/memory-on/ and baseline/memory-off/ result directories
- Top-level
summary.md includes a delta table: per-scenario delta score, aggregate delta, interpretation note
BaselineComparison struct serialized to top-level comparison.json
Acceptance Criteria
Description
Run the same benchmark twice — once with
[memory] enabled=trueand once with[memory] enabled=false— and produce a delta comparison showing Zeph's memory value.Part of epic #2827. See spec:
.local/specs/zeph-bench/spec.mdFR-006, US-004.Scope
--baselineflag onzeph bench runbaseline/memory-on/andbaseline/memory-off/result directoriessummary.mdincludes a delta table: per-scenario delta score, aggregate delta, interpretation noteBaselineComparisonstruct serialized to top-levelcomparison.jsonAcceptance Criteria
summary.md