Skip to content

feat(bench): implement baseline comparison (--baseline flag) #2834

@bug-ops

Description

@bug-ops

Description

Run the same benchmark twice — once with [memory] enabled=true and once with [memory] enabled=false — and produce a delta comparison showing Zeph's memory value.

Part of epic #2827. See spec: .local/specs/zeph-bench/spec.md FR-006, US-004.

Scope

  • --baseline flag on zeph bench run
  • Runs the full scenario set twice: first pass with memory enabled, second pass with memory disabled (config override)
  • Writes baseline/memory-on/ and baseline/memory-off/ result directories
  • Top-level summary.md includes a delta table: per-scenario delta score, aggregate delta, interpretation note
  • BaselineComparison struct serialized to top-level comparison.json

Acceptance Criteria

  • Both memory-on and memory-off result files written to correct subdirectories
  • Delta table present in top-level summary.md
  • Aggregate delta = mean(memory-on scores) - mean(memory-off scores)
  • Each pass uses the same isolation reset between scenarios

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityenhancementNew feature or requestmemoryzeph-memory crate (SQLite)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions