Skip to content

feat: v0.1.70 — AutoResearch optimization + eval infrastructure#19

Merged
kargarisaac merged 3 commits intomainfrom
feat/autoresearch-optimization
Mar 28, 2026
Merged

feat: v0.1.70 — AutoResearch optimization + eval infrastructure#19
kargarisaac merged 3 commits intomainfrom
feat/autoresearch-optimization

Conversation

@kargarisaac
Copy link
Copy Markdown
Contributor

Summary

  • +41% composite quality score via Layer 1 AutoResearch optimization (14 experiments, 7 kept)
  • 4 new eval runners (dedup, maintain, search, tool_selection) with LerimBench 7-dimension scoring
  • Dashboard transition: lerim dashboard shows "coming soon" message, web UI moves to lerim.dev
  • Stale Codex tool references cleaned up from ask prompt

Key changes

Quality (AutoResearch Layer 1)

  • dspy.Predictdspy.ChainOfThought for extraction (single biggest win)
  • Explicit dedup thresholds (0.7/0.4) in sync prompt + batch_dedup tool description
  • MemoryCandidate schema field descriptions improved (title/body format guidance)
  • Post-extraction body filter raised from 30 to 50 chars

Eval infrastructure

  • run_dedup.py, run_maintain.py, run_search.py, run_tool_selection.py
  • LerimBenchScore 7-dimension composite with configurable weights
  • Fuzzy title matching for dedup accuracy (_fuzzy_title_match)
  • Golden dataset support via --golden-dir flag
  • 59 scoring tests (all passing)

Cleanup

  • lerim dashboard → transition message (no more local dashboard)
  • Ask prompt: codex tool → memory_search + read_file
  • ResponsesProxy reference removed from internal docs

Test plan

  • 659 unit tests passing
  • Component-level eval: composite 0.608 → 0.855 (+41%)
  • E2E lifecycle eval: 0.845 → 0.909 (+7.6%)
  • lerim dashboard shows transition message

🤖 Generated with Claude Code

kargarisaac and others added 3 commits March 27, 2026 19:29
…ipts

- Removed the "explorer" section from the required configurations in `configure_dspy_from_eval` and related lifecycle functions.
- Introduced new evaluation scripts: `run_dedup.py`, `run_maintain.py`, `run_search.py`, and `run_tool_selection.py` for assessing deduplication accuracy, maintenance quality, search relevance, and tool selection accuracy against golden datasets.
- Enhanced the README documentation to include detailed descriptions of the new evaluation pipelines and their usage.
- Updated scoring functions in `scores.py` to support new evaluation metrics and composite scoring for the LerimBench.
- Added judge prompts for deduplication, maintenance, search, and tool selection evaluations to standardize quality assessments.
AutoResearch-style optimization loop (14 experiments, 7 kept):
- ChainOfThought for DSPy extraction (Predict → CoT, biggest single win)
- Explicit dedup thresholds (0.7/0.4) in sync prompt
- Behavioral guidance in batch_dedup_candidates tool description
- Improved MemoryCandidate schema field descriptions (title/body format)
- Title format guidance in extraction signature
- Raised post-extraction body filter from 30 to 50 chars

Eval infrastructure extensions:
- check_extraction_assertions() for deterministic golden-case scoring
- check_summarization_assertions() for summary validation
- _fuzzy_title_match() for dedup accuracy with Jaccard similarity
- LerimBenchScore 7-dimension composite scoring
- 59 scoring tests (all passing)

Composite: 0.608 → 0.855 (+41%)
Extraction: 0.693 → 0.877 (+27%)
Dedup: 0.278 → 0.722 (+160%)
Search: 0.905 (unchanged)
Maintain: 1.000 (unchanged)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dashboard: `lerim dashboard` shows transition message to lerim.dev
- Ask prompt: replace stale Codex tool refs with memory_search/read_file
- Clean up ResponsesProxy reference in src/lerim/README.md
- Update test assertion for ask prompt (codex → memory_search)
- Add v0.1.70 section to CHANGELOG.md
- Update README dashboard section
- Bump version 0.1.69 → 0.1.70

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kargarisaac kargarisaac merged commit 398d08e into main Mar 28, 2026
1 check passed
@kargarisaac kargarisaac deleted the feat/autoresearch-optimization branch March 28, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant