feat: autoresearch memory quality improvements#21
Merged
kargarisaac merged 12 commits intomainfrom Mar 31, 2026
Merged
Conversation
The lerim-cloud .pth file in the venv makes lerim-cloud's tests/ package shadow lerim-cli's tests/ directory. Adding __init__.py ensures Python resolves lerim-cli's tests first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added QUALITY BAR section to MemoryExtractSignature: atomic, actionable, context-independent, structured body, durable. Extraction improved +0.022 on 100 cases. Dedup -0.056 is within 3-case noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updated MemoryCandidate body field description: "lead with rule/fact, then WHY, then HOW TO APPLY". Aligned with Claude Code memory body structure. Extraction +0.007 (cumulative +0.029 from baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added a positive ✓ example demonstrating body structure: "WHY: mocked tests passed but prod migration failed. HOW TO APPLY: integration tests must hit real database." Reinforces exp021+exp022 quality criteria by demonstration. Extraction 0.845 (within noise of best 0.848). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… short sessions Updated the iter_sessions function to skip processing of sidechain transcripts and sessions with fewer than 6 conversation turns. This change prevents double-counting of content and ensures only meaningful interactions are considered. Adjusted unit tests to reflect the new minimum turn requirement for session filtering.
…eletion Enhanced the memory reset command help text to clarify that it now wipes cache data along with memory, workspace, and index data. Updated the reset_memory_root function to delete the cache directory and added a note about clearing the adapter cache for improved session management. This change ensures users are fully informed about the implications of the reset operation.
…fixes - memory_record: persist source_speaker and durability in frontmatter (data was silently lost) - memory_index: normalize find_similar output with fused_score, similarity, lexical_similarity - oai_tools: fix batch_dedup score bug (was returning 0 for everything), add write_memory source_speaker/durability/outcome params with validation - oai_sync: update dedup thresholds (0.7→0.75, 0.4→0.45), instruct agent to pass rich metadata - tests: update for new frontmatter keys and similarity fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ~130 lines of regex-based Jaccard word-matching with a DSPy module containing three optimizable signatures: - MemoryExtractSignature (existing, per-window extraction) - ConsolidateCandidatesSignature (LLM merges semantic duplicates across windows) - QualityGateSignature (LLM drops low-value candidates) Also fixes format detection to handle "type":"human" traces (was silently dropping all user messages, causing extraction to return 0 candidates). Every judgment call is now an LLM call that DSPy can optimize via autoresearch, replacing magic thresholds with model understanding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the sequential refine/fold pattern (73 chunks × 45s = hours) with: - Parallel map: extract lightweight facets per chunk (~80 words each) - Tree reduce: merge facets hierarchically when they exceed context budget - Single synthesis: produce final TraceSummaryCandidate from all facets Also adds transcript formatting before windowing (13MB raw → 1.1MB formatted), reducing 73 windows to 6 and total time from hours to ~33 seconds. Key signatures: ChunkFacetSignature (map), MergeFacetsSignature (reduce), SynthesizeSummarySignature (final). All DSPy-optimizable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add memory_actions to OperationResult and details_json so the activity feed can show per-session memory lists with titles, body, tags, confidence, source_speaker, and durability. Each memory action includes session_run_id for per-session grouping. The daemon reads frontmatter from written memory files to extract full metadata. Fixes the "0 memories" bug in the activity feed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ling - Updated MemoryExtractSignature to clarify extraction criteria, emphasizing the importance of actionable insights and structured body content. - Improved similarity handling in MemoryIndex and OAI tools by merging similarity signals and normalizing outputs for better candidate ranking. - Adjusted examples in the documentation to reflect new extraction rules and quality criteria.
… saturation - Add bug-report, directive/TODO, and generic-knowledge exclusion rules to MemoryExtractSignature - Add decision-vs-learning test and cap 0.9+ confidence to max 1 per session - Require HOW TO APPLY to describe a different action than title (no restating) - Lower dedup similarity thresholds (0.75→0.65 for no_op, 0.45→0.40 for update) - Add topic saturation rule: 2+ existing memories on same topic defaults to no_op - Tighten "update" classification to require at least one concrete absent fact Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Commits
refactor(extract): tighten quality gates, dedup thresholds, and topic saturationrefactor(extract): enhance MemoryExtractSignature and similarity handlingfeat(activity): ship memory actions with full metadata to cloudperf(summarize): replace sequential fold with parallel MapReduce treerefactor(extract): replace Jaccard dedup with 3-signature DSPy modulefix(memory): add similarity normalization, rich metadata, and schema fixesopt: add positive WHY+HOW TO APPLY example | extraction 0.845opt: body structure WHY + HOW TO APPLY in schemas.py | extraction 0.848opt: quality criteria in extraction signature | extraction 0.841Test plan
test_extract_pipeline_quality.py,test_oai_tools.py)🤖 Generated with Claude Code