Skip to content

Commit 41bf6e3

Browse files
kargarisaacclaude
andauthored
feat: autoresearch memory quality improvements (#21)
* chore: add tests/__init__.py for lerim-cloud .pth compatibility The lerim-cloud .pth file in the venv makes lerim-cloud's tests/ package shadow lerim-cli's tests/ directory. Adding __init__.py ensures Python resolves lerim-cli's tests first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * opt: quality criteria in extraction signature | extraction 0.841 Added QUALITY BAR section to MemoryExtractSignature: atomic, actionable, context-independent, structured body, durable. Extraction improved +0.022 on 100 cases. Dedup -0.056 is within 3-case noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * opt: body structure WHY + HOW TO APPLY in schemas.py | extraction 0.848 Updated MemoryCandidate body field description: "lead with rule/fact, then WHY, then HOW TO APPLY". Aligned with Claude Code memory body structure. Extraction +0.007 (cumulative +0.029 from baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * opt: add positive WHY+HOW TO APPLY example | extraction 0.845 Added a positive ✓ example demonstrating body structure: "WHY: mocked tests passed but prod migration failed. HOW TO APPLY: integration tests must hit real database." Reinforces exp021+exp022 quality criteria by demonstration. Extraction 0.845 (within noise of best 0.848). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix (adapter): Enhance iter_sessions to skip subagent transcripts and short sessions Updated the iter_sessions function to skip processing of sidechain transcripts and sessions with fewer than 6 conversation turns. This change prevents double-counting of content and ensures only meaningful interactions are considered. Adjusted unit tests to reflect the new minimum turn requirement for session filtering. * refactor(cli, memory): Update memory reset command to include cache deletion Enhanced the memory reset command help text to clarify that it now wipes cache data along with memory, workspace, and index data. Updated the reset_memory_root function to delete the cache directory and added a note about clearing the adapter cache for improved session management. This change ensures users are fully informed about the implications of the reset operation. * fix(memory): add similarity normalization, rich metadata, and schema fixes - memory_record: persist source_speaker and durability in frontmatter (data was silently lost) - memory_index: normalize find_similar output with fused_score, similarity, lexical_similarity - oai_tools: fix batch_dedup score bug (was returning 0 for everything), add write_memory source_speaker/durability/outcome params with validation - oai_sync: update dedup thresholds (0.7→0.75, 0.4→0.45), instruct agent to pass rich metadata - tests: update for new frontmatter keys and similarity fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(extract): replace Jaccard dedup with 3-signature DSPy module Replace ~130 lines of regex-based Jaccard word-matching with a DSPy module containing three optimizable signatures: - MemoryExtractSignature (existing, per-window extraction) - ConsolidateCandidatesSignature (LLM merges semantic duplicates across windows) - QualityGateSignature (LLM drops low-value candidates) Also fixes format detection to handle "type":"human" traces (was silently dropping all user messages, causing extraction to return 0 candidates). Every judgment call is now an LLM call that DSPy can optimize via autoresearch, replacing magic thresholds with model understanding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf(summarize): replace sequential fold with parallel MapReduce tree Replace the sequential refine/fold pattern (73 chunks × 45s = hours) with: - Parallel map: extract lightweight facets per chunk (~80 words each) - Tree reduce: merge facets hierarchically when they exceed context budget - Single synthesis: produce final TraceSummaryCandidate from all facets Also adds transcript formatting before windowing (13MB raw → 1.1MB formatted), reducing 73 windows to 6 and total time from hours to ~33 seconds. Key signatures: ChunkFacetSignature (map), MergeFacetsSignature (reduce), SynthesizeSummarySignature (final). All DSPy-optimizable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(activity): ship memory actions with full metadata to cloud Add memory_actions to OperationResult and details_json so the activity feed can show per-session memory lists with titles, body, tags, confidence, source_speaker, and durability. Each memory action includes session_run_id for per-session grouping. The daemon reads frontmatter from written memory files to extract full metadata. Fixes the "0 memories" bug in the activity feed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(extract): enhance MemoryExtractSignature and similarity handling - Updated MemoryExtractSignature to clarify extraction criteria, emphasizing the importance of actionable insights and structured body content. - Improved similarity handling in MemoryIndex and OAI tools by merging similarity signals and normalizing outputs for better candidate ranking. - Adjusted examples in the documentation to reflect new extraction rules and quality criteria. * refactor(extract): tighten quality gates, dedup thresholds, and topic saturation - Add bug-report, directive/TODO, and generic-knowledge exclusion rules to MemoryExtractSignature - Add decision-vs-learning test and cap 0.9+ confidence to max 1 per session - Require HOW TO APPLY to describe a different action than title (no restating) - Lower dedup similarity thresholds (0.75→0.65 for no_op, 0.45→0.40 for update) - Add topic saturation rule: 2+ existing memories on same topic defaults to no_op - Tighten "update" classification to require at least one concrete absent fact Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent dfd3106 commit 41bf6e3

19 files changed

Lines changed: 973 additions & 142 deletions

src/lerim/adapters/claude.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,21 @@ def iter_sessions(
234234
entries = load_jsonl_dict_lines(path)
235235
if not entries:
236236
continue
237+
238+
# Skip subagent/sidechain transcripts — their content flows back to
239+
# the parent session via tool results, so extracting from both would
240+
# double-count. Also skip tiny sessions (< 6 conversation turns) which
241+
# are typically eval judge calls or trivial interactions.
242+
is_sidechain = any(e.get("isSidechain") for e in entries[:5])
243+
if is_sidechain:
244+
continue
245+
conv_turns = sum(
246+
1 for e in entries
247+
if e.get("type") in ("user", "assistant")
248+
)
249+
if conv_turns < 6:
250+
continue
251+
237252
started_at: datetime | None = None
238253
repo_name: str | None = None
239254
cwd: str | None = None

src/lerim/app/cli.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1445,20 +1445,20 @@ def build_parser() -> argparse.ArgumentParser:
14451445
memory_reset = memory_sub.add_parser(
14461446
"reset",
14471447
formatter_class=_F,
1448-
help="DESTRUCTIVE: wipe memory, workspace, and index data",
1448+
help="DESTRUCTIVE: wipe memory, workspace, cache, and index data",
14491449
description=(
1450-
"Irreversibly delete memory/, workspace/, and index/ under the selected\n"
1451-
"scope, then recreate canonical empty folders.\n\n"
1450+
"Irreversibly delete memory/, workspace/, index/, and cache/ under the\n"
1451+
"selected scope, then recreate canonical empty folders.\n\n"
14521452
"Scopes:\n"
14531453
" project -- reset <repo>/.lerim/ only\n"
1454-
" global -- reset ~/.lerim/ only (includes sessions DB)\n"
1454+
" global -- reset ~/.lerim/ only (includes sessions DB + cache)\n"
14551455
" both -- reset both project and global roots (default)\n\n"
1456-
"The sessions DB lives in global index/, so --scope project alone\n"
1457-
"does NOT reset the session queue. Use 'global' or 'both' for a full wipe.\n\n"
1456+
"The sessions DB lives in global index/, and compacted session traces\n"
1457+
"live in global cache/. Use 'global' or 'both' for a full wipe.\n\n"
14581458
"Examples:\n"
14591459
" lerim memory reset --yes # wipe everything\n"
14601460
" lerim memory reset --scope project --yes # project data only\n"
1461-
" lerim memory reset --yes && lerim sync --max-sessions 5 # fresh start"
1461+
" lerim memory reset --yes && lerim up --build # fresh start"
14621462
),
14631463
)
14641464
memory_reset.add_argument(

src/lerim/app/daemon.py

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -340,44 +340,75 @@ def _process_one_job(job: dict[str, Any]) -> dict[str, Any]:
340340
)
341341
return {"status": "failed"}
342342
counts = result.get("counts") or {}
343+
# Extract memory actions with full metadata for activity feed
344+
memory_actions: list[dict] = []
345+
for path in result.get("written_memory_paths") or []:
346+
p = Path(path)
347+
fname = p.stem if path else ""
348+
title = fname[9:].replace("-", " ") if len(fname) > 9 and fname[8] == "-" else fname.replace("-", " ")
349+
primitive = "decision" if "/decisions/" in str(path) else "learning"
350+
ma: dict = {
351+
"action": "add",
352+
"title": title,
353+
"primitive": primitive,
354+
"session_run_id": rid,
355+
}
356+
# Read frontmatter for full metadata
357+
if p.exists():
358+
try:
359+
import frontmatter
360+
post = frontmatter.load(str(p))
361+
ma["title"] = post.metadata.get("title", title)
362+
ma["body"] = post.content.strip()
363+
ma["confidence"] = float(post.metadata.get("confidence", 0))
364+
ma["tags"] = post.metadata.get("tags", [])
365+
ma["source_speaker"] = post.metadata.get("source_speaker", "")
366+
ma["durability"] = post.metadata.get("durability", "")
367+
ma["kind"] = post.metadata.get("kind", "")
368+
except Exception:
369+
pass
370+
memory_actions.append(ma)
343371
complete_session_job(rid)
344372
return {
345373
"status": "extracted",
346374
"learnings_new": int(counts.get("add") or 0),
347375
"learnings_updated": int(counts.get("update") or 0),
376+
"memory_actions": memory_actions,
348377
"cost_usd": float(result.get("cost_usd") or 0),
349378
}
350379

351380

352381
def _process_claimed_jobs(
353382
claimed: list[dict[str, Any]],
354-
) -> tuple[int, int, int, int, int, float]:
383+
) -> tuple[int, int, int, int, int, list[dict[str, str]], float]:
355384
"""Process claimed jobs sequentially in chronological order.
356385
357386
Jobs are already sorted oldest-first by ``claim_session_jobs``.
358387
Sequential processing ensures that later sessions can correctly
359388
update or supersede memories created by earlier ones.
360389
361-
Returns (extracted, failed, skipped, new, updated, cost_usd).
390+
Returns (extracted, failed, skipped, new, updated, memory_actions, cost_usd).
362391
"""
363392
extracted = 0
364393
failed = 0
365394
skipped = 0
366395
learnings_new = 0
367396
learnings_updated = 0
397+
memory_actions: list[dict[str, str]] = []
368398
cost_usd = 0.0
369399
for job in claimed:
370400
result = _process_one_job(job)
371401
if result["status"] == "extracted":
372402
extracted += 1
373403
learnings_new += result.get("learnings_new", 0)
374404
learnings_updated += result.get("learnings_updated", 0)
405+
memory_actions.extend(result.get("memory_actions", []))
375406
cost_usd += result.get("cost_usd", 0.0)
376407
elif result["status"] == "failed":
377408
failed += 1
378409
elif result["status"] == "skipped":
379410
skipped += 1
380-
return extracted, failed, skipped, learnings_new, learnings_updated, cost_usd
411+
return extracted, failed, skipped, learnings_new, learnings_updated, memory_actions, cost_usd
381412

382413

383414
def run_sync_once(
@@ -495,6 +526,7 @@ def run_sync_once(
495526
failed = 0
496527
learnings_new = 0
497528
learnings_updated = 0
529+
all_memory_actions: list[dict[str, str]] = []
498530
cost_usd = 0.0
499531
projects: set[str] = set()
500532
claim_limit = max(max_sessions, 1)
@@ -526,13 +558,15 @@ def run_sync_once(
526558
batch_skipped,
527559
batch_new,
528560
batch_updated,
561+
batch_actions,
529562
batch_cost,
530563
) = _process_claimed_jobs(claimed)
531564
extracted += batch_extracted
532565
failed += batch_failed
533566
skipped += batch_skipped
534567
learnings_new += batch_new
535568
learnings_updated += batch_updated
569+
all_memory_actions.extend(batch_actions)
536570
cost_usd += batch_cost
537571
total_processed += len(claimed)
538572

@@ -568,6 +602,7 @@ def run_sync_once(
568602
failed_sessions=failed,
569603
learnings_new=learnings_new,
570604
learnings_updated=learnings_updated,
605+
memory_actions=all_memory_actions,
571606
run_ids=target_run_ids,
572607
window_start=window_start.isoformat() if window_start else None,
573608
window_end=window_end.isoformat() if window_end else None,

src/lerim/app/operation_result.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ class OperationResult:
2626
failed_sessions: int = 0
2727
learnings_new: int = 0
2828
learnings_updated: int = 0
29+
memory_actions: list[dict[str, str]] = field(default_factory=list)
2930
run_ids: list[str] = field(default_factory=list)
3031
window_start: str | None = None
3132
window_end: str | None = None

0 commit comments

Comments
 (0)