You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Overhaul the memory consolidation/dedup pipeline. The current write path uses exact-subject + substring matching and a "longer wins" update heuristic, which misses semantic duplicates, never resolves contradictions, and lets stale long-term facts accumulate. This proposes moving to the now-standard unified update pipeline (retrieve similar → LLM decides ADD/UPDATE/DELETE/NOOP), staged so Tier 1 ships with no new dependencies.
Current implementation
Two-tier SQLite memory (schema/memory.sql: long_term, short_term), two LLM passes:
Extraction — MemoryStore.extract_memories (core/memory.py:266), runs after each turn with a 120s cooldown; classifies facts into LONG_TERM (written immediately) or SHORT_TERM (TTL-tagged).
Consolidation — MemoryStore.consolidate_and_cleanup (core/memory.py:436), scheduled via core/scheduler.py:131; promotes surviving short-term → long-term and deletes expired rows.
Dedup/merge — _store_long_term (core/memory.py:355): SELECT ... WHERE subject = ?, then Python substring containment; UPDATE only when the new content is longer than the existing.
Injection — format_for_prompt (core/memory.py:240) dumps all long-term (top 50 by updated_at DESC) + active short-term into every system prompt.
Problems
#
Current behaviour
Problem
1
Dedup = exact subject + substring
Misses semantic duplicates ("Allergic to shellfish" vs "can't eat prawns"); subject-string variants ("matteo" / "the user") don't match
2
No contradiction handling
"uses a standing desk" then "switched back to sitting" → two conflicting long-term rows, both injected indefinitely
3
UPDATE only if new content is longer
Length ≠ correctness/recency; a corrected shorter fact never replaces a stale longer one
Two divergent prompts → inconsistent long-term quality (extraction skips the "strip temporal / compact aggressively" rules consolidation applies)
5
Injects all top-50 long-term every turn
Doesn't scale; with >50 rows, important older facts are silently truncated by updated_at DESC; wastes context + cache
6
Consolidation only promotes
Never merges existing long-term duplicates or prunes stale ones — long-term only grows
7
Hard 120s extraction cooldown
Two salient turns <120s apart → the second turn's facts are dropped entirely
8
Consolidation prompt has no timestamps
The model can't reason about staleness/recency when resolving conflicts
9
No importance / decay / reinforcement
Re-mentioned facts aren't strengthened; trivia and key facts are treated identically
Research — state of the art
Modern agent-memory systems (Mem0, A-MEM, Letta/MemGPT, Zep/Graphiti) converge on a single unified update pipeline: for each candidate fact, retrieve the top-k semantically similar existing memories (vector store), then a single LLM call decides ADD / UPDATE / DELETE / NOOP — explicitly handling refinement and contradiction. Conflict resolution classifies memory pairs as compatible / contradictory / subsumes / subsumed. Read-time retrieval is relevance-ranked (Generative Agents: recency + importance + relevance), not "inject everything". Forgetting is a first-class operation (importance/decay; biological forgetting à la FadeMem). Memory is treated as a managed lifecycle rather than an append-only log.
Tier 1 — Unified ADD/UPDATE/DELETE/NOOP update pipeline (no new dependencies)
Goal: fix the write-path correctness gaps (#1–#4, #8) without adding an embedding model.
Add long_term to FTS5 (or a content/subject LIKE/trigram pre-filter if FTS5 is undesirable) for cheap lexical candidate retrieval. FTS5 ships with SQLite — no new dependency.
New method MemoryStore.update_memory(llm, model, candidate):
Retrieve top-k existing long-term memories lexically similar to the candidate (subject + content), including id, content, category, created_at/updated_at.
One LLM call returns one operation: ADD (new fact) / UPDATE {id} (refine or correct) / DELETE {id} (contradicted, no replacement) / NOOP (duplicate or trivial).
Apply the operation to long_term.
Route both extraction's LONG_TERM writes and consolidation's promotions through update_memory, deleting the divergent substring logic in _store_long_term and the bespoke promotion loop in _run_consolidation_llm. Single source of truth for long-term writes → consistent compaction quality (fixes Tools tab (gh) + prompt caching + conversation compaction #4).
Add an embedding backend: local model (fastembed/MiniLM — consistent with the self-hosted, container-bundled approach) or a provider embeddings API. Make it configurable like the other background models.
Store vectors alongside long_term (sqlite-vec, or a blob column + brute-force cosine in Python — trivial at <1k rows).
Use embeddings for candidate retrieval in the Tier-1 pipeline (replaces/augments FTS5).
Periodic pass (in consolidate_and_cleanup) that clusters near-duplicate long-term rows, merges them, and drops contradictions keeping the most recent (fixes Re-vendor wacli to upstream openclaw/wacli v0.11.0 #6). Reuse the conflict classification from Tier 1.
Normalise subject (lowercase/canonicalise) in code, not only via prompt instruction.
Suggested sequencing
Tier 1 first (highest correctness-per-effort, no new deps), then Tier 2 once memory volume justifies semantic retrieval, then Tiers 3–4 as separate PRs.
Summary
Overhaul the memory consolidation/dedup pipeline. The current write path uses exact-
subject+ substring matching and a "longer wins" update heuristic, which misses semantic duplicates, never resolves contradictions, and lets stale long-term facts accumulate. This proposes moving to the now-standard unified update pipeline (retrieve similar → LLM decides ADD/UPDATE/DELETE/NOOP), staged so Tier 1 ships with no new dependencies.Current implementation
Two-tier SQLite memory (
schema/memory.sql:long_term,short_term), two LLM passes:MemoryStore.extract_memories(core/memory.py:266), runs after each turn with a 120s cooldown; classifies facts into LONG_TERM (written immediately) or SHORT_TERM (TTL-tagged).MemoryStore.consolidate_and_cleanup(core/memory.py:436), scheduled viacore/scheduler.py:131; promotes surviving short-term → long-term and deletes expired rows._store_long_term(core/memory.py:355):SELECT ... WHERE subject = ?, then Python substring containment;UPDATEonly when the new content is longer than the existing.format_for_prompt(core/memory.py:240) dumps all long-term (top 50 byupdated_at DESC) + active short-term into every system prompt.Problems
subject+ substringUPDATEonly if new content is longerupdated_at DESC; wastes context + cacheResearch — state of the art
Modern agent-memory systems (Mem0, A-MEM, Letta/MemGPT, Zep/Graphiti) converge on a single unified update pipeline: for each candidate fact, retrieve the top-k semantically similar existing memories (vector store), then a single LLM call decides ADD / UPDATE / DELETE / NOOP — explicitly handling refinement and contradiction. Conflict resolution classifies memory pairs as compatible / contradictory / subsumes / subsumed. Read-time retrieval is relevance-ranked (Generative Agents: recency + importance + relevance), not "inject everything". Forgetting is a first-class operation (importance/decay; biological forgetting à la FadeMem). Memory is treated as a managed lifecycle rather than an append-only log.
Key references:
Implementation plan
Tier 1 — Unified ADD/UPDATE/DELETE/NOOP update pipeline (no new dependencies)
Goal: fix the write-path correctness gaps (#1–#4, #8) without adding an embedding model.
long_termto FTS5 (or a content/subjectLIKE/trigram pre-filter if FTS5 is undesirable) for cheap lexical candidate retrieval. FTS5 ships with SQLite — no new dependency.MemoryStore.update_memory(llm, model, candidate):id,content,category,created_at/updated_at.ADD(new fact) /UPDATE {id}(refine or correct) /DELETE {id}(contradicted, no replacement) /NOOP(duplicate or trivial).long_term.update_memory, deleting the divergent substring logic in_store_long_termand the bespoke promotion loop in_run_consolidation_llm. Single source of truth for long-term writes → consistent compaction quality (fixes Tools tab (gh) + prompt caching + conversation compaction #4).created_at/updated_atin the decision prompt so the LLM can prefer recent facts on conflict (fixes Fix manage_jobs tool — write failures block all subsequent writes #8); drop thelen()-based update heuristic (fixes Re-vendor wacli to upstream openclaw/wacli v0.11.x (fixes auth loss, stale sync, lock contention) #3).Tier 2 — Embeddings (semantic similarity + relevance-ranked injection)
fastembed/MiniLM — consistent with the self-hosted, container-bundled approach) or a provider embeddings API. Make it configurable like the other background models.long_term(sqlite-vec, or a blob column + brute-force cosine in Python — trivial at <1k rows).format_for_prompt: inject the top-k long-term memories relevant to the current message instead of all 50 (fixes Improve memory consolidation: unified ADD/UPDATE/DELETE/NOOP pipeline + semantic dedup #5). Requires passing the inbound message into prompt building for the memory section.Tier 3 — Forgetting / importance / reinforcement
importance(LLM- or rule-assigned),last_accessed,access_counttolong_term.last_accessed/access_count); compute a Generative-Agents-style retrieval score = recency + importance + relevance.archivedflag rather than hard delete.Tier 4 — Long-term hygiene pass
consolidate_and_cleanup) that clusters near-duplicate long-term rows, merges them, and drops contradictions keeping the most recent (fixes Re-vendor wacli to upstream openclaw/wacli v0.11.0 #6). Reuse the conflict classification from Tier 1.Smaller fixes (fold into Tier 1)
subject(lowercase/canonicalise) in code, not only via prompt instruction.Suggested sequencing
Tier 1 first (highest correctness-per-effort, no new deps), then Tier 2 once memory volume justifies semantic retrieval, then Tiers 3–4 as separate PRs.