feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5)#7
Merged
Merged
Conversation
Replace exact-subject + substring + longer-wins dedup with a single unified update path: retrieve top-k lexically similar long-term memories, then one LLM call decides ADD/UPDATE/DELETE/NOOP. Handles semantic duplicates, refinements, and contradictions, and includes timestamps so the model can prefer recent facts on conflict. Both extraction's LONG_TERM writes and consolidation's promotions now route through update_memory (single source of truth). Lexical retrieval is dependency-free Python token overlap (no FTS5/embeddings), portable across local and container SQLite builds. Subjects are normalised in code. Cooldown-skipped turns are buffered and replayed into the next extraction instead of being dropped.
Add test_memory_update_pipeline.py: ADD/UPDATE/DELETE/NOOP decisions, malformed/exception safe no-op, invalid-id rejection, subject normalisation, timestamp injection, and lexical retrieval ranking. Update extraction tests for the new write path and add cooldown turn-buffering tests. Document the unified write pipeline in pa.md.
…Tiers 2-4) Tier 2 — optional semantic retrieval. New core/embeddings.py wraps an OpenAI-compatible /embeddings endpoint; vectors are stored as packed float32 blobs and compared with brute-force cosine in Python (no native extension, identical local/container). When enabled, update_memory retrieves candidates by cosine (lexical fallback per row), and prompt injection becomes relevance-ranked (relevance + importance + recency) over only injection_top_k memories instead of dumping the recent N. The inbound message is threaded through prompt building as the query. Off by default — the pipeline still runs on Tier-1 lexical retrieval. Tier 3 — forgetting/importance/reinforcement. long_term gains importance, last_accessed, access_count, archived. Recalled memories are reinforced; re-mentions raise importance; consolidation archives cold low-importance idle memories via a soft-delete flag so long-term stops growing unbounded. Tier 4 — hygiene pass. Consolidation clusters near-duplicate long-term rows and resolves each cluster with one LLM call that merges duplicates and drops contradictions, keeping the most recent fact. New columns are added to existing DBs via in-place ALTER TABLE migration in _ensure_schema; no manual upgrade step. All knobs configurable under the memory config section.
… pipeline Add test_memory_tiers.py: vector helpers, embedding-backed retrieval and relevance-ranked injection with reinforcement, cold-memory archiving, importance reinforcement on update, hygiene clustering/merge (incl. malformed-plan safety), consolidation summary keys, and in-place migration of a legacy long_term table. Update the session-prompt test stub for the new query kwarg. Document Tiers 2-4 in pa.md and add the embedding/forgetting/hygiene knobs to config.yml.example.
…ocker Add a local embedding backend so semantic memory needs no API key and keeps data on the box. core/embeddings.py gains LocalEmbeddingClient (fastembed, ONNX/CPU, BAAI/bge-small-en-v1.5) that loads the model lazily in a worker thread, plus prefetch_local_model() and a 'python -m core.embeddings prefetch' CLI. Embeddings now default ON with provider=local. The Dockerfile prefetches the model into /app/models (outside the data volume, chowned to the runtime user) so the image is self-contained and works offline — verified: build-prefetch then HF_HUB_OFFLINE load + embed succeeds, related>unrelated cosine. _build_embedder branches local vs OpenAI-compatible API; remote keys still fall back to the matching agent provider key. fastembed added to deps/lock (onnxruntime 1.26 has cp314 wheels).
…lainers Add a Semantic memory (embeddings) card and a Memory lifecycle (forgetting & hygiene) card to the admin Memory tab: enable toggle, backend (local/openai/ google), model, top-k, importance/archive/hygiene knobs, plus collapsible 'how it works (the science)' sections. Wire status/prefetch/test endpoints (/memory/embedding/*) and a Download-model button; changes apply live to the running agent (embedder rebuilt lazily, lifecycle knobs refreshed in patch_config). Refactor the memory partial render into a shared helper so the post-delete refresh keeps the full config. Tests: local backend lazy-load + prefetch (fastembed stubbed), embedding status/prefetch/test endpoints, memory partial render, and a config_store roundtrip proving UI-saved memory.embedding.* keys rebuild EmbeddingConfig. Docs updated (pa.md). Full suite: 304 passed.
Replace the pure-Python cosine loop with numpy. unpack_vector returns an ndarray (np.frombuffer), and a new cosine_to_matrix() scores a query against all candidate vectors in one normalised matmul. get_relevant_long_term and _retrieve_similar_long_term now batch via _batch_relevance (rows whose stored vector matches the query dim use cosine; the rest fall back to lexical). Measured at 384-dim: per-message retrieval over 1k memories 65ms -> 2.4ms (~28x), 5k 330ms -> 17ms. Hygiene pairwise similarly faster. numpy is already present (fastembed/onnxruntime) and is now a declared dependency. Also fixes ndarray truthiness in _pair_similarity (is not None + shape check).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5 — full implementation across all four tiers.
Tier 1 — Unified ADD/UPDATE/DELETE/NOOP write pipeline
MemoryStore.update_memory()replaces the exact-subject + substring + "longer wins" dedup. For each candidate it retrieves the most similar existing long-term memories, then one LLM call decides ADD / UPDATE / DELETE / NOOP, withcreated_at/updated_atin the prompt so recent facts win on conflict. Both extraction's LONG_TERM writes and consolidation's promotions route through it (single source of truth). Cooldown-skipped turns are buffered + replayed; subjects normalised in code. Fixes #1 #2 #3 #4 #7 #8.Tier 2 — Embeddings (optional) + relevance-ranked injection
New
core/embeddings.pywraps an OpenAI-compatible/embeddingsendpoint; vectors stored as packed float32 blobs, compared with brute-force cosine in Python (no native extension — identical local/container, trivial at <1k rows). When enabled,update_memoryretrieves candidates by cosine (per-row lexical fallback), and prompt injection becomes relevance-ranked (relevance + importance + recency) over onlyinjection_top_kmemories instead of dumping the recent N — the inbound message is threaded through as the query. Off by default; the system runs on Tier-1 lexical retrieval with no new dependency. Fixes #5.Tier 3 — Forgetting / importance / reinforcement
long_termgainsimportance,last_accessed,access_count,archived. Recalled memories are reinforced; re-mentions raise importance; consolidation archives cold, low-importance, idle memories via a soft-delete flag so long-term stops growing unbounded. Fixes #9.Tier 4 — Long-term hygiene pass
Consolidation clusters near-duplicate long-term rows (embedding or lexical) and resolves each cluster with one LLM call — merging duplicates, dropping contradictions, keeping the most recent. Fixes #6.
Portability & migration
Pure stdlib +
aiosqlite+ standard SQL;openaiimported lazily only when embeddings are enabled. New columns are added to existingdata/memory.dbvia an in-placeALTER TABLEmigration in_ensure_schema— no manual upgrade. Verified identical on local env and the container SQLite build (no FTS5 / sqlite-vec / native extensions).Config
All knobs under the
memorysection (seeconfig.yml.example):embedding.{enabled,provider,model,injection_top_k,…},default_importance,archive_*,hygiene_*.Tests
test_memory_update_pipeline.py(Tier 1) andtest_memory_tiers.py(Tiers 2–4: vector helpers, embedding retrieval, relevance injection + reinforcement, archiving, hygiene merge incl. malformed-plan safety, legacy migration) plus updated extraction/consolidation tests. Full suite: 291 passed.Sequencing note: the issue suggested shipping tiers as separate PRs; this PR delivers all four together but keeps them cleanly separated by commit.