feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5) by mattmezza · Pull Request #7 · mattmezza/mpa

mattmezza · 2026-06-07T20:17:01Z

Closes #5 — full implementation across all four tiers.

Tier 1 — Unified ADD/UPDATE/DELETE/NOOP write pipeline

MemoryStore.update_memory() replaces the exact-subject + substring + "longer wins" dedup. For each candidate it retrieves the most similar existing long-term memories, then one LLM call decides ADD / UPDATE / DELETE / NOOP, with created_at/updated_at in the prompt so recent facts win on conflict. Both extraction's LONG_TERM writes and consolidation's promotions route through it (single source of truth). Cooldown-skipped turns are buffered + replayed; subjects normalised in code. Fixes #1 #2 #3 #4 #7 #8.

Tier 2 — Embeddings (optional) + relevance-ranked injection

New core/embeddings.py wraps an OpenAI-compatible /embeddings endpoint; vectors stored as packed float32 blobs, compared with brute-force cosine in Python (no native extension — identical local/container, trivial at <1k rows). When enabled, update_memory retrieves candidates by cosine (per-row lexical fallback), and prompt injection becomes relevance-ranked (relevance + importance + recency) over only injection_top_k memories instead of dumping the recent N — the inbound message is threaded through as the query. Off by default; the system runs on Tier-1 lexical retrieval with no new dependency. Fixes #5.

Tier 3 — Forgetting / importance / reinforcement

long_term gains importance, last_accessed, access_count, archived. Recalled memories are reinforced; re-mentions raise importance; consolidation archives cold, low-importance, idle memories via a soft-delete flag so long-term stops growing unbounded. Fixes #9.

Tier 4 — Long-term hygiene pass

Consolidation clusters near-duplicate long-term rows (embedding or lexical) and resolves each cluster with one LLM call — merging duplicates, dropping contradictions, keeping the most recent. Fixes #6.

Portability & migration

Pure stdlib + aiosqlite + standard SQL; openai imported lazily only when embeddings are enabled. New columns are added to existing data/memory.db via an in-place ALTER TABLE migration in _ensure_schema — no manual upgrade. Verified identical on local env and the container SQLite build (no FTS5 / sqlite-vec / native extensions).

Config

All knobs under the memory section (see config.yml.example): embedding.{enabled,provider,model,injection_top_k,…}, default_importance, archive_*, hygiene_*.

Tests

test_memory_update_pipeline.py (Tier 1) and test_memory_tiers.py (Tiers 2–4: vector helpers, embedding retrieval, relevance injection + reinforcement, archiving, hygiene merge incl. malformed-plan safety, legacy migration) plus updated extraction/consolidation tests. Full suite: 291 passed.

Sequencing note: the issue suggested shipping tiers as separate PRs; this PR delivers all four together but keeps them cleanly separated by commit.

Replace exact-subject + substring + longer-wins dedup with a single unified update path: retrieve top-k lexically similar long-term memories, then one LLM call decides ADD/UPDATE/DELETE/NOOP. Handles semantic duplicates, refinements, and contradictions, and includes timestamps so the model can prefer recent facts on conflict. Both extraction's LONG_TERM writes and consolidation's promotions now route through update_memory (single source of truth). Lexical retrieval is dependency-free Python token overlap (no FTS5/embeddings), portable across local and container SQLite builds. Subjects are normalised in code. Cooldown-skipped turns are buffered and replayed into the next extraction instead of being dropped.

Add test_memory_update_pipeline.py: ADD/UPDATE/DELETE/NOOP decisions, malformed/exception safe no-op, invalid-id rejection, subject normalisation, timestamp injection, and lexical retrieval ranking. Update extraction tests for the new write path and add cooldown turn-buffering tests. Document the unified write pipeline in pa.md.

…Tiers 2-4) Tier 2 — optional semantic retrieval. New core/embeddings.py wraps an OpenAI-compatible /embeddings endpoint; vectors are stored as packed float32 blobs and compared with brute-force cosine in Python (no native extension, identical local/container). When enabled, update_memory retrieves candidates by cosine (lexical fallback per row), and prompt injection becomes relevance-ranked (relevance + importance + recency) over only injection_top_k memories instead of dumping the recent N. The inbound message is threaded through prompt building as the query. Off by default — the pipeline still runs on Tier-1 lexical retrieval. Tier 3 — forgetting/importance/reinforcement. long_term gains importance, last_accessed, access_count, archived. Recalled memories are reinforced; re-mentions raise importance; consolidation archives cold low-importance idle memories via a soft-delete flag so long-term stops growing unbounded. Tier 4 — hygiene pass. Consolidation clusters near-duplicate long-term rows and resolves each cluster with one LLM call that merges duplicates and drops contradictions, keeping the most recent fact. New columns are added to existing DBs via in-place ALTER TABLE migration in _ensure_schema; no manual upgrade step. All knobs configurable under the memory config section.

… pipeline Add test_memory_tiers.py: vector helpers, embedding-backed retrieval and relevance-ranked injection with reinforcement, cold-memory archiving, importance reinforcement on update, hygiene clustering/merge (incl. malformed-plan safety), consolidation summary keys, and in-place migration of a legacy long_term table. Update the session-prompt test stub for the new query kwarg. Document Tiers 2-4 in pa.md and add the embedding/forgetting/hygiene knobs to config.yml.example.

…ocker Add a local embedding backend so semantic memory needs no API key and keeps data on the box. core/embeddings.py gains LocalEmbeddingClient (fastembed, ONNX/CPU, BAAI/bge-small-en-v1.5) that loads the model lazily in a worker thread, plus prefetch_local_model() and a 'python -m core.embeddings prefetch' CLI. Embeddings now default ON with provider=local. The Dockerfile prefetches the model into /app/models (outside the data volume, chowned to the runtime user) so the image is self-contained and works offline — verified: build-prefetch then HF_HUB_OFFLINE load + embed succeeds, related>unrelated cosine. _build_embedder branches local vs OpenAI-compatible API; remote keys still fall back to the matching agent provider key. fastembed added to deps/lock (onnxruntime 1.26 has cp314 wheels).

…lainers Add a Semantic memory (embeddings) card and a Memory lifecycle (forgetting & hygiene) card to the admin Memory tab: enable toggle, backend (local/openai/ google), model, top-k, importance/archive/hygiene knobs, plus collapsible 'how it works (the science)' sections. Wire status/prefetch/test endpoints (/memory/embedding/*) and a Download-model button; changes apply live to the running agent (embedder rebuilt lazily, lifecycle knobs refreshed in patch_config). Refactor the memory partial render into a shared helper so the post-delete refresh keeps the full config. Tests: local backend lazy-load + prefetch (fastembed stubbed), embedding status/prefetch/test endpoints, memory partial render, and a config_store roundtrip proving UI-saved memory.embedding.* keys rebuild EmbeddingConfig. Docs updated (pa.md). Full suite: 304 passed.

Replace the pure-Python cosine loop with numpy. unpack_vector returns an ndarray (np.frombuffer), and a new cosine_to_matrix() scores a query against all candidate vectors in one normalised matmul. get_relevant_long_term and _retrieve_similar_long_term now batch via _batch_relevance (rows whose stored vector matches the query dim use cosine; the rest fall back to lexical). Measured at 384-dim: per-message retrieval over 1k memories 65ms -> 2.4ms (~28x), 5k 330ms -> 17ms. Hygiene pairwise similarly faster. numpy is already present (fastembed/onnxruntime) and is now a declared dependency. Also fixes ndarray truthiness in _pair_similarity (is not None + shape check).

mattmezza added 4 commits June 7, 2026 22:13

mattmezza changed the title ~~feat(memory): unified ADD/UPDATE/DELETE/NOOP write pipeline (#5 Tier 1)~~ feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5) Jun 7, 2026

mattmezza added 3 commits June 7, 2026 23:04

mattmezza merged commit a10000d into main Jun 7, 2026
1 check passed

mattmezza deleted the feat/memory-unified-update-pipeline branch June 7, 2026 21:13

mattmezza mentioned this pull request Jun 7, 2026

Improve memory consolidation: unified ADD/UPDATE/DELETE/NOOP pipeline + semantic dedup #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5)#7

feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5)#7
mattmezza merged 7 commits into
mainfrom
feat/memory-unified-update-pipeline

mattmezza commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattmezza commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier 1 — Unified ADD/UPDATE/DELETE/NOOP write pipeline

Tier 2 — Embeddings (optional) + relevance-ranked injection

Tier 3 — Forgetting / importance / reinforcement

Tier 4 — Long-term hygiene pass

Portability & migration

Config

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattmezza commented Jun 7, 2026 •

edited

Loading