Skip to content

feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5)#7

Merged
mattmezza merged 7 commits into
mainfrom
feat/memory-unified-update-pipeline
Jun 7, 2026
Merged

feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5)#7
mattmezza merged 7 commits into
mainfrom
feat/memory-unified-update-pipeline

Conversation

@mattmezza

@mattmezza mattmezza commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Closes #5 — full implementation across all four tiers.

Tier 1 — Unified ADD/UPDATE/DELETE/NOOP write pipeline

MemoryStore.update_memory() replaces the exact-subject + substring + "longer wins" dedup. For each candidate it retrieves the most similar existing long-term memories, then one LLM call decides ADD / UPDATE / DELETE / NOOP, with created_at/updated_at in the prompt so recent facts win on conflict. Both extraction's LONG_TERM writes and consolidation's promotions route through it (single source of truth). Cooldown-skipped turns are buffered + replayed; subjects normalised in code. Fixes #1 #2 #3 #4 #7 #8.

Tier 2 — Embeddings (optional) + relevance-ranked injection

New core/embeddings.py wraps an OpenAI-compatible /embeddings endpoint; vectors stored as packed float32 blobs, compared with brute-force cosine in Python (no native extension — identical local/container, trivial at <1k rows). When enabled, update_memory retrieves candidates by cosine (per-row lexical fallback), and prompt injection becomes relevance-ranked (relevance + importance + recency) over only injection_top_k memories instead of dumping the recent N — the inbound message is threaded through as the query. Off by default; the system runs on Tier-1 lexical retrieval with no new dependency. Fixes #5.

Tier 3 — Forgetting / importance / reinforcement

long_term gains importance, last_accessed, access_count, archived. Recalled memories are reinforced; re-mentions raise importance; consolidation archives cold, low-importance, idle memories via a soft-delete flag so long-term stops growing unbounded. Fixes #9.

Tier 4 — Long-term hygiene pass

Consolidation clusters near-duplicate long-term rows (embedding or lexical) and resolves each cluster with one LLM call — merging duplicates, dropping contradictions, keeping the most recent. Fixes #6.

Portability & migration

Pure stdlib + aiosqlite + standard SQL; openai imported lazily only when embeddings are enabled. New columns are added to existing data/memory.db via an in-place ALTER TABLE migration in _ensure_schema — no manual upgrade. Verified identical on local env and the container SQLite build (no FTS5 / sqlite-vec / native extensions).

Config

All knobs under the memory section (see config.yml.example): embedding.{enabled,provider,model,injection_top_k,…}, default_importance, archive_*, hygiene_*.

Tests

test_memory_update_pipeline.py (Tier 1) and test_memory_tiers.py (Tiers 2–4: vector helpers, embedding retrieval, relevance injection + reinforcement, archiving, hygiene merge incl. malformed-plan safety, legacy migration) plus updated extraction/consolidation tests. Full suite: 291 passed.

Sequencing note: the issue suggested shipping tiers as separate PRs; this PR delivers all four together but keeps them cleanly separated by commit.

mattmezza added 4 commits June 7, 2026 22:13
Replace exact-subject + substring + longer-wins dedup with a single
unified update path: retrieve top-k lexically similar long-term memories,
then one LLM call decides ADD/UPDATE/DELETE/NOOP. Handles semantic
duplicates, refinements, and contradictions, and includes timestamps so
the model can prefer recent facts on conflict.

Both extraction's LONG_TERM writes and consolidation's promotions now
route through update_memory (single source of truth). Lexical retrieval
is dependency-free Python token overlap (no FTS5/embeddings), portable
across local and container SQLite builds. Subjects are normalised in
code. Cooldown-skipped turns are buffered and replayed into the next
extraction instead of being dropped.
Add test_memory_update_pipeline.py: ADD/UPDATE/DELETE/NOOP decisions,
malformed/exception safe no-op, invalid-id rejection, subject
normalisation, timestamp injection, and lexical retrieval ranking.
Update extraction tests for the new write path and add cooldown
turn-buffering tests. Document the unified write pipeline in pa.md.
…Tiers 2-4)

Tier 2 — optional semantic retrieval. New core/embeddings.py wraps an
OpenAI-compatible /embeddings endpoint; vectors are stored as packed
float32 blobs and compared with brute-force cosine in Python (no native
extension, identical local/container). When enabled, update_memory
retrieves candidates by cosine (lexical fallback per row), and prompt
injection becomes relevance-ranked (relevance + importance + recency)
over only injection_top_k memories instead of dumping the recent N. The
inbound message is threaded through prompt building as the query. Off by
default — the pipeline still runs on Tier-1 lexical retrieval.

Tier 3 — forgetting/importance/reinforcement. long_term gains importance,
last_accessed, access_count, archived. Recalled memories are reinforced;
re-mentions raise importance; consolidation archives cold low-importance
idle memories via a soft-delete flag so long-term stops growing unbounded.

Tier 4 — hygiene pass. Consolidation clusters near-duplicate long-term
rows and resolves each cluster with one LLM call that merges duplicates
and drops contradictions, keeping the most recent fact.

New columns are added to existing DBs via in-place ALTER TABLE migration
in _ensure_schema; no manual upgrade step. All knobs configurable under
the memory config section.
… pipeline

Add test_memory_tiers.py: vector helpers, embedding-backed retrieval and
relevance-ranked injection with reinforcement, cold-memory archiving,
importance reinforcement on update, hygiene clustering/merge (incl.
malformed-plan safety), consolidation summary keys, and in-place
migration of a legacy long_term table. Update the session-prompt test
stub for the new query kwarg. Document Tiers 2-4 in pa.md and add the
embedding/forgetting/hygiene knobs to config.yml.example.
@mattmezza mattmezza changed the title feat(memory): unified ADD/UPDATE/DELETE/NOOP write pipeline (#5 Tier 1) feat(memory): unified update pipeline + embeddings + forgetting + hygiene (closes #5) Jun 7, 2026
mattmezza added 3 commits June 7, 2026 23:04
…ocker

Add a local embedding backend so semantic memory needs no API key and keeps
data on the box. core/embeddings.py gains LocalEmbeddingClient (fastembed,
ONNX/CPU, BAAI/bge-small-en-v1.5) that loads the model lazily in a worker
thread, plus prefetch_local_model() and a 'python -m core.embeddings
prefetch' CLI.

Embeddings now default ON with provider=local. The Dockerfile prefetches the
model into /app/models (outside the data volume, chowned to the runtime user)
so the image is self-contained and works offline — verified: build-prefetch
then HF_HUB_OFFLINE load + embed succeeds, related>unrelated cosine.

_build_embedder branches local vs OpenAI-compatible API; remote keys still
fall back to the matching agent provider key. fastembed added to deps/lock
(onnxruntime 1.26 has cp314 wheels).
…lainers

Add a Semantic memory (embeddings) card and a Memory lifecycle (forgetting &
hygiene) card to the admin Memory tab: enable toggle, backend (local/openai/
google), model, top-k, importance/archive/hygiene knobs, plus collapsible
'how it works (the science)' sections. Wire status/prefetch/test endpoints
(/memory/embedding/*) and a Download-model button; changes apply live to the
running agent (embedder rebuilt lazily, lifecycle knobs refreshed in
patch_config). Refactor the memory partial render into a shared helper so the
post-delete refresh keeps the full config.

Tests: local backend lazy-load + prefetch (fastembed stubbed), embedding
status/prefetch/test endpoints, memory partial render, and a config_store
roundtrip proving UI-saved memory.embedding.* keys rebuild EmbeddingConfig.
Docs updated (pa.md). Full suite: 304 passed.
Replace the pure-Python cosine loop with numpy. unpack_vector returns an
ndarray (np.frombuffer), and a new cosine_to_matrix() scores a query against
all candidate vectors in one normalised matmul. get_relevant_long_term and
_retrieve_similar_long_term now batch via _batch_relevance (rows whose stored
vector matches the query dim use cosine; the rest fall back to lexical).

Measured at 384-dim: per-message retrieval over 1k memories 65ms -> 2.4ms
(~28x), 5k 330ms -> 17ms. Hygiene pairwise similarly faster. numpy is
already present (fastembed/onnxruntime) and is now a declared dependency.

Also fixes ndarray truthiness in _pair_similarity (is not None + shape check).
@mattmezza mattmezza merged commit a10000d into main Jun 7, 2026
1 check passed
@mattmezza mattmezza deleted the feat/memory-unified-update-pipeline branch June 7, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve memory consolidation: unified ADD/UPDATE/DELETE/NOOP pipeline + semantic dedup

1 participant