Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,6 @@ COSMOS_DB_SUMMARIES_CONTAINER="memories_summaries"
COSMOS_DB_TURNS_CONTAINER="memories_turns"
COSMOS_DB_COUNTERS_CONTAINER=counter
COSMOS_DB_LEASE_CONTAINER=leases
# Throughput mode for all required Cosmos DB containers created by the toolkit
# (memories, counter, and lease).
# - serverless: default. The toolkit does not send container RU/s settings.
# Use this only with a Cosmos DB account configured for serverless.
# - autoscale: the toolkit provisions all required containers with autoscale
# throughput using COSMOS_DB_AUTOSCALE_MAX_RU as the max RU/s cap.
# Default max RU/s is 1000.
COSMOS_DB_THROUGHPUT_MODE=serverless
COSMOS_DB_AUTOSCALE_MAX_RU=1000

# ---- Processing thresholds (set to 0 to disable) ----
THREAD_SUMMARY_EVERY_N=10
Expand All @@ -53,9 +44,6 @@ AI_FOUNDRY_ENDPOINT=https://<your-account>.openai.azure.com/
AI_FOUNDRY_API_KEY=
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_EMBEDDING_DATA_TYPE=float32
AI_FOUNDRY_EMBEDDING_DISTANCE_FUNCTION=cosine
COSMOS_DB_FULL_TEXT_LANGUAGE=en-US

AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=<your-model-deployment>
# Optional. Pin the Azure OpenAI REST API version used by chat and embeddings
Expand Down
40 changes: 33 additions & 7 deletions Docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,26 +116,52 @@ Prompts for summarization and fact extraction live in `azure_functions/prompts/`

## Memory Reconciliation

The `reconcile_memories(user_id, n=50)` pipeline step reads up to N most-recent active facts for a user and asks the LLM to identify two orthogonal outcomes in one pass:
Reconciliation runs in **two complementary tiers**: a cheap, LLM-free **vector-floor dedup ladder** applied to freshly-extracted memories before they persist, and a periodic **LLM reconcile** that runs in a **dual mode** (cheap candidate clusters most sweeps, a full-pool backstop occasionally).

- **Duplicates** — two or more facts that restate the same claim in different words. Resolution: collapse into one merged fact; the originals are soft-deleted with `supersede_reason="duplicate"` and `superseded_by` set to the merged fact's id.
- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradict"` and `superseded_by` set to the winner.
### Vector-floor dedup ladder (write path, LLM-free)

### Why one pass
Between extraction and persist, `dedup_extracted_memories` compares each new fact/episodic memory against the user's existing active memories of the same type using Cosmos `VectorDistance` (pure vector, no hybrid). Each new memory takes one rung of a similarity ladder:

Detecting contradictions semantically requires the LLM to see the candidate pool as a whole — paraphrased ("user prefers aisle seats") and contradictory ("user is vegetarian" vs "user loves steak") facts often have very different embedding vectors and would never co-occur in any cosine cluster. Putting all N candidates into one prompt lets the LLM do the semantic reasoning across both axes simultaneously. The pipeline returns `{"kept": int, "merged": int, "contradicted": int}`.
| band | condition (cosine) | action |
|------|--------------------|--------|
| exact | `content_hash` hit | skip (Stage 0, free) |
| near-exact | `s ≥ DEDUP_SIM_HIGH` (0.97) | **auto-skip** the new memory (no LLM); logged for audit |
| borderline | `DEDUP_SIM_LOW ≤ s < DEDUP_SIM_HIGH` (0.80–0.97) | persist, tag `sys:dup-candidate` + stash `dup_of`/`dup_score` for the LLM reconcile |
| novel | `s < DEDUP_SIM_LOW` | persist clean |

The thresholds are calibrated for **cosine/dotproduct** on normalized embeddings. On a container whose `distanceFunction` is **euclidean**, the destructive near-exact auto-skip is **disabled** (one-shot warning) and those memories fall through to borderline tagging so the LLM adjudicates — euclidean distances aren't a bounded [0,1] similarity and would mis-fire the cosine-tuned drop.

### Dual-mode LLM reconcile

`reconcile_memories(user_id, n=50, *, memory_type="fact", full_rebuild=False)` identifies two orthogonal outcomes:

- **Duplicates** — facts restating the same claim. Resolution: collapse into one merged fact; originals soft-deleted with `supersede_reason="duplicate"` and `superseded_by` set to the merged fact.
- **Contradictions** — facts asserting opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradict"`.

It runs in one of two modes:

- **Candidate mode** (default auto sweeps) — builds connected-component clusters from the `sys:dup-candidate` seeds + their vector neighbors (edge threshold `DEDUP_CLUSTER_SIM`, 0.60) and sends **only those clusters** to the LLM. Cheap, but keyed on near-duplicate similarity. Tagged seeds that never join a cluster have their stale tag cleared so they aren't re-scanned forever.
- **Full-pool backstop** — every `DEDUP_FULL_RECLUSTER_EVERY_N`-th sweep (default 12), and on any explicit `reconcile(full_rebuild=True)`, the **entire** active pool goes into one LLM pass. This is the only path that catches **dissimilar contradictions** — paraphrased ("prefers aisle seats") and contradictory ("vegetarian" vs "loves steak") facts have very different embedding vectors and would never co-occur in a cosine cluster, so candidate mode alone can't link them.

Both modes return `{"kept": int, "merged": int, "contradicted": int}`. In-process and durable backends reconcile **both** facts and episodic memories so episodic duplicates don't accrue forever.

### Loser preservation

Soft-deleted facts stay in the container with their `supersede_reason`, `superseded_at`, and `superseded_by` fields populated. Default reads (`get_memories`, `search_cosmos`) filter them out via `superseded_by IS NULL`. To inspect the audit trail (e.g. "show everything that ever applied to this user"), opt out of the filter at the query level.
Soft-deleted facts stay in the container with their `supersede_reason`, `superseded_at`, and `superseded_by` fields populated. Default reads (`get_memories`, `search_cosmos`) filter them out via `superseded_by IS NULL`. To inspect the audit trail, opt out of the filter at the query level.

### Write-time exact dedup

Each fact written by `extract_memories` carries a `content_hash` (SHA-256 of normalized content, truncated to 32 hex chars; lowercase, whitespace-collapsed). Before upserting a freshly-extracted fact, the pipeline checks the hash against existing active facts and short-circuits if a match exists, incrementing the `exact_dedup_skipped` metric. This catches identical re-extractions cheaply without an LLM call.

### Extraction watermark (`recent_k`)

The auto-trigger paths size `recent_k` (how many recent turns extraction reads) from a per-thread **watermark** (`last_extract_count` on the counter doc): `recent_k = current_count − last_extract_count` (with `last_extract_count` treated as `0` before the first successful extract). The newest-`recent_k` turns are exactly the turns added since the last successful extract, and the watermark advances **only after a successful extract** — so under normal operation no turns are skipped when extraction lags or transiently fails: a failed run leaves the watermark put and the full backlog is retried next sweep. The window is deliberately **not** capped by `DEDUP_POOL_SIZE` (that knob governs the reconcile prompt, not the extraction window) — capping would extract only the newest N and silently strand the oldest backlog turns.

> **Caveat (rare):** the SDK's inline counter increment is best-effort — under sustained optimistic-concurrency contention it can drop an increment rather than block the user's write path (see `increment_counter_sync`). A dropped increment leaves `current_count` lagging the true turn count, which can in turn under-cover a later extraction window. This is the one case where the "no turns skipped" property does not hold; the Function App backend avoids it by raising to force change-feed redelivery.

### Tunable

`DEDUP_EVERY_N` (default 5) controls how often `reconcile_memories` runs in the auto-trigger path. Set to `0` to disable. The candidate cap `n` (default 50) is tunable per call; larger values give the LLM a wider view at higher token cost.
`DEDUP_EVERY_N` (default 5) controls how often reconcile runs in the auto-trigger path. Set to `0` to disable. The candidate cap `n` (default `DEDUP_POOL_SIZE`, 50) is tunable per call; larger values give the LLM a wider view at higher token cost. `DEDUP_FULL_RECLUSTER_EVERY_N` (default 12) sets how often the full-pool backstop fires.

Comment on lines 162 to 165
> **Indexing note.** The reconcile pool query orders by `created_at` (matching the prompt's "more recent first" tiebreaker). Cosmos's default indexing policy includes every property, so this works out of the box. If you customize the indexing policy to reduce write RU, ensure `/created_at/?` remains indexed or the query will fail with a 400 (`Order-by over a non-indexed path`).

Expand Down
1 change: 0 additions & 1 deletion Docs/design_patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,6 @@ facts = await mem.search_cosmos(
results = await mem.search_cosmos(
search_terms="PostgreSQL to Cosmos DB",
user_id="user-1",
hybrid_search=True,
top_k=5,
)
```
Expand Down
4 changes: 2 additions & 2 deletions Docs/public_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

### Retrieval

- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — hybrid vector/full-text search memories, falling back to vector-only for all-stopword queries.
- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
Expand Down Expand Up @@ -90,7 +90,7 @@ Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval,

### Retrieval

- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — hybrid vector/full-text search memories, falling back to vector-only for all-stopword queries.
- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
Expand Down
10 changes: 2 additions & 8 deletions Docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,11 @@ COSMOS_DB_DATABASE=ai_memory
COSMOS_DB_MEMORIES_CONTAINER=memories
COSMOS_DB_COUNTERS_CONTAINER=counter
COSMOS_DB_LEASE_CONTAINER=leases
COSMOS_DB_THROUGHPUT_MODE=serverless
COSMOS_DB_AUTOSCALE_MAX_RU=1000

AI_FOUNDRY_ENDPOINT=https://<account>.openai.azure.com/
AI_FOUNDRY_API_KEY=
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_EMBEDDING_DATA_TYPE=float32
AI_FOUNDRY_EMBEDDING_DISTANCE_FUNCTION=cosine
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=<chat-deployment-name>
```

Expand All @@ -77,8 +73,6 @@ The notebooks and samples pass these values into the client like this:
| `AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME` | `embedding_deployment_name` |
| `AI_FOUNDRY_CHAT_DEPLOYMENT_NAME` | `chat_deployment_name` |

`AI_FOUNDRY_EMBEDDING_DIMENSIONS`, `AI_FOUNDRY_EMBEDDING_DATA_TYPE`, and `AI_FOUNDRY_EMBEDDING_DISTANCE_FUNCTION` are read by the toolkit when creating the Cosmos DB vector policy. The Function App also reads `COSMOS_DB__accountEndpoint` for its identity-based Cosmos DB trigger binding; set it to the same value as `COSMOS_DB_ENDPOINT`.

Run `az login` before using `DefaultAzureCredential`.

Required roles:
Expand All @@ -104,7 +98,7 @@ The memories container is created with:

If vector or full-text search fails after changing dimensions or indexing settings, create a fresh container with the desired configuration. Cosmos container vector policies are creation-time infrastructure choices.

Use `COSMOS_DB_THROUGHPUT_MODE=serverless` for the default setup. Use `autoscale` with `COSMOS_DB_AUTOSCALE_MAX_RU` when you need provisioned autoscale throughput.
Pass `cosmos_throughput_mode="serverless"` (the default) when creating the client. Use `cosmos_throughput_mode="autoscale"` with `cosmos_autoscale_max_ru` when you need provisioned autoscale throughput.

---

Expand All @@ -117,7 +111,7 @@ Embedding failures usually mean one of these is wrong:
- `AI_FOUNDRY_EMBEDDING_DIMENSIONS`
- Azure OpenAI / AI Services RBAC

For hybrid search, `search_terms` is required when `hybrid_search=True`.
Search always uses hybrid vector/full-text ranking when keyword extraction finds terms; all-stopword queries fall back to vector-only ranking.

If search returns documents but scores look poor, check that records have an `embedding` field and that the query uses similar language to the stored memory content.

Expand Down
13 changes: 7 additions & 6 deletions Samples/Advanced/advanced_search_patterns.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,11 @@ def seed_memories(mem: CosmosMemoryClient, user_id: str, thread_id: str) -> None
# ---------------------------------------------------------------------------

def vector_search(mem: CosmosMemoryClient, user_id: str) -> None:
"""Pattern 1 — Pure vector (semantic similarity) search."""
print_header("1. Vector Search (semantic similarity)")
"""Pattern 1 — Semantic-style query (natural language, low keyword overlap)."""
print_header("1. Semantic Search (natural-language query)")
print(" Query: 'outdoor activities'")
print(" Finds semantically related memories even without exact keyword matches.\n")
print(" Hybrid ranking leans on embedding similarity when there are few exact")
print(" keyword matches, so semantically related memories still surface.\n")

results = mem.search_cosmos(
search_terms="outdoor activities",
Expand All @@ -103,15 +104,15 @@ def vector_search(mem: CosmosMemoryClient, user_id: str) -> None:


def hybrid_search(mem: CosmosMemoryClient, user_id: str) -> None:
"""Pattern 2 — Hybrid search (vector + full-text)."""
"""Pattern 2 — Hybrid search (vector + full-text) is the default."""
print_header("2. Hybrid Search (vector + full-text)")
print(" Query: 'hiking trails Pacific Northwest'")
print(" Combines embedding similarity with BM25 keyword matching.\n")
print(" Every search_cosmos call fuses embedding similarity with BM25 keyword")
print(" matching automatically — no flag required.\n")

results = mem.search_cosmos(
search_terms="hiking trails Pacific Northwest",
user_id=user_id,
hybrid_search=True,
top_k=5,
)
print_results(results)
Expand Down
2 changes: 1 addition & 1 deletion Samples/Notebooks/Demo_async.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -872,7 +872,7 @@
"results_search_async = await memory.search_cosmos(\n",
" search_terms=\"What did the user ask about the weather?\",\n",
" user_id=USER_ID,\n",
" top_k=3, hybrid_search= True\n",
" top_k=3\n",
")"
]
},
Expand Down
Loading