You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update README and docs to reflect memory system implementation
Update all documentation to cover the persistent memory system added in
Step 5 — SQLite storage, hybrid retrieval (vector + FTS with RRF),
OpenAI embeddings, rolling summarization, memory tools, and session store.
@@ -151,6 +162,25 @@ The runtime is the core agent loop that ties providers and tools together:
151
162
152
163
The turn timeout covers both provider streaming and tool execution as a single budget. Ctrl-C (SIGINT/SIGTERM) propagates cleanly into the runtime via context cancellation.
153
164
165
+
When the conversation approaches the context limit, the runtime triggers rolling summarization — it compacts older messages into a summary and keeps recent turns, so the agent can maintain context across long sessions.
166
+
167
+
## Memory
168
+
169
+
Persistent memory backed by SQLite (pure Go, no CGO via `modernc.org/sqlite`).
170
+
171
+
**Hybrid retrieval** combines two search strategies:
172
+
-**Vector search** — cosine similarity over OpenAI embeddings (semantic meaning)
173
+
-**Full-text search** — SQLite FTS5 with BM25 ranking (exact keywords)
174
+
- Results are merged using **Reciprocal Rank Fusion** with configurable weights (default 0.7 vector / 0.3 FTS)
175
+
176
+
**Graceful degradation** — if no `OPENAI_API_KEY` is set, the system falls back to FTS-only search. Memory is optional; the agent works without it.
177
+
178
+
**What gets persisted:**
179
+
- Memory chunks (knowledge stored by the agent or user)
180
+
- Conversation events (every message in the turn loop)
181
+
- Rolling summaries (context compaction across sessions)
182
+
- Session scratchpad (key-value state per session)
- Reads a file with 6-digit line numbers: ` 1\tpackage main`
@@ -459,9 +461,17 @@ The runtime classifies errors:
459
461
- Max turns reached → `ErrMaxTurns`
460
462
- Tool execution errors → placed in message content (the LLM sees them and can recover)
461
463
462
-
### Context budget
464
+
### Context budget and summarization
465
+
466
+
After each tool dispatch, the runtime estimates token usage (`totalChars / 4`) and checks if the session is approaching the context limit (`TriggerRatio * MaxContextTokens`). When triggered:
467
+
468
+
1. A `MinTurns` guard (default 6) prevents summarizing too-short conversations
469
+
2. The runtime builds a summarization prompt including the existing summary (if any) and the messages to compact
470
+
3. The LLM generates a rolling summary via a dedicated system prompt
471
+
4. The summary is stored in the `session_summaries` table with an incrementing epoch
472
+
5.`session.CompactWithSummary()` replaces older messages with a `[Conversation Summary]` pseudo-message, keeping the most recent turns
463
473
464
-
After each tool dispatch, the runtime estimates token usage (chars/4) and logs a warning if the session is approaching the context limit (`TriggerRatio * MaxContextTokens`). Actual summarization is deferred to Step 5 (Memory).
474
+
On session startup, if a prior summary exists, it's injected as the first messages so the agent has context from previous runs.
465
475
466
476
## How the pieces connect
467
477
@@ -482,10 +492,109 @@ yantra run "add error handling to server.go"
482
492
└── loop until text-only response or MaxTurns
483
493
```
484
494
495
+
## Layer 5: Memory (`internal/memory/`)
496
+
497
+
Memory gives the agent persistence — it can store knowledge, recall it later, and maintain context across sessions.
498
+
499
+
### Storage: SQLite (no CGO)
500
+
501
+
The memory system uses `modernc.org/sqlite`, a pure-Go SQLite implementation. No CGO means the binary cross-compiles trivially. The database opens with **WAL mode** and a 5-second busy timeout for concurrent access safety.
502
+
503
+
Schema (6 tables):
504
+
505
+
```
506
+
chunks — memory fragments with optional embedding BLOBs
session_summaries — rolling summary per session (with epoch counter)
511
+
scratchpads — key-value state per session
512
+
```
513
+
514
+
### Hybrid retrieval
515
+
516
+
Memory search combines two strategies and merges them with Reciprocal Rank Fusion (RRF):
517
+
518
+
```
519
+
Query: "how does authentication work?"
520
+
│
521
+
├── Vector Search (weight: 0.7)
522
+
│ Compute embedding → cosine similarity against all chunks
523
+
│ Returns: semantically similar results
524
+
│
525
+
├── FTS Search (weight: 0.3)
526
+
│ SQLite FTS5 with BM25 ranking
527
+
│ Returns: keyword-matching results
528
+
│
529
+
└── Reciprocal Rank Fusion (k=60)
530
+
Merge + deduplicate by chunk ID
531
+
Score: weight / (60 + rank) per source
532
+
Return top K results
533
+
```
534
+
535
+
The system fetches `topK * 3` candidates from each source before fusion, ensuring good coverage. Weights are configurable — higher `VectorWeight` favors semantic matches, higher `FTSWeight` favors exact keyword matches.
536
+
537
+
**Graceful degradation:**
538
+
- No `OPENAI_API_KEY` → FTS-only search (no embeddings)
- No memory DB → agent runs without memory, logs a warning
541
+
542
+
### Embeddings
543
+
544
+
Embeddings are computed via the OpenAI API and stored as compact little-endian binary BLOBs (4 bytes per float32 dimension), saving ~75% compared to JSON-encoded arrays.
545
+
546
+
Supported models:
547
+
548
+
| Model | Dimensions |
549
+
|-------|-----------|
550
+
|`text-embedding-3-small` (default) | 1536 |
551
+
|`text-embedding-3-large`| 3072 |
552
+
|`text-embedding-ada-002`| 1536 |
553
+
554
+
The factory returns `nil` (not an error) when the API key is missing, so the system can always boot.
555
+
556
+
### Conversation persistence
557
+
558
+
Every message in the turn loop (user, assistant, tool results) is persisted via `StoreConversationEvent()`. Persistence uses the turn context with deadline, ensuring it respects timeouts. Failed persistence is logged as a warning but does not halt execution (fire-and-forget).
559
+
560
+
### Memory tools
561
+
562
+
Two tools expose memory to the agent:
563
+
564
+
-**`memory_save`** (SideEffecting, 15s) — stores knowledge with optional tags
565
+
-**`memory_search`** (ReadOnly, 15s) — hybrid search with ranked results
566
+
567
+
These are conditionally registered only when a `MemoryRetrieval` instance is available.
568
+
569
+
### Session store
570
+
571
+
`SQLiteSessionStore` manages session lifecycle:
572
+
573
+
| Operation | Details |
574
+
|-----------|---------|
575
+
| Create | Generates `ses_<32 hex chars>` ID |
576
+
| Get | Single session by ID |
577
+
| List | All sessions, ordered by `updated_at DESC`, optionally including archived |
578
+
| Update | Name, message count, timestamps |
579
+
| Archive | Soft-delete (sets `archived = 1`) |
580
+
581
+
### Key patterns
582
+
583
+
**Interface-driven design** — the runtime and tools depend on `types.MemoryRetrieval` and `types.SessionStore` interfaces, not concrete types. Compile-time checks enforce this:
Copy file name to clipboardExpand all lines: docs/config.md
+23-9Lines changed: 23 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,36 +131,50 @@ When the conversation history approaches the context limit (trigger_ratio × max
131
131
132
132
### runtime.summarization
133
133
134
-
Controls rolling summarization behavior.
134
+
Controls rolling summarization behavior. When the context budget is exceeded, the runtime calls the LLM to generate a summary of older messages, stores it in the database, and compacts the session.
135
135
136
136
```toml
137
137
[runtime.summarization]
138
138
target_ratio = 0.5# Aim to reduce context to 50% after summarization
139
139
min_turns = 6# Don't summarize conversations shorter than 6 turns
140
140
```
141
141
142
+
**How it works:** When triggered, the runtime builds a summarization prompt with the existing summary (if any) and messages to compact. The LLM generates a rolling summary, which is stored in `session_summaries` with an incrementing epoch. Older messages are replaced with a `[Conversation Summary]` pseudo-message. On session restart, the prior summary is injected as the opening context.
143
+
142
144
### memory
143
145
144
-
Persistent memory backed by a vector database.
146
+
Persistent memory backed by SQLite (pure Go, no CGO via `modernc.org/sqlite`).
# ollama_url = "http://localhost:11434" # For ollama backend (reserved, not yet implemented)
161
+
# ollama_model = "nomic-embed-text" # For ollama backend (reserved, not yet implemented)
156
162
157
163
[memory.retrieval]
158
-
top_k = 8# Number of results to retrieve
164
+
top_k = 8# Number of results to retrieve per search
159
165
vector_weight = 0.7# Weight for vector similarity (0-1)
160
166
fts_weight = 0.3# Weight for full-text search (0-1)
161
167
```
162
168
163
-
Memory uses hybrid retrieval — combining vector similarity search (semantic meaning) with full-text search (exact keyword matching). The weights control the balance. Higher vector_weight favors semantic matches; higher fts_weight favors exact matches.
169
+
Memory uses **hybrid retrieval** — combining vector similarity search (semantic meaning) with full-text search (exact keyword matching via SQLite FTS5). Results from both sources are merged using **Reciprocal Rank Fusion** (RRF) with configurable weights. Higher `vector_weight` favors semantic matches; higher `fts_weight` favors exact keyword matches.
170
+
171
+
**Graceful degradation:** If `OPENAI_API_KEY` is not set, the system falls back to FTS-only search (no vector embeddings). If the database fails to open, the agent continues without memory. Memory tools (`memory_save`, `memory_search`) are only registered when memory is available.
172
+
173
+
**Storage details:**
174
+
- Embeddings are stored as compact little-endian binary BLOBs (4 bytes per dimension)
175
+
- Conversation events are persisted per-session for history recall
176
+
- Rolling summaries are maintained with an epoch counter for context compaction
177
+
- Session scratchpads provide per-session key-value storage
| query | string | yes | — | Search query (used for both semantic and keyword matching) |
403
+
| top_k | integer | no | 5 | Maximum number of results to return |
404
+
405
+
**Output format:**
406
+
```
407
+
1. [score: 0.85] Content of the memory chunk
408
+
Tags: tag1, tag2
409
+
410
+
2. [score: 0.72] Another memory chunk
411
+
Tags: general
412
+
```
413
+
414
+
Returns `"No matching memories found."` when no results match.
415
+
416
+
**Safety tier:** ReadOnly (15s timeout)
417
+
418
+
**Note:** Both memory tools are conditionally registered — they only appear in the tool list when a `MemoryRetrieval` instance is provided to `RegisterBuiltins`. If memory is disabled or the database fails to open, the agent simply doesn't have these tools.
419
+
378
420
## Writing a custom tool
379
421
380
422
To add a new tool, implement the `Tool` interface:
0 commit comments