Skip to content

Commit 8a8b0d2

Browse files
committed
docs: update README and docs to reflect memory system implementation
Update all documentation to cover the persistent memory system added in Step 5 — SQLite storage, hybrid retrieval (vector + FTS with RRF), OpenAI embeddings, rolling summarization, memory tools, and session store.
1 parent fb6a7d3 commit 8a8b0d2

4 files changed

Lines changed: 226 additions & 31 deletions

File tree

README.md

Lines changed: 46 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ Think of it as building your own Claude Code / Cursor agent from scratch.
2020
│ agent turn loop + session │
2121
│ stream → think → act → observe → loop │
2222
├──────────────┬──────────────┬───────────────┤
23-
│ Provider │ Tools │ Memory │
24-
│ Layer │ System │ (Step 5)
25-
│ │ │ (planned)
26-
│ OpenAI │ read_file │
27-
│ Anthropic │ write_file │
28-
│ Gemini │ list_files │
29-
│ │ shell_exec │
30-
│ │ web_fetch │
23+
│ Provider │ Tools │ Memory
24+
│ Layer │ System │ SQLite +
25+
│ │ │ Hybrid Search
26+
│ OpenAI │ read_file │ store/recall
27+
│ Anthropic │ write_file │ vector + FTS
28+
│ Gemini │ list_files │ embeddings
29+
│ │ shell_exec │ summaries
30+
│ │ web_fetch │ sessions
3131
├──────────────┴──────────────┴───────────────┤
3232
│ Types │
3333
│ interfaces, contracts, config │
@@ -81,7 +81,7 @@ internal/
8181
gemini.go Google Gemini GenerateContent
8282
reliable.go Retry wrapper with exponential backoff
8383
runtime/ Agent turn loop
84-
session.go In-memory conversation buffer
84+
session.go In-memory conversation buffer + compaction
8585
runtime.go AgentRuntime, Run(), stream accumulation, tool dispatch
8686
tool/ Tool system
8787
schema.go JSON Schema builder helpers
@@ -92,7 +92,16 @@ internal/
9292
list_files.go list_files tool
9393
shell_exec.go shell_exec tool
9494
web_fetch.go web_fetch tool
95+
memory_save.go memory_save tool
96+
memory_search.go memory_search tool
9597
builtin.go RegisterBuiltins() convenience
98+
memory/ Persistent memory system
99+
sqlite.go SQLite database wrapper + schema migrations
100+
store.go Memory store (CRUD + hybrid retrieval)
101+
retrieval.go Vector search, FTS search, RRF fusion
102+
session_store.go Session lifecycle management
103+
embedding.go Embedding backend factory
104+
embedding_openai.go OpenAI embedding integration
96105
```
97106

98107
## Configuration
@@ -117,13 +126,15 @@ All providers implement the same `Provider` interface. Swap between them with on
117126

118127
## Tools
119128

120-
| Tool | Safety tier | Timeout | What it does |
121-
|--------------|---------------|---------|---------------------------------------------|
122-
| `read_file` | ReadOnly | 10s | Read file with line numbers, offset, limit |
123-
| `write_file` | SideEffecting | 10s | Write/append to file, auto-creates dirs |
124-
| `list_files` | ReadOnly | 10s | List directory, optional recursive |
125-
| `shell_exec` | Privileged | 60s | Run shell command, capture stdout/stderr |
126-
| `web_fetch` | SideEffecting | 30s | HTTP GET/POST, return status + body |
129+
| Tool | Safety tier | Timeout | What it does |
130+
|-----------------|---------------|---------|---------------------------------------------|
131+
| `read_file` | ReadOnly | 10s | Read file with line numbers, offset, limit |
132+
| `write_file` | SideEffecting | 10s | Write/append to file, auto-creates dirs |
133+
| `list_files` | ReadOnly | 10s | List directory, optional recursive |
134+
| `shell_exec` | Privileged | 60s | Run shell command, capture stdout/stderr |
135+
| `web_fetch` | SideEffecting | 30s | HTTP GET/POST, return status + body |
136+
| `memory_save` | SideEffecting | 15s | Save knowledge to persistent memory |
137+
| `memory_search` | ReadOnly | 15s | Search memory with hybrid retrieval |
127138

128139
### Security
129140

@@ -151,6 +162,25 @@ The runtime is the core agent loop that ties providers and tools together:
151162

152163
The turn timeout covers both provider streaming and tool execution as a single budget. Ctrl-C (SIGINT/SIGTERM) propagates cleanly into the runtime via context cancellation.
153164

165+
When the conversation approaches the context limit, the runtime triggers rolling summarization — it compacts older messages into a summary and keeps recent turns, so the agent can maintain context across long sessions.
166+
167+
## Memory
168+
169+
Persistent memory backed by SQLite (pure Go, no CGO via `modernc.org/sqlite`).
170+
171+
**Hybrid retrieval** combines two search strategies:
172+
- **Vector search** — cosine similarity over OpenAI embeddings (semantic meaning)
173+
- **Full-text search** — SQLite FTS5 with BM25 ranking (exact keywords)
174+
- Results are merged using **Reciprocal Rank Fusion** with configurable weights (default 0.7 vector / 0.3 FTS)
175+
176+
**Graceful degradation** — if no `OPENAI_API_KEY` is set, the system falls back to FTS-only search. Memory is optional; the agent works without it.
177+
178+
**What gets persisted:**
179+
- Memory chunks (knowledge stored by the agent or user)
180+
- Conversation events (every message in the turn loop)
181+
- Rolling summaries (context compaction across sessions)
182+
- Session scratchpad (key-value state per session)
183+
154184
## Tests
155185

156186
```bash

docs/architecture.md

Lines changed: 115 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Everything in Yantra exists to make this loop work well:
3939
- **Security** prevents the LLM from doing damage
4040
- **Config** makes it all customizable
4141
- **Runtime** runs the think → act → observe loop
42-
- **Memory** (planned) lets the agent remember across sessions
42+
- **Memory** lets the agent remember across sessions
4343
- **Gateway** (planned) lets you control it remotely
4444

4545
## Layer 1: Types (`internal/types/`)
@@ -341,7 +341,9 @@ Schema(
341341

342342
This is a small quality-of-life builder. It outputs valid JSON Schema as `json.RawMessage`, which slots directly into `FunctionDecl.Parameters`.
343343

344-
### The five built-in tools
344+
### The built-in tools
345+
346+
Yantra ships with 7 built-in tools (5 core + 2 memory):
345347

346348
**read_file** (ReadOnly, 10s timeout)
347349
- Reads a file with 6-digit line numbers: ` 1\tpackage main`
@@ -459,9 +461,17 @@ The runtime classifies errors:
459461
- Max turns reached → `ErrMaxTurns`
460462
- Tool execution errors → placed in message content (the LLM sees them and can recover)
461463

462-
### Context budget
464+
### Context budget and summarization
465+
466+
After each tool dispatch, the runtime estimates token usage (`totalChars / 4`) and checks if the session is approaching the context limit (`TriggerRatio * MaxContextTokens`). When triggered:
467+
468+
1. A `MinTurns` guard (default 6) prevents summarizing too-short conversations
469+
2. The runtime builds a summarization prompt including the existing summary (if any) and the messages to compact
470+
3. The LLM generates a rolling summary via a dedicated system prompt
471+
4. The summary is stored in the `session_summaries` table with an incrementing epoch
472+
5. `session.CompactWithSummary()` replaces older messages with a `[Conversation Summary]` pseudo-message, keeping the most recent turns
463473

464-
After each tool dispatch, the runtime estimates token usage (chars/4) and logs a warning if the session is approaching the context limit (`TriggerRatio * MaxContextTokens`). Actual summarization is deferred to Step 5 (Memory).
474+
On session startup, if a prior summary exists, it's injected as the first messages so the agent has context from previous runs.
465475

466476
## How the pieces connect
467477

@@ -482,10 +492,109 @@ yantra run "add error handling to server.go"
482492
└── loop until text-only response or MaxTurns
483493
```
484494

495+
## Layer 5: Memory (`internal/memory/`)
496+
497+
Memory gives the agent persistence — it can store knowledge, recall it later, and maintain context across sessions.
498+
499+
### Storage: SQLite (no CGO)
500+
501+
The memory system uses `modernc.org/sqlite`, a pure-Go SQLite implementation. No CGO means the binary cross-compiles trivially. The database opens with **WAL mode** and a 5-second busy timeout for concurrent access safety.
502+
503+
Schema (6 tables):
504+
505+
```
506+
chunks — memory fragments with optional embedding BLOBs
507+
chunks_fts — FTS5 virtual table (porter stemmer + unicode61)
508+
sessions — session lifecycle tracking
509+
conversation_events — per-session conversation log
510+
session_summaries — rolling summary per session (with epoch counter)
511+
scratchpads — key-value state per session
512+
```
513+
514+
### Hybrid retrieval
515+
516+
Memory search combines two strategies and merges them with Reciprocal Rank Fusion (RRF):
517+
518+
```
519+
Query: "how does authentication work?"
520+
521+
├── Vector Search (weight: 0.7)
522+
│ Compute embedding → cosine similarity against all chunks
523+
│ Returns: semantically similar results
524+
525+
├── FTS Search (weight: 0.3)
526+
│ SQLite FTS5 with BM25 ranking
527+
│ Returns: keyword-matching results
528+
529+
└── Reciprocal Rank Fusion (k=60)
530+
Merge + deduplicate by chunk ID
531+
Score: weight / (60 + rank) per source
532+
Return top K results
533+
```
534+
535+
The system fetches `topK * 3` candidates from each source before fusion, ensuring good coverage. Weights are configurable — higher `VectorWeight` favors semantic matches, higher `FTSWeight` favors exact keyword matches.
536+
537+
**Graceful degradation:**
538+
- No `OPENAI_API_KEY` → FTS-only search (no embeddings)
539+
- FTS query fails (malformed syntax) → vector-only search
540+
- No memory DB → agent runs without memory, logs a warning
541+
542+
### Embeddings
543+
544+
Embeddings are computed via the OpenAI API and stored as compact little-endian binary BLOBs (4 bytes per float32 dimension), saving ~75% compared to JSON-encoded arrays.
545+
546+
Supported models:
547+
548+
| Model | Dimensions |
549+
|-------|-----------|
550+
| `text-embedding-3-small` (default) | 1536 |
551+
| `text-embedding-3-large` | 3072 |
552+
| `text-embedding-ada-002` | 1536 |
553+
554+
The factory returns `nil` (not an error) when the API key is missing, so the system can always boot.
555+
556+
### Conversation persistence
557+
558+
Every message in the turn loop (user, assistant, tool results) is persisted via `StoreConversationEvent()`. Persistence uses the turn context with deadline, ensuring it respects timeouts. Failed persistence is logged as a warning but does not halt execution (fire-and-forget).
559+
560+
### Memory tools
561+
562+
Two tools expose memory to the agent:
563+
564+
- **`memory_save`** (SideEffecting, 15s) — stores knowledge with optional tags
565+
- **`memory_search`** (ReadOnly, 15s) — hybrid search with ranked results
566+
567+
These are conditionally registered only when a `MemoryRetrieval` instance is available.
568+
569+
### Session store
570+
571+
`SQLiteSessionStore` manages session lifecycle:
572+
573+
| Operation | Details |
574+
|-----------|---------|
575+
| Create | Generates `ses_<32 hex chars>` ID |
576+
| Get | Single session by ID |
577+
| List | All sessions, ordered by `updated_at DESC`, optionally including archived |
578+
| Update | Name, message count, timestamps |
579+
| Archive | Soft-delete (sets `archived = 1`) |
580+
581+
### Key patterns
582+
583+
**Interface-driven design** — the runtime and tools depend on `types.MemoryRetrieval` and `types.SessionStore` interfaces, not concrete types. Compile-time checks enforce this:
584+
```go
585+
var _ types.MemoryRetrieval = (*Store)(nil)
586+
var _ types.SessionStore = (*SQLiteSessionStore)(nil)
587+
```
588+
589+
**Transaction safety** — multi-table operations (Store, Forget, StoreConversationEvent) use explicit transactions with `defer tx.Rollback()`.
590+
591+
**Binary embedding storage** — float32 slices are serialized as little-endian BLOBs via custom `encodeFloat32s`/`decodeFloat32s` helpers.
592+
485593
## What's next
486594

487595
| Step | What | Purpose |
488596
|------|------|---------|
489-
| 5 | Memory | Persistent vector DB for cross-session recall + rolling summarization |
490597
| 6 | Gateway | WebSocket server for remote control |
491-
| 7 | Multi-agent | Specialist subagents with delegation |
598+
| 7 | MCP | Model Context Protocol client for external tools |
599+
| 8 | TUI | Terminal UI with Bubble Tea |
600+
| 9 | Polish | Config scaffolding, cross-platform build, docs |

docs/config.md

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -131,36 +131,50 @@ When the conversation history approaches the context limit (trigger_ratio × max
131131

132132
### runtime.summarization
133133

134-
Controls rolling summarization behavior.
134+
Controls rolling summarization behavior. When the context budget is exceeded, the runtime calls the LLM to generate a summary of older messages, stores it in the database, and compacts the session.
135135

136136
```toml
137137
[runtime.summarization]
138138
target_ratio = 0.5 # Aim to reduce context to 50% after summarization
139139
min_turns = 6 # Don't summarize conversations shorter than 6 turns
140140
```
141141

142+
**How it works:** When triggered, the runtime builds a summarization prompt with the existing summary (if any) and messages to compact. The LLM generates a rolling summary, which is stored in `session_summaries` with an incrementing epoch. Older messages are replaced with a `[Conversation Summary]` pseudo-message. On session restart, the prior summary is injected as the opening context.
143+
142144
### memory
143145

144-
Persistent memory backed by a vector database.
146+
Persistent memory backed by SQLite (pure Go, no CGO via `modernc.org/sqlite`).
145147

146148
```toml
147149
[memory]
148-
enabled = true
149-
db_path = ".yantra/memory.db" # SQLite/libSQL database path
150-
embedding_backend = "openai" # "openai" or "ollama"
150+
enabled = true # Set false to disable memory entirely
151+
db_path = ".yantra/memory.db" # SQLite database path (auto-created)
152+
embedding_backend = "openai" # "openai" or "" (empty = openai default)
151153

152154
[memory.embedding]
153155
model = "text-embedding-3-small" # OpenAI embedding model
154-
# ollama_url = "http://localhost:11434" # For ollama backend
155-
# ollama_model = "nomic-embed-text" # For ollama backend
156+
# Supported models:
157+
# text-embedding-3-small (1536 dimensions, default)
158+
# text-embedding-3-large (3072 dimensions)
159+
# text-embedding-ada-002 (1536 dimensions)
160+
# ollama_url = "http://localhost:11434" # For ollama backend (reserved, not yet implemented)
161+
# ollama_model = "nomic-embed-text" # For ollama backend (reserved, not yet implemented)
156162

157163
[memory.retrieval]
158-
top_k = 8 # Number of results to retrieve
164+
top_k = 8 # Number of results to retrieve per search
159165
vector_weight = 0.7 # Weight for vector similarity (0-1)
160166
fts_weight = 0.3 # Weight for full-text search (0-1)
161167
```
162168

163-
Memory uses hybrid retrieval — combining vector similarity search (semantic meaning) with full-text search (exact keyword matching). The weights control the balance. Higher vector_weight favors semantic matches; higher fts_weight favors exact matches.
169+
Memory uses **hybrid retrieval** — combining vector similarity search (semantic meaning) with full-text search (exact keyword matching via SQLite FTS5). Results from both sources are merged using **Reciprocal Rank Fusion** (RRF) with configurable weights. Higher `vector_weight` favors semantic matches; higher `fts_weight` favors exact keyword matches.
170+
171+
**Graceful degradation:** If `OPENAI_API_KEY` is not set, the system falls back to FTS-only search (no vector embeddings). If the database fails to open, the agent continues without memory. Memory tools (`memory_save`, `memory_search`) are only registered when memory is available.
172+
173+
**Storage details:**
174+
- Embeddings are stored as compact little-endian binary BLOBs (4 bytes per dimension)
175+
- Conversation events are persisted per-session for history recall
176+
- Rolling summaries are maintained with an epoch counter for context compaction
177+
- Session scratchpads provide per-session key-value storage
164178

165179
### tools
166180

docs/tools.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,48 @@ status: 200
375375
- No cookie handling
376376
- No redirect following configuration (uses Go's default: follows up to 10 redirects)
377377

378+
### memory_save
379+
380+
**Purpose:** Persist knowledge for future recall across sessions.
381+
382+
**Parameters:**
383+
| Name | Type | Required | Description |
384+
|---------|----------------|----------|-------------|
385+
| content | string | yes | The knowledge to store |
386+
| tags | array (string) | no | Optional tags for categorization |
387+
388+
**Behavior:**
389+
- Calls `mem.Store()` with source `"user_saved"`
390+
- If an embedding backend is configured, computes and stores the embedding alongside the content
391+
- Returns: `"Saved to memory (id: <hex>)"`
392+
393+
**Safety tier:** SideEffecting (15s timeout)
394+
395+
### memory_search
396+
397+
**Purpose:** Search persistent memory using hybrid retrieval (vector + full-text).
398+
399+
**Parameters:**
400+
| Name | Type | Required | Default | Description |
401+
|-------|---------|----------|---------|-------------|
402+
| query | string | yes || Search query (used for both semantic and keyword matching) |
403+
| top_k | integer | no | 5 | Maximum number of results to return |
404+
405+
**Output format:**
406+
```
407+
1. [score: 0.85] Content of the memory chunk
408+
Tags: tag1, tag2
409+
410+
2. [score: 0.72] Another memory chunk
411+
Tags: general
412+
```
413+
414+
Returns `"No matching memories found."` when no results match.
415+
416+
**Safety tier:** ReadOnly (15s timeout)
417+
418+
**Note:** Both memory tools are conditionally registered — they only appear in the tool list when a `MemoryRetrieval` instance is provided to `RegisterBuiltins`. If memory is disabled or the database fails to open, the agent simply doesn't have these tools.
419+
378420
## Writing a custom tool
379421

380422
To add a new tool, implement the `Tool` interface:

0 commit comments

Comments
 (0)