You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: API.md
+61-22Lines changed: 61 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,7 @@ A SQLite extension that provides semantic memory capabilities with hybrid search
5
5
## Table of Contents
6
6
7
7
-[Overview](#overview)
8
+
-[Sync Behavior](#sync-behavior)
8
9
-[Loading the Extension](#loading-the-extension)
9
10
-[SQL Functions](#sql-functions)
10
11
-[General Functions](#general-functions)
@@ -29,6 +30,31 @@ sqlite-memory enables semantic search over text content stored in SQLite. It:
29
30
30
31
---
31
32
33
+
## Sync Behavior
34
+
35
+
All `memory_sync_*` functions use **content-hash change detection** to avoid redundant embedding computation. Each piece of content is hashed before processing — if the hash already exists in the database, the content is skipped.
36
+
37
+
### Change Detection
38
+
39
+
| Scenario | Behavior |
40
+
|----------|----------|
41
+
| New content | Chunked, embedded, and indexed |
42
+
| Unchanged content | Skipped (hash match) |
43
+
| Modified file | Old entry atomically deleted, new content reindexed |
44
+
| Deleted file | Entry removed during directory sync |
45
+
46
+
### Transactional Safety
47
+
48
+
Every sync operation is wrapped in a SQLite **SAVEPOINT** transaction. If any step fails (embedding error, disk issue, constraint violation), the entire operation rolls back. This guarantees:
49
+
50
+
-**No partially-indexed files** — content is either fully indexed or not at all
51
+
-**No orphaned chunks** — embeddings and FTS entries are always consistent with `dbmem_content`
52
+
-**Safe to retry** — a failed sync leaves the database in its previous valid state
53
+
54
+
This makes all sync functions idempotent and safe to call repeatedly (e.g., on a schedule or at application startup).
#### `memory_add_text(content TEXT [, context TEXT])`
203
+
#### `memory_sync_text(content TEXT [, context TEXT])`
178
204
179
-
Adds text content to memory.
205
+
Syncs text content to memory. Duplicate content (same hash) is skipped automatically.
180
206
181
207
**Parameters:**
182
208
| Parameter | Type | Required | Description |
@@ -189,23 +215,24 @@ Adds text content to memory.
189
215
**Notes:**
190
216
- Content is chunked based on `max_tokens` and `overlay_tokens` settings
191
217
- Each chunk is embedded and stored in `dbmem_vault`
192
-
- Content hash prevents duplicate storage
218
+
- Content hash prevents duplicate storage — calling with the same content is a no-op
219
+
- Runs inside a SAVEPOINT transaction (see [Sync Behavior](#sync-behavior))
193
220
- Sets `created_at` timestamp automatically
194
221
195
222
**Example:**
196
223
```sql
197
224
-- Add text without context
198
-
SELECTmemory_add_text('SQLite is a C-language library that implements a small, fast, self-contained SQL database engine.');
225
+
SELECTmemory_sync_text('SQLite is a C-language library that implements a small, fast, self-contained SQL database engine.');
199
226
200
227
-- Add text with context
201
-
SELECTmemory_add_text('Important meeting notes from 2024-01-15...', 'meetings');
228
+
SELECTmemory_sync_text('Important meeting notes from 2024-01-15...', 'meetings');
202
229
```
203
230
204
231
---
205
232
206
-
#### `memory_add_file(path TEXT [, context TEXT])`
233
+
#### `memory_sync_file(path TEXT [, context TEXT])`
207
234
208
-
Adds a file to memory.
235
+
Syncs a file to memory. Unchanged files are skipped; modified files are atomically replaced.
209
236
210
237
**Parameters:**
211
238
| Parameter | Type | Required | Description |
@@ -218,39 +245,51 @@ Adds a file to memory.
218
245
**Notes:**
219
246
- Only processes files matching configured extensions (default: `md,mdx`)
220
247
- File path is stored in `dbmem_content.path`
248
+
- If the file was previously indexed with different content, the old entry (chunks, embeddings, FTS) is deleted and new content is reindexed — all within a single SAVEPOINT transaction (see [Sync Behavior](#sync-behavior))
221
249
- Not available when compiled with `DBMEM_OMIT_IO`
-**Intelligent Sync**: Content-hash change detection — unchanged files are skipped, modified files are atomically replaced, deleted files are cleaned up
37
+
-**Transactional Safety**: Every sync operation runs inside a SAVEPOINT transaction — either fully succeeds or fully rolls back, no partially-indexed content
36
38
-**Efficient Storage**: Binary embeddings with configurable dimensions
37
39
-**Flexible Embedding**: Use local models (llama.cpp) or [vectors.space](https://vectors.space) remote API
@@ -142,6 +144,20 @@ memories = recall("what's the project timeline")
142
144
# Returns relevant context about March 15th deadline
143
145
```
144
146
147
+
## Intelligent Sync
148
+
149
+
All `memory_sync_*` functions use content-hash change detection to avoid redundant work:
150
+
151
+
-**`memory_sync_text`** — Computes a hash of the content. If the same content was already indexed, it is skipped entirely. No duplicate embeddings are ever created.
152
+
-**`memory_sync_file`** — Reads the file and hashes its content. If the file was previously indexed with different content, the old entry (chunks, embeddings, FTS) is atomically replaced. Unchanged files are skipped.
153
+
-**`memory_sync_directory`** — Performs a full two-phase sync:
154
+
1.**Cleanup**: Removes database entries for files that no longer exist on disk
155
+
2.**Scan**: Recursively processes all matching files — adding new ones, replacing modified ones, and skipping unchanged ones
156
+
157
+
Every sync operation is wrapped in a SQLite SAVEPOINT transaction. If anything fails mid-sync (embedding error, disk issue, etc.), the entire operation rolls back cleanly. There is no risk of partially-indexed files or orphaned entries.
158
+
159
+
This makes all sync functions safe to call repeatedly — for example, on a cron schedule or at agent startup — with minimal overhead.
160
+
145
161
## Use Cases
146
162
147
163
-**AI Assistants**: Maintain conversation history and user preferences
@@ -217,7 +233,7 @@ make test
217
233
218
234
-**Local Engine**: Built-in llama.cpp for on-device embeddings (requires GGUF model)
219
235
-**Remote Engine**: [vectors.space](https://vectors.space) API for cloud embeddings (requires free API key)
220
-
-**File I/O**: `memory_add_file` and `memory_add_directory` functions
236
+
-**File I/O**: `memory_sync_file` and `memory_sync_directory` functions
0 commit comments