Skip to content

Commit 6ab17f1

Browse files
committed
Bump to v0.4.0, document context graph feature
- Bump version 0.3.3 → 0.4.0 (new feature, backward compatible) - Document context_query, graph_annotate, session_summary tools in README - Add graph configuration env vars to README - Update project structure to include graph/ and tools/ - Update architecture.md with graph component details - Add code-graph, context-graph keywords to package.json
1 parent 69d4c91 commit 6ab17f1

3 files changed

Lines changed: 138 additions & 6 deletions

File tree

README.md

Lines changed: 91 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ Using local embeddings and vector search, it bridges the gap between text search
2626
## Features
2727

2828
- **Semantic search** - Find code by meaning, not just keywords
29+
- **Context graph** - Understand how code connects (calls, imports, inheritance) with graph-enhanced search
30+
- **Session memory** - Track agent exploration state across multi-turn conversations
2931
- **AST-aware chunking** - Tree-sitter WASM for cross-platform parsing, no native compilation required
3032
- **Local embeddings** - ONNX Runtime with nomic-embed-code (768 dims, 8K context)
3133
- **Hybrid search** - Vector similarity + BM25 keyword matching
@@ -127,14 +129,80 @@ semantic_search({
127129
## Architecture
128130

129131
```
130-
semantic_search tool (MCP Server)
132+
MCP Server
131133
├── Chunker (web-tree-sitter) → AST-aware code splitting (WASM, cross-platform)
132134
├── Embedder (ONNX local) → nomic-embed-code, 768 dims
133135
├── Vector DB (LanceDB) → Serverless, hybrid search
136+
├── Context Graph (SQLite) → Structural relationships + session memory
134137
├── File Watcher (chokidar) → Incremental updates
135138
└── Hybrid Search → BM25 + vector + reranking
136139
```
137140

141+
## Context Graph (Opt-in)
142+
143+
The context graph adds structural awareness on top of semantic search. Enable it with:
144+
145+
```bash
146+
SEMANTIC_CODE_GRAPH_ENABLED=true
147+
```
148+
149+
When enabled, the server extracts structural relationships (calls, imports, extends, implements) from the AST during indexing and stores them in a SQLite graph. This powers three additional tools.
150+
151+
### Tool: context_query
152+
153+
Semantic search + graph neighborhood expansion. Returns search results enriched with structural context.
154+
155+
| Parameter | Type | Required | Description |
156+
|-----------|------|----------|-------------|
157+
| `query` | string | Yes | Natural language query |
158+
| `path` | string | No | Directory to scope the search |
159+
| `limit` | number | No | Maximum results (default: 10) |
160+
| `file_pattern` | string | No | Glob pattern to filter files |
161+
| `depth` | number | No | Graph traversal depth (1-3, default: 1) |
162+
| `edge_kinds` | string[] | No | Edge types to follow: calls, imports, extends, implements, exports, agent_linked |
163+
| `session_id` | string | No | Session ID for exploration tracking |
164+
165+
```
166+
context_query({
167+
query: "payment processing",
168+
depth: 2,
169+
session_id: "debug-checkout"
170+
})
171+
```
172+
173+
Returns each search result plus its graph neighbors — callers, callees, imports, and inheritance — without reading additional files.
174+
175+
### Tool: graph_annotate
176+
177+
Leave notes on code nodes and create links between related chunks.
178+
179+
| Parameter | Type | Required | Description |
180+
|-----------|------|----------|-------------|
181+
| `session_id` | string | Yes | Session ID |
182+
| `node_id` | string | Yes | Chunk ID to annotate |
183+
| `note` | string | No | Note to attach |
184+
| `link_to` | string[] | No | Chunk IDs to create agent_linked edges to |
185+
| `reasoning` | string | No | Reasoning log entry |
186+
187+
### Tool: session_summary
188+
189+
View exploration state: visited nodes, frontier, annotations, reasoning log, and graph stats.
190+
191+
| Parameter | Type | Required | Description |
192+
|-----------|------|----------|-------------|
193+
| `session_id` | string | Yes | Session ID to summarize |
194+
195+
### Graph Configuration
196+
197+
| Variable | Description | Default |
198+
|----------|-------------|---------|
199+
| `SEMANTIC_CODE_GRAPH_ENABLED` | Enable the context graph | `false` |
200+
| `SEMANTIC_CODE_GRAPH_DEPTH` | Default BFS traversal depth (1-5) | `2` |
201+
| `SEMANTIC_CODE_SESSION_TTL` | Session TTL in seconds | `3600` |
202+
| `SEMANTIC_CODE_EDGE_KINDS` | Comma-separated edge types to follow | all types |
203+
204+
The graph degrades gracefully — if SQLite initialization fails, semantic search continues to work without graph features.
205+
138206
## Supported Languages
139207

140208
- TypeScript / JavaScript (including TSX/JSX)
@@ -155,6 +223,10 @@ Other languages fall back to line-based chunking.
155223
|----------|-------------|---------|
156224
| `SEMANTIC_CODE_ROOT` | Root directory to index | Current working directory |
157225
| `SEMANTIC_CODE_INDEX` | Custom index storage location | `.semantic-code/index/` |
226+
| `SEMANTIC_CODE_GRAPH_ENABLED` | Enable context graph | `false` |
227+
| `SEMANTIC_CODE_GRAPH_DEPTH` | Default graph traversal depth (1-5) | `2` |
228+
| `SEMANTIC_CODE_SESSION_TTL` | Session TTL in seconds | `3600` |
229+
| `SEMANTIC_CODE_EDGE_KINDS` | Edge types to follow (comma-separated) | all types |
158230

159231
### Default Ignore Patterns
160232

@@ -194,6 +266,7 @@ Invalid inputs throw typed errors (`InvalidFilterError`, `PathTraversalError`, `
194266
## Storage
195267

196268
- Index location: `.semantic-code/index/` (add to `.gitignore`)
269+
- Graph database: `.semantic-code/index/graph.db` (SQLite, created when graph is enabled)
197270
- Model cache: `~/.cache/semantic-code-mcp/`
198271
- Estimated size: 3GB codebase → ~1.5GB index (with float16)
199272

@@ -231,21 +304,35 @@ semantic-code-mcp/
231304
├── src/
232305
│ ├── index.ts # MCP server entry point
233306
│ ├── chunker/
234-
│ │ ├── index.ts # Main chunker logic
307+
│ │ ├── index.ts # AST-aware chunker + edge extraction
235308
│ │ ├── languages.ts # Language configs with WASM paths
236309
│ │ └── wasm-loader.ts # WASM grammar loader with caching
237310
│ ├── embedder/
238311
│ │ ├── index.ts # ONNX embedding generation
239312
│ │ └── model.ts # Model download & loading
313+
│ ├── graph/
314+
│ │ ├── index.ts # SQLite graph store (nodes, edges, BFS)
315+
│ │ ├── config.ts # Graph configuration from env vars
316+
│ │ ├── extractor.ts # Edge resolution (raw edges → graph edges)
317+
│ │ ├── schema.ts # SQLite DDL for graph tables
318+
│ │ ├── session.ts # In-memory session manager
319+
│ │ └── types.ts # GraphNode, GraphEdge, RawEdge types
240320
│ ├── store/
241321
│ │ └── index.ts # LanceDB integration
242322
│ ├── search/
243323
│ │ ├── index.ts # Hybrid search orchestration
244324
│ │ └── reranker.ts # Cross-encoder reranking
245325
│ ├── watcher/
246326
│ │ └── index.ts # File watcher + incremental indexing
247-
│ └── tools/
248-
│ └── semantic-search.ts # MCP tool definition
327+
│ ├── tools/
328+
│ │ ├── semantic-search.ts # semantic_search tool
329+
│ │ ├── context-query.ts # context_query tool (search + graph)
330+
│ │ ├── graph-annotate.ts # graph_annotate tool
331+
│ │ └── session-summary.ts # session_summary tool
332+
│ └── utils/
333+
│ ├── logger.ts # Structured logging
334+
│ ├── validation.ts # Shared ID validation
335+
│ └── ...
249336
├── grammars/ # Pre-built WASM parsers
250337
├── scripts/
251338
│ └── copy-grammars.js # Build script for WASM files

docs/architecture.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,19 @@ This document describes the internal architecture of semantic-code-mcp.
5656
│ │
5757
│ - Tree-sitter parsing │
5858
│ - Semantic splitting │
59+
│ - Edge extraction │
5960
│ - Fallback chunking │
6061
└───────────────────────┘
62+
63+
┌───────────────────────┐
64+
│ Context Graph │
65+
│ (graph/) │
66+
│ │
67+
│ - SQLite graph store │
68+
│ - BFS traversal │
69+
│ - Session memory │
70+
│ - Edge resolution │
71+
└───────────────────────┘
6172
```
6273

6374
## Component Details
@@ -66,9 +77,10 @@ This document describes the internal architecture of semantic-code-mcp.
6677

6778
The entry point that implements the Model Context Protocol:
6879

69-
- Registers the `semantic_search` tool
80+
- Registers `semantic_search`, `context_query`, `graph_annotate`, and `session_summary` tools
7081
- Handles JSON-RPC communication over stdio
7182
- Manages server lifecycle
83+
- Conditionally initializes the context graph when `SEMANTIC_CODE_GRAPH_ENABLED=true`
7284

7385
### 2. SemanticSearchTool (`src/tools/semantic-search.ts`)
7486

@@ -185,6 +197,37 @@ File Change → Debounce (1s) → Read Content → Check Hash → Chunk → Embe
185197
- Debouncing to avoid excessive re-indexing
186198
- Graceful shutdown with pending operation tracking
187199

200+
### 8. Context Graph (`src/graph/`)
201+
202+
Opt-in structural awareness layer using SQLite (better-sqlite3):
203+
204+
**Components:**
205+
- **GraphStore** (`index.ts`): SQLite-backed store for nodes and edges with BFS traversal
206+
- **Extractor** (`extractor.ts`): Resolves raw edges (symbol names) to concrete graph edges (chunk IDs)
207+
- **SessionManager** (`session.ts`): In-memory session state with TTL-based cleanup
208+
- **Config** (`config.ts`): Environment variable parsing for graph settings
209+
210+
**Edge Types:**
211+
- `calls` — function/method call relationships
212+
- `imports` — import/require dependencies
213+
- `extends` — class inheritance
214+
- `implements` — interface implementation
215+
- `exports` — module exports
216+
- `agent_linked` — agent-created links via `graph_annotate`
217+
218+
**Schema (SQLite):**
219+
```
220+
graph_nodes: id, file_path, symbol_name, kind, start_line, end_line, updated_at, stale
221+
graph_edges: source_id, target_id, edge_type, weight, metadata
222+
graph_meta: key, value
223+
```
224+
225+
**Design Decisions:**
226+
- SQLite for graph traversal (BFS < 1ms), separate from LanceDB for vectors
227+
- In-memory sessions (ephemeral by design, tied to agent tasks not codebase)
228+
- Graceful degradation: graph failure never breaks semantic search
229+
- ID validation via shared `utils/validation.ts` for defense-in-depth
230+
188231
## Data Flow
189232

190233
### Indexing Flow

package.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@smallthinkingmachines/semantic-code-mcp",
3-
"version": "0.3.3",
3+
"version": "0.4.0",
44
"description": "MCP server for semantic code search using AST-aware chunking and vector embeddings",
55
"type": "module",
66
"main": "dist/index.js",
@@ -35,6 +35,8 @@
3535
"semantic-search",
3636
"mcp-server",
3737
"code-understanding",
38+
"code-graph",
39+
"context-graph",
3840
"vector-search",
3941
"ai-coding",
4042
"claude-code",

0 commit comments

Comments
 (0)