Skip to content

Commit 0e17331

Browse files
authored
Merge pull request #978 from krissetto/semantic-embeddings
RAG: Semantic embeddings strategy
2 parents 1cc965e + 0b5f357 commit 0e17331

8 files changed

Lines changed: 1018 additions & 6 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ See [MCP Mode documentation](./docs/MCP-MODE.md) for detailed instructions on ex
146146
- **💭 Advanced reasoning** - Built-in "think", "todo" and "memory" tools for
147147
complex problem-solving.
148148
- **🔍 RAG (Retrieval-Augmented Generation)** - Pluggable retrieval strategies
149-
(chunked_embeddings, BM25, more to come..) with hybrid retrieval, fusion, and result reranking support.
149+
(BM25, chunked-embeddings, semantic-embeddings) with hybrid retrieval, result fusion and reranking support.
150150
- **🌐 Multiple AI providers** - Support for OpenAI, Anthropic, Gemini, xAI,
151151
Mistral, Nebius and [Docker Model
152152
Runner](https://docs.docker.com/ai/model-runner/).
@@ -369,7 +369,7 @@ agents:
369369
```
370370

371371
**Features:**
372-
- **Multiple strategies**: Vector (semantic), BM25 (keyword), or both
372+
- **Multiple strategies**: Vector embeddings, semantic embeddings, BM25 (keyword), or combinations
373373
- **Parallel execution**: Strategies run concurrently for fast results
374374
- **Pluggable fusion**: RRF, weighted, or max score combining
375375
- **Result reranking**: Re-score results with specialized models for improved relevance

cagent-schema.json

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -725,12 +725,13 @@
725725
"description": "Retrieval strategy type",
726726
"enum": [
727727
"bm25",
728-
"chunked-embeddings"
728+
"chunked-embeddings",
729+
"semantic-embeddings"
729730
]
730731
},
731732
"embedding_model": {
732733
"type": "string",
733-
"description": "Embedding model reference for chunked-embeddings strategies (looked up in models map, or 'auto' for automatic selection)",
734+
"description": "Embedding model reference for chunked-embeddings and semantic-embeddings strategies (looked up in models map, or 'auto' for automatic selection)",
734735
"examples": [
735736
"openai/text-embedding-3-small",
736737
"dmr/embeddinggemma",
@@ -811,6 +812,41 @@
811812
}
812813
},
813814
"additionalProperties": false
815+
},
816+
"embedding_batch_size": {
817+
"type": "integer",
818+
"description": "Number of text chunks to send to the embedding API in a single request (chunked-embeddings/semantic-embeddings only)",
819+
"minimum": 1,
820+
"default": 50
821+
},
822+
"max_embedding_concurrency": {
823+
"type": "integer",
824+
"description": "Maximum concurrent embedding batch API requests. For semantic-embeddings, also controls parallel LLM calls for generating chunk summaries.",
825+
"minimum": 1,
826+
"default": 3
827+
},
828+
"max_indexing_concurrency": {
829+
"type": "integer",
830+
"description": "Maximum number of files to index in parallel during initialization",
831+
"minimum": 1,
832+
"default": 3
833+
},
834+
"chat_model": {
835+
"type": "string",
836+
"description": "Chat model used to generate semantic representations for each chunk (semantic-embeddings only, required)",
837+
"examples": [
838+
"anthropic/claude-sonnet-4-5",
839+
"openai/gpt-4o-mini"
840+
]
841+
},
842+
"semantic_prompt": {
843+
"type": "string",
844+
"description": "Custom prompt template for semantic LLM. Uses JavaScript template literal syntax with the following placeholders: ${path} (full source file path), ${basename} (base name of file), ${chunk_index} (numeric chunk index), ${content} (raw chunk content), ${ast_context} (AST metadata when ast_context is enabled). Only applicable to semantic-embeddings strategy."
845+
},
846+
"ast_context": {
847+
"type": "boolean",
848+
"description": "Include TreeSitter-derived AST metadata in the semantic prompt (semantic-embeddings only, requires chunking.code_aware for best results)",
849+
"default": false
814850
}
815851
},
816852
"additionalProperties": true

docs/USAGE.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -680,6 +680,39 @@ rag:
680680

681681
**Best for:** Understanding intent, synonyms, paraphrasing, multilingual queries
682682

683+
#### Semantic-Embeddings Strategy (LLM-Enhanced Semantic Search)
684+
685+
Uses an LLM to generate semantic summaries of each chunk before embedding, capturing meaning and intent rather than raw text:
686+
687+
```yaml
688+
rag:
689+
code_search:
690+
docs: [./src, ./pkg]
691+
strategies:
692+
- type: semantic-embeddings
693+
embedding_model: openai/text-embedding-3-small
694+
vector_dimensions: 1536
695+
chat_model: openai/gpt-4o-mini # LLM for generating summaries
696+
database: ./semantic.db
697+
threshold: 0.3
698+
limit: 10
699+
ast_context: true # Include AST metadata in prompts
700+
chunking:
701+
size: 1000
702+
code_aware: true # AST-aware chunking for best results
703+
```
704+
705+
**Best for:** Code search, understanding intent, finding implementations by what they do rather than exact names
706+
707+
**Trade-offs:** Higher quality retrieval but slower indexing (LLM call per chunk) and additional API costs
708+
709+
**Parameters:**
710+
- `embedding_model` (required): Embedding model for vector similarity
711+
- `chat_model` (required): Chat model to generate semantic summaries
712+
- `vector_dimensions` (required): Embedding vector dimensions
713+
- `semantic_prompt`: Custom prompt template (uses `${path}`, `${content}`, `${ast_context}` placeholders)
714+
- `ast_context`: Include TreeSitter AST metadata in prompts (default: `false`)
715+
683716
#### BM25 Strategy (Keyword Search)
684717

685718
Uses traditional keyword matching:
@@ -945,7 +978,7 @@ models:
945978
| `results` | object | Post-processing configuration |
946979

947980
**Strategy Configuration:**
948-
- `type`: Strategy type (`chunked-embeddings`, `bm25`)
981+
- `type`: Strategy type (`chunked-embeddings`, `semantic-embeddings`, `bm25`)
949982
- `docs`: Strategy-specific documents (optional, augments shared docs)
950983
- `database`: Database configuration (path to local sqlite db)
951984
- `chunking`: Chunking configuration
@@ -963,6 +996,23 @@ models:
963996
- `chunking.size`: Chunk size in characters (default: `1000`)
964997
- `chunking.overlap`: Overlap between chunks (default: `75`)
965998

999+
**Semantic-Embeddings Strategy:**
1000+
- `embedding_model` (required): Embedding model reference (e.g., `openai/text-embedding-3-small`)
1001+
- `chat_model` (required): Chat model for generating semantic summaries (e.g., `openai/gpt-4o-mini`)
1002+
- `vector_dimensions` (required): Embedding vector dimensions (e.g., `1536` for text-embedding-3-small)
1003+
- `database`: Database configuration (same formats as chunked-embeddings)
1004+
- `semantic_prompt`: Custom prompt template using JS template literals (`${path}`, `${basename}`, `${chunk_index}`, `${content}`, `${ast_context}`)
1005+
- `ast_context`: Include TreeSitter AST metadata in semantic prompts. Useful for code (default: `false`, best with `code_aware` chunking)
1006+
- `similarity_metric`: Similarity metric (default: `cosine_similarity`)
1007+
- `threshold`: Minimum similarity (0–1, default: `0.5`)
1008+
- `limit`: Max candidates from this strategy for fusion input (default: `5`)
1009+
- `embedding_batch_size`: Chunks per embedding request (default: `50`)
1010+
- `max_embedding_concurrency`: Concurrent embedding/LLM requests (default: `3`)
1011+
- `max_indexing_concurrency`: Concurrent file indexing (default: `3`)
1012+
- `chunking.size`: Chunk size in characters (default: `1000`)
1013+
- `chunking.overlap`: Overlap between chunks (default: `75`)
1014+
- `chunking.code_aware`: Use AST-based chunking (default: `false`, if `true` the `chunking.overlap` will be ignored)
1015+
9661016
**BM25 Strategy:**
9671017
- `database`: Database configuration (same formats as chunked-embeddings)
9681018
- `k1`: Term frequency saturation (recommended range: `1.2–2.0`, default: `1.5`)
@@ -1029,6 +1079,7 @@ Look for logs tagged with:
10291079
See `examples/rag/` directory:
10301080
- `examples/rag/bm25.yaml` - BM25 strategy only
10311081
- `examples/rag/hybrid.yaml` - Hybrid retrieval (chunked-embeddings + BM25)
1082+
- `examples/rag/semantic_embeddings.yaml` - Semantic-embeddings strategy with LLM summaries
10321083
- `examples/rag/reranking.yaml` - Reranking with various providers
10331084
- `examples/rag/reranking_full_example.yaml` - Complete reranking configuration reference
10341085

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# This example demonstrates the semantic-embeddings RAG strategy.
2+
#
3+
# Unlike chunked-embeddings which embeds raw text chunks directly,
4+
# semantic-embeddings uses an LLM to generate semantic summaries of each
5+
# chunk before embedding. This captures the meaning/purpose of code,
6+
# making retrieval more semantic than direct chunk embedding.
7+
#
8+
# Trade-offs:
9+
# - Higher quality retrieval for code and structured content
10+
# - Slower indexing (requires LLM call per chunk)
11+
# - Additional cost from semantic model API calls
12+
13+
agents:
14+
root:
15+
model: openai/gpt-4o
16+
description: assistant with semantic code search
17+
instruction: |
18+
You are a helpful coding assistant with access to semantic code search.
19+
Use the search tool to find relevant code based on meaning, not just keywords.
20+
rag:
21+
- codebase
22+
23+
rag:
24+
codebase:
25+
tool:
26+
description: Search the codebase for relevant code snippets by semantic meaning
27+
docs:
28+
- ../../pkg/**/*.go
29+
- ../../cmd/**/*.go
30+
strategies:
31+
- type: semantic-embeddings
32+
# Required: embedding model for vector similarity
33+
embedding_model: openai/text-embedding-3-small
34+
vector_dimensions: 1536
35+
36+
# Required: chat model to generate semantic summaries of each chunk
37+
chat_model: openai/gpt-4o-mini
38+
39+
# Custom prompt template for generating semantic summaries during indexing.
40+
# Uses JS template literal syntax with these placeholders:
41+
# ${path} - full source file path
42+
# ${basename} - base name of the source file
43+
# ${chunk_index} - numeric index of the chunk
44+
# ${content} - raw chunk content
45+
# ${ast_context} - formatted AST metadata (when ast_context: true)
46+
semantic_prompt: |
47+
You are summarizing source code for semantic search.
48+
49+
File: ${basename}
50+
${ast_context}
51+
52+
```
53+
${content}
54+
```
55+
56+
In 2-4 sentences, explain what this code does. Be specific:
57+
- Name exact functions, types, and methods
58+
- Mention key dependencies or libraries used
59+
- Describe inputs, outputs, and notable behavior
60+
61+
# Optional: database path (defaults to auto-generated name)
62+
database: ./semantic_embeddings.db
63+
64+
# Optional: similarity settings
65+
similarity_metric: cosine_similarity
66+
threshold: 0.3
67+
limit: 10
68+
69+
# Optional: performance tuning
70+
embedding_batch_size: 50 # chunks per embedding API call
71+
max_embedding_concurrency: 3 # parallel embedding/LLM requests
72+
max_indexing_concurrency: 3 # parallel file indexing
73+
74+
# Optional: include AST metadata in semantic prompt (best with code_aware chunking)
75+
ast_context: true
76+
77+
# Optional: chunking configuration
78+
chunking:
79+
size: 1000
80+
respect_word_boundaries: true
81+
code_aware: true # Use tree-sitter for AST-aware chunking
82+
83+
results:
84+
# Optional: rerank results using an LLM for better relevance
85+
reranking:
86+
model: openai/gpt-4o-mini
87+
threshold: 0.3
88+
# Custom criteria to guide the reranking model's relevance scoring
89+
criteria: |
90+
When scoring relevance, prioritize:
91+
- Code that directly implements the queried functionality
92+
- Functions and methods over comments or documentation
93+
- Complete implementations over partial snippets
94+
deduplicate: true
95+
return_full_content: false # return full document content instead of just the matched chunks
96+
limit: 5
97+

0 commit comments

Comments
 (0)