Skip to content

Commit 9c0158e

Browse files
authored
Merge pull request #897 from krissetto/rag-reranking
RAG re-ranking
2 parents bcdf6ea + 6584ed7 commit 9c0158e

24 files changed

Lines changed: 2245 additions & 108 deletions

File tree

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ See [MCP Mode documentation](./docs/MCP-MODE.md) for detailed instructions on ex
146146
- **💭 Advanced reasoning** - Built-in "think", "todo" and "memory" tools for
147147
complex problem-solving.
148148
- **🔍 RAG (Retrieval-Augmented Generation)** - Pluggable retrieval strategies
149-
(chunked_embeddings, BM25, more to come..) with hybrid retrieval and fusion support.
149+
(chunked_embeddings, BM25, more to come..) with hybrid retrieval, fusion, and result reranking support.
150150
- **🌐 Multiple AI providers** - Support for OpenAI, Anthropic, Gemini, xAI,
151151
Mistral, Nebius and [Docker Model
152152
Runner](https://docs.docker.com/ai/model-runner/).
@@ -372,10 +372,38 @@ agents:
372372
- **Multiple strategies**: Vector (semantic), BM25 (keyword), or both
373373
- **Parallel execution**: Strategies run concurrently for fast results
374374
- **Pluggable fusion**: RRF, weighted, or max score combining
375+
- **Result reranking**: Re-score results with specialized models for improved relevance
375376
- **Per-strategy configuration**: Different thresholds, limits, and documents
376377
- **Auto file watching**: Reindex automatically on file changes
377378

378-
See the [RAG documentation](docs/RAG.md) for complete details, examples, and debugging guides.
379+
### Result Reranking
380+
381+
Improve search quality by re-scoring retrieved results with a reranking model:
382+
383+
```yaml
384+
rag:
385+
knowledge_base:
386+
docs: [./documents]
387+
strategies:
388+
- type: chunked-embeddings
389+
model: openai/text-embedding-3-small
390+
limit: 20 # Retrieve more candidates for reranking
391+
392+
results:
393+
reranking:
394+
model: openai/gpt-4.1-mini # Any chat model or DMR reranker
395+
top_k: 10 # Only rerank top 10 (optional)
396+
threshold: 0.3 # Filter low-scoring results (optional)
397+
criteria: | # Domain-specific relevance guidance (optional, not used with DMR reranking specific models)
398+
Prioritize recent documentation and practical examples.
399+
Documents from official sources are more relevant.
400+
limit: 5 # Final top results after reranking
401+
```
402+
403+
**Supported providers:** DMR (native `/rerank` endpoint), OpenAI, Anthropic, Gemini (via structured outputs)
404+
**Note:** Temperature defaults to 0.0 for more deterministic scoring when not explicitly set.
405+
406+
See the [RAG documentation in USAGE.md](docs/USAGE.md#rag-configuration) for complete details, examples, and debugging guides.
379407

380408
## Quickly generate agents and agent teams with `cagent new`
381409

cagent-schema.json

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@
277277
},
278278
"provider_opts": {
279279
"type": "object",
280-
"description": "Provider-specific options. dmr: runtime_flags. anthropic: interleaved_thinking (boolean, default false)",
280+
"description": "Provider-specific options. dmr: runtime_flags. anthropic: interleaved_thinking (boolean, default false). openai/anthropic/google: rerank_prompt (string) to fully override the system prompt used for RAG reranking (advanced - prefer using results.reranking.criteria for domain-specific guidance).",
281281
"additionalProperties": true
282282
},
283283
"track_usage": {
@@ -820,6 +820,45 @@
820820
},
821821
"additionalProperties": false
822822
},
823+
"reranking": {
824+
"type": "object",
825+
"description": "Configuration for reranking results using a specialized reranking model. Reranking re-scores the retrieved results to improve relevance accuracy.",
826+
"properties": {
827+
"model": {
828+
"type": "string",
829+
"description": "Model reference for reranking (can be inline like 'dmr/model-name' or a reference to a defined model)",
830+
"examples": [
831+
"dmr/hf.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF",
832+
"reranker_model"
833+
]
834+
},
835+
"top_k": {
836+
"type": "integer",
837+
"description": "Optional: only rerank top K results for efficiency. When unset or 0, defaults to the global results.limit (which itself defaults to 15).",
838+
"minimum": 0,
839+
"default": 0
840+
},
841+
"threshold": {
842+
"type": "number",
843+
"description": "Optional: minimum score threshold after reranking (filter results below this score)",
844+
"minimum": 0,
845+
"maximum": 1,
846+
"default": 0.5
847+
},
848+
"criteria": {
849+
"type": "string",
850+
"description": "Optional: domain-specific relevance criteria to guide scoring. This text is appended to the base reranking prompt to customize what 'relevance' means for your use case. Supported by OpenAI, Anthropic, and Gemini providers (not DMR native reranking).",
851+
"examples": [
852+
"Prioritize recent information and practical examples over historical context",
853+
"When scoring relevance, focus on code examples and implementation details"
854+
]
855+
}
856+
},
857+
"required": [
858+
"model"
859+
],
860+
"additionalProperties": false
861+
},
823862
"deduplicate": {
824863
"type": "boolean",
825864
"description": "Remove duplicate documents across strategies",

docs/USAGE.md

Lines changed: 161 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -714,7 +714,7 @@ rag:
714714
strategies:
715715
- type: chunked-embeddings
716716
model: embedder
717-
docs: [./pdfs] # Additional chunked-embeddings-specific docs
717+
docs: [./docs] # Additional chunked-embeddings-specific docs
718718
database: ./vector.db
719719
threshold: 0.5
720720
limit: 20 # Retrieve 20 candidates
@@ -786,6 +786,155 @@ results:
786786

787787
**Best for:** Strategies using the same scoring scale. Takes maximum score.
788788

789+
### Result Reranking
790+
791+
Reranking re-scores retrieved documents using a specialized model to improve relevance. This is applied **after** retrieval and fusion, but **before** the final limit.
792+
793+
#### Why Rerank?
794+
795+
Initial retrieval strategies (embeddings, BM25) are fast but approximate. Reranking uses a more sophisticated model to:
796+
- Improve relevance scoring accuracy
797+
- Apply domain-specific criteria
798+
- Consider document metadata (source, recency, type)
799+
- Filter low-quality results
800+
801+
#### Provider Support
802+
803+
| Provider | Implementation | Recommended Use Case |
804+
|------------|-----------------------------------------|------------------------------|
805+
| DMR | Native `/rerank` endpoint | Production (fast, efficient) |
806+
| OpenAI | Chat completion + structured outputs | Flexible, criteria-based |
807+
| Anthropic | Beta API + structured outputs | Complex relevance rules |
808+
| Gemini | Structured outputs | Cost-effective scoring |
809+
810+
#### Basic Reranking Configuration
811+
812+
```yaml
813+
rag:
814+
docs_with_reranking:
815+
docs: [./knowledge_base]
816+
strategies:
817+
- type: chunked-embeddings
818+
model: openai/text-embedding-3-small
819+
limit: 20 # Retrieve more candidates for reranking
820+
821+
results:
822+
reranking:
823+
model: openai/gpt-4o-mini
824+
limit: 5 # Final results after reranking
825+
```
826+
827+
#### Advanced Reranking Configuration
828+
829+
```yaml
830+
rag:
831+
advanced_reranking:
832+
docs: [./documents]
833+
strategies:
834+
- type: chunked-embeddings
835+
model: embedder
836+
limit: 20
837+
838+
results:
839+
reranking:
840+
model: openai/gpt-4o-mini
841+
842+
# top_k: Only rerank top K results (optional)
843+
# Useful for cost optimization when retrieving many documents
844+
# Set to 0 or omit to rerank all results
845+
top_k: 10
846+
847+
# threshold: Minimum relevance score (0.0-1.0) after reranking (default: 0.5)
848+
# Results below threshold are filtered out
849+
# Applied before final limit
850+
threshold: 0.3
851+
852+
# criteria: Domain-specific relevance guidance (optional)
853+
# The model receives metadata: source path, chunk index, created_at
854+
# Use this to guide scoring based on source, recency, or content type
855+
criteria: |
856+
When scoring relevance, prioritize:
857+
- Content from official documentation over blog posts
858+
- Recent information (check created_at dates)
859+
- Practical examples and implementation details
860+
- Documents from docs/ directory when available
861+
862+
deduplicate: true
863+
limit: 5
864+
```
865+
866+
#### DMR Native Reranking
867+
868+
DMR offers a native reranking endpoint:
869+
870+
```yaml
871+
models:
872+
dmr-reranker:
873+
provider: dmr
874+
model: hf.co/ggml-org/qwen3-reranker-0.6b-q8_0-gguf # reranking specific model
875+
# Note: Native reranking doesn't support criteria parameter
876+
877+
rag:
878+
knowledge_base:
879+
docs: [./documents]
880+
strategies:
881+
- type: chunked-embeddings
882+
model: embedder
883+
limit: 20
884+
885+
results:
886+
reranking:
887+
model: dmr-reranker
888+
threshold: 0.5
889+
limit: 5
890+
```
891+
892+
#### Reranking Model Configuration
893+
894+
Configure sampling parameters for deterministic or creative scoring. Note that temperature defaults to 0.0 for reranking when not explicitly set:
895+
896+
```yaml
897+
models:
898+
# Deterministic reranking (default behavior)
899+
openai-rerank:
900+
provider: openai
901+
model: gpt-4o-mini
902+
# temperature: 0.0 # Default for reranking (explicit setting optional)
903+
max_tokens: 16384
904+
905+
# Anthropic with structured outputs
906+
claude-rerank:
907+
provider: anthropic
908+
model: claude-sonnet-4-5 # model needs to support structured outputs
909+
# temperature: 0.0 # Default for reranking
910+
max_tokens: 16384
911+
912+
# Gemini reranking
913+
gemini-rerank:
914+
provider: google
915+
model: gemini-2.5-flash
916+
# temperature: 0.0 # Default for reranking
917+
max_tokens: 16384
918+
```
919+
920+
#### Reranking Configuration Reference
921+
922+
| Field | Type | Description | Default |
923+
|--------------|---------|-----------------------------------------------------------------------|---------|
924+
| `model` | string | Model reference for reranking | - |
925+
| `top_k` | int | Only rerank top K results (0 = rerank all) | 0 |
926+
| `threshold` | float | Minimum score (0.0-1.0) after reranking | 0.5 |
927+
| `criteria` | string | Domain-specific relevance guidance (not supported by DMR native) | "" |
928+
929+
**Notes:**
930+
- Reranking adds latency but significantly improves result quality
931+
- Use `top_k` to trade quality for speed and cost
932+
- Temperature defaults to 0.0 for deterministic scoring when not explicitly set (OpenAI, Anthropic, Gemini)
933+
- Default threshold of 0.5 filters documents with negative logits (sigmoid < 0.5 = not relevant)
934+
- DMR native reranking is fastest but doesn't support custom criteria
935+
- Criteria works with OpenAI, Anthropic, and Gemini (chat-based reranking using structured-outputs)
936+
- Fallback: If reranking fails, original post-fusion retrieval scores are used
937+
789938
### RAG Configuration Reference
790939

791940
| Field | Type | Description |
@@ -829,6 +978,11 @@ results:
829978
- `fusion.strategy`: rrf, weighted, or max
830979
- `fusion.k`: RRF parameter
831980
- `fusion.weights`: Weights for weighted fusion
981+
- `reranking`: Optional reranking configuration (see [Result Reranking](#result-reranking) section)
982+
- `reranking.model`: Model reference for reranking
983+
- `reranking.top_k`: Only rerank top K results (default: `0` = rerank all)
984+
- `reranking.threshold`: Minimum score after reranking (default: `0.0`)
985+
- `reranking.criteria`: Domain-specific relevance guidance (optional, not supported by DMR native)
832986
- `return_full_content`: When `true`, return full document contents instead of just matched chunks (default: `false`)
833987

834988
### Debugging RAG
@@ -844,14 +998,15 @@ Look for logs tagged with:
844998
- `[Chunked-Embeddings Strategy]` - Chunked-embeddings retrieval
845999
- `[BM25 Strategy]` - BM25 retrieval
8461000
- `[RRF Fusion]` / `[Weighted Fusion]` / `[Max Fusion]` - Result fusion
1001+
- `[Reranker]` - Reranking operations and score adjustments
8471002

8481003
### RAG Examples
8491004

850-
See `examples/` directory:
851-
- `examples/rag_vector.yaml` - Chunked-embeddings strategy only
852-
- `examples/rag_bm25.yaml` - BM25 strategy only
853-
- `examples/rag_hybrid.yaml` - Hybrid retrieval
854-
- `examples/rag_advanced.yaml` - Advanced configuration
1005+
See `examples/rag/` directory:
1006+
- `examples/rag/bm25.yaml` - BM25 strategy only
1007+
- `examples/rag/hybrid.yaml` - Hybrid retrieval (chunked-embeddings + BM25)
1008+
- `examples/rag/reranking.yaml` - Reranking with various providers
1009+
- `examples/rag/reranking_full_example.yaml` - Complete reranking configuration reference
8551010

8561011
## Examples
8571012

0 commit comments

Comments
 (0)