You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cagent-schema.json
+5-1Lines changed: 5 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -728,7 +728,7 @@
728
728
"chunked-embeddings"
729
729
]
730
730
},
731
-
"model": {
731
+
"embedding_model": {
732
732
"type": "string",
733
733
"description": "Embedding model reference for chunked-embeddings strategies (looked up in models map, or 'auto' for automatic selection)",
734
734
"examples": [
@@ -804,6 +804,10 @@
804
804
"respect_word_boundaries": {
805
805
"type": "boolean",
806
806
"description": "When true, chunks will split on the nearest whitespace boundary instead of at the exact character limit, preventing words from being truncated."
807
+
},
808
+
"code_aware": {
809
+
"type": "boolean",
810
+
"description": "Enable code-aware chunking for source files. When true, the chunking strategy will prefer AST-based or language-aware processors when available (tree-sitter based), and fall back to plain text chunking for unsupported languages."
Copy file name to clipboardExpand all lines: docs/USAGE.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -972,6 +972,30 @@ models:
972
972
- `chunking.size`: Chunk size in characters (default: `1000`)
973
973
- `chunking.overlap`: Overlap between chunks (default: `75`)
974
974
975
+
**Code-Aware Chunking:**
976
+
977
+
When indexing source code, you can enable code-aware chunking to produce semantically aligned chunks based on the code's AST (Abstract Syntax Tree). This keeps functions and methods intact rather than splitting them arbitrarily:
978
+
979
+
```yaml
980
+
rag:
981
+
codebase:
982
+
docs: [./src]
983
+
strategies:
984
+
- type: bm25
985
+
database: ./code.db
986
+
chunking:
987
+
size: 2000
988
+
code_aware: true # Enable AST-based chunking
989
+
```
990
+
991
+
- `chunking.code_aware`: When `true`, uses tree-sitter for AST-based chunking (default: `false`), and `size` becomes indicative
992
+
993
+
**Notes:**
994
+
- Currently supports **Go** source files (`.go`). More languages will be added incrementally.
995
+
- Falls back to plain text chunking for unsupported file types.
996
+
- Produces chunks that align with code structure (functions, methods, type declarations).
997
+
- Particularly useful for code search and retrieval tasks.
998
+
975
999
**Results:**
976
1000
- `limit`: Final number of results (default: `15`)
0 commit comments