Know a codebase using KnowCode. Ask questions and get responses in natural language about a codebase to learn more about it. Provide accurate, relevant context to your AI coding agent and make its token usage limits last 10x longer.
KnowCode analyzes your codebase and builds a semantic graph of entities (functions, classes, modules) and their relationships (calls, imports, dependencies). This structured knowledge enables:
- Accurate context synthesis for AI assistants
- Token-efficient context generation (only what's needed)
- Local-first querying without LLM dependency
- Traceability back to source code
# Create and activate virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install KnowCode for development (batteries included)
uv sync --dev --extra all --extra mcp --extra voyageai
# Set API keys (only needed for the features you use; see aimodels.yaml)
export VOYAGE_API_KEY_1="..." # embeddings + reranking (semantic search)
export OPENAI_API_KEY="..." # embeddings (alternative to VoyageAI)
export GOOGLE_API_KEY_1="..." # LLM (Gemini) for `knowcode ask`KnowCode now ships with a lightweight core install plus feature extras:
knowcode[server]→knowcode serverknowcode[search]→knowcode index,knowcode semantic-searchknowcode[llm]→knowcode askknowcode[watch]→knowcode server --watchknowcode[all]→ union ofserver,search,llm,watchknowcode[mcp]andknowcode[voyageai]remain available as before
Commands fail fast with actionable hints, e.g.:
Install knowcode[server] to use 'knowcode server'.
# 1. Analyze your codebase
knowcode analyze src/
# 2. Query the knowledge store
knowcode query search "MyClass"
knowcode query callers "my_function"
knowcode query callees "MyClass.method"
# 3. Generate context for an entity
knowcode context "MyClass.important_method"
# 4. Export documentation
knowcode export -o docs/
# 5. (Optional) Build semantic search index explicitly
# `analyze` also attempts indexing; this command is useful for rebuilds.
knowcode index src/
# 6. Perform semantic search
knowcode semantic-search "How does parsing work?"
# 7. Start the intelligence server with watch mode
knowcode server --port 8080 --watch
# 8. Start MCP server for IDE integration
knowcode mcp-server --store .
# 9. View statistics
knowcode statsScan and parse a directory to build the knowledge store.
knowcode analyze <directory> [--output <path>] [--ignore <pattern>]Example:
knowcode analyze src/ --ignore "tests/*" --ignore "*.pyc"Query the knowledge store for relationships.
knowcode query <type> <target> [--store <path>] [--json]Query types:
search <pattern>- Search entities by namecallers <entity>- Find what calls this entitycallees <entity>- Find what this entity callsdeps <entity>- Get all dependencies
Example:
knowcode query search "Parser"
knowcode query callers "GraphBuilder.build_from_directory"
knowcode query deps "PythonParser" --jsonGenerate a context bundle for an entity (ready for AI consumption).
knowcode context <entity> [--store <path>] [--max-tokens <n>]Example:
knowcode context "GraphBuilder.build_from_directory" --max-tokens 4000Export the knowledge store as Markdown documentation.
knowcode export [--store <path>] [--output <dir>]Example:
knowcode export -o docs/Show statistics about the knowledge store.
knowcode stats [--store <path>]Check whether the local KnowCode setup is ready for daily use.
knowcode doctor [--store <path>] [--index <path>] [--config <path>] [--mcp] [--json]Checks include strict config loading, required model API keys, knowledge store schema, semantic index schema/embedding dimensions, artifact disk footprint, and optionally an MCP stdio handshake.
Build a semantic search index for your codebase.
knowcode index <directory> [--output <path>] [--config <path>]Perform a natural language search against the semantic index.
knowcode semantic-search <query> [--index <path>] [--store <path>] [--config <path>] [--limit <n>]Example:
knowcode semantic-search "Where is the graph built?"Start the FastAPI intelligence server. This is the preferred way for locally hosted AI agents (IDEs) to interact with KnowCode.
knowcode server [--host <host>] [--port <port>] [--store <path>] [--watch]Example:
knowcode server --port 8080Once running, you can access endpoints like:
GET /api/v1/context?target=MyClass&task_type=debugGET /api/v1/search?q=parser(lexical search)POST /api/v1/context/query(semantic search)GET /api/v1/trace_calls/{entity_id}?direction=callers&depth=3(multi-hop call graph)GET /api/v1/impact/{entity_id}(deletion impact analysis)POST /api/v1/reload(to refresh data after a newanalyzerun)
Show git history for the codebase or specific entities. Requires analysis with --temporal.
knowcode history [target] [--limit <n>]Example:
# Show recent project history
knowcode history --limit 5
# Show history for a specific class
knowcode history "KnowledgeStore"Ask questions about the codebase using an LLM agent. Requires an API key for at least one configured model in aimodels.yaml.
Prerequisites:
- Knowledge store exists (
knowcode analyze <dir>) - Semantic index exists (
knowcode index <dir>) - LLM dependencies installed (
knowcode[llm])
knowcode ask <question> [--config <path>]Configuration: KnowCode looks for a configuration file in the following order:
--configargumentaimodels.yamlin current directory~/.aimodels.yaml
Example aimodels.yaml:
natural_language_models:
- name: gemini-2.5-flash
provider: google
api_key_env: GOOGLE_API_KEY_1Example:
knowcode ask "How does the graph builder work?"Start an MCP (Model Context Protocol) server for IDE agent integration.
knowcode mcp-server [--store <path>] [--config <path>]Prerequisite: knowledge store must already exist (knowcode analyze <dir>).
MCP read tools are deterministic and do not auto-run analysis.
Tools Exposed:
search_codebase- Search for code entities by nameget_entity_context- Get detailed context for an entitytrace_calls- Trace call graph (callers/callees) with depthretrieve_context_for_query- Unified query→retrieval→context bundle (same pipeline asknowcode ask)
MCP Client Configuration (Claude Desktop, VS Code, etc.):
{
"knowcode": {
"command": "uv",
"args": ["run", "knowcode", "mcp-server", "--store", "/path/to/project"]
}
}Installation with MCP support:
pip install "knowcode[mcp]"KnowCode enables token-efficient IDE agent workflows. When your IDE agent needs context, it invokes KnowCode's MCP tools to retrieve relevant code context locally before calling expensive external LLMs.
The canonical retrieval policy lives in docs/mcp-contract.md.
Keep agent rules pointed there instead of hard-coding separate thresholds or
token budgets in each client.
How It Works:
- IDE agent receives user query
- Agent invokes
retrieve_context_for_querywithverbosity="minimal" - KnowCode returns compact context +
sufficiency_score(0.0-1.0) - If the score meets
config.sufficiency_threshold, answer locally - If context is insufficient, escalate verbosity or budget before falling back
Antigravity Configuration (.gemini/mcp_servers.json):
{
"mcpServers": {
"knowcode": {
"command": "uv",
"args": ["run", "knowcode", "mcp-server", "--store", "/path/to/your/project"]
}
}
}For OpenAPI-to-tool orchestration through LiteLLM, this repository now includes a self-contained gateway app at apps/agent-gateway/.
- Source code:
apps/agent-gateway/src/agent_gateway/ - Setup + local run:
apps/agent-gateway/README.md - Clean repo split playbook:
apps/agent-gateway/EXTRACTION.md
The gateway intentionally integrates with KnowCode only over HTTP (/openapi.json and /api/v1/*) so it can be moved to a separate repository without code changes.
Token Savings:
- Simple "locate" queries → 100% savings (answered locally)
- Code explanations → 60-80% savings (precise context only)
- Python (.py) - Full AST parsing (Supports Python 3.10 - 3.12)
- JavaScript / TypeScript (.js, .ts) - Classes, functions, imports (via tree-sitter)
- Java (.java) - Classes, methods, imports, inheritance (via tree-sitter)
- Markdown (.md) - Document structure with heading hierarchy
- YAML (.yaml, .yml) - Configuration keys with nested structure
KnowCode follows a layered architecture:
- Scanner - Discovers files with gitignore support
- Parsers - Language-specific parsing (Python AST, Tree-sitter for others)
- Graph Builder - Constructs semantic graph with entities and relationships
- Knowledge Store - In-memory graph with JSON persistence
- Indexer - Vector embedding and hybrid retrieval engine (FAISS + BM25)
- Context Synthesizer - Generates token-efficient context bundles with priority ranking
- CLI - User interface for all operations
See KnowCode.md for the complete reference architecture.
aimodels.yaml supports:
# LLM models for 'ask' command
natural_language_models:
- name: gemini-2.0-flash-lite
provider: google
api_key_env: GOOGLE_API_KEY_1
# Embedding models
embedding_models:
- name: voyage-3-lite
provider: voyageai
api_key_env: VOYAGE_API_KEY_1
# Reranking models (cross-encoder)
reranking_models:
- name: rerank-2.5
provider: voyageai
api_key_env: VOYAGE_API_KEY_1
# Config
config:
sufficiency_threshold: 0.8 # For local-first answeringOptional dependencies:
pip install "knowcode[mcp]" # MCP server support
pip install "knowcode[voyageai]" # VoyageAI embeddings + rerankingStats:
Total Entities: 98
class: 15
function: 6
method: 66
module: 11
Total Relationships: 616
calls: 478
contains: 87
imports: 47
inherits: 4
Context Bundle:
# Method: `GraphBuilder.build_from_directory`
**File**: `/path/to/graph_builder.py`
**Lines**: 24-45
## Description
Build graph by scanning and parsing a directory.
## Signature
def build_from_directory(self, root_dir: str | Path, ...) -> 'GraphBuilder'
## Source Code
[full source code]
## Called By
- `main`
- `analyze_command`
## Calls
- `Scanner.__init__`
- `Scanner.scan_all`# Run tests
pytest
# Type checking
mypy src/
# Linting
ruff check src/
# Format
ruff format src/See KnowCode.md for the full vision and detailed architectural debt register.
MVP (completed):
- ✅ Single monorepo support
- ✅ Python, Markdown, YAML parsing
- ✅ Snapshot-only analysis (no temporal tracking)
- ✅ Local CLI tool
Released:
- ✅ v1.1: Additional languages (JavaScript, TypeScript, Java)
- ✅ v1.2: Git history integration, temporal tracking
- ✅ v1.3: Token budget optimization, priority ranking
- ✅ v1.4: Runtime signal integration
- ✅ v2.0: Intelligence Server mode (local API for local IDE agents)
- ✅ v2.1: Semantic search with embeddings, hybrid retrieval, and watch mode
- ✅ v2.2: Developer Q&A & IDE Agent Integration:
- Query classification and task-specific templates
- Multi-hop
trace_calls()and impact analysis - Local-first
smart_answer()with sufficiency scoring - MCP server for IDE integration
- VoyageAI cross-encoder reranking
Next: v2.3 — Architectural Hardening:
- Modularise dependencies into optional extras (core install stays lightweight)
- Remove hidden side effects from query paths (fail fast, not auto-build)
- Add schema versioning to knowledge store and index artifacts
- Fix
metadatatype restriction (dict[str, str]→dict[str, Any]) - Harden configuration loading (logging, validation, strict server mode)
- Decompose
KnowCodeServiceand introduceProtocolinterfaces - Add layer contract tests (parser, store roundtrip, retrieval golden queries)
Future releases:
- v2.4: Multi-level documentation synthesis
- v3.0: Deep analysis (data flow, intent extraction, confidence scoring)
- v4.0: Enterprise features (RBAC, scalability, team sharing)
MIT