All notable changes to Cortex will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Major Feature: Automatic sync on every git commit!
-
Git Pre-Commit Hook - Changed files automatically sync to vector database on commit
- New script:
cortex/scripts/git-sync.ts(162 lines) - Git hook:
.git/hooks/pre-commit(24 lines) - Syncs only staged files (filters via
.cortexignore) - Hash-based change detection (skips unchanged files)
- Blocks commit if sync fails (prevents outdated embeddings)
- Shows real-time progress during commit
- New script:
-
Git-Like Sync System - Simplified configuration inspired by
.gitignore.cortexignorefile (auto-created from defaults on first sync).cortexconfig.json(optional advanced settings)- Sync EVERYTHING except ignore patterns (like Git)
- No whitelists, no folder validation prompts
- Predictable behavior (familiar to Git users)
Optional: Customize .cortexignore
# Add your patterns (like editing .gitignore)
**/experiments/**
**/scratch/**| Before (Manual) | After (Auto-Sync) |
|---|---|
| Forget to sync new code | Automatically synced on commit ✅ |
| Outdated search results | Always up-to-date ✅ |
Manual cortex_sync() calls |
Happens automatically ✅ |
| Risk of missing files | Git tracks everything ✅ |
| Folder validation prompts | No prompts, just .cortexignore ✅ |
| Complex configuration | One file (.cortexignore) ✅ |
Files Modified: 3
cortex/src/constants.tscortex/scripts/bulk-sync.tscortex/src/tools/sync.ts
Files Created: 9
cortex/scripts/git-sync.ts.git/hooks/pre-commitcortex/docs/.cortexignore.defaultcortex/docs/.cortexconfig.example.json- 4 documentation files
Lines Changed: ~800 (added/modified/removed) Code Reduction: -200 lines (complexity removed)
Cortex is a free and open-source AI memory system for preventing code duplication and ensuring pattern consistency. It provides semantic search over your codebase via Model Context Protocol (MCP) integration with Claude Code CLI.
MCP Integration
- 6 MCP tools for Claude Code CLI integration
- Interactive setup wizard with database state detection
- Semantic search over codebase
- File embedding with folder pattern support
- Database statistics
- Clean up embeddings with confirmation
Vector Database
- PostgreSQL 16 with pgvector extension
- HNSW index for O(log n) vector similarity search
- Cosine similarity search (<100ms query time)
- Single initialization migration (schema.sql)
Embedding Service
- Ollama integration (100% local, no API costs)
- nomic-embed-text model (768 dimensions)
- Auto-downloads model on first startup
- No manual model management required
Smart Document Chunking
- Markdown: Chunks by headings (preserves context)
- TypeScript/Code: Plain text chunking with overlap
- Max chunk: ~256 tokens (1024 chars)
- Overlap: ~25 tokens (100 chars)
- SHA-256 deduplication (skips unchanged files)
Memory Optimization
- Isolated connections per file (prevents leaks)
- Sequential processing (stable memory usage)
- Forced garbage collection between files
- ~200MB memory usage (vs 2GB+ without optimization)
Docker Orchestration
- Simple 2-service setup (PostgreSQL + Ollama)
- Automatic health checks
- Persistent volumes for data
- MCP server runs locally via tsx (no Docker needed)
| Component | Details |
|---|---|
| Embedding Model | nomic-embed-text (768 dimensions) |
| Database | PostgreSQL 16 + pgvector |
| Vector Index | HNSW (m=16, ef_construction=64) |
| MCP Transport | STDIO (standard input/output) |
| MCP SDK | @modelcontextprotocol/sdk 1.9.0 |
| Node.js | 20+ |
| TypeScript | 5.7.3 |
| ORM | postgres (not pg) |
| Metric | Value |
|---|---|
| Query Speed | <100ms (HNSW index) |
| Embedding Speed | ~5 files/second |
| Initial Sync | ~30s per 100 files |
| Memory Usage | ~200-300MB stable |
| Database Size | ~50MB per 1000 chunks |
Community contributions welcome! Here are some ideas:
- Automatic file watching and sync
- Streaming sync for large codebases
- Parallel embedding generation
- Query result caching
- Multiple project support
- Custom embedding models (beyond nomic-embed-text)
- Support for more file types (Java, Ruby, PHP, etc.)
- Configurable chunking strategies
- Advanced filtering options
- CLI tool for sync/query outside MCP
- Web UI for browsing embeddings
- Better error messages
- Progress indicators for long operations
- Automatic backup/restore
Cortex uses Semantic Versioning:
- MAJOR version for incompatible API changes
- MINOR version for new functionality (backwards-compatible)
- PATCH version for backwards-compatible bug fixes
See CONTRIBUTING.md for how to contribute to Cortex.
Support the project: ☕ Buy Me a Coffee