Vectorizer Sync implements comprehensive file filtering to exclude unnecessary files from synchronization and workspace export. This document describes the filtering rules and how they are applied.
Files are filtered at multiple stages:
- Initial Scan: Files are filtered during project directory scanning
- Workspace Export: Files are filtered when generating workspace.yml
- Cloud Sync: Files are filtered before uploading to HiveHub Cloud
The following directories are ALWAYS excluded:
node_modules/- Node.js dependenciesvendor/- PHP dependenciespackages/- Package directories.venv/- Python virtual environmentvenv/- Python virtual environmentenv/- Python virtual environment__pycache__/- Python cachetarget/- Rust build output.cargo/- Rust cache (except config)
Pattern: **/node_modules/**, **/vendor/**, etc.
The following build directories are ALWAYS excluded:
dist/- Distribution/build outputbuild/- Build outputout/- Output directory.next/- Next.js build.nuxt/- Nuxt.js buildout/- Output directorybin/- Binary output (if build artifact)obj/- Object files (.NET)
Pattern: **/dist/**, **/build/**, etc.
Files larger than the configured maximum size are excluded:
- Default: 100KB (102,400 bytes)
- Configurable: User can change in settings
- Maximum Configurable: 10MB
Implementation:
function isFileTooLarge(filePath: string, maxSize: number): boolean {
const stats = fs.statSync(filePath);
return stats.size > maxSize;
}4. Hidden Files and Directories
Files and directories starting with . are excluded, EXCEPT:
Included Hidden Files:
.gitignore.env.example.eslintrc.*.prettierrc.*.editorconfig.npmrc.yarnrc
Excluded Hidden Files:
.git/(entire directory).DS_Store.vscode/(IDE settings).idea/(IDE settings).*.swp(editor swap files).*.swo(editor swap files).cache/
Pattern: **/.* (with exceptions)
Common binary file extensions are excluded:
Excluded Extensions:
.exe- Windows executables.dll- Windows libraries.so- Linux shared libraries.dylib- macOS dynamic libraries.bin- Binary files.o- Object files.a- Archive files.pyc- Python bytecode.pyo- Python optimized bytecode.class- Java bytecode.jar- Java archives.war- Web archives.ear- Enterprise archives
Note: This list is configurable and can be extended.
Temporary and cache files are excluded:
*.tmp*.temp*.cache*.log(unless in logs directory for documentation)*.swp*.swo*~(backup files)
Pattern: **/*.tmp, **/*.temp, etc.
Database files are excluded:
*.db*.sqlite*.sqlite3*.db-shm*.db-wal
Exception: If database files are part of the project (e.g., test fixtures), they can be explicitly included.
Large media files can be excluded (configurable):
*.mp4,*.avi,*.mov- Video files*.mp3,*.wav,*.flac- Audio files*.jpg,*.jpeg,*.png,*.gif- Images (if large)
Note: Small images and media files in documentation may be included.
Users can configure additional exclusion patterns:
Users can add custom glob patterns:
interface UserExclusions {
patterns: string[]; // e.g., ["**/test/**", "**/*.spec.ts"]
}"**/test/**"- Exclude all test directories"**/*.spec.ts"- Exclude test files"**/coverage/**"- Exclude coverage reports"**/.github/**"- Exclude GitHub workflows (if desired)
The following files are ALWAYS included (even if they match exclusion patterns):
README.md- Project readmeLICENSE- License filepackage.json- Node.js package fileCargo.toml- Rust package filepyproject.toml- Python project filego.mod- Go module fileworkspace.yml- Vectorizer workspace file (if exists)
All source code files are included (unless excluded by size):
*.ts,*.tsx- TypeScript*.js,*.jsx- JavaScript*.rs- Rust*.py- Python*.go- Go*.java- Java*.cpp,*.c,*.h- C/C++*.rb- Ruby*.php- PHP- And other common source file extensions
Documentation files are included:
*.md- Markdown*.txt- Text files*.rst- reStructuredText*.adoc- AsciiDoc
Configuration files are included:
*.json- JSON configs*.yaml,*.yml- YAML configs*.toml- TOML configs*.ini- INI configs*.conf- Config files*.config- Config files
class FileFilter {
private maxFileSize: number;
private excludedPatterns: string[];
private excludedExtensions: string[];
private excludedDirectories: string[];
isExcluded(filePath: string, stats: fs.Stats): {
excluded: boolean;
reason?: string;
} {
// Check size
if (stats.size > this.maxFileSize) {
return { excluded: true, reason: 'File too large' };
}
// Check patterns
if (this.matchesPattern(filePath, this.excludedPatterns)) {
return { excluded: true, reason: 'Matches exclusion pattern' };
}
// Check extensions
if (this.hasExcludedExtension(filePath)) {
return { excluded: true, reason: 'Excluded file extension' };
}
// Check directories
if (this.isInExcludedDirectory(filePath)) {
return { excluded: true, reason: 'In excluded directory' };
}
return { excluded: false };
}
}Uses glob pattern matching:
import { minimatch } from 'minimatch';
function matchesPattern(path: string, patterns: string[]): boolean {
return patterns.some(pattern => minimatch(path, pattern));
}Excluded files are reflected in workspace.yml exclude_patterns:
collections:
- name: source-code
include_patterns:
- "src/**/*.ts"
exclude_patterns:
- "node_modules/**"
- "dist/**"
- "**/*.log"
- "**/*.tmp"- Early Exit: Check most common exclusions first
- Caching: Cache exclusion decisions for unchanged files
- Batch Processing: Filter files in batches
- Parallel Processing: Filter multiple files in parallel
// Cache exclusion results
const exclusionCache = new Map<string, boolean>();
function isExcludedCached(filePath: string): boolean {
if (exclusionCache.has(filePath)) {
return exclusionCache.get(filePath)!;
}
const result = fileFilter.isExcluded(filePath, stats);
exclusionCache.set(filePath, result.excluded);
return result.excluded;
}- Module Directories: Verify
node_modules/is excluded - Build Directories: Verify
dist/is excluded - File Size: Verify files > 100KB are excluded
- Hidden Files: Verify
.git/is excluded but.gitignoreis included - Binary Files: Verify
.exefiles are excluded - Custom Patterns: Verify user patterns work correctly
describe('FileFilter', () => {
it('should exclude node_modules', () => {
const result = filter.isExcluded('project/node_modules/file.js', stats);
expect(result.excluded).toBe(true);
expect(result.reason).toBe('In excluded directory');
});
it('should exclude large files', () => {
const largeStats = { size: 200000 }; // 200KB
const result = filter.isExcluded('project/large-file.txt', largeStats);
expect(result.excluded).toBe(true);
expect(result.reason).toBe('File too large');
});
it('should include source files', () => {
const result = filter.isExcluded('project/src/index.ts', stats);
expect(result.excluded).toBe(false);
});
});Users can configure:
- Maximum File Size: Slider or input field
- Custom Exclusion Patterns: Text area with pattern list
- Excluded Extensions: Checkbox list or text input
- Excluded Directories: Checkbox list
Show preview of what will be excluded:
- List of excluded files (with reasons)
- Count of excluded vs included files
- Total size of excluded files
- Smart Exclusions: Learn from user behavior
- Project Templates: Pre-configured exclusions for project types
- Exclusion Analytics: Show statistics on excluded files
- Selective Inclusion: Allow including specific files even if excluded