Skip to content

Security: JashT14/symtrace

Security

SECURITY.md

symtrace Security Policy

Last audited: 2026-03-14 Audit scope: All 16 source files (src/*.rs) + Cargo.toml — full manual review


Executive Summary

symtrace is a local-only, offline, privacy-respecting CLI tool. It has no network capabilities, collects no telemetry, uses no unsafe Rust code, and processes data only as explicitly requested by the user.

Property Status
Network access None — no HTTP, TCP, or DNS crates
Telemetry / analytics None — zero tracking or phoning home
Data exfiltration None — all output goes to stdout/stderr only
Environment variable access Limited — reads only XDG_CACHE_HOME, LOCALAPPDATA, HOME, USERPROFILE for cache directory resolution
External process spawning None — no std::process::Command
Unsafe Rust code Deniedunsafe_code = "deny" enforced in Cargo.toml lints
File writes outside cache None — writes only to $XDG_CACHE_HOME/symtrace/<repo_hash>/
Cache directory permissions Restricted — Unix: 0o700 (owner-only); prevents shared-system leakage
Cache deserialization Bounded — 20 MiB limit, versioned envelope with integrity checks
Parser resources Guarded — configurable limits on file size, node count, depth, timeout
Incremental parsing Isolated — tree-sitter Trees cached in-memory only; no disk serialisation
Repository path validation Enforced — canonicalized + directory check before git2 access

1. Ethical Data Processing

What data does symtrace access?

symtrace reads only local git repository data that the user explicitly points it to via CLI arguments:

  • Git commit trees and blob objects (via libgit2)
  • Source code file contents (to build ASTs via tree-sitter)

What does symtrace do with the data?

  1. Parses source code into Abstract Syntax Trees (ASTs)
  2. Computes structural hashes (BLAKE3) for diff matching
  3. Outputs a semantic diff report to stdout

What does symtrace NOT do?

  • Does NOT transmit any data over the network — there are zero networking dependencies (reqwest, hyper, http, curl, etc. are all absent from Cargo.toml)
  • Does NOT collect telemetry — no analytics, usage metrics, crash reports, or any form of tracking
  • Does NOT spawn sub-processes — no std::process::Command usage
  • Does NOT read files outside the specified git repository
  • Does NOT modify the git repository in any way (read-only access)
  • Does NOT log sensitive information — no credentials, tokens, or personal data appear in any output

Cache data

The on-disk AST cache is stored outside the repository tree: $XDG_CACHE_HOME/symtrace/<blake3(canonical_repo_path)>/ (Windows: %LOCALAPPDATA%/symtrace/<hash>/)

Cache files contain serialised AST representations wrapped in a versioned envelope. Cache data is:

  • Stored externally — not inside the repository directory
  • Never transmitted anywhere
  • Versioned with a schema tag — mismatched versions are rejected
  • Integrity-checked against the git blob OID
  • Bounded to 20 MiB maximum during deserialization
  • Safe to delete at any time without affecting functionality

2. Security Findings

2.1 Cache Deserialization Without Integrity Verification — MITIGATED

Field Value
Severity LowResolved
Location src/ast_cache.rsget() method
Description Cache files were deserialized via bincode::deserialize without checksum or version tag verification.
Mitigation applied Cache files are now wrapped in a CacheEnvelope with a version: u8 tag and blob_oid: String integrity field. Deserialization uses bincode::options().with_limit(20_971_520). On version mismatch or OID mismatch, the stale file is removed and the entry is treated as a cache miss.

2.2 Bincode v1 Allocation Limits — MITIGATED

Field Value
Severity LowResolved
Location src/ast_cache.rs — bounded bincode::options() calls
Description bincode v1's deserialize could allocate large amounts of memory if a crafted input specified large Vec or String lengths.
Mitigation applied All deserialization uses bincode::options().with_limit(20_971_520) (20 MiB cap). Payloads exceeding this limit cause a deserialization error, which is treated as a cache miss and the corrupt file is removed.

2.3 Mutex unwrap() on Lock — RESOLVED

Field Value
Severity LowResolved
Location src/ast_cache.rs — 4 calls; src/incremental_parse.rs — 3 calls
Description If a thread panics while holding a cache mutex (AstCache or TreeCache), the mutex becomes poisoned and subsequent unwrap() calls will panic.
Mitigation applied All mutex acquisitions in both AstCache and TreeCache now use .lock().unwrap_or_else(|e| e.into_inner()), recovering gracefully from poisoned mutexes instead of panicking.

2.4 Repository Path — MITIGATED

Field Value
Severity InformationalResolved
Location src/main.rs — path canonicalization
Description repo_path was accepted as a raw CLI string without canonicalization.
Mitigation applied The path is now canonicalized via std::fs::canonicalize() before use. On Windows, the \\?\ UNC prefix is stripped. Cache is stored externally under a blake3(canonical_path) hash, preventing path spoofing and repository contamination.

2.5 Silent Parse Failure — Informational

Field Value
Severity Informational
Location src/main.rsparse_or_cached_with_tree()
Description When ast_builder::parse_content returns Err, the error is silently discarded and the file is skipped.
Impact No security impact. May confuse users if files are silently skipped.

2.6 Incremental Parsing TreeCache — Informational

Field Value
Severity Informational
Location src/incremental_parse.rsTreeCache
Description The TreeCache stores tree-sitter Tree objects in a Mutex<LruCache<String, Tree>> (capacity 128). These trees are kept only in process memory and are never serialised to disk.
Security properties (1) No disk persistence — trees exist only during program execution; (2) No deserialization surface — trees are computed via tree-sitter::Parser, not loaded from external storage; (3) Mutex-protected — concurrent access is serialised; (4) Bounded capacity — LRU eviction at 128 entries prevents unbounded memory growth; (5) Edit computation is pure arithmetic — compute_edit() performs common-prefix/suffix byte comparison with no I/O or allocations beyond a single InputEdit struct.
Impact No security impact. The TreeCache introduces no new file I/O, no new environment variable access, and no new attack surface.

2.7 Cache Directory Permissions — RESOLVED

Field Value
Severity LowResolved
Location src/ast_cache.rsnew() method
Description The cache directory was created with default OS permissions, which on shared systems could allow other users to read cached AST data (containing source code).
Mitigation applied On Unix systems, the cache directory permissions are now explicitly set to 0o700 (owner-only read/write/execute) immediately after creation. This prevents other users from listing or reading cached files.

2.8 Repository Path Directory Validation — RESOLVED

Field Value
Severity InformationalResolved
Location src/main.rs — after canonicalize()
Description The repository path was passed to git2::Repository::open() without verifying it was a directory. Passing a regular file path could produce confusing error messages.
Mitigation applied After canonicalization, an explicit is_dir() check rejects non-directory paths with a clear error message before any git operations commence.

2.9 CLI Version Hardcoding — RESOLVED

Field Value
Severity InformationalResolved
Location src/cli.rs#[command(version)]
Description The CLI version was hardcoded as "0.1.0" instead of being derived from Cargo.toml, causing version output to be stale.
Mitigation applied Replaced with env!("CARGO_PKG_VERSION") to always reflect the actual crate version.

3. Dependency Security Assessment

Crate Version Risk Notes
clap =4.5.60 Low Widely audited CLI parser; version pinned
git2 =0.19.0 Low Rust bindings for libgit2 (C library); mature; pinned
tree-sitter =0.25.10 Low Parsing framework with C runtime; well-tested; pinned
tree-sitter-* pinned Low Language grammars (generated C parsers); pinned
blake3 =1.8.3 Low Audited cryptographic hash, SIMD-optimised; pinned
serde =1.0.228 Low De facto standard serialization; pinned
serde_json =1.0.149 Low Standard JSON; pinned
bincode =1.3.3 Low Bounded deserialization with with_limit(); pinned
anyhow =1.0.102 Low Error handling; pinned
colored =2.2.0 Low Terminal coloring, no security surface; pinned
rayon =1.11.0 Low Well-maintained parallelism; pinned
lru =0.12.5 Low Simple LRU cache; pinned
bumpalo =3.20.2 Low Arena allocator, safe API only; pinned

Supply chain hardening:

  • All dependency versions are exactly pinned (=x.y.z) in Cargo.toml
  • unsafe_code = "deny" enforced via [lints.rust] in Cargo.toml
  • cargo-deny configuration (deny.toml) checks advisories, licenses, bans, and sources
  • Run cargo deny check and cargo audit locally before publishing

Key observations:

  • Zero networking crates — no reqwest, hyper, http, curl, openssl, rustls, or any TLS dependency
  • Native code surface comes only from git2 (libgit2), tree-sitter (C runtime), and grammar crates (generated C parsers)
  • All crate versions are current with no known CVEs at time of audit

4. File System Access Map

Operation File Scope
Repository::open() src/git_layer.rs Canonicalized repo path (read only)
fs::canonicalize() src/main.rs User-specified repo path → canonical
fs::create_dir_all() src/ast_cache.rs $XDG_CACHE_HOME/symtrace/<hash>/ only
fs::read() src/ast_cache.rs Cache .bin files only (bounded to 20 MiB)
fs::write() src/ast_cache.rs Cache .bin files only
fs::read_dir() src/ast_cache.rs Cache directory listing (for stats)
fs::remove_file() src/ast_cache.rs Stale/corrupt cache files only
std::env::var() src/main.rs XDG_CACHE_HOME, LOCALAPPDATA, HOME, USERPROFILE

No directory traversal vulnerabilities — all cache paths are constructed from git2::Oid hex strings (40-char [0-9a-f] only). Cache is stored outside the repository tree, preventing accidental commits.


5. Information Leakage Assessment

Channel Content Classification
stdout Diff report (file paths, function names, line numbers, similarity scores) Expected output
stderr Cache stats, performance metrics Metadata only
Disk cache Serialised ASTs (includes leaf source text) Stored in user cache dir
JSON output Repository path, commit hashes Expected output

No unexpected information leakage. The tool does not output credentials, environment variables, or user PII.


6. Reporting Vulnerabilities

If you discover a security issue, please report it by opening an issue or contacting the maintainer directly. Please include:

  1. Description of the vulnerability
  2. Steps to reproduce
  3. Potential impact assessment

7. Secure Usage Recommendations

  1. Keep dependencies updated — run cargo update periodically; exact-pinned versions require deliberate updates
  2. Run security checks — use cargo audit and cargo deny check (configuration in deny.toml)
  3. Review cache permissions — on shared systems, verify that the cache directory is not world-readable
  4. Delete cache when needed — the cache is stored outside the repo under $XDG_CACHE_HOME/symtrace/; delete it freely
  5. Use on trusted repositories only — tree-sitter parses source code via C grammars; parser resource guardrails (file size, node count, recursion depth, and timeout limits) mitigate pathological inputs
  6. Configure resource limits — use --max-file-size, --max-ast-nodes, --max-recursion-depth, and --parse-timeout-ms CLI flags to adjust guardrails for your workload
  7. Disable incremental parsing if needed — use --no-incremental to force full re-parsing of every file; useful for auditing hash correctness or when memory is constrained

There aren't any published security advisories