Skip to content

perf: persistent VCS history cache keyed by HEAD SHA #334

@dekobon

Description

@dekobon

Follow-up to #328.

Add a persistent VCS history cache keyed by HEAD SHA and repo identity, so re-running bca vcs on an unchanged tree reuses prior work.

Why

The v1 implementation walks the full history on every invocation. For large repos (Linux kernel: ~1.2M commits, ~80k files) this is non-trivial. CI integration in particular benefits from caching: the only deltas between consecutive runs are commits added since the last cache entry.

Scope

  • Cache location: ~/.cache/big-code-analysis/vcs/<repo-hash>/<head-sha>.json (or --cache-dir).
  • Cache key: (repo_canonical_path, head_sha, options_hash).
  • Cache content: serialized HistoryIndex.
  • Incremental update: if HEAD has moved forward by N commits and the prior cache entry is still present, walk only the new commits and merge.
  • Invalidation: if vcs_schema_version or risk_score_version differs, ignore the cache and recompute.
  • Flag: --no-cache to disable; --clear-cache to wipe.

Edge cases

  • Force-pushed history: detect via parent SHA mismatch; invalidate.
  • Different --long-window / --recent-window: counted as part of options_hash, so window changes force a fresh walk.
  • Concurrent runs: cache writes are atomic via tempfile + rename.

Acceptance criteria

  • Cache hit on second invocation produces bit-identical output to the first run.
  • Force-push test: rewriting history invalidates the cache.
  • Performance regression test: incremental update of a 100-commit delta on a 100k-commit fixture is under 5 s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions