Skip to content

feat(metrics): per-function VCS metrics via git blame + AST line spans #329

@dekobon

Description

@dekobon

Follow-up to #328.

Extend the change-history (VCS) metrics from file-level to per-function granularity by attributing commits to function line ranges using git blame + the existing AST FuncSpace line spans.

Why

Per-function attribution is the gold standard for triage: it isolates the risky function inside a risky file rather than flagging the whole file. The vulnerability/defect-prediction literature explicitly calls out function-level analysis as higher-precision than file-level when blame data is available (see FFmpeg/Wireshark attack-surface work cited in vulnerability-correlation.md).

Scope

  • Use gix-blame (1-pass incremental blame — reportedly 500-1000× faster than naive Python implementations) to compute, per file, the set of commits that last touched each line.
  • Map each line to its enclosing FuncSpace using the existing per-language AST walk and node spans.
  • Aggregate per-function: commits, churn, authors, ownership_top_share, age_days, last_modified_days, and risk_score.
  • Surface results as a nested vcs field on each FuncSpace, not just on the top-level file space.

Hard problems to solve

  • Line drift across history. A function's line range at HEAD does not correspond to the same range at older commits. Blame reports per-line provenance correctly, but mapping a historical commit back to "this function" requires either (a) traversing AST at the historical revision (expensive) or (b) treating the function as a moving line-set with each commit. Pick one and document.
  • Function renames / refactors. When a function is renamed or split, its blame trail diverges. Best-effort heuristics only; document the limit.
  • Function deletion + recreation. Same line range, different identity. Conservative: treat as one function for blame, flag in output.
  • Performance. Blame is N× more expensive than the file-level walk. Cache per-file blame results during a single invocation.

Out of scope

  • Cross-function call-graph weighting.
  • Function-level SZZ bug-inducing commit identification.

Acceptance criteria

  • vcs field appears on inner FuncSpace entries when --metrics vcs:per-function is selected (new sub-metric selector).
  • Performance regression test: per-function VCS on a 10k-line, 200-commit fixture completes in under 30 s on CI.
  • Documented limitations for renames, splits, and deletion+recreation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions