Follow-up to #328.
Extend the change-history (VCS) metrics from file-level to per-function granularity by attributing commits to function line ranges using git blame + the existing AST FuncSpace line spans.
Why
Per-function attribution is the gold standard for triage: it isolates the risky function inside a risky file rather than flagging the whole file. The vulnerability/defect-prediction literature explicitly calls out function-level analysis as higher-precision than file-level when blame data is available (see FFmpeg/Wireshark attack-surface work cited in vulnerability-correlation.md).
Scope
- Use
gix-blame (1-pass incremental blame — reportedly 500-1000× faster than naive Python implementations) to compute, per file, the set of commits that last touched each line.
- Map each line to its enclosing
FuncSpace using the existing per-language AST walk and node spans.
- Aggregate per-function: commits, churn, authors, ownership_top_share, age_days, last_modified_days, and risk_score.
- Surface results as a nested
vcs field on each FuncSpace, not just on the top-level file space.
Hard problems to solve
- Line drift across history. A function's line range at HEAD does not correspond to the same range at older commits. Blame reports per-line provenance correctly, but mapping a historical commit back to "this function" requires either (a) traversing AST at the historical revision (expensive) or (b) treating the function as a moving line-set with each commit. Pick one and document.
- Function renames / refactors. When a function is renamed or split, its blame trail diverges. Best-effort heuristics only; document the limit.
- Function deletion + recreation. Same line range, different identity. Conservative: treat as one function for blame, flag in output.
- Performance. Blame is N× more expensive than the file-level walk. Cache per-file blame results during a single invocation.
Out of scope
- Cross-function call-graph weighting.
- Function-level SZZ bug-inducing commit identification.
Acceptance criteria
Follow-up to #328.
Extend the change-history (VCS) metrics from file-level to per-function granularity by attributing commits to function line ranges using
git blame+ the existing ASTFuncSpaceline spans.Why
Per-function attribution is the gold standard for triage: it isolates the risky function inside a risky file rather than flagging the whole file. The vulnerability/defect-prediction literature explicitly calls out function-level analysis as higher-precision than file-level when blame data is available (see FFmpeg/Wireshark attack-surface work cited in
vulnerability-correlation.md).Scope
gix-blame(1-pass incremental blame — reportedly 500-1000× faster than naive Python implementations) to compute, per file, the set of commits that last touched each line.FuncSpaceusing the existing per-language AST walk and node spans.vcsfield on eachFuncSpace, not just on the top-level file space.Hard problems to solve
Out of scope
Acceptance criteria
vcsfield appears on innerFuncSpaceentries when--metrics vcs:per-functionis selected (new sub-metric selector).