Skip to content

feat(metrics): directory- and repo-level bus factor (Avelino DoA) #332

@dekobon

Description

@dekobon

Follow-up to #328.

Add directory- and repo-level bus / truck factor as a vcs aggregate, complementing the v1 file-level ownership_top_share.

Why

Bus factor (a.k.a. truck factor) measures the minimum number of developers whose departure would leave the project (or a subsystem) without sufficient knowledge to continue. Avelino et al. propose a Degree-of-Authorship (DoA) heuristic that iteratively removes the most knowledgeable developer until >50% of files are abandoned (see https://www.sciencedirect.com/science/article/pii/S0020025526002847).

The v1 file-level ownership_top_share captures concentration per file. Bus factor captures concentration across a set of files, which is the directionally correct unit for component- or repo-level risk planning.

Scope

  • New vcs::BusFactor computation:
    • Per-directory bus factor (each top-level directory and each src/ subdirectory)
    • Repo-level bus factor
  • Avelino DoA formula: DoA(d, f) = N₁ * FirstAuthorship + N₂ * ln(1 + Deliveries) + N₃ * ln(1 + AcceptedChanges) with thresholds from the paper.
  • Surface in a new top-level vcs_aggregate object alongside the per-file vcs data.

Edge cases

  • Files with single authors: skew bus factor downward; document.
  • Bots (already filtered at file level) must also be filtered here.
  • Very large directories: cap iteration count.

Acceptance criteria

  • vcs_aggregate.bus_factor.{repo, by_directory} emitted alongside per-file vcs data.
  • Configurable threshold (default 0.5, per Avelino).
  • Unit tests on synthetic repos with known authorship distributions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions