Skip to content

[dev-cache] Parallel scanning for large directory trees #64

Description

@markcallen

Overview

Speed up scans on large directory trees by processing multiple directories concurrently using a worker pool.

Implementation Plan

  1. Worker pool

    • Use a goroutine pool (e.g. runtime.NumCPU() or configurable --workers)
    • Walk remains single-threaded; dispatch inspectPath and pattern matching to workers
    • Sync with channels or errgroup
  2. Thread-safe aggregation

    • Collect findings in a slice protected by mutex, or use channel to aggregate
    • Preserve deterministic output order (sort by path) for stable table/JSON
  3. Batching

    • Batch directories to inspect (e.g. when a cache pattern match is found, send to worker)
    • Alternative: walk emits paths, workers run inspectPath; main goroutine collects
  4. Flag

    • --workers N (default: 0 = sequential for backward compatibility, or 1 = NumCPU)
    • Document: parallel mode may increase memory use on very large scans
  5. Testing

    • Verify results match sequential run (same findings, same sizes)
    • Benchmark on large tree (e.g. 1000+ projects)

Acceptance Criteria

  • --workers 4 (or default) parallelizes scanning
  • Output identical to sequential when no races
  • No data races (run under -race)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions