Skip to content

perf: replay cached dry-run diffs for unchanged files, 9-150x faster warm dry-runs#8033

Open
SanderMuller wants to merge 3 commits into
rectorphp:mainfrom
SanderMuller:perf/dry-run-diff-cache
Open

perf: replay cached dry-run diffs for unchanged files, 9-150x faster warm dry-runs#8033
SanderMuller wants to merge 3 commits into
rectorphp:mainfrom
SanderMuller:perf/dry-run-diff-cache

Conversation

@SanderMuller

@SanderMuller SanderMuller commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Built on #8028 (its commit is first; review the last two here). Opening now so the numbers discussion has code attached — happy to rebase once #8028 lands.

Problem

Files with a pending diff are never marked clean in dry-run mode — correct, the diff must keep being reported — so every warm dry-run reprocesses them from scratch: parse, full PHPStan scope resolution, every rule. On real projects most warm time is exactly this. laravel/framework src/Illuminate with the prepared sets has 1,526 pending diffs: a warm dry-run costs the same as a cold one (220s vs 240s single process).

Change

Cache the produced FileDiff keyed on the file's own content hash, the parameter hash and one content hash per captured dependency (the capture from #8028). When everything still matches on the next run, replay the cached diff instead of reprocessing the file — skipping scope resolution entirely. Gist:

// ApplicationFileProcessor::processFile()
if ($useDiffCache) {
    $cachedFileProcessResult = $this->dryRunDiffCache->load($file, $configuration);
    if ($cachedFileProcessResult instanceof FileProcessResult) {
        return $cachedFileProcessResult;
    }
}

Dry-run only: write mode always computes fresh. Selective runs (--only, --only-suffix) bypass the cache entirely. --no-diffs results never cross into normal entries. The original hasChanged flag is replayed, since a rule can report line changes while printing identical content. A failed dependency capture means the file is never cached. The parameter hash is memoized per process, as computing it serializes the whole parameter bag and the cache key needs it per file.

Numbers

corpus pending diffs warm dry-run on main with replay speedup
laravel/framework src/Illuminate, single process 1,526 220-236s 1.5s ~150x
laravel/framework, parallel (14 cores) 1,526 31s 2.0s ~15x
laravel-queue-insights (public, 85 files) 38 4.6s 0.34s ~13x
hihaho/rector-rules (public, 25 files) 2 1.1s 0.28s ~4x
private 138-file corpus 62 2.7s 0.30s ~9x

Output byte-identical to a fresh run in every cache state, verified per measurement; numbers re-measured on the current minimal #8028 base. Cold cost is the #8028 capture (~7-8% interleaved in its minimal form); the replay itself adds nothing measurable on top. The warm gain scales with how many pending diffs the project has; a fully clean project sees no change.

Verification

Invalidation is covered end-to-end in tests: own-content change, dependency change (fresh-process simulation), --no-diffs cross-replay, the hasChanged flag round-trip. Replay works in parallel mode (workers save, workers replay).

return false;
}

$this->internalFunctionNames ??= array_flip(get_defined_functions()['internal']);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please don't do this.

Correctness is more important than speed. Doing this may cause inconsistent result for developers due to runtime check per available extensions in their local dev.

Use phpstan reflection instead, that's the way make it reliable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied — NativeFunctionCallAnalyzer is removed from the stack entirely (here and in #8028). Every node now goes through PHPStan's DependencyResolver, so native functions are resolved by PHPStan reflection like everything else. The skip-layer trade-off (~4% vs ~7-8% cold) can be a separate discussion if it ever comes back as its own PR.

Comment on lines +753 to +755
$resolvedName = $node->name->getAttribute('resolvedName');
$nameForMemoKey = $resolvedName instanceof Name ? $resolvedName : $node->name;
$functionMemoKey = $mutatingScope->getNamespace() . '|' . strtolower($nameForMemoKey->toCodeString());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use $this->nodeNameResolver->getName($node) instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied — the memo key now uses $this->nodeNameResolver->getName($node) (lowercased, as function names are case-insensitive). Measured cost: none (interleaved cold A/B within noise), and verified the namespace-fallback case end to end: two namespaces calling the same unqualified helper() with different resolutions record distinct dependency edges, and editing one helper re-reports only its caller, byte-identical to a fresh run.

SanderMuller and others added 3 commits June 11, 2026 11:18
The cache only checked each file's own content, so a clean file stayed
skipped on warm runs even when one of its dependencies changed, e.g. a
parent class method gaining a return type that lets a child file infer
its own. A fresh run reports the new change, a warm run misses it.

PHPStanNodeScopeResolver now records each file's dependencies during
scope resolution using PHPStan's own DependencyResolver, the same engine
behind PHPStan's result cache. Cache entries store the file's own hash
plus one hash per dependency, all re-validated on load; legacy string
entries self-upgrade on the next write. A failed capture skips caching
entirely rather than caching a partial set.

Function calls memoize their dependency files per resolved name, as
signature dependencies are identical at every call site.

Selective runs (--only, --only-suffix) bypass the cache write, same
guard as rectorphp#8029.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Files with a pending diff are never marked clean in dry-run mode, the
diff must keep being reported, so every warm dry-run reprocessed them
from scratch. On a 4,400-file project with 37 pending diffs that was
~11s per run.

Cache the FileDiff with the file's own hash plus one hash per captured
dependency; when all still match, replay the cached diff instead of
reprocessing, skipping scope resolution entirely. Dry-run only: write
mode always computes fresh. --no-diffs results never cross into normal
entries, and the original hasChanged flag is replayed, as a rule can
report line changes while printing identical content.

Warm dry-run on the same project: ~9x faster single process, ~3.5x
parallel. Output stays byte-identical in every cache state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SimpleParameterProvider::hash() serializes the whole parameter bag and
contentHash() runs per file, so a warm run paid the serialization once
per file (~46ms per 3,200 calls with a 300-entry skip list).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@SanderMuller SanderMuller force-pushed the perf/dry-run-diff-cache branch from 7a94fea to bd25a20 Compare June 11, 2026 09:18
@SanderMuller SanderMuller changed the title perf: replay cached dry-run diffs for unchanged files, 9-170x faster warm dry-runs perf: replay cached dry-run diffs for unchanged files, 9-150x faster warm dry-runs Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants