Skip to content

Fix unchanged-files cache missing changes induced by changed dependencies#8028

Closed
SanderMuller wants to merge 1 commit into
rectorphp:mainfrom
SanderMuller:perf/autoresearch
Closed

Fix unchanged-files cache missing changes induced by changed dependencies#8028
SanderMuller wants to merge 1 commit into
rectorphp:mainfrom
SanderMuller:perf/autoresearch

Conversation

@SanderMuller

@SanderMuller SanderMuller commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Problem

The unchanged-files cache only checks each file's own content. A clean file stays skipped on warm runs even when one of its dependencies changed.

Repro on main: a parent class method gains : int, which lets a child file infer its own return type. A fresh run reports the new change in the child file, a warm run reports nothing until --clear-cache.

// Dad.php — the only file that changes on disk
 class Dad
 {
-    public function getValue()
+    public function getValue(): int
     {
         return 1;
     }
 }

// Kid.php — unchanged on disk, but typeDeclarations can now add `: int` to value()
class Kid extends Dad
{
    public function value()
    {
        return $this->getValue();
    }
}

Reproducible with vanilla rector in ~30s: https://github.com/SanderMuller/rector-stale-cache-poc

Fix

PHPStanNodeScopeResolver records which files each file depends on during scope resolution, using PHPStan's own DependencyResolver — the same engine behind PHPStan's result cache. No own logic about which nodes matter: every node goes through PHPStan's resolver and its branch chain decides.

// PHPStanNodeScopeResolver, inside the node callback:
$nodeDependencies = $this->dependencyResolver->resolveDependencies($node, $mutatingScope);
foreach ($nodeDependencies->getFileDependencies() as $dependencyFile) {
    $this->fileDependencyCollector->record($filePath, $dependencyFile);
}

Cache entries store the file's own hash plus one hash per dependency, all re-validated on load:

// before
'abc123'
// after
['hash' => 'abc123', 'deps' => ['/src/Dad.php' => 'def456']]

Old string entries still load and upgrade themselves on the next write. If capture fails for a file, the file is not cached at all rather than cached with a partial set. Function calls memoize their dependency files per resolved name. Selective runs (--only, --only-suffix) skip the cache write, same guard as #8029.

Cost and scope

~7-8% on a cold run (interleaved A/B, 138-file corpus); warm runs unchanged within noise. Output byte-identical to a fresh run.

Per review feedback this is now the minimal single-logic version: 306 lines, 6 files, one commit, no node-type guards, no get_defined_functions(), no hardcoded node names — native functions are handled by PHPStan's reflection like everything else. The per-node guards that cut the cold cost to ~4% are split off and can come later as their own perf PR, where that trade-off can be discussed on its own.

Follow-ups: #8029 (open), #8033 (diff replay, the 9-170x warm win, stacked on this).

Comment thread src/NodeAnalyzer/NativeFunctionCallAnalyzer.php Outdated
return false;
}

$this->internalFunctionNames ??= array_flip(get_defined_functions()['internal']);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsonasik Do you mean this line then? Would static reflection be faster?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this AI generated get_defined_functions() caching usage on StructArmed, it slow, and use of namespaced_name is more than enough there.

If function check exists needed, there is already phpstan reflection function NativeFunctionReflection for that, so way more reliable per static analysis process instead of runtime php.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is built once per process and memoized, not per call - measured in rector's container: ~0.05ms to build, 10k checks ≈ 1.4ms. Calling get_defined_functions() per node would indeed be slow; this does not.

The namespaced-name comparison answers a different question: "does this resolve to the global namespace", while this guard needs "can this call have a file dependency". Userland functions in the global namespace - app()/config() in Laravel, any composer files-autoloaded helpers - pass that comparison but live in files the cache must track; skipping capture for them would reintroduce the stale-cache bug this PR fixes. The matrix test pins that global userland functions are not treated as native.

NativeFunctionReflection is only known after resolving through the ReflectionProvider (first getFunction() + variants ≈ 24ms, signature map) - the work this guard sits in front of. And the runtime list errs only in the safe direction: a name not in it simply is not skipped, capture proceeds and PHPStan stays authoritative.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means the laravelapp() is not properly loaded by phpstan, to make it properly loaded, you may probably needs to load via withBootstrapFiles(), eg:

<?php

use Rector\Config\RectorConfig;

return RectorConfig::configure()
    ->withBootstrapFiles([__DIR__ . '/bootstrap/app.php'];

The reason to use exactly PHPStan reflection as possible is to avoid runtime which inverse with what static analysis purpose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class this thread is on is gone from the PR now — every node goes through PHPStan's DependencyResolver, so function resolution is pure PHPStan reflection.

Your bigger point holds in an important way: the quality of dependency tracking follows the quality of function resolution. I verified the app()-style case with the current branch — a files-autoloaded helper outside the analyzed paths, like vendor helpers: with the project autoloader loaded, PHPStan reflects it with file info, the cache entry records helpers.php as a dependency, and editing only the helper makes the warm run re-report the caller, byte-identical to a fresh run. So composer projects work without withBootstrapFiles(); for functions composer does not load, withBootstrapFiles() is exactly the way to make tracking see them.

When a function is not resolvable at all, PHPStan records no dependency — but it cannot infer anything through it either, so the cached result still matches a fresh run (also verified). Same contract as PHPStan's result cache: edges mirror exactly what PHPStan can see.

Chasing this also surfaced a pre-existing gap: a changed --autoload-file never invalidates the cache (unlike withBootstrapFiles(), which the config hash covers). Fixed separately in #8034.

@SanderMuller SanderMuller changed the title perf: dependency-aware caching, fixes stale cache results and speeds up warm dry-runs 3.5-9x Fix unchanged-files cache missing changes induced by changed dependencies Jun 10, 2026
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Slimmed this down as discussed: it now contains only the dependency-aware unchanged-files cache fix — 626 lines, half of which is tests. The --only cache bug is split out as #8029 (standalone, ~30 lines of src + an e2e). The two perf pieces (dry-run diff replay, long-lived dependency-set store) are ready as follow-up PRs and will open one at a time after this merges.

@samsonasik your NativeFunctionCallAnalyzer comment still applies here — answered in the thread above.

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Numbers on a real monorepo, as requested — laravel/framework src/Illuminate (1,628 files), deadCode + codeQuality + typeDeclarations + earlyReturn + codingStyle + import names. That config leaves 1,526 pending diffs, which is exactly the case the current cache can't help with: files with a pending diff are never cached, so a warm dry-run reprocesses all of them.

cold warm dry-run
main, single process 240s 220–236s
this PR, single process 210s 205s
this PR + diff replay (follow-up), single process 209s 1.3s
main, parallel (14 cores) 33s 31s
+ diff replay (follow-up), parallel 35s 2.0s

Output is byte-identical across all cells, verified per run. Warm ≈ cold on main because the 1,526 dirty files redo full scope resolution every run; the follow-up replays their cached diffs once dependencies are validated. The warm gain scales with pending-diff count: a 62-diff project measures ~9x, an 85-file public repo with 38 diffs ~12x (laravel-queue-insights), this corpus 170x single-process / 15x parallel.

This PR alone is the correctness half. Its cold cost is +4.5% on a precisely interleaved A/B (138-file corpus, 5 rounds), decomposed: the node-type guards and dependency hashing are free, the cost is PHPStan's resolveDependencies() itself on scope-dependent nodes. A class-name memo experiment measured zero gain, so this is the floor for this design; the dependency-set store follow-up cuts repeat colds to ~2-3%.

Proof of concept repo: https://github.com/SanderMuller/rector-stale-cache-poc — reproduces the stale-cache bug in ~30s with vanilla rector 2.4.5 (composer install && ./reproduce.sh); the trigger is just applying rector's own suggestion to a parent class. The gist of the fix is in the README and the PR description above.

One caveat for anyone rerunning this on a rector-src dev checkout: --autoload-file pointing at a project's vendor/autoload.php prepends that project's autoloader, which can shadow rector's own runtime dependencies (symfony/console, illuminate/container) with incompatible versions and hang the run — the released scoped build is immune. I'll file that separately; the benchmark used a classes-only autoload shim to keep all sides clean.

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Pushed f2009eb: codex review caught a gap — PHPStan's DependencyResolver does resolve dynamic callees ($callable()) via the scope type, so skipping non-Name FuncCalls lost those dependency edges. The skip is removed and a second tripwire now pins that PHPStan behavior.

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Rewrote this per the feedback: single logic change, one commit, 306 lines. The node-type guards and NativeFunctionCallAnalyzer are gone — every node now goes through PHPStan's DependencyResolver and its own branch chain decides, so native functions are handled by PHPStan reflection (@samsonasik's point) and there are no hardcoded node-name lists (@TomasVotruba's point). Cost moves from ~4% to ~7-8% cold; the guard layer that wins that back is split off for a possible later perf PR where the trade-off can be reviewed in isolation. #8033 is rebased accordingly.

The cache only checked each file's own content, so a clean file stayed
skipped on warm runs even when one of its dependencies changed, e.g. a
parent class method gaining a return type that lets a child file infer
its own. A fresh run reports the new change, a warm run misses it.

PHPStanNodeScopeResolver now records each file's dependencies during
scope resolution using PHPStan's own DependencyResolver, the same engine
behind PHPStan's result cache. Cache entries store the file's own hash
plus one hash per dependency, all re-validated on load; legacy string
entries self-upgrade on the next write. A failed capture skips caching
entirely rather than caching a partial set.

Function calls memoize their dependency files per resolved name, as
signature dependencies are identical at every call site.

Selective runs (--only, --only-suffix) bypass the cache write, same
guard as rectorphp#8029.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@samsonasik

Copy link
Copy Markdown
Member

I am closing this in favor of extracted PRs so we can focus on single purpose on single PR:

@samsonasik samsonasik closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants