Skip to content

Store file cache entries with serialize() instead of var_export/include#5845

Closed
SanderMuller wants to merge 1 commit into
phpstan:2.2.xfrom
SanderMuller:perf/serialize-file-cache
Closed

Store file cache entries with serialize() instead of var_export/include#5845
SanderMuller wants to merge 1 commit into
phpstan:2.2.xfrom
SanderMuller:perf/serialize-file-cache

Conversation

@SanderMuller

Copy link
Copy Markdown
Contributor

What & why

FileCacheStorage stores entries as var_export'd PHP files loaded via include. On CLI,
where opcache is typically off, every cache hit pays full PHP parse plus AST-to-value
construction. A cold run loads thousands of per-file reflection and PHPDoc entries this
way. Storing the CacheItem with serialize() and reading it back with unserialize()
is measurably cheaper: about 2% less total CPU on a cold src/Type self-analysis run
(up to 4.7% in an ablation on a larger change set), and the entries shrink on disk.

Migration details, since a format switch deserves scrutiny:

  • Serialized entries are written as .dat files. An older PHPStan version sharing the
    same tmpDir looks for .php files, finds none, and treats it as a plain cache miss.
    Writing the new format into .php files would be worse than a miss: include of a
    file without a <?php tag echoes its raw bytes to stdout.
  • Entries written by older versions fail the CacheItem instance check on load and
    count as a one-time miss; the cache rebuilds from there.
  • clearUnusedFiles() now keeps current-format files (the predicate is built from
    CacheItem::class) and CACHED_CLEARED_VERSION is bumped, so leftover var_export
    entries from before the switch get purged once. Previously the keep-predicate matched
    only the old format, which meant a missing or stale marker file would have deleted
    every current entry as legacy garbage.

FileCacheStorageTest covers the round-trip, that cleanup keeps current-format entries
when the marker file is missing, and that legacy .php entries get removed.

Tests

make tests (12,714, green), make phpstan, make cs, make lint and
make composer-dependency-analyser all pass. Analysis output is byte-identical to the
baseline.

🤖 Generated with Claude Code

Loading a cache entry no longer pays PHP parse + AST-to-value cost for
every hit (opcache is typically off on CLI); unserialize() of the same
data is measurably cheaper across the thousands of per-file reflection
and PHPDoc cache entries a cold run touches (-2% cold CPU, -4.7% in the
ablation). Entries written in the old format fail to unserialize and
count as a one-time cache miss.

clearUnusedFiles() now recognises the serialized format - its keep
predicate is built from CacheItem::class - and CACHED_CLEARED_VERSION
is bumped so legacy var_export entries are purged once after upgrade.
Without that, a missing or stale cache-cleared marker would have made
cleanup delete every current-format entry as legacy garbage.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ondrejmirtes

Copy link
Copy Markdown
Member

I don't think it will be faster. Also with opcache enabled, reading the files is essentially free.

@ondrejmirtes

Copy link
Copy Markdown
Member

One thing that passing size from the outside will allow us to do later is to construct true LoC number to be analysed.

You can have a small file that uses 5 huge traits, would be nice if it's striped between the largest files to be analysed. But this PR doesn't have to do it now.

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback, I will look into it more carefully and get back to you!

@SanderMuller SanderMuller marked this pull request as draft June 11, 2026 08:25
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Measured it on the full make phpstan (two worktrees, primed file caches, hyperfine --warmup 1 --runs 6, M4 Pro / PHP 8.5.7):

opcache off opcache.enable_cli=1
base (var_export + include) 18.95 ± 0.04 s 18.40 ± 0.16 s
this PR (serialize) 18.93 ± 0.12 s 18.07 ± 0.03 s

So you're right — wall time is a wash. User CPU comes out 2–4% lower with serialize (one-shot worker processes still pay the compile on every include; opcache SHM doesn't outlive them), but that's not enough to justify changing the cache format. Closing — thanks for pushing me to measure the real workload.

The LoC-weighted striping idea is a good one; that gets easy once the size callback from #5844 is in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants