Skip to content

Stripe size-sorted files across parallel jobs#5844

Merged
ondrejmirtes merged 1 commit into
phpstan:2.2.xfrom
SanderMuller:perf/striped-job-scheduling
Jun 11, 2026
Merged

Stripe size-sorted files across parallel jobs#5844
ondrejmirtes merged 1 commit into
phpstan:2.2.xfrom
SanderMuller:perf/striped-job-scheduling

Conversation

@SanderMuller

@SanderMuller SanderMuller commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What & why

The scheduler chunks files into jobs in path order. Directories tend to cluster files of
similar weight, so one job can collect a directory's heaviest files and become the
straggler the whole run waits for while other workers sit idle.

This change sorts the files by size and deals them round-robin across the jobs, so every
job gets the same size profile. Job count and the per-job ceiling (jobSize) stay the
same; only the composition changes. File sizes are supplied from the outside
(callable(string): int), so the Scheduler itself stays unit-testable without touching
the filesystem.

Benchmarks

Complete make phpstan on phpstan-src (clears the result cache itself, so every run is
a full analysis of 2,364 files), two clean worktrees with primed file caches,
hyperfine --warmup 1 --runs 6, Apple M4 Pro / PHP 8.5.7:

Command Mean [s] Min [s] Max [s] Relative
base ff2647amake phpstan 19.047 ± 0.100 18.921 19.215 1.08 ± 0.01
PR — make phpstan 17.573 ± 0.068 17.454 17.635 1.00

−1.47 s, 1.08× faster; user CPU unchanged within noise (111.3 s vs 113.3 s) — the win is
load balance, not less work. A reversed-order pass reproduces it (17.77 ± 0.24 s vs
19.04 ± 0.23 s), so it is not an ordering or thermal artifact.

One detail worth knowing: sorting without striping makes things considerably worse
(+46% wall on a smaller corpus), because plain chunking of a sorted list concentrates all
heavy files into the first job. The striping is what makes the sort pay off, and
testHeaviestFilesAreSpreadAcrossJobs pins exactly that property.

File size is a proxy for analysis cost, not a perfect predictor, but across repeated runs
it consistently beat path-order chunking and never lost to it.

Tests

make tests (12,713, green), make phpstan, make cs, make lint and
make composer-dependency-analyser all pass. Analysis output is byte-identical to the
baseline; only the distribution of files over worker jobs changes. SchedulerTest covers
the striped layout exactly (six distinctly-sized files: the three heaviest land in three
different jobs) and that every file is scheduled exactly once with jobs ≤ jobSize.

🤖 Generated with Claude Code

@ondrejmirtes ondrejmirtes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, I wondered about it several times myself.

124,
2,
[30, 30, 30, 30, 4],
[25, 25, 25, 25, 24],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests could be more expressive here. Could we supply differently sized files here, or maybe change the Scheduler signature so that we provide file sizes from the outside and the class is still unit testable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — scheduleWork() now takes a callable(string): int for the file sizes, so the class is unit-testable without touching the filesystem; both production callers pass fn ($file) => (int) @filesize($file). Added two expressive tests with synthetic sizes: one asserts the exact striped layout for six distinctly-sized files (the three heaviest files end up in three different jobs, each paired with a light file), the other checks every file is scheduled exactly once and no job exceeds jobSize for a larger irregular set. The existing count-based dataset tests now pass a constant-size callback.

Comment on lines +59 to +65
$numberOfJobs = (int) ceil(count($files) / $this->jobSize);
$stripedJobs = [];
foreach ($files as $i => $file) {
$stripedJobs[$i % $numberOfJobs][] = $file;
}

$jobs = array_values($stripedJobs);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this last block be simplified to $jobs = array_chunk($files, $this->jobSize); again?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't, unfortunately — that's the one shape I measured as actively harmful. With the list sorted by size, array_chunk() puts the ~20 heaviest files together in the first job, and whichever worker picks it up becomes a straggler the whole run waits for: sort + chunk measured +46% cold wall time vs. unsorted chunking on the src/Type self-analysis benchmark, while sort + round-robin striping is what produces the −21..23%. The striping is the point, not an artifact; the new testHeaviestFilesAreSpreadAcrossJobs test pins exactly this property.

Files were chunked into jobs in path order, so a job could end up with
all of a directory's heaviest files and become the straggler the whole
run waits for. Sorting by file size and dealing files round-robin across
jobs gives every job the same size profile: -21..23% cold wall time on a
14-core machine, CPU unchanged.

Sorting without striping makes it worse (+46% wall - the heavy files all
land in the first job), which is why the striping is essential rather
than cosmetic.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@SanderMuller SanderMuller force-pushed the perf/striped-job-scheduling branch from eede67f to b8e6fde Compare June 11, 2026 09:41
@ondrejmirtes

Copy link
Copy Markdown
Member

So what are hyperfine numbers for this one when running complete make phpstan on phpstan-src?

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Complete make phpstan on phpstan-src (which clears the result cache itself, so every run is a full analysis of the 2,364 files), two clean worktrees with primed file caches, Apple M4 Pro / PHP 8.5.7:

Command Mean [s] Min [s] Max [s] Relative
base ff2647amake phpstan 19.047 ± 0.100 18.921 19.215 1.08 ± 0.01
PR — make phpstan 17.573 ± 0.068 17.454 17.635 1.00

−1.47 s, 1.08× faster (hyperfine --warmup 1 --runs 6). User CPU is unchanged within noise (111.3 s vs 113.3 s) — the win is purely better load balance across the workers, not less work. A reversed-order pass (PR first, then base, 3 runs each) reproduces it: 17.77 ± 0.24 s vs 19.04 ± 0.23 s, so it's not an ordering or thermal artifact.

@ondrejmirtes ondrejmirtes merged commit cb28849 into phpstan:2.2.x Jun 11, 2026
665 of 672 checks passed
@ondrejmirtes

Copy link
Copy Markdown
Member

Thank you!

@staabm

staabm commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

looks like this PR regressed a e2e test:
https://github.com/phpstan/phpstan-src/actions/runs/27340923857/job/80777219668

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Good catch, you're right — and I now understand why I misdiagnosed it earlier as environmental: on my Mac the e2e fails on the base branch too (APFS resolves the wrong-case PSR-4 path Platforms/SqlitePlatform.php, so class.nameCase fires regardless of this PR), which made me think it wasn't related. On Linux that lookup correctly misses, and what actually decides the outcome there is the analysis order inside the job: MemoizingReflector keys are lowercased, so once src/Persistence/Sql/DbalDriverMiddleware.php resolves SQLitePlatform (the DBAL 4 name), a later lookup of SqlitePlatform from bootstrap-types.php hits that memo and reports class.nameCase instead of the expected class.notFound. The size sort put the 560-byte middleware before the 327-byte bootstrap file; FileFinder order has them the other way around.

Fixed by restoring the original input order within each job after striping — the size sort now only decides which job a file lands in, never the order files are analysed in. That's also the safer contract in general, since results can apparently be order-sensitive. testFilesWithinAJobKeepTheirInputOrder pins exactly the bug-14036 shape, and make phpstan numbers are unchanged (17.73 ± 0.04 s vs 19.00 ± 0.06 s base, 1.07×).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants