Commit c7b5f7c
authored
[Support] Use atomic counter in parallelFor instead of per-task spawning (llvm#187989)
This function is primarily used by lld and debug info tools.
Instead of pre-splitting work into up to MaxTasksPerGroup (1024) tasks
and spawning each through the Executor's mutex+condvar, use an atomic
counter for work distribution. Only ThreadCount workers are spawned;
each grabs the next chunk via atomic fetch_add.
This reduces futex calls from ~31K (glibc, release+assertions build) to
~1.4K when linking clang-14 (191MB PIE with --export-dynamic) with
`ld.lld --threads=8` (each parallelFor spawned up to 1024 tasks, each
requiring mutex lock + condvar signal).
```
Wall System futex
glibc (assertions) before: 927ms 897ms 31K
glibc (assertions) after: 879ms 765ms 1.4K
mimalloc before: 872ms 694ms 25K
mimalloc after: 830ms 661ms 1K
```1 parent e73d8f8 commit c7b5f7c
1 file changed
Lines changed: 23 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
256 | 256 | | |
257 | 257 | | |
258 | 258 | | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
265 | 280 | | |
266 | 281 | | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
| 282 | + | |
| 283 | + | |
279 | 284 | | |
280 | 285 | | |
281 | 286 | | |
| |||
0 commit comments