Skip to content

gix-pack: dynamic work queue for parallel delta resolution#2592

Draft
Vinaya Mandke (vmandke) wants to merge 2 commits into
GitoxideLabs:mainfrom
vmandke:feature/perf-gix-pack-work-stealing-queue
Draft

gix-pack: dynamic work queue for parallel delta resolution#2592
Vinaya Mandke (vmandke) wants to merge 2 commits into
GitoxideLabs:mainfrom
vmandke:feature/perf-gix-pack-work-stealing-queue

Conversation

@vmandke
Copy link
Copy Markdown

Fixes #2424

Replaces: resolve::deltas
Modifies: resolve::delta_mt

Previously each thread was assigned a fixed subtree via in_parallel_with_slice, causing load imbalance
when one root had a deep chain of deltas. Thread synchronization also used a busy wait of poll_interval.
This PR rewrites deltas_mt to use a Workqueue with Condvar blocking, so all threads pull work evenly
regardless of tree shape.


  • Feedback on Approach
  • Minimal Test for the example in linked issue. (deeply chained deltas)

Timings

--- gix main (in_parallel_with_slice) ---
/tmp/gix-main free pack index create --pack-path /Users/vmandke/phpstan.pack /tmp/bench2-main
index: bf450685c28fb13ad44fcafe90a5401b7b54141a
pack: 73555a206b95a6590d30e8671eacb5bd4ff55a70

real 2m36.613s
user 6m13.277s
sys 0m15.543s

--- gix fix (work-stealing queue) ---
/tmp/gix-fix free pack index create --pack-path /Users/vmandke/phpstan.pack /tmp/bench2-fix
index: bf450685c28fb13ad44fcafe90a5401b7b54141a
pack: 73555a206b95a6590d30e8671eacb5bd4ff55a70

real 1m14.710s
user 6m43.032s
sys 0m10.444s


Profiling

Gix on Main

Gix on Main

Gix on Fix

Gix on Fix

Git on same pack

Git on same pack

AI Usage disclaimer

Used Claude Code for benchmark scripting, fixes and cleanup.
Also used for understanding both Gitoxide and Git codebases.

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 325b86c31c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

size.fetch_add(base_bytes.len(), Ordering::Relaxed);
}
}
queue.finish_item();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make queue accounting panic-safe

When a worker panics after queue.pop() has incremented in_progress—for example from the existing delta-size assert_eq! on malformed input or from a panicking inspector—this explicit finish_item() is skipped. Any other workers waiting in pop() then see an empty queue with in_progress > 0 and block forever, so the scoped join cannot unwind; use a guard or otherwise ensure the counter is decremented on panic paths.

Useful? React with 👍 / 👎.

@vmandke Vinaya Mandke (vmandke) changed the title gix-pack: work-stealing queue for parallel delta resolution gix-pack: dynamic work queue for parallel delta resolution May 9, 2026
…a resolution

Replaces: resolve::deltas
Modifies: resolve::delta_mt

Previously each thread was assigned a fixed subtree via in_parallel_with_slice, causing load imbalance
when one root had a deep chain of deltas. Thread synchronization also used a busy wait of poll_interval.
This PR rewrites deltas_mt to use a Workqueue with Condvar blocking, so all threads pull work evenly
regardless of tree shape.

Co-authored-by: Claude Code
@vmandke Vinaya Mandke (vmandke) force-pushed the feature/perf-gix-pack-work-stealing-queue branch from 325b86c to c308380 Compare May 9, 2026 20:13
@vmandke
Copy link
Copy Markdown
Author

edit:: updated commit message, as this is not a work stealing queue, rather a global work queue

@Byron
Copy link
Copy Markdown
Member

Sebastian Thiel (Byron) commented May 10, 2026

Thanks a lot!

Without looking at the code in depth, I just had it run benchmarks to see what the algorithm change does given different and somewhat extreme packs.

My takeaway is that the user time of the new algorithm is probably more similar to Git because it will use threads more evenly, but these threads duplicate more work as well.

So the idea behind the implementation is that the work is organized so that no duplicate work exists.
In the case of PHPstan this obviously costs much more memory, which is worth considering as a trade-off.

For now, however, I would be ok to use more memory if it's in the service to ultimately doing less work. To do that, the previous implementation already implemented something like a work stealing model, but apparently the heuristics didn't make it kick in for a PHPstan. Maybe this can be fixed, or maybe the algorithm behind that can be improved to allow incorporating threads not only per root, but let idle threads work together with other threads that still have work to do.
After all, we already have the delta tree in memory, and it's no problem to split that up into smaller subtasks without letting threads overlap.

If this works, then a follow up could deal with reducing the memory footprint. There should be a lot of potential, given that right now it keeps in memory all the bases in the delta tree, even though many of these at some point will not be used anymore. And from what we know, apparently this could halve the memory usage in case of PHPstan.

Please note that I put this PR back into draft while the details of the algorithm are still not figured out.

Codex Benchmark Results

Pack Verification Performance Report

Compared this branch’s target/release/gix against mainline gix and Git.

Binaries:

  • Mainline gix: gix v0.53.0-100-g1322a36599
  • This branch: gix v0.53.0-66-g60e081abe2, HEAD 60e081abe2
  • Git: git version 2.50.1 (Apple Git-155)

Methodology:

  • Built this branch with cargo build --release --bin gix.
  • gix command: gix free pack verify <idx>.
  • Git command: git verify-pack -v <idx> >/dev/null.
  • Used hyperfine --warmup 1 --min-runs 5.
  • For gix-vs-gix, ran benchmarks twice with command order reversed.
  • Measured memory with /usr/bin/time -lp, using maximum resident set size.
  • Machine was not fully isolated, so some runtime fluctuation is expected.

Linux Fixture Pack

Pack index:

/Users/byron/dev/github.com/GitoxideLabs/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx

Pack size: about 1.4 GB; index size: about 203 MB.

Runtime

Tool Wall Time User Time System Time Notes
Mainline gix 14.41 s 180.64 s 8.89 s average of two 5-run hyperfine passes
Branch gix 18.23 s 182.56 s 41.66 s average of two 5-run hyperfine passes
Git 51.040 s ± 0.393 183.831 s 51.318 s one 5-run hyperfine pass

Detailed gix passes:

Pass Order Mainline Wall/User/Sys Branch Wall/User/Sys Result
1 mainline then branch 14.540 s ± 0.159 / 182.700 s / 8.928 s 18.766 s ± 1.430 / 185.033 s / 41.068 s mainline 1.29x ± 0.10 faster
2 branch then mainline 14.286 s ± 0.275 / 178.579 s / 8.850 s 17.698 s ± 1.811 / 180.095 s / 42.249 s mainline 1.24x ± 0.13 faster

On this pack, the branch is about 26% slower than mainline gix, but still about 2.80x faster than Git.

Peak RSS

Tool Run 1 RSS Run 2 RSS Avg RSS Avg Real/User/Sys
Mainline gix 3,541,352,448 B 3,540,860,928 B 3.30 GiB 13.84 s / 173.79 s / 8.90 s
Branch gix 3,220,176,896 B 3,224,027,136 B 3.00 GiB 16.23 s / 172.89 s / 44.43 s
Git 1,091,026,944 B 1,109,950,464 B 1.02 GiB 51.11 s / 183.85 s / 52.14 s

The branch uses about 303 MiB less peak RSS than mainline gix, a reduction of about 9.0%. Git uses substantially less RSS on this pack, but is much slower.

PHPStan Pack

Pack index:

/Users/byron/dev/github.com/phpstan/phpstan/.git/objects/pack/pack-fc5a93b6903e792e5c65ce2f8f02054775c35c94.idx

Pack size: about 6.4 GB; index size: about 3.1 MB.

Runtime

Tool Wall Time User Time System Time Notes
Mainline gix 103.92 s 321.54 s 4.87 s average of two 5-run hyperfine passes
Branch gix 25.44 s 361.45 s 4.21 s average of two 5-run hyperfine passes
Git 73.553 s ± 0.440 338.737 s 2.688 s one 5-run hyperfine pass

Detailed gix passes:

Pass Order Mainline Wall/User/Sys Branch Wall/User/Sys Result
1 mainline then branch 104.648 s ± 2.888 / 325.635 s / 4.899 s 25.478 s ± 0.270 / 363.916 s / 4.326 s branch 4.11x ± 0.12 faster
2 branch then mainline 103.198 s ± 0.957 / 317.440 s / 4.848 s 25.408 s ± 0.926 / 358.992 s / 4.092 s branch 4.06x ± 0.15 faster

On this pack, the branch is about 4.08x faster than mainline gix and about 2.89x faster than Git. Git is about 1.41x faster than mainline gix here.

Peak RSS

Tool Run 1 RSS Run 2 RSS Avg RSS Avg Real/User/Sys
Mainline gix 19,708,248,064 B 19,887,423,488 B 18.44 GiB 102.48 s / 315.71 s / 4.25 s
Branch gix 8,940,797,952 B 9,664,970,752 B 8.66 GiB 25.54 s / 369.68 s / 4.19 s
Git 1,776,631,808 B 1,925,038,080 B 1.72 GiB 74.11 s / 338.57 s / 2.90 s

The branch uses about 9.77 GiB less peak RSS than mainline gix, a reduction of about 53.0%. Git uses much less RSS than both gix variants, but is about 2.89x slower than the branch.

Summary

Pack Runtime Result Peak RSS Result User/System Notes
Linux fixture Branch gix is about 26% slower than mainline gix, but 2.80x faster than Git Branch gix uses about 9% less RSS than mainline gix; Git uses the least RSS Branch gix has much higher system time than mainline gix
PHPStan Branch gix is about 4.08x faster than mainline gix and 2.89x faster than Git Branch gix uses about 53% less RSS than mainline gix; Git uses the least RSS Branch gix uses more user CPU than mainline gix, but finishes much sooner

Overall, this branch is strongly beneficial for the PHPStan pack, both in wall-clock time and peak RSS. On the Linux fixture, it reduces peak RSS modestly but regresses runtime compared to mainline gix, with notably higher system CPU time.

@Byron Sebastian Thiel (Byron) marked this pull request as draft May 10, 2026 08:36
@vmandke
Copy link
Copy Markdown
Author

Yes, I tried to work out something similar to how Git was parallelising. And I was only looking at the phpstan pack :) Will benchmark this more, and look into how the previous implementation can be fixed.

Thanks for the review. Will get back with something by next week.

@Byron
Copy link
Copy Markdown
Member

Thanks. Meantime, I also added Git as a baseline.

Overall, Gitoxide is much faster in mainline and in this branch, but it uses significantly more memory.
From that point of view, this branch already is better. Compared to Git, it really isn't worse.

However, my feeling is that it would be better to optimize the current algorithm to also deal with PHP Stan and similar packs. It seems more like a bug in the current implementation than something that needs a complete change to be more like Git.

And in the second step, clearly, Gitoxide needs an algorithm that is just like the one in Git to reduce memory usage. Despite Git being much slower, it's very respectable how small the memory footprint is that it needs to handle these large packs.

There's clearly a trade-off and I personally would always want it to be fast but we'd want to offer a choice here, and with it, a more balanced algorithm.

With all that said, it also seems that this branch already contains an implementation that is similar to Git, given It's memory footprint, and all that speaks against it, a proper review outstanding is that it is slower on the Linux pack. So maybe what we really want is to implement this as alternative and further reduce its memory footprint instead.

I will leave it to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

phpstan/phpstan pack resolves with a single thread only

2 participants