Skip to content

Improve performance of status on windows#2547

Draft
Special Bread (special-bread) wants to merge 4 commits into
GitoxideLabs:mainfrom
special-bread:windows-status-performance
Draft

Improve performance of status on windows#2547
Special Bread (special-bread) wants to merge 4 commits into
GitoxideLabs:mainfrom
special-bread:windows-status-performance

Conversation

@special-bread
Copy link
Copy Markdown

@special-bread Special Bread (special-bread) commented Apr 27, 2026

This creates a cache of file metadata that is then prefilled by windows API calls to allow per-directory walking instead of per file. As a result performance is much faster.

The cache method is made to minimise the surface area of the change, and is also windows-only where other targets should be unaffected.

Testing status on the linux repo improves speed from ~1000ms to ~300ms - putting this to be roughly on par with libgit2. Faster speeds are possible but would require larger changes, so this is an initial pass while avoiding doing too much.


Additional things to consider and discuss perhaps:

  1. This does have a little drift I feel, the cache works but perhaps it should not be considered a cache since its thrown away after every git status, and often invalidating these is equivalent to rebuilding these. So using a cache like an actual cache over multiple git statuses is up to the caller, and its pretty complex so the caller would need to know a lot to be able to use this, also for dubious benefit.
  2. I did leave some room open for linux based speedups later, but I believe that a different implementation would be needed as lstat on linux is fast and a cache wouldnt really speed things up, the only option here is to instead include a directory keyed cache which would be able to check for untracked files, and meaning that you can do fewer lstats overall, but that would be a perhaps 10-20% speedup, not a 300% (with 1000% possible) speedup like on windows.
  3. for reference check out this custom implementation of git status I have here: https://github.com/special-bread/tests-git-status - this can do a git status of linux (the above test case) in ~70ms, but is redone entirely, and also has some slightly different behaviour which is fine for my purposes but not identical to git - i.e. how it considers case sensitivity, how it treats some states as clean if git index entries cancel out, and some other details. I think that its possible to reach and beat the time in there, but that would require more invasive changes which I thought would be fairly rough for a PR that touches a piece of core functionality.
  4. see related issue here: "gix status" is slow on Windows #2296

Given that this is a common piece of functionality I would love for someone else to test this too, I myself have been embarrassingly busy recently so this PR cooked for a while, and I may have missed some stuff while working on it on and off.

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c27c3aebe0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment thread gix-status/src/metadata_cache.rs Outdated
Comment on lines +94 to +96
pub fn normalize_path(path: &[u8]) -> BString {
use bstr::ByteSlice;
path.to_str_lossy().to_lowercase().into()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect case-sensitive worktrees in metadata keys

Do not lowercase every cache key unconditionally here: on Windows repositories living in case-sensitive directories (e.g. per-directory case sensitivity enabled, typically with core.ignoreCase=false), distinct tracked paths like Foo.txt and foo.txt collapse to the same key and one entry overwrites the other. index_as_worktree then reads the wrong cached stat for at least one file, which can misreport tracked-file status (clean/modified/removed) instead of merely causing a cache miss.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shameful display on my part, I believe this may be left over from a different solution. There are various combinations of the folder on windows being case sensitive/not and there being collisions with the entries in the git tree/not, so either there is some mapping on what to do for each of the cases, or the cache should be case sensitive where any miss results in a miss, which slows down the status a tiny bit, but also maintains correctness. I believe that the worst case is if every path has incorrect casing between tree and disk, which would be strange and in practice I found it to be only a handful of files across multiple repos, with the typical case being zero. In the worst case this should revert back to the original performance, so not being slower than it used to be.

Ill make a commit that addresses this here

@Byron
Copy link
Copy Markdown
Member

My apologies for the late response. I guess that I thought I would wait until CI is green. But maybe it makes sense to align and see what your plans are with the PR.

Meantime, let me put it back to draft and you can bring it out of draft once you think it's ready for review.
Merging this is very valuable to me, so I can certainly also do last-mile work if you say so.

@Byron Sebastian Thiel (Byron) marked this pull request as draft May 6, 2026 04:07
@special-bread
Copy link
Copy Markdown
Author

Special Bread (special-bread) commented May 7, 2026

No worries at all, I too am very busy!

This is me trying to squeeze it out since there were simply too many things and it was a shame to let the work sit without making it upstream. If anything I should be apologizing for taking so long!

(Edit - I fixed a doc link didnt resolve on non windows machines and reran cargo fmt, but otherwise it seemed like previous tests were mostly passing, but a few failed to load and so failed overall, and now rerunning them without changing any of the code seems to make them all pass)


I think that some general guidance would be helpful on whether we want to land a minimal scope first (i.e. this) which matches to libgit performance, or look into more invasive changes which will get more speed. Also on the minimal implementation it is also worth asking the question if its a bit much even in the current state as perhaps the idea of the cache should be cleaned up into a windows only preprocessing step for the status, mostly the same thing, but the idea of maybe-later caching seems perhaps more work than its work since invalidating the cache is as expensive as recreating one in many cases.

@special-bread Special Bread (special-bread) force-pushed the windows-status-performance branch 2 times, most recently from 71f7b63 to 64cdbf4 Compare May 7, 2026 19:43
@Byron
Copy link
Copy Markdown
Member

Sebastian Thiel (Byron) commented May 7, 2026

I'd definitely think that as a first pass, this PR doesn't need to get larger, so if you are happy with the Windows speedups, we should work on cashing those in.

  1. This does have a little drift I feel, the cache works but perhaps it should not be considered a cache since its thrown away after every git status, and often invalidating these is equivalent to rebuilding these. So using a cache like an actual cache over multiple git statuses is up to the caller, and its pretty complex so the caller would need to know a lot to be able to use this, also for dubious benefit.

Then let's not expose it, and if you want, even keep it Windows only. I am running some performance tests right now, so let's see.

  1. I did leave some room open for linux based speedups later, but I believe that a different implementation would be needed as lstat on linux is fast and a cache wouldnt really speed things up, the only option here is to instead include a directory keyed cache which would be able to check for untracked files, and meaning that you can do fewer lstats overall, but that would be a perhaps 10-20% speedup, not a 300% (with 1000% possible) speedup like on windows.

There is a untracked-cache in the Git index which I believe is at least related, and #2503 might be worth a look. If you think it's related, I could prioritize its review - it's a biggie and of unknown quality (i.e. it might be faster to re-prompt it), but maybe it's good to get additional work off the ground more quickly or prevents reinventing something.

Codex says it's related, but I don't know how easy it is for you to evaluate beyond trying the other PR in isolation.

Codex

How is the cache used? Does it affect Windows? Also, is there any overlap with https://github.com/
/pull/2503 ?

The cache is used only in the tracked-file status path.

Call path:

  1. On Windows, Repository::status() defaults metadata_cache to Auto.
  2. Platform::into_iter() sees Auto and calls build_metadata_cache(repo, None).
  3. build_metadata_cache() does a gitignore-aware worktree walk and calls gix_status::metadata_cache::prepare().
  4. prepare() uses Windows GetFileInformationByHandleEx(FileIdBothDirectoryInfo) to batch-enumerate metadata into a HashMap<relative_path, CachedMetadata>.
  5. That cache is passed into index_worktree_status(), then into gix-status’s index_as_worktree context.
  6. In compute_status(), each tracked index entry first checks metadata_cache.get(rela_path). On hit, it uses cached metadata; on miss, it falls back to live Metadata::from_path_no_follow().

So yes, it affects Windows directly, and only Windows. The module, fields, API methods, and cache plumbing are behind #[cfg(windows)]. On macOS/Linux it is not compiled and cannot affect behavior. On Windows it is default-on for normal Repository::status(), but best-effort: if cache construction fails, it silently falls back to live stat calls.

PR #2503 overlaps conceptually but not at the same layer:

They are complementary in status: PR #2503 speeds the dirwalk/untracked side; this branch speeds tracked-file metadata checks, especially on Windows.

There is likely merge overlap, though. Both touch status plumbing and outcomes/tests, and both are sensitive to stat semantics. PR #2503 also fixes index ctime/mtime decoding, which is adjacent to this branch’s Windows cached-stat comparison. I would treat them as interacting PRs: separate goals, compatible design, but expect conflicts or review coordination around status outcomes, stat comparison correctness, and cache invalidation semantics.

  1. for reference check out this custom implementation of git status I have here: https://github.com/special-bread/tests-git-status - this can do a git status of linux (the above test case) in ~70ms, but is redone entirely, and also has some slightly different behaviour which is fine for my purposes but not identical to git - i.e. how it considers case sensitivity, how it treats some states as clean if git index entries cancel out, and some other details. I think that its possible to reach and beat the time in there, but that would require more invasive changes which I thought would be fairly rough for a PR that touches a piece of core functionality.

Yes, let's not rush this. git status shows how it's supposed to be done and gix must be compatible (but not bug for bug), with the same results. Ideally there is a complete baseline with Git so it won't be too easy to break. But then… everything is allowed I suppose, even though I'd already be happy with Git-equivalent performance which right now it also didn't always have.

  1. see related issue here: "gix status" is slow on Windows #2296

Given that this is a common piece of functionality I would love for someone else to test this too, I myself have been embarrassingly busy recently so this PR cooked for a while, and I may have missed some stuff while working on it on and off.

It can't hurt the let people try the PR!

I hope my answers help somewhat, and my plan is to wait until it comes out of draft for a proper review. Meantime, here is some generated info at your discretion (I just skimmed it).

Codex

Review outcome: changes requested. I confirmed the P2 correctness issue: Windows directory reparse points can be cached with both is_dir=true and is_symlink=true (gix-status/src/metadata_cache.rs:167-168), and compute_status() handles metadata.is_dir() before symlink/type handling (gix-status/src/index_as_worktree/function.rs:478-488). For a tracked symlink to a directory on Windows, this can report Removed from the cached path while the live from_path_no_follow() path would preserve lstat semantics. Reparse points should either carry more precise metadata or be skipped from the cache so the live stat fallback handles them.

Performance note: these benchmarks ran on macOS, so they do not exercise the Windows-only metadata cache path gated by #[cfg(windows)]. They are useful as a non-Windows regression check only.

Environment:

  • macOS 26.4.1, Apple M4 Max, 16 logical CPUs, 64 GiB RAM
  • rustc 1.95.0 (59807616e 2026-04-14)
  • hyperfine 1.20.0
  • git 2.50.1 (Apple Git-155)
  • ein v0.51.0-23-g6183fd092d

Compared binaries:

  • main: origin/main at 8af2691270, gix v0.53.0-89-g8af2691270
  • head: PR branch at 64cdbf47e1, gix v0.53.0-91-g64cdbf47e1

Repos selected from ein t find ~/dev:

Repo Path Repo HEAD Tracked files Worktree size Dirty entries (git status --porcelain -uno)
small /Users/byron/dev/github.com/Byron/small 363849d 8 20M 1
gitoxide /Users/byron/dev/github.com/GitoxideLabs/gitoxide.windows-status-performance 64cdbf47e1 2,857 2.6G 0
linux /Users/byron/dev/git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux 707df3375124 88,834 7.3G 13
rust /Users/byron/dev/github.com/rust-lang/rust ea346509168 54,192 9.2G 10
webkit /Users/byron/dev/github.com/WebKit/WebKit 6322a8384e74 417,002 16G 9

Command shape:

hyperfine --style basic --warmup 2 --min-runs 10 \
  --command-name "main <repo>" "/tmp/gitoxide-bench-bins/gix-main-8af2691270 --no-verbose -r '<repo-path>' status --no-write > /dev/null" \
  --command-name "head <repo>" "/tmp/gitoxide-bench-bins/gix-head-64cdbf47e1 --no-verbose -r '<repo-path>' status --no-write > /dev/null"

Results:

Repo main mean head mean Outcome
small 7.3 ms ± 0.5 ms 6.9 ms ± 0.4 ms head 1.05x faster, high relative noise
gitoxide 21.0 ms ± 0.8 ms 20.4 ms ± 0.4 ms head 1.03x faster
linux 183.8 ms ± 1.4 ms 183.1 ms ± 1.6 ms effectively neutral
rust 502.7 ms ± 3.9 ms 499.2 ms ± 5.5 ms effectively neutral
webkit 1.523 s ± 0.022 s 1.537 s ± 0.011 s main 1.01x faster, effectively neutral

Conclusion: no meaningful non-Windows performance regression showed up across the selected repo sizes. A Windows benchmark is still needed to validate the intended metadata-cache speedup and to verify the corrected symlink/reparse-point semantics once fixed.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Windows-only metadata cache for status to avoid per-file lstat calls (slow on Windows) by pre-enumerating directories and reusing the collected metadata during tracked-file checks, aiming to bring gix status performance closer to libgit2/Git for Windows.

Changes:

  • Add a Windows-only gix_status::metadata_cache module that walks the worktree via Windows APIs and builds a path→metadata map.
  • Thread the optional metadata cache through gix status configuration (MetadataCacheConfig) and into the gix-status index-as-worktree pipeline.
  • Extend mode-change logic to accept pre-extracted file-type/permission bits (to support cached metadata).

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
gix/src/status/platform.rs Adds Windows-only builder methods to provide/disable/prepare the metadata cache from the status() platform.
gix/src/status/mod.rs Introduces MetadataCacheConfig and a Windows-only helper to build the cache (gitignore-aware).
gix/src/status/iter/mod.rs Wires MetadataCacheConfig into iterator construction and passes cache refs into index-worktree status.
gix/src/status/index_worktree.rs Extends index_worktree_status() signature/context to accept an optional Windows metadata cache.
gix-status/src/lib.rs Exposes the Windows-only metadata_cache module and re-exports its types.
gix-status/src/metadata_cache.rs Implements the cache builder and Windows directory enumeration walker.
gix-status/src/index_as_worktree/types.rs Adds an optional Windows-only metadata_cache field to the tracked-modifications context.
gix-status/src/index_as_worktree/function.rs Consults the cache on Windows before falling back to live syscalls; adapts mode/stat extraction to cached metadata.
gix-status/src/index_as_worktree_with_renames/types.rs Threads the optional cache through the “with renames” context on Windows.
gix-status/src/index_as_worktree_with_renames/mod.rs Passes the cache into the underlying tracked-modifications status call on Windows.
gix-status/tests/status/index_as_worktree.rs Updates test contexts to include the new Windows-only metadata_cache field.
gix-status/tests/status/index_as_worktree_with_renames.rs Updates test contexts to include the new Windows-only metadata_cache field.
gix-status/Cargo.toml Adds hashbrown and Windows-only windows-sys dependency for the cache implementation.
gix-index/src/entry/mode.rs Adds change_to_match_fs_with_values() to reuse mode-change logic with cached metadata bits.
Cargo.lock Records new dependency resolutions for added crates/features.

Comment thread gix-status/src/metadata_cache.rs Outdated
Comment thread gix-status/src/metadata_cache.rs Outdated
Comment thread gix-status/src/worktree_stats.rs
Comment thread gix/src/status/mod.rs Outdated
Comment on lines +259 to +266
thread_limit: Option<usize>,
) -> Result<gix_status::MetadataCache, crate::status::index_worktree::Error> {
let workdir = repo
.workdir()
.ok_or(crate::status::index_worktree::Error::MissingWorkDir)?;
let sync_repo = repo.clone().into_sync();
let index = repo.index_or_empty()?;
let index_state: &gix_index::State = &index;
Comment thread gix-status/Cargo.toml Outdated
Follow up to git status performance improvement, this fixes an edge case where a case sensitive entry in the cache gets lowercased and matches a second case sensitive entry in the tree, potentially resulting in incorrect git status entries. Skipping lowercasing entirely results in those cases being a cache miss instead making it more transparent.
@special-bread
Copy link
Copy Markdown
Author

I cleaned up the code to be more minimal and instead of styling it as a cache its now styled as a windows only preprocessing step. There is still a parameter to run status without the cache but that is intended mostly for regression tests and such.

In theory there is a tiny amount of overhead due to starting the walk first, then doing git status afterwards. But in practice in every repo with more than a few dozen files in it should be faster since your directory wide walk will capture those in one go as opposed to multiple single file queries


Re untracked caches: I admit i didnt comb through the other pr, so this is just my thoughts on those:

In my experience on windows the main slow down is the lstat approach to querying files, and untracked cache doesnt fix that, so it can only provide a modest speedup, and in my experience never helped that much. I believe that you can get a really fast status without it on windows

The linux story for that is nicer - where in linux you dont get any speedup from directory wide queries, but I believe that you can use the untracked cache to skip some comparisons if you have a bunch of files that are untracked, and directories too. Also you can swap some queries for cheaper queries if you rely on it, so thats nice. But im not on a linux machine so this is mostly theory not practice.

An untracked cache is additive on top of a good status implementation, and with windows in particular there is a bit of tension since the fastest way to walk is by directory anyway so it does little good to have an untracked cache, which mostly only works for where entire directories are untracked so you can return early. For that reason in my custom implementation I didnt add support for that since it was more complexity and didnt really help.


I havent looked super deeply into why the current implementation is 350ms and not 100ms for example, but I imagine that those speedups would carry over to linux too, since that would be touching the core status implementation, not just adding a windows preprocessing step.

So to speed things up more i would focus on that next, i.e. making sure that the status is fast without an untracked cache first, then once that is fast, looking at what the cache can do - i.e. -> make the current code strong first, then add complexity after

Also to manage expectations a bit - I am sadly amazingly busy so wont be able to devote my time to a full rewrite of status (which is also terrifying since this is public code and must be completely correct and there are lots of edge cases to status!)

Hope this helps :>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants