gix,gix-odb,gix-pack: stream blob contents from the ODB by mjgil · Pull Request #2526 · GitoxideLabs/gitoxide

Malcom Gilbert (mjgil) · 2026-04-20T02:29:07Z

Add streaming blob lookup support for loose and packed objects, avoid eagerly materializing packed results into memory, and expose repository helpers with regression coverage for loose, packed, empty, and wrong-type cases.

Addresses #1595.

Summary

This adds blob-only streaming reads from the object database so callers can obtain blob contents without first materializing the full object in memory.

At the repository level, this introduces blob streaming helpers that:

stream loose blobs directly
stream packed blobs through a tempfile-backed reader
reject non-blob object IDs
preserve the existing empty-blob fast path

Implementation

To support this, the change adds a streaming object path through the ODB stack.

Loose objects are exposed as a reader over incremental decompression.

Packed objects are decoded into a temporary file and then returned as a readable stream. This avoids holding the fully resolved blob in RAM, including for delta-resolved packed blobs, while still fitting the existing pack decoding model.

The repository-facing API is intentionally blob-only for now, which keeps the surface area aligned with the use case described in this issue.

Tests

Added coverage for:

loose blob streaming
packed blob streaming
empty blob streaming
rejection of non-blob IDs

Also validated with:

cargo test -p gix
cargo test -p gix-odb
cargo test -p gix-pack

Notes

Co-authored-by: Claude
Co-authored-by: Codex

Add streaming blob lookup support for loose and packed objects, avoid eagerly materializing packed results into memory, and expose repository helpers with regression coverage for loose, packed, empty, and wrong-type cases. Co-authored-by: Claude Co-authored-by: Codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f918257b18

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-04-20T02:34:06Z

+                zlib::Status::Ok | zlib::Status::BufError if eof => {
+                    self.done = true;
+                    return Ok(total_written);


Reject truncated loose streams instead of returning EOF

The streaming loose-object reader currently treats eof with Status::Ok|BufError as a normal end-of-stream and returns Ok(...), but it never verifies that the deflate stream actually reached StreamEnd or that the advertised loose-object size was fully produced. For a truncated/corrupt loose object this can silently return fewer bytes than Stream::size() instead of surfacing an error (the non-streaming path reports size/decompression mismatches), which can propagate corrupted blob contents to callers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-20T02:34:06Z

    }
-    assert_eq!(i, data.len());
-    assert_eq!(target.len(), 0);
+    debug_assert_eq!(i, data.len());


Preserve delta output-length validation in apply()

After generalizing delta::apply() to write into any Write, the function no longer checks that the expected output length was fully produced. In decode paths that pass a fixed-size slice target, malformed delta instructions that produce fewer bytes than declared result_size now return Ok(()) and leave trailing zeroed bytes, silently corrupting decoded objects instead of failing fast.

Useful? React with 👍 / 👎.

Sebastian Thiel (Byron) · 2026-04-20T02:57:14Z

Thanks for taking a stab at this. If this would work, that would be a huge step forward for our Gitoxide.

For the PR to have a chance of being merged, I'd need proof of "it's working". For that, you could create a test with a custom 'measuring' allocator that shows how an optimally compressed and thus huge base object and deltified object are decompressed with less memory usage compared to the current eager decompression that stores the whole object in memory.

I may also add that it will take considerable time to get to it, but if you stay on top of it and provide the requested information, it should happen eventually.

Sebastian Thiel (Byron)

Let's also add benchmark comparisons to see if this affects the performance of packed object decompression in the non-streaming case. The gix CLI has subcommands for that.

Reject truncated loose-object streams, restore delta output-length validation, and add allocator-backed proof that packed delta blob streaming lowers peak memory usage compared to eager blob lookup. Co-authored-by: Codex

Malcom Gilbert (mjgil) · 2026-04-20T03:45:53Z

Addressed the review items in 84931e7c3.

Changes:

reject truncated loose-object streams instead of returning short reads
restore delta output-size validation after the Write-based delta::apply() refactor
add an allocator-backed proof test for packed delta blob streaming memory behavior

Validation:

cargo test -p gix-pack
cargo test -p gix-odb
cargo test -p gix

Memory proof:

added cargo test -p gix --test blob_stream_memory
this creates a packed delta blob and compares peak allocations between find_blob() and find_blob_stream() on fresh repo instances
the streaming path shows a lower peak allocation than the eager path

Non-streaming benchmark:

command: gix free pack verify -s <idx>
main: 111.2 ms ± 1.0 ms
this branch: 113.6 ms ± 2.1 ms

So on this fixture the non-streaming packed-object path is slightly slower, about 2%, but not by a large margin.

Bring the gix_object::Write trait into scope for write_buf() and remove the unused Read import so workspace clippy and test builds pass again. Co-authored-by: Codex

Accept both UnexpectedEof and InvalidData for truncated loose-object stream corruption, as the stream reader can surface either depending on where zlib detects the truncated deflate payload. Co-authored-by: Codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b7bf16549

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-04-20T05:02:40Z

+                                pack_id: resolved_pack_id,
+                                pack_offset,
+                                entry_size: outcome.compressed_size + header_size,


Recompute pack location after delta-base retry

When decode_entry_to_write() falls into the DeltaBaseUnresolved retry path, the object is looked up and decoded again after recursive base resolution, and that recursive lookup can reorder or refresh snapshot.indices. In that case, the successful retry may come from a different pack entry than the initial lookup, but the returned location still uses the originally captured pack_id/header_size, so callers can receive a mismatched Location (pack_id, pack_offset, entry_size) for the bytes they actually streamed.

Useful? React with 👍 / 👎.

Sebastian Thiel (Byron) · 2026-04-20T08:22:25Z

Thanks! Please note that I am a bit allergic against PRs that appear fully automated. This translates to a lot of time for me and possibly very little for the author.

What's the story related to gix-filter? Right now this also isn't streaming, and to checkout or checkin/track a file one will have to run them. That is, any worktree interaction both from or to Git objects may require a filter invocation.

Will Stott (willstott101) · 2026-05-09T18:30:03Z

In the issue here: #1595 Sebastian Thiel (@Byron) you mentioned that you'd prefer streaming to fail if it wasn't possible rather than the use something like Cursor<Vec<u8>> - but it seems like that's a code path in this PR.

I suppose the context is different. If the goal is for gitoxide to incrementally be able to improve it's memory usage by reducing how often entire blobs must be loaded into memory then I prefer this invisible Vec approach. But I suppose if the goal is for users of the library to request a file stream for server streaming like in my old issue then knowing if the stream was possible or not would be helpful, and actually just being given the Vec is more useful.

I would be happy to help review and extend this PR, it's still been in the back of my mind since I made that issue.

My inclination would be to make this API surface available similar to it's current state, and then smaller PRs/improvements can be made to consume the streaming API surface within gix - for working directory stuff - as well as the underlying streaming mechanics - to stream more efficiently in more scenarios.

Does that sound reasonable? Will offer an actual review later.

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

Sebastian Thiel (Byron) marked this pull request as draft April 20, 2026 02:57

Sebastian Thiel (Byron) requested changes Apr 20, 2026

View reviewed changes

Harden blob streaming validation and add memory proof

84931e7

Reject truncated loose-object streams, restore delta output-length validation, and add allocator-backed proof that packed delta blob streaming lowers peak memory usage compared to eager blob lookup. Co-authored-by: Codex

Malcom Gilbert (mjgil) added 4 commits April 19, 2026 22:57

Fix blob stream loose-store test imports

36495db

Bring the gix_object::Write trait into scope for write_buf() and remove the unused Read import so workspace clippy and test builds pass again. Co-authored-by: Codex

Relax truncated loose-stream test error kind

20eaed2

Accept both UnexpectedEof and InvalidData for truncated loose-object stream corruption, as the stream reader can surface either depending on where zlib detects the truncated deflate payload. Co-authored-by: Codex

Fix loose stream truncation test setup

41b5fb7

Fix Unix permissions in loose stream test

4b7bf16

Malcom Gilbert (mjgil) marked this pull request as ready for review April 20, 2026 04:55

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

Sebastian Thiel (Byron) mentioned this pull request Apr 20, 2026

Memory-bounded clone/fetch for large repositories #2527

Open

Sebastian Thiel (Byron) added the feedback requested label Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gix,gix-odb,gix-pack: stream blob contents from the ODB#2526

gix,gix-odb,gix-pack: stream blob contents from the ODB#2526
Malcom Gilbert (mjgil) wants to merge 6 commits into
GitoxideLabs:mainfrom
mjgil:pr-1595

Malcom Gilbert (mjgil) commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

Sebastian Thiel (Byron) commented Apr 20, 2026

Uh oh!

Sebastian Thiel (Byron) left a comment

Uh oh!

Malcom Gilbert (mjgil) commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

Sebastian Thiel (Byron) commented Apr 20, 2026

Uh oh!

Will Stott (willstott101) commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Malcom Gilbert (mjgil) commented Apr 20, 2026

Summary

Implementation

Tests

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) commented Apr 20, 2026

Uh oh!

Sebastian Thiel (Byron) left a comment

Choose a reason for hiding this comment

Uh oh!

Malcom Gilbert (mjgil) commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) commented Apr 20, 2026

Uh oh!

Will Stott (willstott101) commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants