Skip to content

chunked: use reflinks for chunk deduplication#892

Open
giuseppe wants to merge 2 commits into
podman-container-tools:mainfrom
giuseppe:chunked-preopen-files
Open

chunked: use reflinks for chunk deduplication#892
giuseppe wants to merge 2 commits into
podman-container-tools:mainfrom
giuseppe:chunked-preopen-files

Conversation

@giuseppe

@giuseppe giuseppe commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

When reusing chunks from other layers, reflink the source file into a temporary directory under the staging dir. The reflinked copy shares data blocks (CoW, O(1)) but is a separate inode, so itsurvives concurrent deletion of the source layer.

If the filesystem does not support reflinks, chunk deduplication is skipped and all chunks are fetched from the network.

Follow-up for #890

@github-actions github-actions Bot added the storage Related to "storage" package label Jun 5, 2026

@mtrmac mtrmac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would work individually and cleanly resolve the race, but I don’t think it scales: a layer can be thousands or millions of files (i.e. possibly even an order of magnitude more roll-sum chunks), and the process has a file descriptor limit.

(I didn’t review the details of the implementation.)

Comment thread storage/pkg/chunked/storage_linux.go Outdated
srcDirfd, err := unix.Open(source, unix.O_RDONLY|unix.O_CLOEXEC, 0)
if err != nil {
// The source layer may have been deleted concurrently.
if errors.Is(err, unix.ENOENT) || errors.Is(err, unix.ENOTDIR) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ENOTDIR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong assumption that it could happen when a parent directory is deleted. That still maps to ENOENT.

@giuseppe giuseppe force-pushed the chunked-preopen-files branch from 89cac1b to a09e9f6 Compare June 5, 2026 16:52
@giuseppe giuseppe marked this pull request as ready for review June 5, 2026 17:24
@giuseppe giuseppe changed the title [WIP] chunked: handle concurrent layer deletion during dedup chunked: follow-up for https://github.com/podman-container-tools/container-libs/pull/890 Jun 5, 2026

@mtrmac mtrmac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a literally one-minute skim, but looks reasonable — and actually much smaller than I expected. Nice!

Comment thread storage/pkg/chunked/storage_linux.go Outdated
@giuseppe giuseppe changed the title chunked: follow-up for https://github.com/podman-container-tools/container-libs/pull/890 chunked: use reflinks for chunk deduplication Jun 5, 2026
@giuseppe

giuseppe commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

I've tested this manually and seems to work fine

@cgwalters

Copy link
Copy Markdown
Contributor

Code changes offhand look sane to me.

One thing I don't understand - why can't we lock the source layer?


Side note: this is very different in a composefs-rs, because we don't have unpacked files per layer at all, just an object store.

@giuseppe

giuseppe commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

One thing I don't understand - why can't we lock the source layer?

the network code calls into containers/image that in turn can call back into c/storage so we might end up with some deadlocks.

On top of that, the network calls could be slow and they would block any write access to the store in the meanwhile (possibly this could be abused)

Comment thread storage/pkg/chunked/storage_linux.go Outdated

wg.Wait()

chunkRefsDir := filepath.Join(filepath.Dir(dest), "chunk-refs")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would really benefit from a comment documenting why we bother with all of this. (Yes, it is now documented in the commit message, but finding that is more work.)

defer os.RemoveAll(chunkRefsDir)
}
type reflinkKey struct{ root, path string }
reflinkMap := make(map[reflinkKey]string)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Documenting the key/value semantics would be nice.)

Comment thread storage/pkg/chunked/storage_linux.go Outdated
Comment thread storage/pkg/chunked/storage_linux.go Outdated

wg.Wait()

chunkRefsDir := filepath.Join(filepath.Dir(dest), "chunk-refs")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1] … Given the signature of the function, a caller would not expect anything to happen outside of dest.

driver.Differ is explicitly an unstable interface, so let’s make the design explicit, please.

Because overlay already hard-codes the structure of the staging directory (overlay.Driver.ApplyDiffWithDiffer sets out.Target, and later overlay.Driver.CleanupStagingDirectory uses filepath.Dir(stagingDirectory), chunkedDiffer.ApplyDiff can’t just be given the top-level staging directory to create the Target wherever it wants.

So, ApplyDiffWithDiffer could explicitly provide a “destination directory” and an “empty scratch directory” siblings to chunkedDiffer.ApplyDiff (OTOH conceptually ApplyDiffWithDiffer does not know a scratch will always be necessary); or chunkedDiffer.ApplyDiff could get a “destination directory” and “staging area” and with the obligation to use MkdirTemp to allocate the reflink directory within the staging area.

Comment thread storage/pkg/chunked/storage_linux.go Outdated
if isReflinkNotSupported(err) {
reflinkSupported = false
}
break

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this explicitly check for ENOENT and fail on unexpected errors? I’m not really sure it’s necessary, when we have the option to get the data from the network.

At least a comment highlighting that ENOENT can legitimately happen here might be useful.

Add a Reflink() function that attempts a CoW file clone without
falling back to io.Copy.  Callers that need to know whether the
filesystem supports reflinks can use this instead of ReflinkOrCopy.

ReflinkOrCopy is refactored to call Reflink internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the chunked-preopen-files branch 2 times, most recently from 48e0860 to 23867e1 Compare June 10, 2026 05:24
@giuseppe

Copy link
Copy Markdown
Contributor Author

@mtrmac thanks, I've addressed the comments

@Honny1 Honny1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one non-blocking comment.

// entirely and fetch everything from the network.
var chunkRefsDir string
if differOpts != nil && differOpts.StagingDirectory != "" {
d, err := os.MkdirTemp(differOpts.StagingDirectory, "chunk-refs-")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: I would log an error at least as debug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

Thinking more about it, I think it should be a hard error. If we fail to create a temporary directory, we need to report that

When reusing chunks from other layers, reflink the source file into
a temporary directory under the staging dir.  The reflinked copy
shares data blocks (CoW, O(1)) but is a separate inode, so it
survives concurrent deletion of the source layer.

If the filesystem does not support reflinks, chunk deduplication is
skipped and all chunks are fetched from the network.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the chunked-preopen-files branch from 23867e1 to 838d9bf Compare June 10, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants