chunked: use reflinks for chunk deduplication#892
Conversation
mtrmac
left a comment
There was a problem hiding this comment.
I think this would work individually and cleanly resolve the race, but I don’t think it scales: a layer can be thousands or millions of files (i.e. possibly even an order of magnitude more roll-sum chunks), and the process has a file descriptor limit.
(I didn’t review the details of the implementation.)
| srcDirfd, err := unix.Open(source, unix.O_RDONLY|unix.O_CLOEXEC, 0) | ||
| if err != nil { | ||
| // The source layer may have been deleted concurrently. | ||
| if errors.Is(err, unix.ENOENT) || errors.Is(err, unix.ENOTDIR) { |
There was a problem hiding this comment.
wrong assumption that it could happen when a parent directory is deleted. That still maps to ENOENT.
89cac1b to
a09e9f6
Compare
mtrmac
left a comment
There was a problem hiding this comment.
This is a literally one-minute skim, but looks reasonable — and actually much smaller than I expected. Nice!
|
I've tested this manually and seems to work fine |
|
Code changes offhand look sane to me. One thing I don't understand - why can't we lock the source layer? Side note: this is very different in a composefs-rs, because we don't have unpacked files per layer at all, just an object store. |
the network code calls into containers/image that in turn can call back into c/storage so we might end up with some deadlocks. On top of that, the network calls could be slow and they would block any write access to the store in the meanwhile (possibly this could be abused) |
|
|
||
| wg.Wait() | ||
|
|
||
| chunkRefsDir := filepath.Join(filepath.Dir(dest), "chunk-refs") |
There was a problem hiding this comment.
This would really benefit from a comment documenting why we bother with all of this. (Yes, it is now documented in the commit message, but finding that is more work.)
| defer os.RemoveAll(chunkRefsDir) | ||
| } | ||
| type reflinkKey struct{ root, path string } | ||
| reflinkMap := make(map[reflinkKey]string) |
There was a problem hiding this comment.
(Documenting the key/value semantics would be nice.)
|
|
||
| wg.Wait() | ||
|
|
||
| chunkRefsDir := filepath.Join(filepath.Dir(dest), "chunk-refs") |
There was a problem hiding this comment.
[1] … Given the signature of the function, a caller would not expect anything to happen outside of dest.
driver.Differ is explicitly an unstable interface, so let’s make the design explicit, please.
Because overlay already hard-codes the structure of the staging directory (overlay.Driver.ApplyDiffWithDiffer sets out.Target, and later overlay.Driver.CleanupStagingDirectory uses filepath.Dir(stagingDirectory), chunkedDiffer.ApplyDiff can’t just be given the top-level staging directory to create the Target wherever it wants.
So, ApplyDiffWithDiffer could explicitly provide a “destination directory” and an “empty scratch directory” siblings to chunkedDiffer.ApplyDiff (OTOH conceptually ApplyDiffWithDiffer does not know a scratch will always be necessary); or chunkedDiffer.ApplyDiff could get a “destination directory” and “staging area” and with the obligation to use MkdirTemp to allocate the reflink directory within the staging area.
| if isReflinkNotSupported(err) { | ||
| reflinkSupported = false | ||
| } | ||
| break |
There was a problem hiding this comment.
Should this explicitly check for ENOENT and fail on unexpected errors? I’m not really sure it’s necessary, when we have the option to get the data from the network.
At least a comment highlighting that ENOENT can legitimately happen here might be useful.
Add a Reflink() function that attempts a CoW file clone without falling back to io.Copy. Callers that need to know whether the filesystem supports reflinks can use this instead of ReflinkOrCopy. ReflinkOrCopy is refactored to call Reflink internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
48e0860 to
23867e1
Compare
|
@mtrmac thanks, I've addressed the comments |
Honny1
left a comment
There was a problem hiding this comment.
LGTM, just one non-blocking comment.
| // entirely and fetch everything from the network. | ||
| var chunkRefsDir string | ||
| if differOpts != nil && differOpts.StagingDirectory != "" { | ||
| d, err := os.MkdirTemp(differOpts.StagingDirectory, "chunk-refs-") |
There was a problem hiding this comment.
Non-blocking: I would log an error at least as debug.
There was a problem hiding this comment.
thanks.
Thinking more about it, I think it should be a hard error. If we fail to create a temporary directory, we need to report that
When reusing chunks from other layers, reflink the source file into a temporary directory under the staging dir. The reflinked copy shares data blocks (CoW, O(1)) but is a separate inode, so it survives concurrent deletion of the source layer. If the filesystem does not support reflinks, chunk deduplication is skipped and all chunks are fetched from the network. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
23867e1 to
838d9bf
Compare
When reusing chunks from other layers, reflink the source file into a temporary directory under the staging dir. The reflinked copy shares data blocks (CoW, O(1)) but is a separate inode, so itsurvives concurrent deletion of the source layer.
If the filesystem does not support reflinks, chunk deduplication is skipped and all chunks are fetched from the network.
Follow-up for #890