feat(cache): advertise eager-CDC narinfos as Compression: none predictively#1380
Conversation
Advertise narinfo Compression: none predictively for eager-CDC NARs so clients always request .nar and never .nar.xz, eliminating the chunking-window compressed-request corruption at its source. This commit lands the OpenSpec planning artifacts (proposal, design, delta specs for cdc-chunking and inflight-nar-staging, and tasks). The pull path is brought into parity with PutNarInfo, which already normalizes CDC narinfos to none; scope is gated strictly to eager CDC (isCDCEnabled && !GetCDCLazyChunkingEnabled) since lazy keeps the whole xz file servable as .nar.xz. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tively Under eager CDC the durable form of a NAR is its uncompressed chunk sequence and ncps has no NAR compressor, so a .nar.xz request during the chunking window could not be served correctly (404->upstream fallback at best, decompressed-bytes-mislabeled-as-xz corruption at worst). The narinfo advertised the upstream xz compression until the NAR was genuinely chunked, so clients requested .nar.xz and hit this. Advertise Compression: none / nar/<hash>.nar for eager-CDC NARs consistently -- at narinfo store time on the pull path and at serve time -- so clients always request the uncompressed .nar and never .nar.xz, eliminating the corruption at its source. This brings the pull path into parity with PutNarInfo, which already normalizes CDC narinfos to none on the upload path. How: - isEagerCDC() gate: CDC enabled AND chunk store present AND lazy chunking disabled. Lazy CDC keeps the whole xz file servable as .nar.xz and is deliberately excluded. - Store-time: a new eager-CDC case in the pull-path narinfo switch rewrites URL/Compression and nulls FileHash/FileSize (after the opaque case, so cachix handling is untouched). - Serve-time: maybeCDCNormalizeNarInfoURL normalizes unconditionally under eager CDC (backstop for legacy xz rows); lazy/drain keep the HasNarInChunks gate. Safety: a .nar request for not-yet-materialized bytes re-downloads via lookupPreferredUpstreamURL (re-fetch upstream narinfo -> recover xz URL -> decompress), never a terminal 404 -- verified by the updated phantom-recovery test. This reverses the earlier "persist truthful xz until chunked" stance (the former Fix-B guard), now safe because of narServability re-download + in-flight staging. Tests: new isEagerCDC + store-time + serve-time unit tests; the Fix-B guard, phantom-recovery, and not-chunked tests updated to the new policy. Verified end-to-end: cdc-lifecycle e2e full pass (eager->none) and the contention e2e (cross-pod readers serve byte-correct .nar with narinfo none). Harness assertions added to both e2e drivers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Apply the change's delta specs to the main specs and archive the completed change: - cdc-chunking: broaden the GetNarInfo normalization requirement to cover eager (predictive) vs lazy (HasNarInChunks-gated) modes, and add "Eager CDC MUST advertise narinfo Compression none predictively". - inflight-nar-staging: add "Predictive narinfo none steers eager-CDC cross-pod readers to the uncompressed staging serve". Tasks reconciled: all in-scope tasks done (production + unit tests + cdc-lifecycle/contention e2e verification); additive harness scenarios (stale-xz e2e variant, lazy cross-pod, k8s compression assert) moved to a Deferred section for a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
This change is part of the following stack: Change managed by git-spice. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughUnder eager-CDC, narinfo is predictively normalized to advertise ChangesEager-CDC Predictive Narinfo Normalization
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Code Review
This pull request implements predictive Compression: none / .nar URL advertisement for eager-CDC NARs consistently on the pull path (at store time and serve time) before chunks exist. This ensures clients always request the uncompressed .nar instead of .nar.xz during the chunking window, preventing 404 fallbacks and silent corruption. The changes include the introduction of the isEagerCDC() helper, store-time and serve-time normalization logic in pkg/cache/cache.go, comprehensive unit and end-to-end tests, and detailed design specifications. There are no review comments provided, so I have no feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/cache/cdc_url_desync_prevention_internal_test.go`:
- Around line 115-117: The test currently discards the error returned by
c.GetNarInfo which can hide failures; update both occurrences where you call
reqCtx := withNarPrefetchDisabled(newContext()) followed by _, _ =
c.GetNarInfo(reqCtx, testdata.Nar1.NarInfoHash) to capture the result and error
(e.g. _, err := c.GetNarInfo(...)) and call require.NoError(t, err) immediately
before asserting persisted DB rows; apply the same change for the second
occurrence (the call around the other test block).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fa706bd7-c4b8-4a3b-a718-6a528a241f06
📒 Files selected for processing (16)
CHANGELOG.mddev-scripts/test-cdc-lifecycle-e2e.pydev-scripts/test-inflight-staging-contention-e2e.pyopenspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/.openspec.yamlopenspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/design.mdopenspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/proposal.mdopenspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/specs/cdc-chunking/spec.mdopenspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/specs/inflight-nar-staging/spec.mdopenspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/tasks.mdopenspec/specs/cdc-chunking/spec.mdopenspec/specs/inflight-nar-staging/spec.mdpkg/cache/cache.gopkg/cache/cache_test.gopkg/cache/cdc_advertise_narinfo_none_internal_test.gopkg/cache/cdc_url_desync_prevention_internal_test.gopkg/cache/phantom_recovery_test.go
Address CodeRabbit review on #1380: the two CDC pull-path narinfo tests discarded the GetNarInfo error (_, _ =), which could hide a failure after partial DB writes. Verified empirically that GetNarInfo returns nil here (prefetch-disabled still fetches + persists the narinfo), so capture the error and require.NoError before asserting the persisted row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Documentation Updates 1 document(s) were updated by changes in this PR: NAR File Download and StreamingView Changes@@ -109,17 +109,17 @@
2. **Finished-asset detection**: Checks whether the NAR is fully materialized—either a whole file in the store OR a fully-chunked nar_file (`total_chunks > 0`). This check uses `hasFinishedNar` (not `isServable`), which deliberately excludes actively-chunking NARs (`total_chunks = 0` with live chunker). If the asset is finished, returns a completed "served-by-peer" state so `GetNar` serves it directly. An actively-chunking NAR is NOT treated as finished and falls through to states (3) or (4) to avoid routing to chunk-based serving, which would 404 a compressed request.
-3. **Actively-chunking with staging**: When the holder is alive and actively chunking a NAR (`total_chunks = 0`, `nar_file` exists, chunker live), AND staging is enabled and parts are available (`stagingServeReady`), the waiter serves from staging parts instead of routing to chunk-based serving. For an eager-CDC NAR being chunked, the staged bytes are uncompressed (the temp is decompressed for chunking), so uncompressed (`none`) requests are served the complete NAR from staging, while compressed variant requests (e.g. `.nar.xz`) return not-found so the client falls back to an upstream that has the original compressed file.
+3. **Actively-chunking with staging**: When the holder is alive and actively chunking a NAR (`total_chunks = 0`, `nar_file` exists, chunker live), AND staging is enabled and parts are available (`stagingServeReady`), the waiter serves from staging parts instead of routing to chunk-based serving. For an eager-CDC NAR being chunked, the staged bytes are uncompressed (the temp is decompressed for chunking), so uncompressed (`none`) requests are served the complete NAR from staging. Under eager CDC, clients request `.nar` due to the predictive narinfo normalization (see "GetNarInfo Narinfo Normalization"), avoiding compressed-variant requests that would return not-found.
4. **Actively-chunking without staging**: When the holder is alive and actively chunking, but no staging parts are available, returns a completed "served-by-peer" state so `GetNar` routes to progressive chunk streaming (`streamProgressiveChunks`). This preserves the no-staging CDC path but only serves decompressed data (compressed requests would 404 in this scenario without staging).
**Streaming from staging parts**: When `GetNar` receives a staging directive, it calls `serveNarFromStaging` to stream the NAR from staging part-objects rather than storage. The implementation creates a `stagingPartReader` that tails part-objects in order, polling `staging_state.parts_available` every 200ms and blocking until each part is durably available.
-**Eager-CDC compressed request handling**: For an eager-CDC NAR being chunked, the staged bytes are uncompressed (the temp is decompressed for chunking). Uncompressed (`none`) requests are served the complete NAR from staging. Compressed variant requests (e.g. `.nar.xz`) cannot be reconstructed from uncompressed staged bytes (ncps has no NAR compressor, and a re-compressed file would not match the narinfo `FileHash`/`FileSize`), so the staging serve returns not-found and the client falls back to an upstream that has the original compressed file—never the uncompressed bytes mislabeled as the requested compression.
+**Eager-CDC cross-pod request handling**: Under eager CDC, the predictive narinfo normalization (documented in "GetNarInfo Narinfo Normalization") steers cross-pod readers to request the uncompressed `.nar` variant during the chunking window. The uncompressed request is satisfied from staging (or progressive chunks) per the existing in-flight staging mechanism. The compressed-variant upstream fallback (e.g., `.nar.xz` returning not-found) remains only as a defensive backstop for directly-constructed compressed requests, such as a client holding a stale `xz` narinfo. Under eager CDC, the common cross-pod path does not exercise the compressed fallback because clients request `.nar`.
**Precedence during the chunking window**: When `total_chunks == 0` (actively chunking state), `getNarFromChunks` first checks for in-flight staging parts if staging is enabled. If staging parts are available, it serves the complete NAR from them instead of entering the progressive chunk streaming path (`streamProgressiveChunks`). This precedence improves reliability: staged whole-NAR parts are a complete, ordered representation, so serving from them avoids the fragile progressive reassembly that can truncate. This precedence is strictly scoped to the chunking window—when `total_chunks > 0` (steady-state chunk serving), the new branch is not reached and behavior is unchanged. When serving from staging in this path, the `TransparentZstd` flag is cleared because `serveNarFromStaging` sets the compression based on what's actually in staging. If no staging parts are available, the system falls back to the progressive-chunk path as before.
-**Transcoding support**: The reader decompresses staged bytes when the client requests uncompressed data—for example, staged compressed bytes served as uncompressed—at parity with the same-pod streaming path. A request for a compressed variant backed only by uncompressed staged bytes (the eager-CDC chunking window) returns not-found so the client falls back to upstream. The returned `narURL.Compression` advertises the compression actually served.
+**Transcoding support**: The reader decompresses staged bytes when the client requests uncompressed data—for example, staged compressed bytes served as uncompressed—at parity with the same-pod streaming path. A request for a compressed variant backed only by uncompressed staged bytes (the eager-CDC chunking window) returns not-found so the client falls back to upstream; however, under eager CDC, clients request `.nar` due to the predictive narinfo normalization and do not request compressed variants. The returned `narURL.Compression` advertises the compression actually served.
**Clean EOF semantics**: The reader emits a clean `io.EOF` only when `status = complete` and all parts have been consumed. If the producer stalls (stops advancing `parts_available` without setting the complete marker), the reader surfaces `errStagingStall` after the per-part wait timeout. If the `staging_state` row disappears mid-stream (reset by a takeover), the reader surfaces `errStagingReset`. Context cancellation is propagated immediately. This ensures a truncated NAR is never delivered as a clean EOF.
@@ -380,19 +380,26 @@
Chunks are always stored uncompressed. If a client requests a compressed NAR (e.g., `.nar.xz`) but only chunks exist in storage (the whole-file has been deleted), `GetNar` returns `ErrNotFound` to allow the client to fall back to an upstream cache that still has the original compressed file. This prevents "input compression not recognized" errors that would occur if the system served raw uncompressed chunk data to a client expecting compressed data.
-#### GetNarInfo In-Memory Normalization
-
-When CDC is enabled, narinfos are persisted to the database with their truthful upstream compression (e.g., `Compression: xz`) and URL until CDC chunking actually completes. The `pullNarInfo` function does NOT normalize narinfos to `compression=none` at store time, regardless of whether lazy chunking is enabled or not. This prevents a narinfo/storage desync where the database advertises `nar/<hash>.nar` (compression=none) while the NAR is still stored under its original compression (e.g., xz), causing 404 errors and `nix copy` reference check failures.
-
-The normalization behavior:
-- **Store time**: The narinfo is persisted with its truthful upstream compression and URL. The `pullNarInfo` function only normalizes `Compression:none` upstreams (e.g., Harmonia's zstd-stored, none-served path); CDC narinfos are never normalized at store time.
-- **Serve time**: `GetNarInfo` normalizes the narinfo in-memory via `maybeCDCNormalizeNarInfoURL` before returning to clients, but only once the NAR is genuinely chunked (`HasNarInChunks` returns true). This rewrites `Compression`, `URL`, `FileHash`, and `FileSize` in memory only—the database record is not modified synchronously.
-- Normalizes the URL hash before the `HasNarInChunks` lookup to handle nix-serve-style prefixed hashes (e.g., `abc-hash` → `hash`) matching nar_file rows correctly
+#### GetNarInfo Narinfo Normalization
+
+When CDC is enabled, the normalization behavior differs between eager and lazy CDC modes:
+
+**Eager CDC** (default: lazy chunking disabled):
+- **Store time**: Narinfos fetched from upstream on the pull path are normalized to `Compression: none` before being persisted in the database, mirroring the existing upload-path behavior (`PutNarInfo`). The URL is rewritten to `nar/<hash>.nar`, and `FileHash` and `FileSize` are set to null. This predictive normalization ensures clients always request the uncompressed `.nar` variant, because the durable form under eager CDC is uncompressed chunks—ncps has no NAR compressor, and a `.nar.xz` request cannot be reliably served during the chunking window.
+- **Serve time**: `GetNarInfo` normalizes the narinfo in-memory via `maybeCDCNormalizeNarInfoURL` unconditionally (regardless of whether chunks exist yet), serving as a backstop for legacy database rows that predate store-time normalization. This rewrites `Compression`, `URL`, `FileHash`, and `FileSize` in memory only—the database record is not modified synchronously.
+
+**Lazy CDC** (lazy chunking enabled):
+- **Store time**: Narinfos are persisted with their truthful upstream compression (e.g., `Compression: xz`) and URL until CDC chunking actually completes. The `pullNarInfo` function does NOT normalize lazy-CDC narinfos at store time, preventing a narinfo/storage desync where the database advertises `nar/<hash>.nar` while the NAR is still stored under its original compression.
+- **Serve time**: `GetNarInfo` normalizes the narinfo in-memory only once the NAR is genuinely chunked (`HasNarInChunks` returns true). This rewrites `Compression`, `URL`, `FileHash`, and `FileSize` in memory only—the database record is not modified synchronously.
+
+The normalization mechanism:
+- Normalizes the URL hash before any `HasNarInChunks` lookup to handle nix-serve-style prefixed hashes (e.g., `abc-hash` → `hash`) matching nar_file rows correctly
- Skips normalization entirely when CDC is disabled
-
-This ensures that the persisted narinfo always matches actual storage state:
-- If chunking hasn't completed (or never completes due to a crash/restart), the narinfo truthfully advertises its original compression (e.g., xz) and URL
-- Once chunking completes, serve-time normalization presents `compression=none` to clients
+- Non-CDC narinfos (e.g., Harmonia's zstd-stored, none-served path) are normalized at store time regardless of CDC mode
+
+This ensures that the advertised narinfo always matches the servable NAR representation:
+- **Eager CDC**: Clients always request `.nar`, which is satisfied from staging or in-flight chunks during the chunking window, or re-downloaded if bytes are not yet materialized (never a terminal 404)
+- **Lazy CDC**: If chunking hasn't completed (or never completes due to a crash/restart), the narinfo truthfully advertises its original compression (e.g., xz) and URL; once chunking completes, serve-time normalization presents `compression=none` to clients
- The narinfo never advertises chunks that don't exist yet, preventing 404 errors during background chunking or after process restarts
**Delayed Deletion:** |
Summary
Under eager CDC the durable form of a NAR is its uncompressed chunk sequence, and ncps has no NAR compressor — so a
.nar.xzrequest during the chunking window could not be served correctly (404→upstream fallback at best, decompressed-bytes-mislabeled-as-xz corruption at worst). The narinfo advertised the upstreamxzcompression until the NAR was genuinely chunked, so clients requested.nar.xzand hit this.This change advertises
Compression: none/nar/<hash>.narfor eager-CDC NARs consistently and predictively — at narinfo store time on the pull path and at serve time — so clients always request the uncompressed.narand never.nar.xz, eliminating the corruption at its source. It brings the pull path into parity withPutNarInfo, which already normalizes CDC narinfos tononeon the upload path.How
isEagerCDC()gate — CDC enabled AND chunk store present AND lazy chunking disabled. Lazy CDC retains the wholexzfile and keeps serving.nar.xz, so it is deliberately excluded.FileHash/FileSize(placed after the opaque case, so cachix handling is untouched).maybeCDCNormalizeNarInfoURLnormalizes unconditionally under eager CDC (backstop for legacyxzrows); lazy/drain keep theHasNarInChunksgate.Safety
A
.narrequest for not-yet-materialized bytes re-downloads vialookupPreferredUpstreamURL(re-fetch the upstream narinfo → recover thexzURL → decompress), never a terminal 404 — confirmed by the updated phantom-recovery test. This reverses the earlier "persist truthful xz until chunked" stance (the former Fix B guard), now safe because ofnarServabilityre-download + in-flight staging. Scoped strictly to eager; lazy CDC is unchanged.Test plan
task fmt,task lint(0 issues),task test(full unit suite green)isEagerCDC, store-time eager→none + lazy→xz, serve-time legacy-xz backstop, cold-client, CDC-disabled guard; reconciled the Fix B guard, phantom-recovery, and not-chunked tests to the new policy✓ eager CDC narinfo advertises Compression: none, plus drain/restart/fsck (no regression)--window chunking, 2 replicas, redis): narinfo-noneasserts pass and all cross-pod readers serve byte-correct content (this window served corrupt bytes before this lineage)tasks.md§9): stale-xzdefensive e2e variant, cross-pod lazy coverage, k8s compression assertionNotes
openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/); main specscdc-chunkingandinflight-nar-stagingupdated.