Skip to content

feat(cache): advertise eager-CDC narinfos as Compression: none predictively#1380

Merged
kalbasit merged 4 commits into
mainfrom
cdc-advertise-narinfo-none
Jun 9, 2026
Merged

feat(cache): advertise eager-CDC narinfos as Compression: none predictively#1380
kalbasit merged 4 commits into
mainfrom
cdc-advertise-narinfo-none

Conversation

@kalbasit

@kalbasit kalbasit commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Under eager CDC the durable form of a NAR is its uncompressed chunk sequence, and ncps has no NAR compressor — so a .nar.xz request during the chunking window could not be served correctly (404→upstream fallback at best, decompressed-bytes-mislabeled-as-xz corruption at worst). The narinfo advertised the upstream xz compression until the NAR was genuinely chunked, so clients requested .nar.xz and hit this.

This change advertises Compression: none / nar/<hash>.nar for eager-CDC NARs consistently and predictively — at narinfo store time on the pull path and at serve time — so clients always request the uncompressed .nar and never .nar.xz, eliminating the corruption at its source. It brings the pull path into parity with PutNarInfo, which already normalizes CDC narinfos to none on the upload path.

How

  • isEagerCDC() gate — CDC enabled AND chunk store present AND lazy chunking disabled. Lazy CDC retains the whole xz file and keeps serving .nar.xz, so it is deliberately excluded.
  • Store-time — a new eager-CDC case in the pull-path narinfo switch rewrites URL/Compression and nulls FileHash/FileSize (placed after the opaque case, so cachix handling is untouched).
  • Serve-timemaybeCDCNormalizeNarInfoURL normalizes unconditionally under eager CDC (backstop for legacy xz rows); lazy/drain keep the HasNarInChunks gate.

Safety

A .nar request for not-yet-materialized bytes re-downloads via lookupPreferredUpstreamURL (re-fetch the upstream narinfo → recover the xz URL → decompress), never a terminal 404 — confirmed by the updated phantom-recovery test. This reverses the earlier "persist truthful xz until chunked" stance (the former Fix B guard), now safe because of narServability re-download + in-flight staging. Scoped strictly to eager; lazy CDC is unchanged.

Test plan

  • task fmt, task lint (0 issues), task test (full unit suite green)
  • New unit tests: isEagerCDC, store-time eager→none + lazy→xz, serve-time legacy-xz backstop, cold-client, CDC-disabled guard; reconciled the Fix B guard, phantom-recovery, and not-chunked tests to the new policy
  • cdc-lifecycle e2e (sqlite/local): full pass — ✓ eager CDC narinfo advertises Compression: none, plus drain/restart/fsck (no regression)
  • contention e2e (--window chunking, 2 replicas, redis): narinfo-none asserts pass and all cross-pod readers serve byte-correct content (this window served corrupt bytes before this lineage)
  • Deferred to a follow-up (see archived tasks.md §9): stale-xz defensive e2e variant, cross-pod lazy coverage, k8s compression assertion

Notes

  • No schema, flags, or migrations. Forward-only, HA-safe mixed-version deploy.
  • OpenSpec change archived in-PR (openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/); main specs cdc-chunking and inflight-nar-staging updated.

kalbasit and others added 3 commits June 8, 2026 21:46
Advertise narinfo Compression: none predictively for eager-CDC NARs
so clients always request .nar and never .nar.xz, eliminating the
chunking-window compressed-request corruption at its source.

This commit lands the OpenSpec planning artifacts (proposal, design,
delta specs for cdc-chunking and inflight-nar-staging, and tasks).
The pull path is brought into parity with PutNarInfo, which already
normalizes CDC narinfos to none; scope is gated strictly to eager CDC
(isCDCEnabled && !GetCDCLazyChunkingEnabled) since lazy keeps the
whole xz file servable as .nar.xz.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tively

Under eager CDC the durable form of a NAR is its uncompressed chunk
sequence and ncps has no NAR compressor, so a .nar.xz request during the
chunking window could not be served correctly (404->upstream fallback at
best, decompressed-bytes-mislabeled-as-xz corruption at worst). The
narinfo advertised the upstream xz compression until the NAR was
genuinely chunked, so clients requested .nar.xz and hit this.

Advertise Compression: none / nar/<hash>.nar for eager-CDC NARs
consistently -- at narinfo store time on the pull path and at serve time
-- so clients always request the uncompressed .nar and never .nar.xz,
eliminating the corruption at its source. This brings the pull path into
parity with PutNarInfo, which already normalizes CDC narinfos to none on
the upload path.

How:
- isEagerCDC() gate: CDC enabled AND chunk store present AND lazy
  chunking disabled. Lazy CDC keeps the whole xz file servable as .nar.xz
  and is deliberately excluded.
- Store-time: a new eager-CDC case in the pull-path narinfo switch
  rewrites URL/Compression and nulls FileHash/FileSize (after the opaque
  case, so cachix handling is untouched).
- Serve-time: maybeCDCNormalizeNarInfoURL normalizes unconditionally
  under eager CDC (backstop for legacy xz rows); lazy/drain keep the
  HasNarInChunks gate.

Safety: a .nar request for not-yet-materialized bytes re-downloads via
lookupPreferredUpstreamURL (re-fetch upstream narinfo -> recover xz URL
-> decompress), never a terminal 404 -- verified by the updated
phantom-recovery test. This reverses the earlier "persist truthful xz
until chunked" stance (the former Fix-B guard), now safe because of
narServability re-download + in-flight staging.

Tests: new isEagerCDC + store-time + serve-time unit tests; the Fix-B
guard, phantom-recovery, and not-chunked tests updated to the new policy.
Verified end-to-end: cdc-lifecycle e2e full pass (eager->none) and the
contention e2e (cross-pod readers serve byte-correct .nar with narinfo
none). Harness assertions added to both e2e drivers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Apply the change's delta specs to the main specs and archive the
completed change:

- cdc-chunking: broaden the GetNarInfo normalization requirement to
  cover eager (predictive) vs lazy (HasNarInChunks-gated) modes, and add
  "Eager CDC MUST advertise narinfo Compression none predictively".
- inflight-nar-staging: add "Predictive narinfo none steers eager-CDC
  cross-pod readers to the uncompressed staging serve".

Tasks reconciled: all in-scope tasks done (production + unit tests +
cdc-lifecycle/contention e2e verification); additive harness scenarios
(stale-xz e2e variant, lazy cross-pod, k8s compression assert) moved to
a Deferred section for a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kalbasit

kalbasit commented Jun 9, 2026

Copy link
Copy Markdown
Owner Author

This change is part of the following stack:

Change managed by git-spice.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request go Pull requests that update go code labels Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cda2dd13-0e40-46d9-9eb6-d81fd9206828

📥 Commits

Reviewing files that changed from the base of the PR and between 89c9eca and 2ce0d27.

📒 Files selected for processing (1)
  • pkg/cache/cdc_url_desync_prevention_internal_test.go

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Fixed eager-CDC content delivery mismatch: clients now consistently request and receive uncompressed NARs during the eager chunking window, preventing 404s and mislabeled decompressed content. Lazy-CDC behavior unchanged.
  • Tests
    • Strengthened end-to-end and unit tests to assert advertised compression is "none" and that URLs point to uncompressed NARs under eager-CDC.
  • Documentation
    • Updated specs/design/docs describing predictive uncompressed advertisement and expected serve/recovery behavior.

Walkthrough

Under eager-CDC, narinfo is predictively normalized to advertise Compression: none with uncompressed .nar URLs at both pull-path store time and GetNarInfo serve time, preventing clients from requesting unavailable compressed variants during the chunking window. Lazy-CDC behavior remains unchanged.

Changes

Eager-CDC Predictive Narinfo Normalization

Layer / File(s) Summary
Design & Specification Documentation
openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/design.md, openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/proposal.md, openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/specs/cdc-chunking/spec.md, openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/specs/inflight-nar-staging/spec.md, openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/tasks.md, openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/.openspec.yaml, openspec/specs/cdc-chunking/spec.md, openspec/specs/inflight-nar-staging/spec.md
Design and formal specs define eager-CDC predictive normalization at store time (persist Compression: none, nar/<hash>.nar, clear FileHash/FileSize) and serve time (immediate in-memory normalization), gating to eager-CDC-only and specifying re-download routing for unmaterialized .nar requests and defensive fallback for stale xz narinfos.
Core Implementation: Predictive Normalization
pkg/cache/cache.go
Adds (*Cache).isEagerCDC() predicate and implements eager-CDC branches: pullNarInfo persists predictive Compression:none/.nar and maybeCDCNormalizeNarInfoURL skips the HasNarInChunks wait for eager CDC so Serve-time normalization occurs immediately.
Unit Tests: Eager vs Lazy Behavior
pkg/cache/cdc_advertise_narinfo_none_internal_test.go, pkg/cache/cache_test.go
New tests validate isEagerCDC() gating and eager vs lazy serve-time normalization behaviors (legacy xznone under eager, preserve xz under lazy, CDC-disabled no-op, cold-client none). cache_test.go replaces the not-chunked normalization test to assert eager normalization for placeholder nar_file.
Integration & Regression Tests
pkg/cache/cdc_url_desync_prevention_internal_test.go, pkg/cache/phantom_recovery_test.go, dev-scripts/test-cdc-lifecycle-e2e.py, dev-scripts/test-inflight-staging-contention-e2e.py
Pull-path tests refactor shared setup and assert persisted predictive .nar/Compression:none under eager CDC and .nar.xz/xz under lazy CDC. Phantom-recovery now verifies re-download+decompress serves full uncompressed content. Lifecycle and contention e2e scripts add deterministic assertions that eager-CDC serves Compression: none with .nar URL suffixes.
Changelog
CHANGELOG.md
Documents the fix: under eager-CDC, narinfo now advertises Compression: none and uncompressed .nar URLs at store and serve time, avoiding request/serve compression mismatches.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • kalbasit/ncps#1332: Modifies CDC narinfo normalization logic addressing narinfo vs storage desynchronization and related normalization/migration approaches.
  • kalbasit/ncps#1376: Adjusts CDC chunking-window routing/serve behavior around compressed requests to avoid 404s mid-chunking; touches similar cache/read-path code.
  • kalbasit/ncps#1334: Implements narinfo normalization to Compression:none/nar/<hash>.nar in a related code path (de-chunk flipping), overlapping on normalization semantics.

Suggested labels

merge-stack

Poem

🐰 A rabbit hops through compression trees,
where .nar now begs the gentle breeze—
Eager CDC sings a clearer tune,
chunks and narinfo hum in tune.
Fetches find the uncompressed prize,
and bytes arrive without disguise.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main change: predictively advertising eager-CDC narinfos with Compression: none instead of upstream compression.
Description check ✅ Passed The pull request description is comprehensive and directly related to the changeset, explaining the problem, solution, implementation, safety considerations, and test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements predictive Compression: none / .nar URL advertisement for eager-CDC NARs consistently on the pull path (at store time and serve time) before chunks exist. This ensures clients always request the uncompressed .nar instead of .nar.xz during the chunking window, preventing 404 fallbacks and silent corruption. The changes include the introduction of the isEagerCDC() helper, store-time and serve-time normalization logic in pkg/cache/cache.go, comprehensive unit and end-to-end tests, and detailed design specifications. There are no review comments provided, so I have no feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/cache/cdc_url_desync_prevention_internal_test.go`:
- Around line 115-117: The test currently discards the error returned by
c.GetNarInfo which can hide failures; update both occurrences where you call
reqCtx := withNarPrefetchDisabled(newContext()) followed by _, _ =
c.GetNarInfo(reqCtx, testdata.Nar1.NarInfoHash) to capture the result and error
(e.g. _, err := c.GetNarInfo(...)) and call require.NoError(t, err) immediately
before asserting persisted DB rows; apply the same change for the second
occurrence (the call around the other test block).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fa706bd7-c4b8-4a3b-a718-6a528a241f06

📥 Commits

Reviewing files that changed from the base of the PR and between 14e9cd2 and 89c9eca.

📒 Files selected for processing (16)
  • CHANGELOG.md
  • dev-scripts/test-cdc-lifecycle-e2e.py
  • dev-scripts/test-inflight-staging-contention-e2e.py
  • openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/.openspec.yaml
  • openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/design.md
  • openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/proposal.md
  • openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/specs/cdc-chunking/spec.md
  • openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/specs/inflight-nar-staging/spec.md
  • openspec/changes/archive/2026-06-08-cdc-advertise-narinfo-none/tasks.md
  • openspec/specs/cdc-chunking/spec.md
  • openspec/specs/inflight-nar-staging/spec.md
  • pkg/cache/cache.go
  • pkg/cache/cache_test.go
  • pkg/cache/cdc_advertise_narinfo_none_internal_test.go
  • pkg/cache/cdc_url_desync_prevention_internal_test.go
  • pkg/cache/phantom_recovery_test.go

Comment thread pkg/cache/cdc_url_desync_prevention_internal_test.go
Address CodeRabbit review on #1380: the two CDC pull-path narinfo tests
discarded the GetNarInfo error (_, _ =), which could hide a failure after
partial DB writes. Verified empirically that GetNarInfo returns nil here
(prefetch-disabled still fetches + persists the narinfo), so capture the
error and require.NoError before asserting the persisted row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kalbasit kalbasit enabled auto-merge (squash) June 9, 2026 06:35
@kalbasit kalbasit merged commit 3c11a3b into main Jun 9, 2026
72 of 87 checks passed
@kalbasit kalbasit deleted the cdc-advertise-narinfo-none branch June 9, 2026 06:57
@dosubot

dosubot Bot commented Jun 9, 2026

Copy link
Copy Markdown

Documentation Updates

1 document(s) were updated by changes in this PR:

NAR File Download and Streaming
View Changes
@@ -109,17 +109,17 @@
 
 2. **Finished-asset detection**: Checks whether the NAR is fully materialized—either a whole file in the store OR a fully-chunked nar_file (`total_chunks > 0`). This check uses `hasFinishedNar` (not `isServable`), which deliberately excludes actively-chunking NARs (`total_chunks = 0` with live chunker). If the asset is finished, returns a completed "served-by-peer" state so `GetNar` serves it directly. An actively-chunking NAR is NOT treated as finished and falls through to states (3) or (4) to avoid routing to chunk-based serving, which would 404 a compressed request.
 
-3. **Actively-chunking with staging**: When the holder is alive and actively chunking a NAR (`total_chunks = 0`, `nar_file` exists, chunker live), AND staging is enabled and parts are available (`stagingServeReady`), the waiter serves from staging parts instead of routing to chunk-based serving. For an eager-CDC NAR being chunked, the staged bytes are uncompressed (the temp is decompressed for chunking), so uncompressed (`none`) requests are served the complete NAR from staging, while compressed variant requests (e.g. `.nar.xz`) return not-found so the client falls back to an upstream that has the original compressed file.
+3. **Actively-chunking with staging**: When the holder is alive and actively chunking a NAR (`total_chunks = 0`, `nar_file` exists, chunker live), AND staging is enabled and parts are available (`stagingServeReady`), the waiter serves from staging parts instead of routing to chunk-based serving. For an eager-CDC NAR being chunked, the staged bytes are uncompressed (the temp is decompressed for chunking), so uncompressed (`none`) requests are served the complete NAR from staging. Under eager CDC, clients request `.nar` due to the predictive narinfo normalization (see "GetNarInfo Narinfo Normalization"), avoiding compressed-variant requests that would return not-found.
 
 4. **Actively-chunking without staging**: When the holder is alive and actively chunking, but no staging parts are available, returns a completed "served-by-peer" state so `GetNar` routes to progressive chunk streaming (`streamProgressiveChunks`). This preserves the no-staging CDC path but only serves decompressed data (compressed requests would 404 in this scenario without staging).
 
 **Streaming from staging parts**: When `GetNar` receives a staging directive, it calls `serveNarFromStaging` to stream the NAR from staging part-objects rather than storage. The implementation creates a `stagingPartReader` that tails part-objects in order, polling `staging_state.parts_available` every 200ms and blocking until each part is durably available.
 
-**Eager-CDC compressed request handling**: For an eager-CDC NAR being chunked, the staged bytes are uncompressed (the temp is decompressed for chunking). Uncompressed (`none`) requests are served the complete NAR from staging. Compressed variant requests (e.g. `.nar.xz`) cannot be reconstructed from uncompressed staged bytes (ncps has no NAR compressor, and a re-compressed file would not match the narinfo `FileHash`/`FileSize`), so the staging serve returns not-found and the client falls back to an upstream that has the original compressed file—never the uncompressed bytes mislabeled as the requested compression.
+**Eager-CDC cross-pod request handling**: Under eager CDC, the predictive narinfo normalization (documented in "GetNarInfo Narinfo Normalization") steers cross-pod readers to request the uncompressed `.nar` variant during the chunking window. The uncompressed request is satisfied from staging (or progressive chunks) per the existing in-flight staging mechanism. The compressed-variant upstream fallback (e.g., `.nar.xz` returning not-found) remains only as a defensive backstop for directly-constructed compressed requests, such as a client holding a stale `xz` narinfo. Under eager CDC, the common cross-pod path does not exercise the compressed fallback because clients request `.nar`.
 
 **Precedence during the chunking window**: When `total_chunks == 0` (actively chunking state), `getNarFromChunks` first checks for in-flight staging parts if staging is enabled. If staging parts are available, it serves the complete NAR from them instead of entering the progressive chunk streaming path (`streamProgressiveChunks`). This precedence improves reliability: staged whole-NAR parts are a complete, ordered representation, so serving from them avoids the fragile progressive reassembly that can truncate. This precedence is strictly scoped to the chunking window—when `total_chunks > 0` (steady-state chunk serving), the new branch is not reached and behavior is unchanged. When serving from staging in this path, the `TransparentZstd` flag is cleared because `serveNarFromStaging` sets the compression based on what's actually in staging. If no staging parts are available, the system falls back to the progressive-chunk path as before.
 
-**Transcoding support**: The reader decompresses staged bytes when the client requests uncompressed data—for example, staged compressed bytes served as uncompressed—at parity with the same-pod streaming path. A request for a compressed variant backed only by uncompressed staged bytes (the eager-CDC chunking window) returns not-found so the client falls back to upstream. The returned `narURL.Compression` advertises the compression actually served.
+**Transcoding support**: The reader decompresses staged bytes when the client requests uncompressed data—for example, staged compressed bytes served as uncompressed—at parity with the same-pod streaming path. A request for a compressed variant backed only by uncompressed staged bytes (the eager-CDC chunking window) returns not-found so the client falls back to upstream; however, under eager CDC, clients request `.nar` due to the predictive narinfo normalization and do not request compressed variants. The returned `narURL.Compression` advertises the compression actually served.
 
 **Clean EOF semantics**: The reader emits a clean `io.EOF` only when `status = complete` and all parts have been consumed. If the producer stalls (stops advancing `parts_available` without setting the complete marker), the reader surfaces `errStagingStall` after the per-part wait timeout. If the `staging_state` row disappears mid-stream (reset by a takeover), the reader surfaces `errStagingReset`. Context cancellation is propagated immediately. This ensures a truncated NAR is never delivered as a clean EOF.
 
@@ -380,19 +380,26 @@
 
 Chunks are always stored uncompressed. If a client requests a compressed NAR (e.g., `.nar.xz`) but only chunks exist in storage (the whole-file has been deleted), `GetNar` returns `ErrNotFound` to allow the client to fall back to an upstream cache that still has the original compressed file. This prevents "input compression not recognized" errors that would occur if the system served raw uncompressed chunk data to a client expecting compressed data.
 
-#### GetNarInfo In-Memory Normalization
-
-When CDC is enabled, narinfos are persisted to the database with their truthful upstream compression (e.g., `Compression: xz`) and URL until CDC chunking actually completes. The `pullNarInfo` function does NOT normalize narinfos to `compression=none` at store time, regardless of whether lazy chunking is enabled or not. This prevents a narinfo/storage desync where the database advertises `nar/<hash>.nar` (compression=none) while the NAR is still stored under its original compression (e.g., xz), causing 404 errors and `nix copy` reference check failures.
-
-The normalization behavior:
-- **Store time**: The narinfo is persisted with its truthful upstream compression and URL. The `pullNarInfo` function only normalizes `Compression:none` upstreams (e.g., Harmonia's zstd-stored, none-served path); CDC narinfos are never normalized at store time.
-- **Serve time**: `GetNarInfo` normalizes the narinfo in-memory via `maybeCDCNormalizeNarInfoURL` before returning to clients, but only once the NAR is genuinely chunked (`HasNarInChunks` returns true). This rewrites `Compression`, `URL`, `FileHash`, and `FileSize` in memory only—the database record is not modified synchronously.
-- Normalizes the URL hash before the `HasNarInChunks` lookup to handle nix-serve-style prefixed hashes (e.g., `abc-hash` → `hash`) matching nar_file rows correctly
+#### GetNarInfo Narinfo Normalization
+
+When CDC is enabled, the normalization behavior differs between eager and lazy CDC modes:
+
+**Eager CDC** (default: lazy chunking disabled):
+- **Store time**: Narinfos fetched from upstream on the pull path are normalized to `Compression: none` before being persisted in the database, mirroring the existing upload-path behavior (`PutNarInfo`). The URL is rewritten to `nar/<hash>.nar`, and `FileHash` and `FileSize` are set to null. This predictive normalization ensures clients always request the uncompressed `.nar` variant, because the durable form under eager CDC is uncompressed chunks—ncps has no NAR compressor, and a `.nar.xz` request cannot be reliably served during the chunking window.
+- **Serve time**: `GetNarInfo` normalizes the narinfo in-memory via `maybeCDCNormalizeNarInfoURL` unconditionally (regardless of whether chunks exist yet), serving as a backstop for legacy database rows that predate store-time normalization. This rewrites `Compression`, `URL`, `FileHash`, and `FileSize` in memory only—the database record is not modified synchronously.
+
+**Lazy CDC** (lazy chunking enabled):
+- **Store time**: Narinfos are persisted with their truthful upstream compression (e.g., `Compression: xz`) and URL until CDC chunking actually completes. The `pullNarInfo` function does NOT normalize lazy-CDC narinfos at store time, preventing a narinfo/storage desync where the database advertises `nar/<hash>.nar` while the NAR is still stored under its original compression.
+- **Serve time**: `GetNarInfo` normalizes the narinfo in-memory only once the NAR is genuinely chunked (`HasNarInChunks` returns true). This rewrites `Compression`, `URL`, `FileHash`, and `FileSize` in memory only—the database record is not modified synchronously.
+
+The normalization mechanism:
+- Normalizes the URL hash before any `HasNarInChunks` lookup to handle nix-serve-style prefixed hashes (e.g., `abc-hash` → `hash`) matching nar_file rows correctly
 - Skips normalization entirely when CDC is disabled
-
-This ensures that the persisted narinfo always matches actual storage state:
-- If chunking hasn't completed (or never completes due to a crash/restart), the narinfo truthfully advertises its original compression (e.g., xz) and URL
-- Once chunking completes, serve-time normalization presents `compression=none` to clients
+- Non-CDC narinfos (e.g., Harmonia's zstd-stored, none-served path) are normalized at store time regardless of CDC mode
+
+This ensures that the advertised narinfo always matches the servable NAR representation:
+- **Eager CDC**: Clients always request `.nar`, which is satisfied from staging or in-flight chunks during the chunking window, or re-downloaded if bytes are not yet materialized (never a terminal 404)
+- **Lazy CDC**: If chunking hasn't completed (or never completes due to a crash/restart), the narinfo truthfully advertises its original compression (e.g., xz) and URL; once chunking completes, serve-time normalization presents `compression=none` to clients
 - The narinfo never advertises chunks that don't exist yet, preventing 404 errors during background chunking or after process restarts
 
 **Delayed Deletion:**

How did I do? Any feedback?  Join Discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant