Skip to content

fix(sidecar): preserve duplicate blob paths#21

Merged
pedronauck merged 4 commits into
mainfrom
codex/fix-duplicate-sidecar-blobs
May 12, 2026
Merged

fix(sidecar): preserve duplicate blob paths#21
pedronauck merged 4 commits into
mainfrom
codex/fix-duplicate-sidecar-blobs

Conversation

@marcioaltoe
Copy link
Copy Markdown
Member

@marcioaltoe marcioaltoe commented May 11, 2026

Summary

  • Preserve every file path that points at the same sidecar object during snapshot loading.
  • Hydrate duplicate-content files with the same content, size, SHA-256 digest, and sidecar blob metadata.
  • Add a regression test covering two SPEC.md files with identical content.

Root Cause

sidecarSnapshot keyed sidecar object paths as map[string]string, so when multiple managed files had identical content and Git stored them as the same blob object, the later path overwrote the earlier one. FSCK then hydrated metadata for only one path and could report duplicate-content specs incorrectly.

Validation

  • make verify passed locally.
  • Result included 0 issues and 193 tests passing.

Summary by CodeRabbit

  • Bug Fixes

    • Ensure all tracked files with identical content in a namespace are populated with content and metadata (no longer only the last match).
  • Tests

    • Added a test verifying deduplication and consistency of persisted hydration/FSCK metadata for duplicate files.
  • Chores

    • Added an install target and corresponding build task to streamline building and installing the CLI binary.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e314f064-9b48-4d51-8fad-953dbf86a145

📥 Commits

Reviewing files that changed from the base of the PR and between 96c9013 and 7de3cf2.

📒 Files selected for processing (1)
  • magefile.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • magefile.go

Walkthrough

The PR records multiple paths per git blob in sidecar snapshots and populates all matching snapshot entries when blobs are read; it adds a test verifying FSCK with duplicate blobs and introduces Makefile/mage install targets.

Changes

Blob Deduplication Support

Layer / File(s) Summary
Data Model Change: Multiple Paths per Blob
internal/sidecar/reconcile.go
objectPaths refactored from map[string]string to map[string][]string; git ls-tree parsing appends each matched relative path to the blob's path slice.
Blob Content Population: Populate All Paths
internal/sidecar/reconcile.go
When git cat-file --batch returns blob content, the code iterates over all paths for that blob and writes content, size, and SHA256 to every corresponding files[rel] snapshot entry.
Test Validation: FSCK with Duplicate Blobs
internal/sidecar/service_test.go
New test creates two identical SPEC.md files, runs Sync and FSCK, and asserts the hydration journal records the same SHA256 and SidecarBlob for both files.
Makefile: .PHONY Update
Makefile
Adds install to the .PHONY targets.
Makefile: install Target
Makefile
Introduces install target that depends on build and invokes the Mage install task.
Mage: Install Target
magefile.go
Adds Install() mage target that runs go install ./cmd/skeeper with existing build LDFlags and logs progress.
Mage Helpers: goInstallPath, goEnv
magefile.go
Adds goInstallPath() to resolve GOBIN/GOPATH install path and goEnv(key) helper to read go env values.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(sidecar): preserve duplicate blob paths' accurately describes the main change: fixing how duplicate blob paths are handled in the sidecar reconciliation logic by preserving all file paths instead of overwriting them.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-duplicate-sidecar-blobs

Comment @coderabbitai help to get the list of available commands and usage tips.

@marcioaltoe marcioaltoe changed the title [codex] fix duplicate sidecar blobs fix(sidecar): preserve duplicate blob paths May 11, 2026
@marcioaltoe marcioaltoe marked this pull request as ready for review May 11, 2026 11:23
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/sidecar/reconcile.go (1)

491-497: ⚡ Quick win

Deduplicate blob IDs before git cat-file --batch to avoid repeated payload reads.

Line 491 appends every object ID (including duplicates). With many identical files, this can multiply batch I/O and memory for no extra value. Request each object once, then fan out by objectPaths.

♻️ Proposed refactor
 	files := map[string]reconcile.SnapshotFile{}
 	objects := make([]string, 0)
+	seenObjects := map[string]struct{}{}
 	objectPaths := map[string][]string{}
@@
 		object := fields[2]
 		files[rel] = reconcile.SnapshotFile{Path: rel, Size: size, Blob: object}
-		objects = append(objects, object)
+		if _, seen := seenObjects[object]; !seen {
+			objects = append(objects, object)
+			seenObjects[object] = struct{}{}
+		}
 		objectPaths[object] = append(objectPaths[object], rel)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/sidecar/reconcile.go` around lines 491 - 497, The code currently
appends every object ID (including duplicates) to the slice named objects before
calling s.catFileBatch, causing repeated reads; change the logic to deduplicate
object IDs (e.g., build a unique set/map of object IDs) so s.catFileBatch is
invoked only with one entry per unique object, then use the existing objectPaths
map to fan out the returned contents to all paths associated with each object;
ensure you still populate objectPaths[object] with rel for every file and call
s.catFileBatch(ctx, sidecarDir, uniqueObjects).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/sidecar/service_test.go`:
- Around line 222-264: The test function
TestServiceFSCKHandlesDuplicateSidecarBlobs must be converted to use the
repository's subtest pattern: wrap the existing test body in a t.Run("Should
...") closure (e.g., t.Run("Should handle duplicate sidecar blobs", func(t
*testing.T) { ... })) and move all current setup and assertions (setGitIdentity,
newMainRepo, newBareRepo, cfg/bootstrapRepo, writes, service.Sync, service.FSCK,
journal reads and assertions) inside that closure so the top-level
TestServiceFSCKHandlesDuplicateSidecarBlobs becomes a thin wrapper that calls
the subtest; keep the same assertions and references to sidecar.New,
service.Sync, service.FSCK, and state.HydrationJournal.

---

Nitpick comments:
In `@internal/sidecar/reconcile.go`:
- Around line 491-497: The code currently appends every object ID (including
duplicates) to the slice named objects before calling s.catFileBatch, causing
repeated reads; change the logic to deduplicate object IDs (e.g., build a unique
set/map of object IDs) so s.catFileBatch is invoked only with one entry per
unique object, then use the existing objectPaths map to fan out the returned
contents to all paths associated with each object; ensure you still populate
objectPaths[object] with rel for every file and call s.catFileBatch(ctx,
sidecarDir, uniqueObjects).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 514ee61d-8b5f-4745-a82e-e941eb5ed0b5

📥 Commits

Reviewing files that changed from the base of the PR and between a10b859 and 4f387d0.

📒 Files selected for processing (2)
  • internal/sidecar/reconcile.go
  • internal/sidecar/service_test.go

Comment thread internal/sidecar/service_test.go
@pedronauck pedronauck merged commit 6152887 into main May 12, 2026
7 checks passed
@pedronauck pedronauck deleted the codex/fix-duplicate-sidecar-blobs branch May 12, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants