Skip to content

repoobj, repository: add chunk_id to header, introduce packs/ namespace#9692

Open
mr-raj12 wants to merge 7 commits into
borgbackup:masterfrom
mr-raj12:pack-files-step3-pack-id
Open

repoobj, repository: add chunk_id to header, introduce packs/ namespace#9692
mr-raj12 wants to merge 7 commits into
borgbackup:masterfrom
mr-raj12:pack-files-step3-pack-id

Conversation

@mr-raj12
Copy link
Copy Markdown
Contributor

@mr-raj12 mr-raj12 commented May 31, 2026

Description

RepoObj blob header grows from 17 bytes (<8sBII) to 49 bytes (<8sB32sII): a 32-byte chunk_id field between the version byte and the size fields. REPOOBJ_HEADER_SIZE = 49 names the fixed part.

chunk_id is the ID hash of the plaintext data. Putting it in the unencrypted header lets a forward scanner rebuild the chunk_id -> location index without touching the key. It is also the additional authenticated data for AEAD, so a plaintext copy in the header is unavoidable -- recovering it from ciphertext requires the key.

Chunks move from data/ to packs/. The new namespace uses one directory level keyed on the first byte of the pack ID (256 subdirs 00/..ff/), versus two levels for data/. For the expected object count per pack, one level is enough.

For N=1 packs (one chunk per pack file), pack_id == chunk_id. The pack filename is the chunk ID (id_hash of the plaintext chunk content).

Changes:

  • repoobj.py: extend struct from <8sBII to <8sB32sII, add chunk_id to ObjHeader, add REPOOBJ_HEADER_SIZE = 49, pass id into ObjHeader in format().
  • repository.py: add packs/ to ns_config with {"levels": [1]}, add packs to permission maps, replace all data/ keys with packs/ in get()/put()/delete()/list()/check(), introduce pack_id = id (N=1 invariant), wrap stored blobs in a BORGPACK header (13 bytes: magic + version + blob_len), bump repo version to 4.
  • repository_test.py: add chunk_id parameter to fchunk(), fix pchunk() slice to [3:5], update test_read_data() to pass H(0) as chunk_id.

refs #8572

Checklist

  • PR is against master (or maintenance branch if only applicable there)
  • New code has tests and docs where appropriate
  • Tests pass (run tox or the relevant test subset)
  • Commit messages are clean and reference related issues

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

❌ Patch coverage is 71.05263% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.81%. Comparing base (9d2dd1d) to head (b6c075d).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/borg/repository.py 68.75% 5 Missing and 5 partials ⚠️
src/borg/archiver/_common.py 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9692      +/-   ##
==========================================
- Coverage   83.88%   82.81%   -1.07%     
==========================================
  Files          93       93              
  Lines       15653    15658       +5     
  Branches     2351     2351              
==========================================
- Hits        13130    12967     -163     
- Misses       1789     1974     +185     
+ Partials      734      717      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
@ThomasWaldmann
Copy link
Copy Markdown
Member

"The pack filename is the chunk's SHA-256 hex ID."

That is never true.

For N=1 the pack filename (pack_id) is the chunk_id (id_hash(chunk content), usually not sha256).

For N>1 the pack filename (pack_id) shall be sha256(pack content).

Comment thread src/borg/archiver/_common.py Outdated
Comment thread src/borg/archiver/_common.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
@mr-raj12 mr-raj12 force-pushed the pack-files-step3-pack-id branch from 75eea8d to f1833e8 Compare June 1, 2026 13:05
Comment thread src/borg/archiver/check_cmd.py Outdated
@mr-raj12 mr-raj12 force-pushed the pack-files-step3-pack-id branch 2 times, most recently from 192a412 to 5e13bda Compare June 1, 2026 14:18
mr-raj12 added 7 commits June 1, 2026 19:49
…ckup#8572

Stores chunk_id unencrypted in the per-blob header so borg check can
rebuild the chunk_id -> pack location index without decryption. AEAD
uses chunk_id as additional data, making key-free recovery circular
without an explicit plaintext copy.

Header layout: OBJ_MAGIC(8) + version(1) + chunk_id(32) + meta_size(4)
+ data_size(4) = REPOOBJ_HEADER_SIZE = 49 bytes.
…orgbackup#8572

Introduces pack_id as the borgstore storage key (N=1: pack_id == chunk_id).
Chunks move from data/ to packs/ with single-level directory sharding (256
subdirs). check_object() validates the header chunk_id against the pack
filename. Adds packs/ to ns_config with levels=[1] and to the permissions
maps for no-delete and write-only modes.
…rgbackup#8572

Wrap each pack file in a 13-byte header (magic + version + blob_len) so
packs are self-identifying and the [len][blob] unit extends to N>1 without
a format revision. Bump version 3->4: packs/ and 49-byte ObjHeader are
incompatible with version-3 readers. Fix test_extra_chunks chunk_id mismatch.
_common.py had a hard-coded version check that only allowed v3.
Now that repository.py creates v4 repos, every archiver command
failed to open the repo. Extend the guard to (3, 4).

The --other-repo check (v1 or v3 for borg transfer source) is
intentionally left unchanged.
@mr-raj12 mr-raj12 force-pushed the pack-files-step3-pack-id branch from 5e13bda to b6c075d Compare June 1, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants