Skip to content

repository - going away from transactions, log, refcounts? #7377

@ThomasWaldmann

Description

@ThomasWaldmann

current state (current master branch, borg 1.x, borg 0.x, attic)

A borg repository is primarily a key/value store (with some aux functions).

The key is the chunk id (== MAC(plaintext)), the value is the compressed/encrypted/authenticated data.

borg uses transactions and a LOG when writing to the repo:

  • start of transaction (usually triggered by PUT/DEL)
  • writes more objects by appending PUT entries to the log
  • deletes objects by appending DEL entries to the log
  • commits (appends a COMMIT entry to the log)
  • end of transaction (S: saves repo index and hints, C: saves chunks index and files cache)

LOG means that new stuff is always appended at the end of the last/current segment file. In general, old segment files are never modified in place.

borg compact defrags non-compact segment files:

  • a segment file contains PUTs, DELs, COMMITs
  • if a PUT(id) is later deleted by a DEL(id), it creates a logical hole in a segment file (that object is not used any more), making it non-compact
  • compaction / defragging works by reading all still-needed objects from an old segment file and appending them to a new segment file. after that is finished, the old segment file is deleted (and that frees disk space because the new segment file is smaller).

advantages of this approach

  • transactions and append-only log are a very safe approaches (even if stuff crashes it usually can roll back to previous state and be fine again)
  • segment files are medium size files: not too large, not too small, not too many
    • works well even with not very scalable filesystems
    • has little overhead due to fs block / cluster size
    • can be copied or deleted rather quickly (not many fs objects)

disadvantages of this approach

  • borg compact can cause lots of I/O when shuffling objects from old non-compact segments to new compact segments
  • borg compact needs some space on the fs to be able to work. bad if your fs is 100% full...
  • compaction code is rather complex, same for transaction management
  • to quickly access objects, the repository needs an index mapping id -> (segment, offset, flags)
  • borg currently loads the repo index (hashtable) into memory. RAM usage is about 44b * object_count + free space in hashtable. if you have a lot of files and/or a lot of data volume, repo index can need GBs of RAM.
  • to implement this, some special borg code is needed with access to the repo filesystem
  • hard to work like this without locking the repository against concurrent access.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions