From 5d594ee85b31587f64d912c2ce0d665ba666e184 Mon Sep 17 00:00:00 2001 From: Justin Chung <20733699+justin13888@users.noreply.github.com> Date: Sun, 3 May 2026 17:46:49 +0800 Subject: [PATCH 1/4] doc: remove td from AGENTS.md --- AGENTS.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 679cd7a..5173c9c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,7 +1 @@ # Pixles - -## MANDATORY: Use td for Task Management - -You must run td usage --new-session at conversation start (or after /clear) to see current work. -Use td usage -q for subsequent reads. - From eec0d6bb0d24038a3ef1ab1a65d6d61e6913be4b Mon Sep 17 00:00:00 2001 From: Justin Chung <20733699+justin13888@users.noreply.github.com> Date: Sun, 3 May 2026 17:54:02 +0800 Subject: [PATCH 2/4] doc: update README to clarify project purpose and target audience --- README.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9edd99b..b15de1e 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Capsule -Open Asset Management Scaled to Millions. +Open-source, federated, E2E encrypted photo management and sharing service built for professionals and prosumers. > Disclaimer: This project continues to be in active development. Star this repo to get the latest updates! @@ -18,6 +18,14 @@ Open Asset Management Scaled to Millions. +## Is Capsule for you? + +Capsule is highly refined for photographers and prosumers who want to store and share their photos nearly as seamlessly as do cloud photo services that do not necessarily work on all your devices equally as well and own your data. + +We implement strict security and privacy requirements with the assumption that any data stored can be viewed by unauthorized parties. As such, everything is end-to-end encrypted and processed locally. + +However, it is important to note that (at least currently) Capsule requires a **self-hosted** server which requires some technical knowledge. This is not a turn-key solution but rather a capable and actively-developed open-source project. It was created out of passion and so I (as the author) do not ask for any monetary compensation. The best form of compensation is technical contributions and feedback! + ## Screenshots From 1776c045079f98f2acf4e546df6ffdd1d9a78b15 Mon Sep 17 00:00:00 2001 From: Justin Chung <20733699+justin13888@users.noreply.github.com> Date: Sun, 24 May 2026 19:24:27 -0400 Subject: [PATCH 3/4] refactor: remove remaining referneces to Pixles and overhauled design docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Switch the file-hashing algorithm from BLAKE3 to SHA-256 (sha2 + hex crates) across capsule-core, capsule-api entity/upload/migration, and all sidecar/DB column references (hash_blake3 → hash_sha256). A new DB migration is added; existing hash values are incompatible and will be invalidated on upgrade. Also correct residual "pixles-*" / "Pixles" references in GitHub issue templates, AGENTS.md, and SECURITY.md to their Capsule equivalents, and expand AGENTS.md with the full code-style guide. --- .github/ISSUE_TEMPLATE/bug_report.md | 18 +- .github/ISSUE_TEMPLATE/chore.md | 18 +- .github/ISSUE_TEMPLATE/feature_request.md | 18 +- .github/ISSUE_TEMPLATE/security_report.md | 18 +- AGENTS.md | 10 +- Cargo.lock | 1105 +++++++------ Cargo.toml | 3 +- SECURITY.md | 4 +- capsule-api/entity/src/asset.rs | 2 +- capsule-api/migration/src/lib.rs | 4 +- ...0322_000000_change_file_hash_to_sha256.rs} | 2 +- capsule-api/upload/src/models/requests.rs | 2 +- capsule-api/upload/src/models/session.rs | 2 +- capsule-core/Cargo.toml | 3 +- capsule-core/src/db/driver.rs | 22 +- capsule-core/src/db/rows.rs | 2 +- capsule-core/src/db/schema.rs | 4 +- capsule-core/src/import/executor.rs | 14 +- capsule-core/src/import/planner.rs | 15 +- capsule-core/src/import/upload.rs | 2 +- capsule-core/src/library/rebuild.rs | 4 +- capsule-core/src/library/trash.rs | 6 +- capsule-core/src/metadata/file.rs | 6 +- capsule-core/src/sidecar/asset_sidecar.rs | 10 +- capsule-core/src/sidecar/io.rs | 2 +- capsule-core/src/utils/hash.rs | 11 +- capsule-docs/src/content/docs/design/ai.md | 126 +- .../src/content/docs/design/asset-stacking.md | 31 - .../src/content/docs/design/authentication.md | 90 ++ .../src/content/docs/design/authorization.md | 63 + .../content/docs/design/backup-recovery.md | 77 + .../src/content/docs/design/clients.md | 49 + .../src/content/docs/design/cryptography.md | 619 ++++++++ .../src/content/docs/design/federation.md | 153 ++ .../src/content/docs/design/filesystem.md | 1409 ++++++----------- .../docs/design/import-prioritization.md | 15 - .../docs/design/import-synchronization.md | 270 ++++ .../src/content/docs/design/metadata.md | 156 ++ .../src/content/docs/design/ml-models.md | 106 ++ .../src/content/docs/design/organization.md | 56 + .../src/content/docs/design/peering.md | 175 ++ .../src/content/docs/design/principles.md | 62 + .../src/content/docs/design/search.md | 6 - .../src/content/docs/design/threat-model.md | 313 ++++ .../src/content/docs/design/thumbnails.md | 40 + .../src/content/docs/design/versioning.md | 73 + .../src/content/docs/development/upload.md | 6 - .../src/content/docs/guides/self-hosting.md | 7 +- 48 files changed, 3470 insertions(+), 1739 deletions(-) rename capsule-api/migration/src/{m20260322_000000_change_file_hash_to_blake3.rs => m20260322_000000_change_file_hash_to_sha256.rs} (98%) delete mode 100644 capsule-docs/src/content/docs/design/asset-stacking.md create mode 100644 capsule-docs/src/content/docs/design/authentication.md create mode 100644 capsule-docs/src/content/docs/design/authorization.md create mode 100644 capsule-docs/src/content/docs/design/backup-recovery.md create mode 100644 capsule-docs/src/content/docs/design/clients.md create mode 100644 capsule-docs/src/content/docs/design/cryptography.md create mode 100644 capsule-docs/src/content/docs/design/federation.md delete mode 100644 capsule-docs/src/content/docs/design/import-prioritization.md create mode 100644 capsule-docs/src/content/docs/design/import-synchronization.md create mode 100644 capsule-docs/src/content/docs/design/metadata.md create mode 100644 capsule-docs/src/content/docs/design/ml-models.md create mode 100644 capsule-docs/src/content/docs/design/organization.md create mode 100644 capsule-docs/src/content/docs/design/peering.md create mode 100644 capsule-docs/src/content/docs/design/principles.md delete mode 100644 capsule-docs/src/content/docs/design/search.md create mode 100644 capsule-docs/src/content/docs/design/threat-model.md create mode 100644 capsule-docs/src/content/docs/design/thumbnails.md create mode 100644 capsule-docs/src/content/docs/design/versioning.md delete mode 100644 capsule-docs/src/content/docs/development/upload.md diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 206c3f3..edaf489 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -10,15 +10,15 @@ assignees: '' ## Component -- [ ] pixles-android -- [ ] pixles-api -- [ ] pixles-cli -- [ ] pixles-core-rust -- [ ] pixles-desktop -- [ ] pixles-docs -- [ ] pixles-media -- [ ] pixles-swift -- [ ] pixles-web +- [ ] capsule-android +- [ ] capsule-api +- [ ] capsule-cli +- [ ] capsule-core-rust +- [ ] capsule-desktop +- [ ] capsule-docs +- [ ] capsule-media +- [ ] capsule-swift +- [ ] capsule-web - [ ] Other: _____________________ ## Steps to Reproduce diff --git a/.github/ISSUE_TEMPLATE/chore.md b/.github/ISSUE_TEMPLATE/chore.md index 7c4ef12..de43f3f 100644 --- a/.github/ISSUE_TEMPLATE/chore.md +++ b/.github/ISSUE_TEMPLATE/chore.md @@ -13,15 +13,15 @@ assignees: '' ## Component -- [ ] pixles-android -- [ ] pixles-api -- [ ] pixles-cli -- [ ] pixles-core-rust -- [ ] pixles-desktop -- [ ] pixles-docs -- [ ] pixles-media -- [ ] pixles-swift -- [ ] pixles-web +- [ ] capsule-android +- [ ] capsule-api +- [ ] capsule-cli +- [ ] capsule-core-rust +- [ ] capsule-desktop +- [ ] capsule-docs +- [ ] capsule-media +- [ ] capsule-swift +- [ ] capsule-web - [ ] Other: _____________________ ## Proposed Changes diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 5398480..5c56967 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -10,15 +10,15 @@ assignees: '' ## Component -- [ ] pixles-android -- [ ] pixles-api -- [ ] pixles-cli -- [ ] pixles-core-rust -- [ ] pixles-desktop -- [ ] pixles-docs -- [ ] pixles-media -- [ ] pixles-swift -- [ ] pixles-web +- [ ] capsule-android +- [ ] capsule-api +- [ ] capsule-cli +- [ ] capsule-core-rust +- [ ] capsule-desktop +- [ ] capsule-docs +- [ ] capsule-media +- [ ] capsule-swift +- [ ] capsule-web - [ ] Other: _____________________ ## Use Case diff --git a/.github/ISSUE_TEMPLATE/security_report.md b/.github/ISSUE_TEMPLATE/security_report.md index 29f294c..60d6633 100644 --- a/.github/ISSUE_TEMPLATE/security_report.md +++ b/.github/ISSUE_TEMPLATE/security_report.md @@ -11,15 +11,15 @@ assignees: '' ## Component -- [ ] pixles-android -- [ ] pixles-api -- [ ] pixles-cli -- [ ] pixles-core-rust -- [ ] pixles-desktop -- [ ] pixles-docs -- [ ] pixles-media -- [ ] pixles-swift -- [ ] pixles-web +- [ ] capsule-android +- [ ] capsule-api +- [ ] capsule-cli +- [ ] capsule-core-rust +- [ ] capsule-desktop +- [ ] capsule-docs +- [ ] capsule-media +- [ ] capsule-swift +- [ ] capsule-web - [ ] Other: _____________________ ## Steps to Reproduce (if safe to share) diff --git a/AGENTS.md b/AGENTS.md index 5173c9c..ef63635 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1 +1,9 @@ -# Pixles +# Capsule + +## Code Style + +- Contract-driven development: Define the interfaces and data structures first, along with all test cases, before implementing the actual logic. +- Cohesion: All code should be split into cohesive modules that have a single responsibility and clear interfaces. Encapsulate unnecessary details. +- Minimalism: Choose to use a dependency if it reduces the scope of testing and quantity of code and as long as it does not compromise on performance and required capabilities. +- Traceability: all critical processes are verbosely logged so it is clear what happened after the fact and recovery can be feasible. Use INFO logs where necessary and DEBUG,TRACE aggressively for all critical processes. Logs should be structured and easily queryable. Instrument hot paths (e.g. major functions) for performance monitoring and debugging in production. +- Mocking: Use mocks for all external dependencies and critical internal processes. This allows us to have deterministic tests and easily simulate edge cases and failure scenarios that are hard to reproduce with real dependencies. Do not try to wire up two incomplete complex systems to mock each other. diff --git a/Cargo.lock b/Cargo.lock index d56638e..f329a81 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -213,12 +213,6 @@ dependencies = [ "password-hash", ] -[[package]] -name = "arrayref" -version = "0.3.9" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "76a2e8124351fda1ef8aaaa3bbd7ebbcb486bbcd4225aca0aa0d84bb2db8fecb" - [[package]] name = "arrayvec" version = "0.7.6" @@ -640,20 +634,6 @@ dependencies = [ "digest", ] -[[package]] -name = "blake3" -version = "1.8.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2468ef7d57b3fb7e16b576e8377cdbde2320c60e1491e961d11da40fc4f02a2d" -dependencies = [ - "arrayref", - "arrayvec", - "cc", - "cfg-if", - "constant_time_eq 0.4.2", - "cpufeatures", -] - [[package]] name = "block-buffer" version = "0.10.4" @@ -857,278 +837,639 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6b5271031022835ee8c7582fe67403bd6cb3d962095787af7921027234bab5bf" [[package]] -name = "castaway" -version = "0.2.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "dec551ab6e7578819132c713a93c022a05d60159dc86e7a7050223577484c55a" +name = "capsule-api" +version = "0.1.0" dependencies = [ - "rustversion", + "capsule-api-auth", + "capsule-api-environment", + "capsule-api-library", + "capsule-api-media", + "capsule-api-migration", + "capsule-api-sync", + "capsule-api-upload", + "clap", + "color-eyre", + "eyre", + "listenfd", + "salvo", + "sea-orm", + "serde", + "serde_json", + "tokio", + "tracing", + "tracing-subscriber", ] [[package]] -name = "cc" -version = "1.2.51" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7a0aeaff4ff1a90589618835a598e545176939b97874f7abc7851caa0618f203" +name = "capsule-api-auth" +version = "0.1.0" dependencies = [ - "find-msvc-tools", - "jobserver", - "libc", - "shlex", + "argon2", + "async-trait", + "base64 0.22.1", + "bb8", + "bb8-redis", + "capsule-api-environment", + "capsule-api-migration", + "capsule-api-model", + "capsule-api-service", + "chrono", + "derive_more", + "eyre", + "jsonwebtoken", + "mime", + "nanoid", + "redis", + "reqwest", + "ring", + "salvo", + "sea-orm", + "sea-orm-migration", + "secrecy", + "serde", + "serde_json", + "strum 0.27.2", + "strum_macros 0.27.2", + "testcontainers", + "testcontainers-modules", + "thiserror 2.0.17", + "tokio", + "tonic-prost-build", + "totp-rs", + "tracing", + "tracing-subscriber", + "uuid", + "webauthn-rs", ] [[package]] -name = "cfg-if" -version = "1.0.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" - -[[package]] -name = "cfg_aliases" -version = "0.2.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724" - -[[package]] -name = "chardetng" -version = "0.1.17" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "14b8f0b65b7b08ae3c8187e8d77174de20cb6777864c6b832d8ad365999cf1ea" +name = "capsule-api-entity" +version = "0.1.0" dependencies = [ - "cfg-if", - "encoding_rs", - "memchr", + "chrono", + "nanoid", + "sea-orm", + "serde_json", ] [[package]] -name = "chrono" -version = "0.4.42" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "145052bdd345b87320e369255277e3fb5152762ad123a901ef5c262dd38fe8d2" +name = "capsule-api-environment" +version = "0.1.0" dependencies = [ - "iana-time-zone", - "js-sys", - "num-traits", - "serde", - "wasm-bindgen", - "windows-link 0.2.1", + "base64 0.22.1", + "dotenvy", + "eyre", + "jsonwebtoken", + "ring", + "secrecy", + "thiserror 2.0.17", + "tracing", ] [[package]] -name = "ciborium" -version = "0.2.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e" +name = "capsule-api-library" +version = "0.1.0" dependencies = [ - "ciborium-io", - "ciborium-ll", + "argon2", + "async-graphql", + "base64 0.22.1", + "capsule-api-auth", + "capsule-api-entity", + "capsule-api-environment", + "capsule-api-model", + "capsule-api-service", + "chrono", + "eyre", + "futures-util", + "jsonwebtoken", + "nanoid", + "ring", + "salvo", + "sea-orm", + "secrecy", "serde", + "serde_json", + "thiserror 2.0.17", + "tokio", + "tracing", + "uuid", ] [[package]] -name = "ciborium-io" -version = "0.2.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757" - -[[package]] -name = "ciborium-ll" -version = "0.2.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9" +name = "capsule-api-media" +version = "0.1.0" dependencies = [ - "ciborium-io", - "half", + "capsule-api-auth", + "capsule-api-entity", + "capsule-api-environment", + "capsule-api-model", + "capsule-api-service", + "derive_more", + "eyre", + "jsonwebtoken", + "salvo", + "sea-orm", + "serde", + "thiserror 2.0.17", + "tokio", + "tracing", + "uuid", ] [[package]] -name = "cipher" -version = "0.4.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "773f3b9af64447d2ce9850330c473515014aa235e6a783b02db81ff39e4a3dad" +name = "capsule-api-migration" +version = "0.1.0" dependencies = [ - "crypto-common", - "inout", + "sea-orm-migration", + "tokio", ] [[package]] -name = "clap" -version = "4.5.53" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c9e340e012a1bf4935f5282ed1436d1489548e8f72308207ea5df0e23d2d03f8" +name = "capsule-api-model" +version = "0.1.0" dependencies = [ - "clap_builder", - "clap_derive", + "argon2", + "capsule-api-entity", + "chrono", + "eyre", + "jsonwebtoken", + "redis", + "salvo", + "sea-orm", + "serde", + "serde_json", + "thiserror 2.0.17", + "tracing", + "uuid", ] [[package]] -name = "clap_builder" -version = "4.5.53" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d76b5d13eaa18c901fd2f7fca939fefe3a0727a953561fefdf3b2922b8569d00" +name = "capsule-api-service" +version = "0.1.0" dependencies = [ - "anstream", - "anstyle", - "clap_lex", - "strsim", + "capsule-api-entity", + "capsule-api-model", + "capsule-core", + "chrono", + "data-encoding", + "nanoid", + "sea-orm", + "serde", + "serde_json", + "thiserror 2.0.17", + "tokio", + "uuid", ] [[package]] -name = "clap_derive" -version = "4.5.49" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2a0b5487afeab2deb2ff4e03a807ad1a03ac532ff5a2cee5d86884440c7f7671" +name = "capsule-api-sync" +version = "0.1.0" dependencies = [ - "heck 0.5.0", - "proc-macro2", - "quote", - "syn 2.0.111", -] - -[[package]] -name = "clap_lex" -version = "0.7.6" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d" + "capsule-api-auth", + "capsule-api-entity", + "capsule-api-environment", + "capsule-core", + "eyre", + "futures-util", + "http-body-util", + "jsonwebtoken", + "prost 0.14.3", + "prost-types 0.14.3", + "salvo", + "sea-orm", + "serde", + "sync_wrapper", + "thiserror 2.0.17", + "tokio", + "tokio-stream", + "tonic", + "tonic-health", + "tonic-prost", + "tonic-prost-build", + "tonic-types", + "tonic-web", + "tower", + "tracing", +] [[package]] -name = "cmake" -version = "0.1.57" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "75443c44cd6b379beb8c5b45d85d0773baf31cce901fe7bb252f4eff3008ef7d" +name = "capsule-api-testing" +version = "0.1.0" dependencies = [ - "cc", + "capsule-api-entity", + "capsule-api-migration", + "dotenvy", + "sea-orm", + "sea-orm-migration", + "thiserror 2.0.17", + "tokio", + "uuid", ] [[package]] -name = "color-eyre" -version = "0.6.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e5920befb47832a6d61ee3a3a846565cfa39b331331e68a3b1d1116630f2f26d" +name = "capsule-api-upload" +version = "0.1.0" dependencies = [ - "backtrace", - "color-spantrace", + "base64 0.22.1", + "bb8-redis", + "bytes", + "capsule-api-auth", + "capsule-api-entity", + "capsule-api-environment", + "capsule-api-model", + "capsule-api-service", + "capsule-core", + "capsule-media", + "chrono", "eyre", - "indenter", - "once_cell", - "owo-colors", - "tracing-error", + "futures-util", + "indexmap 2.12.1", + "jsonwebtoken", + "libc", + "nanoid", + "ring", + "salvo", + "sea-orm", + "secrecy", + "serde", + "serde_json", + "thiserror 2.0.17", + "tokio", + "tracing", + "uuid", ] [[package]] -name = "color-spantrace" -version = "0.3.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b8b88ea9df13354b55bc7234ebcce36e6ef896aca2e42a15de9e10edce01b427" +name = "capsule-cli" +version = "0.1.0" dependencies = [ - "once_cell", - "owo-colors", - "tracing-core", - "tracing-error", + "base64 0.22.1", + "capitalize", + "capsule-cli-entity", + "capsule-cli-migration", + "capsule-core", + "chrono", + "clap", + "color-eyre", + "colored", + "dialoguer", + "directories", + "eyre", + "futures", + "humansize", + "indexmap 2.12.1", + "nanoid", + "sea-orm", + "serde", + "sysinfo", + "thiserror 2.0.17", + "tokio", + "tracing", + "tracing-subscriber", + "walkdir", ] [[package]] -name = "colorchoice" -version = "1.0.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75" +name = "capsule-cli-entity" +version = "0.1.0" +dependencies = [ + "chrono", + "nanoid", + "sea-orm", + "serde", +] [[package]] -name = "colored" -version = "3.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fde0e0ec90c9dfb3b4b1a0891a7dcd0e2bffde2f7efed5fe7c9bb00e5bfb915e" +name = "capsule-cli-migration" +version = "0.1.0" dependencies = [ - "windows-sys 0.59.0", + "sea-orm-migration", + "tokio", ] [[package]] -name = "combine" -version = "4.6.7" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ba5a308b75df32fe02788e748662718f03fde005016435c444eea572398219fd" +name = "capsule-core" +version = "0.1.0" dependencies = [ - "bytes", - "futures-core", - "memchr", - "pin-project-lite", - "tokio", - "tokio-util", + "chrono", + "ciborium", + "globset", + "hex", + "indexmap 2.12.1", + "kamadak-exif", + "log", + "rusqlite", + "serde", + "serde_json", + "sha2", + "tempfile", + "thiserror 2.0.17", + "tzf-rs", + "uuid", + "walkdir", ] [[package]] -name = "compact_str" -version = "0.9.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3fdb1325a1cece981e8a296ab8f0f9b63ae357bd0784a9faaf548cc7b480707a" +name = "capsule-media" +version = "0.1.0" dependencies = [ - "castaway", - "cfg-if", - "itoa", - "rustversion", - "ryu", + "base64 0.22.1", + "chrono", + "file-format", + "indexmap 2.12.1", + "jpeg-encoder", + "memmap2", + "num-rational", "serde", - "static_assertions", + "thiserror 2.0.17", + "thumbhash", + "tokio", + "tracing", + "zune-core", + "zune-jpeg", ] [[package]] -name = "concurrent-queue" -version = "2.5.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4ca0197aee26d1ae37445ee532fefce43251d24cc7c166799f4d46817f1d3973" +name = "capsule-sdk" +version = "0.1.0" dependencies = [ - "crossbeam-utils", + "chrono", + "log", + "progenitor", + "reqwest", + "serde", + "thiserror 2.0.17", + "tokio", + "uuid", ] [[package]] -name = "console" -version = "0.15.11" +name = "castaway" +version = "0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "054ccb5b10f9f2cbf51eb355ca1d05c2d279ce1804688d0db74b4733a5aeafd8" +checksum = "dec551ab6e7578819132c713a93c022a05d60159dc86e7a7050223577484c55a" dependencies = [ - "encode_unicode", - "libc", - "once_cell", - "unicode-width", - "windows-sys 0.59.0", + "rustversion", ] [[package]] -name = "const-oid" -version = "0.9.6" +name = "cc" +version = "1.2.51" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c2459377285ad874054d797f3ccebf984978aa39129f6eafde5cdc8315b612f8" +checksum = "7a0aeaff4ff1a90589618835a598e545176939b97874f7abc7851caa0618f203" +dependencies = [ + "find-msvc-tools", + "jobserver", + "libc", + "shlex", +] [[package]] -name = "constant_time_eq" -version = "0.3.1" +name = "cfg-if" +version = "1.0.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7c74b8349d32d297c9134b8c88677813a227df8f779daa29bfc29c183fe3dca6" +checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" [[package]] -name = "constant_time_eq" -version = "0.4.2" +name = "cfg_aliases" +version = "0.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3d52eff69cd5e647efe296129160853a42795992097e8af39800e1060caeea9b" +checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724" [[package]] -name = "content_inspector" -version = "0.2.4" +name = "chardetng" +version = "0.1.17" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b7bda66e858c683005a53a9a60c69a4aca7eeaa45d124526e389f7aec8e62f38" +checksum = "14b8f0b65b7b08ae3c8187e8d77174de20cb6777864c6b832d8ad365999cf1ea" dependencies = [ + "cfg-if", + "encoding_rs", "memchr", ] [[package]] -name = "cookie" -version = "0.18.1" +name = "chrono" +version = "0.4.42" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4ddef33a339a91ea89fb53151bd0a4689cfce27055c291dfa69945475d22c747" +checksum = "145052bdd345b87320e369255277e3fb5152762ad123a901ef5c262dd38fe8d2" dependencies = [ - "aes-gcm", - "base64 0.22.1", - "hmac", + "iana-time-zone", + "js-sys", + "num-traits", + "serde", + "wasm-bindgen", + "windows-link 0.2.1", +] + +[[package]] +name = "ciborium" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e" +dependencies = [ + "ciborium-io", + "ciborium-ll", + "serde", +] + +[[package]] +name = "ciborium-io" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757" + +[[package]] +name = "ciborium-ll" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9" +dependencies = [ + "ciborium-io", + "half", +] + +[[package]] +name = "cipher" +version = "0.4.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "773f3b9af64447d2ce9850330c473515014aa235e6a783b02db81ff39e4a3dad" +dependencies = [ + "crypto-common", + "inout", +] + +[[package]] +name = "clap" +version = "4.5.53" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c9e340e012a1bf4935f5282ed1436d1489548e8f72308207ea5df0e23d2d03f8" +dependencies = [ + "clap_builder", + "clap_derive", +] + +[[package]] +name = "clap_builder" +version = "4.5.53" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d76b5d13eaa18c901fd2f7fca939fefe3a0727a953561fefdf3b2922b8569d00" +dependencies = [ + "anstream", + "anstyle", + "clap_lex", + "strsim", +] + +[[package]] +name = "clap_derive" +version = "4.5.49" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2a0b5487afeab2deb2ff4e03a807ad1a03ac532ff5a2cee5d86884440c7f7671" +dependencies = [ + "heck 0.5.0", + "proc-macro2", + "quote", + "syn 2.0.111", +] + +[[package]] +name = "clap_lex" +version = "0.7.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d" + +[[package]] +name = "cmake" +version = "0.1.57" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "75443c44cd6b379beb8c5b45d85d0773baf31cce901fe7bb252f4eff3008ef7d" +dependencies = [ + "cc", +] + +[[package]] +name = "color-eyre" +version = "0.6.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e5920befb47832a6d61ee3a3a846565cfa39b331331e68a3b1d1116630f2f26d" +dependencies = [ + "backtrace", + "color-spantrace", + "eyre", + "indenter", + "once_cell", + "owo-colors", + "tracing-error", +] + +[[package]] +name = "color-spantrace" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8b88ea9df13354b55bc7234ebcce36e6ef896aca2e42a15de9e10edce01b427" +dependencies = [ + "once_cell", + "owo-colors", + "tracing-core", + "tracing-error", +] + +[[package]] +name = "colorchoice" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75" + +[[package]] +name = "colored" +version = "3.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fde0e0ec90c9dfb3b4b1a0891a7dcd0e2bffde2f7efed5fe7c9bb00e5bfb915e" +dependencies = [ + "windows-sys 0.59.0", +] + +[[package]] +name = "combine" +version = "4.6.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ba5a308b75df32fe02788e748662718f03fde005016435c444eea572398219fd" +dependencies = [ + "bytes", + "futures-core", + "memchr", + "pin-project-lite", + "tokio", + "tokio-util", +] + +[[package]] +name = "compact_str" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3fdb1325a1cece981e8a296ab8f0f9b63ae357bd0784a9faaf548cc7b480707a" +dependencies = [ + "castaway", + "cfg-if", + "itoa", + "rustversion", + "ryu", + "serde", + "static_assertions", +] + +[[package]] +name = "concurrent-queue" +version = "2.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4ca0197aee26d1ae37445ee532fefce43251d24cc7c166799f4d46817f1d3973" +dependencies = [ + "crossbeam-utils", +] + +[[package]] +name = "console" +version = "0.15.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "054ccb5b10f9f2cbf51eb355ca1d05c2d279ce1804688d0db74b4733a5aeafd8" +dependencies = [ + "encode_unicode", + "libc", + "once_cell", + "unicode-width", + "windows-sys 0.59.0", +] + +[[package]] +name = "const-oid" +version = "0.9.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c2459377285ad874054d797f3ccebf984978aa39129f6eafde5cdc8315b612f8" + +[[package]] +name = "constant_time_eq" +version = "0.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7c74b8349d32d297c9134b8c88677813a227df8f779daa29bfc29c183fe3dca6" + +[[package]] +name = "content_inspector" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b7bda66e858c683005a53a9a60c69a4aca7eeaa45d124526e389f7aec8e62f38" +dependencies = [ + "memchr", +] + +[[package]] +name = "cookie" +version = "0.18.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4ddef33a339a91ea89fb53151bd0a4689cfce27055c291dfa69945475d22c747" +dependencies = [ + "aes-gcm", + "base64 0.22.1", + "hmac", "percent-encoding", "rand 0.8.5", "sha2", @@ -3507,372 +3848,6 @@ dependencies = [ "futures-io", ] -[[package]] -name = "capsule-api" -version = "0.1.0" -dependencies = [ - "clap", - "color-eyre", - "eyre", - "listenfd", - "capsule-api-auth", - "capsule-api-environment", - "capsule-api-library", - "capsule-api-media", - "capsule-api-migration", - "capsule-api-sync", - "capsule-api-upload", - "salvo", - "sea-orm", - "serde", - "serde_json", - "tokio", - "tracing", - "tracing-subscriber", -] - -[[package]] -name = "capsule-api-auth" -version = "0.1.0" -dependencies = [ - "argon2", - "async-trait", - "base64 0.22.1", - "bb8", - "bb8-redis", - "chrono", - "derive_more", - "eyre", - "jsonwebtoken", - "mime", - "nanoid", - "capsule-api-environment", - "capsule-api-migration", - "capsule-api-model", - "capsule-api-service", - "redis", - "reqwest", - "ring", - "salvo", - "sea-orm", - "sea-orm-migration", - "secrecy", - "serde", - "serde_json", - "strum 0.27.2", - "strum_macros 0.27.2", - "testcontainers", - "testcontainers-modules", - "thiserror 2.0.17", - "tokio", - "tonic-prost-build", - "totp-rs", - "tracing", - "tracing-subscriber", - "uuid", - "webauthn-rs", -] - -[[package]] -name = "capsule-api-entity" -version = "0.1.0" -dependencies = [ - "chrono", - "nanoid", - "sea-orm", - "serde_json", -] - -[[package]] -name = "capsule-api-environment" -version = "0.1.0" -dependencies = [ - "base64 0.22.1", - "dotenvy", - "eyre", - "jsonwebtoken", - "ring", - "secrecy", - "thiserror 2.0.17", - "tracing", -] - -[[package]] -name = "capsule-api-library" -version = "0.1.0" -dependencies = [ - "argon2", - "async-graphql", - "base64 0.22.1", - "chrono", - "eyre", - "futures-util", - "jsonwebtoken", - "nanoid", - "capsule-api-auth", - "capsule-api-entity", - "capsule-api-environment", - "capsule-api-model", - "capsule-api-service", - "ring", - "salvo", - "sea-orm", - "secrecy", - "serde", - "serde_json", - "thiserror 2.0.17", - "tokio", - "tracing", - "uuid", -] - -[[package]] -name = "capsule-api-media" -version = "0.1.0" -dependencies = [ - "derive_more", - "eyre", - "jsonwebtoken", - "capsule-api-auth", - "capsule-api-entity", - "capsule-api-environment", - "capsule-api-model", - "capsule-api-service", - "salvo", - "sea-orm", - "serde", - "thiserror 2.0.17", - "tokio", - "tracing", - "uuid", -] - -[[package]] -name = "capsule-api-migration" -version = "0.1.0" -dependencies = [ - "sea-orm-migration", - "tokio", -] - -[[package]] -name = "capsule-api-model" -version = "0.1.0" -dependencies = [ - "argon2", - "chrono", - "eyre", - "jsonwebtoken", - "capsule-api-entity", - "redis", - "salvo", - "sea-orm", - "serde", - "serde_json", - "thiserror 2.0.17", - "tracing", - "uuid", -] - -[[package]] -name = "capsule-api-service" -version = "0.1.0" -dependencies = [ - "chrono", - "data-encoding", - "nanoid", - "capsule-api-entity", - "capsule-api-model", - "capsule-core", - "sea-orm", - "serde", - "serde_json", - "thiserror 2.0.17", - "tokio", - "uuid", -] - -[[package]] -name = "capsule-api-sync" -version = "0.1.0" -dependencies = [ - "eyre", - "futures-util", - "http-body-util", - "jsonwebtoken", - "capsule-api-auth", - "capsule-api-entity", - "capsule-api-environment", - "capsule-core", - "prost 0.14.3", - "prost-types 0.14.3", - "salvo", - "sea-orm", - "serde", - "sync_wrapper", - "thiserror 2.0.17", - "tokio", - "tokio-stream", - "tonic", - "tonic-health", - "tonic-prost", - "tonic-prost-build", - "tonic-types", - "tonic-web", - "tower", - "tracing", -] - -[[package]] -name = "capsule-api-testing" -version = "0.1.0" -dependencies = [ - "dotenvy", - "capsule-api-entity", - "capsule-api-migration", - "sea-orm", - "sea-orm-migration", - "thiserror 2.0.17", - "tokio", - "uuid", -] - -[[package]] -name = "capsule-api-upload" -version = "0.1.0" -dependencies = [ - "base64 0.22.1", - "bb8-redis", - "bytes", - "chrono", - "eyre", - "futures-util", - "indexmap 2.12.1", - "jsonwebtoken", - "libc", - "nanoid", - "capsule-api-auth", - "capsule-api-entity", - "capsule-api-environment", - "capsule-api-model", - "capsule-api-service", - "capsule-core", - "capsule-media", - "ring", - "salvo", - "sea-orm", - "secrecy", - "serde", - "serde_json", - "thiserror 2.0.17", - "tokio", - "tracing", - "uuid", -] - -[[package]] -name = "capsule-cli" -version = "0.1.0" -dependencies = [ - "base64 0.22.1", - "capitalize", - "chrono", - "clap", - "color-eyre", - "colored", - "dialoguer", - "directories", - "eyre", - "futures", - "humansize", - "indexmap 2.12.1", - "nanoid", - "capsule-cli-entity", - "capsule-cli-migration", - "capsule-core", - "sea-orm", - "serde", - "sysinfo", - "thiserror 2.0.17", - "tokio", - "tracing", - "tracing-subscriber", - "walkdir", -] - -[[package]] -name = "capsule-cli-entity" -version = "0.1.0" -dependencies = [ - "chrono", - "nanoid", - "sea-orm", - "serde", -] - -[[package]] -name = "capsule-cli-migration" -version = "0.1.0" -dependencies = [ - "sea-orm-migration", - "tokio", -] - -[[package]] -name = "capsule-core" -version = "0.1.0" -dependencies = [ - "blake3", - "chrono", - "ciborium", - "globset", - "indexmap 2.12.1", - "kamadak-exif", - "log", - "rusqlite", - "serde", - "serde_json", - "tempfile", - "thiserror 2.0.17", - "tzf-rs", - "uuid", - "walkdir", -] - -[[package]] -name = "capsule-media" -version = "0.1.0" -dependencies = [ - "base64 0.22.1", - "chrono", - "file-format", - "indexmap 2.12.1", - "jpeg-encoder", - "memmap2", - "num-rational", - "serde", - "thiserror 2.0.17", - "thumbhash", - "tokio", - "tracing", - "zune-core", - "zune-jpeg", -] - -[[package]] -name = "capsule-sdk" -version = "0.1.0" -dependencies = [ - "chrono", - "log", - "progenitor", - "reqwest", - "serde", - "thiserror 2.0.17", - "tokio", - "uuid", -] - [[package]] name = "pkcs1" version = "0.7.5" @@ -6391,7 +6366,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f124352108f58ef88299e909f6e9470f1cdc8d2a1397963901b4a6366206bf72" dependencies = [ "base32", - "constant_time_eq 0.3.1", + "constant_time_eq", "hmac", "rand 0.9.2", "sha1", diff --git a/Cargo.toml b/Cargo.toml index eda343b..d4ce81a 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -98,7 +98,8 @@ uuid = { version = "1.19.0", features = ["v4"] } tower-http = { version = "0.6.8", features = ["cors"] } tracing = "0.1.43" tracing-subscriber = { version = "0.3.18", features = ["env-filter", "json"] } -blake3 = "1" +sha2 = "0.10" +hex = "0.4" [profile.release] lto = "thin" diff --git a/SECURITY.md b/SECURITY.md index 8c34697..12a2886 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1,6 +1,6 @@ # Security Policy -Security is paramount when Pixles handles your data. Regardless of the number of active decisions made to protect you, we appreciate any support to do better. +Security is paramount when Capsule handles your data. Regardless of the number of active decisions made to protect you, we appreciate any support to do better. ## Supported Versions @@ -8,6 +8,6 @@ We will always ensure the latest major.minor version receives updates. Minor ver ## Reporting a Vulnerability -Click [here](https://github.com/justin13888/Pixles/security/advisories/new) to report a vulnerability. +Click [here](https://github.com/Capsulsaur/Capsule/security/advisories/new) to report a vulnerability. See [this](https://docs.github.com/en/code-security/security-advisories/guidance-on-reporting-and-writing-information-about-vulnerabilities/privately-reporting-a-security-vulnerability) for more info. diff --git a/capsule-api/entity/src/asset.rs b/capsule-api/entity/src/asset.rs index 01a02f4..bb52240 100644 --- a/capsule-api/entity/src/asset.rs +++ b/capsule-api/entity/src/asset.rs @@ -25,7 +25,7 @@ pub struct Model { pub original_filename: String, /// File size in bytes pub file_size: i64, - /// BLAKE3 hash of the file content (64-char lowercase hex) + /// SHA-256 hash of the file content (64-char lowercase hex) #[sea_orm(column_type = "String(StringLen::N(64))")] pub file_hash: String, /// MIME type diff --git a/capsule-api/migration/src/lib.rs b/capsule-api/migration/src/lib.rs index 60b1737..5e096a4 100644 --- a/capsule-api/migration/src/lib.rs +++ b/capsule-api/migration/src/lib.rs @@ -2,7 +2,7 @@ pub use sea_orm_migration::prelude::*; mod m20250210_000000_initial_schema; mod m20250302_000000_add_registered_via; -mod m20260322_000000_change_file_hash_to_blake3; +mod m20260322_000000_change_file_hash_to_sha256; pub struct Migrator; @@ -12,7 +12,7 @@ impl MigratorTrait for Migrator { vec![ Box::new(m20250210_000000_initial_schema::Migration), Box::new(m20250302_000000_add_registered_via::Migration), - Box::new(m20260322_000000_change_file_hash_to_blake3::Migration), + Box::new(m20260322_000000_change_file_hash_to_sha256::Migration), ] } } diff --git a/capsule-api/migration/src/m20260322_000000_change_file_hash_to_blake3.rs b/capsule-api/migration/src/m20260322_000000_change_file_hash_to_sha256.rs similarity index 98% rename from capsule-api/migration/src/m20260322_000000_change_file_hash_to_blake3.rs rename to capsule-api/migration/src/m20260322_000000_change_file_hash_to_sha256.rs index a247daa..6999d09 100644 --- a/capsule-api/migration/src/m20260322_000000_change_file_hash_to_blake3.rs +++ b/capsule-api/migration/src/m20260322_000000_change_file_hash_to_sha256.rs @@ -7,7 +7,7 @@ pub struct Migration; #[async_trait::async_trait] impl MigrationTrait for Migration { async fn up(&self, manager: &SchemaManager) -> Result<(), DbErr> { - // Drop the old BigInt column and add a VARCHAR(64) column for BLAKE3 hex hashes. + // Drop the old BigInt column and add a VARCHAR(64) column for SHA-256 hex hashes. // Existing hash values are incompatible (different algorithm), so data loss is expected. manager .alter_table( diff --git a/capsule-api/upload/src/models/requests.rs b/capsule-api/upload/src/models/requests.rs index f130b0d..9d24cb3 100644 --- a/capsule-api/upload/src/models/requests.rs +++ b/capsule-api/upload/src/models/requests.rs @@ -9,7 +9,7 @@ pub struct CreateUploadRequest { pub filename: String, /// File size in bytes pub size: u64, - /// BLAKE3 hash of the complete file (64-char lowercase hex) + /// SHA-256 hash of the complete file (64-char lowercase hex) pub hash: String, /// MIME type (e.g., "image/jpeg") pub content_type: String, diff --git a/capsule-api/upload/src/models/session.rs b/capsule-api/upload/src/models/session.rs index a0352f9..1e3c921 100644 --- a/capsule-api/upload/src/models/session.rs +++ b/capsule-api/upload/src/models/session.rs @@ -17,7 +17,7 @@ pub struct UploadSession { pub album_id: Option, /// Content type of the file being uploaded pub content_type: Option, - /// Expected BLAKE3 hash for verification on finalize (64-char lowercase hex) + /// Expected SHA-256 hash for verification on finalize (64-char lowercase hex) pub expected_hash: String, // Upload state diff --git a/capsule-core/Cargo.toml b/capsule-core/Cargo.toml index e939998..7de9050 100644 --- a/capsule-core/Cargo.toml +++ b/capsule-core/Cargo.toml @@ -18,7 +18,8 @@ thiserror = { workspace = true } tzf-rs = "0.4" uuid = { workspace = true, features = ["v7", "serde"] } walkdir = "2" -blake3 = { workspace = true } +sha2 = { workspace = true } +hex = { workspace = true } [dev-dependencies] tempfile = "3" diff --git a/capsule-core/src/db/driver.rs b/capsule-core/src/db/driver.rs index 8993102..9af410e 100644 --- a/capsule-core/src/db/driver.rs +++ b/capsule-core/src/db/driver.rs @@ -41,12 +41,12 @@ impl DatabaseDriver { pub fn insert_asset(&self, row: &AssetRow) -> Result<(), rusqlite::Error> { self.conn.execute( "INSERT INTO assets (uuid, asset_type, capture_timestamp, capture_utc, capture_tz_source, - import_timestamp, hash_blake3, width, height, duration_ms, stack_id, is_stack_hidden, + import_timestamp, hash_sha256, width, height, duration_ms, stack_id, is_stack_hidden, chromahash, dominant_color, album_id, rating, is_deleted, deleted_at) VALUES (?1,?2,?3,?4,?5,?6,?7,?8,?9,?10,?11,?12,?13,?14,?15,?16,?17,?18)", params![ row.uuid, row.asset_type, row.capture_timestamp, row.capture_utc, - row.capture_tz_source, row.import_timestamp, row.hash_blake3, + row.capture_tz_source, row.import_timestamp, row.hash_sha256, row.width, row.height, row.duration_ms, row.stack_id, row.is_stack_hidden as i64, row.chromahash, row.dominant_color, row.album_id, row.rating, row.is_deleted as i64, row.deleted_at, @@ -58,12 +58,12 @@ impl DatabaseDriver { pub fn upsert_asset(&self, row: &AssetRow) -> Result<(), rusqlite::Error> { self.conn.execute( "INSERT OR REPLACE INTO assets (uuid, asset_type, capture_timestamp, capture_utc, capture_tz_source, - import_timestamp, hash_blake3, width, height, duration_ms, stack_id, is_stack_hidden, + import_timestamp, hash_sha256, width, height, duration_ms, stack_id, is_stack_hidden, chromahash, dominant_color, album_id, rating, is_deleted, deleted_at) VALUES (?1,?2,?3,?4,?5,?6,?7,?8,?9,?10,?11,?12,?13,?14,?15,?16,?17,?18)", params![ row.uuid, row.asset_type, row.capture_timestamp, row.capture_utc, - row.capture_tz_source, row.import_timestamp, row.hash_blake3, + row.capture_tz_source, row.import_timestamp, row.hash_sha256, row.width, row.height, row.duration_ms, row.stack_id, row.is_stack_hidden as i64, row.chromahash, row.dominant_color, row.album_id, row.rating, row.is_deleted as i64, row.deleted_at, @@ -75,7 +75,7 @@ impl DatabaseDriver { pub fn find_by_uuid(&self, uuid: &str) -> Result, rusqlite::Error> { let mut stmt = self.conn.prepare( "SELECT uuid, asset_type, capture_timestamp, capture_utc, capture_tz_source, - import_timestamp, hash_blake3, width, height, duration_ms, stack_id, is_stack_hidden, + import_timestamp, hash_sha256, width, height, duration_ms, stack_id, is_stack_hidden, chromahash, dominant_color, album_id, rating, is_deleted, deleted_at FROM assets WHERE uuid = ?1 LIMIT 1", )?; @@ -89,9 +89,9 @@ impl DatabaseDriver { pub fn find_by_hash(&self, hash: &str) -> Result, rusqlite::Error> { let mut stmt = self.conn.prepare( "SELECT uuid, asset_type, capture_timestamp, capture_utc, capture_tz_source, - import_timestamp, hash_blake3, width, height, duration_ms, stack_id, is_stack_hidden, + import_timestamp, hash_sha256, width, height, duration_ms, stack_id, is_stack_hidden, chromahash, dominant_color, album_id, rating, is_deleted, deleted_at - FROM assets WHERE hash_blake3 = ?1 LIMIT 1", + FROM assets WHERE hash_sha256 = ?1 LIMIT 1", )?; let mut rows = stmt.query_map(params![hash], map_asset_row)?; match rows.next() { @@ -107,7 +107,7 @@ impl DatabaseDriver { ) -> Result, rusqlite::Error> { let mut stmt = self.conn.prepare( "SELECT uuid, asset_type, capture_timestamp, capture_utc, capture_tz_source, - import_timestamp, hash_blake3, width, height, duration_ms, stack_id, is_stack_hidden, + import_timestamp, hash_sha256, width, height, duration_ms, stack_id, is_stack_hidden, chromahash, dominant_color, album_id, rating, is_deleted, deleted_at FROM assets WHERE is_deleted = 0 AND is_stack_hidden = 0 @@ -229,7 +229,7 @@ impl DatabaseDriver { let threshold = now_secs() - older_than_secs; let mut stmt = self.conn.prepare( "SELECT uuid, asset_type, capture_timestamp, capture_utc, capture_tz_source, - import_timestamp, hash_blake3, width, height, duration_ms, stack_id, is_stack_hidden, + import_timestamp, hash_sha256, width, height, duration_ms, stack_id, is_stack_hidden, chromahash, dominant_color, album_id, rating, is_deleted, deleted_at FROM assets WHERE is_deleted = 1 AND deleted_at IS NOT NULL AND deleted_at < ?1", )?; @@ -253,7 +253,7 @@ fn map_asset_row(row: &rusqlite::Row<'_>) -> rusqlite::Result { capture_utc: row.get(3)?, capture_tz_source: row.get(4)?, import_timestamp: row.get(5)?, - hash_blake3: row.get(6)?, + hash_sha256: row.get(6)?, width: row.get(7)?, height: row.get(8)?, duration_ms: row.get(9)?, @@ -281,7 +281,7 @@ mod tests { capture_utc: Some(1719997200), capture_tz_source: Some("offset_exif".to_string()), import_timestamp: 1720000000, - hash_blake3: hash.to_string(), + hash_sha256: hash.to_string(), width: Some(4032), height: Some(3024), duration_ms: None, diff --git a/capsule-core/src/db/rows.rs b/capsule-core/src/db/rows.rs index 91e997b..d0f89d1 100644 --- a/capsule-core/src/db/rows.rs +++ b/capsule-core/src/db/rows.rs @@ -6,7 +6,7 @@ pub struct AssetRow { pub capture_utc: Option, pub capture_tz_source: Option, pub import_timestamp: i64, - pub hash_blake3: String, + pub hash_sha256: String, pub width: Option, pub height: Option, pub duration_ms: Option, diff --git a/capsule-core/src/db/schema.rs b/capsule-core/src/db/schema.rs index add6091..200aa69 100644 --- a/capsule-core/src/db/schema.rs +++ b/capsule-core/src/db/schema.rs @@ -10,7 +10,7 @@ CREATE TABLE IF NOT EXISTS assets ( capture_utc INTEGER, capture_tz_source TEXT, import_timestamp INTEGER NOT NULL, - hash_blake3 TEXT NOT NULL, + hash_sha256 TEXT NOT NULL, width INTEGER, height INTEGER, duration_ms INTEGER, @@ -51,7 +51,7 @@ CREATE TABLE IF NOT EXISTS asset_tags ( PRIMARY KEY (uuid, tag) ); -CREATE INDEX IF NOT EXISTS idx_assets_hash ON assets(hash_blake3); +CREATE INDEX IF NOT EXISTS idx_assets_hash ON assets(hash_sha256); CREATE INDEX IF NOT EXISTS idx_assets_utc ON assets(capture_utc, capture_timestamp); CREATE INDEX IF NOT EXISTS idx_assets_deleted ON assets(is_deleted); CREATE INDEX IF NOT EXISTS idx_assets_album ON assets(album_id); diff --git a/capsule-core/src/import/executor.rs b/capsule-core/src/import/executor.rs index 242df9a..cab941d 100644 --- a/capsule-core/src/import/executor.rs +++ b/capsule-core/src/import/executor.rs @@ -170,7 +170,7 @@ fn execute_candidate( capture_utc: commit.capture_utc, capture_tz_source: commit.capture_tz_source.clone(), import_timestamp: now, - hash_blake3: commit.hash.clone(), + hash_sha256: commit.hash.clone(), width: commit.width.map(|w| w as i64), height: commit.height.map(|h| h as i64), duration_ms: None, @@ -265,11 +265,11 @@ fn commit_member( } })?; - // Step 5: BLAKE3 verify + // Step 5: SHA-256 verify let source_bytes = fs::read(source).map_err(|e| format!("read failed: {e}"))?; - let source_hash = blake3::hash(&source_bytes).to_hex().to_string(); + let source_hash = crate::utils::hash::hash_bytes(&source_bytes); let tmp_bytes = fs::read(&tmp_media).map_err(|e| format!("read tmp failed: {e}"))?; - let tmp_hash = blake3::hash(&tmp_bytes).to_hex().to_string(); + let tmp_hash = crate::utils::hash::hash_bytes(&tmp_bytes); if source_hash != tmp_hash { let _ = fs::remove_file(&tmp_media); return Err("corrupt_transfer".to_string()); @@ -301,7 +301,7 @@ fn commit_member( original_filename, import_timestamp: now, modified_timestamp: now, - hash_blake3: source_hash.clone(), + hash_sha256: source_hash.clone(), file_size: source_bytes.len() as u64, is_deleted: false, rating: 0, @@ -472,8 +472,8 @@ mod tests { // This is hard to simulate with real fs::copy, so we test the hash comparison logic. let src_bytes = b"source content"; let tmp_bytes = b"different content"; // simulates corruption - let src_hash = blake3::hash(src_bytes).to_hex().to_string(); - let tmp_hash = blake3::hash(tmp_bytes).to_hex().to_string(); + let src_hash = crate::utils::hash::hash_bytes(src_bytes); + let tmp_hash = crate::utils::hash::hash_bytes(tmp_bytes); assert_ne!( src_hash, tmp_hash, "hashes should differ for corrupt transfer test" diff --git a/capsule-core/src/import/planner.rs b/capsule-core/src/import/planner.rs index 391514b..42b0ce2 100644 --- a/capsule-core/src/import/planner.rs +++ b/capsule-core/src/import/planner.rs @@ -9,7 +9,7 @@ use crate::import::scan::{ImportCandidate, ScanResult}; pub struct ImportConfig { pub import_mode: ImportMode, pub target_album_id: Option, - /// If true, import even if a file with the same BLAKE3 hash already exists. + /// If true, import even if a file with the same SHA-256 hash already exists. pub force_reimport_duplicates: bool, } @@ -49,7 +49,7 @@ pub struct ImportActionPlan { /// Phase 2 — decide what to do with each candidate from the scan. /// -/// BLAKE3-hashes the primary member of each candidate and checks the DB for +/// SHA-256-hashes the primary member of each candidate and checks the DB for /// duplicates. Returns an `ImportActionPlan` with per-candidate decisions. pub fn plan( scan: &ScanResult, @@ -110,8 +110,7 @@ fn decide( fn hash_file(path: &Path) -> Result { let bytes = std::fs::read(path)?; - let hash = blake3::hash(&bytes); - Ok(hash.to_hex().to_string()) + Ok(crate::utils::hash::hash_bytes(&bytes)) } // ── Tests ──────────────────────────────────────────────────────────────────── @@ -155,7 +154,7 @@ mod tests { // Write a file and pre-insert its hash let content = b"unique_photo_content"; fs::write(tmp.path().join("photo.jpg"), content).unwrap(); - let hash = blake3::hash(content).to_hex().to_string(); + let hash = crate::utils::hash::hash_bytes(content); let row = crate::db::rows::AssetRow { uuid: "existing-uuid".to_string(), @@ -164,7 +163,7 @@ mod tests { capture_utc: None, capture_tz_source: None, import_timestamp: 1, - hash_blake3: hash, + hash_sha256: hash, width: None, height: None, duration_ms: None, @@ -197,7 +196,7 @@ mod tests { let content = b"reimport_me"; fs::write(tmp.path().join("photo.jpg"), content).unwrap(); - let hash = blake3::hash(content).to_hex().to_string(); + let hash = crate::utils::hash::hash_bytes(content); let row = crate::db::rows::AssetRow { uuid: "existing-uuid2".to_string(), @@ -206,7 +205,7 @@ mod tests { capture_utc: None, capture_tz_source: None, import_timestamp: 1, - hash_blake3: hash, + hash_sha256: hash, width: None, height: None, duration_ms: None, diff --git a/capsule-core/src/import/upload.rs b/capsule-core/src/import/upload.rs index c171742..99bd973 100644 --- a/capsule-core/src/import/upload.rs +++ b/capsule-core/src/import/upload.rs @@ -1,5 +1,5 @@ // Related documentations: -// - https://capsule.justinchung.net/design/import-prioritization/ +// - https://capsule.justinchung.net/design/upload/ use std::{ collections::{HashMap, HashSet}, diff --git a/capsule-core/src/library/rebuild.rs b/capsule-core/src/library/rebuild.rs index a2f7be0..3d9bc60 100644 --- a/capsule-core/src/library/rebuild.rs +++ b/capsule-core/src/library/rebuild.rs @@ -124,7 +124,7 @@ fn asset_row_from_sidecar(s: &crate::sidecar::AssetSidecar) -> AssetRow { capture_utc: s.capture_utc, capture_tz_source: s.capture_tz_source.map(|c| tz_source_str(c).to_string()), import_timestamp: s.import_timestamp, - hash_blake3: s.hash_blake3.clone(), + hash_sha256: s.hash_sha256.clone(), width: s.width.map(|w| w as i64), height: s.height.map(|h| h as i64), duration_ms: s.duration_ms.map(|d| d as i64), @@ -224,7 +224,7 @@ mod tests { original_filename: format!("{uuid}.jpg"), import_timestamp: 1720000000, modified_timestamp: 1720000000, - hash_blake3: hash.to_string(), + hash_sha256: hash.to_string(), file_size: 1024, is_deleted: false, rating: 0, diff --git a/capsule-core/src/library/trash.rs b/capsule-core/src/library/trash.rs index 19d9416..cdad53b 100644 --- a/capsule-core/src/library/trash.rs +++ b/capsule-core/src/library/trash.rs @@ -126,7 +126,7 @@ mod tests { original_filename: format!("{uuid}.jpg"), import_timestamp: 1720000000, modified_timestamp: 1720000000, - hash_blake3: hash.to_string(), + hash_sha256: hash.to_string(), file_size: 1024, is_deleted: false, rating: 0, @@ -181,7 +181,7 @@ mod tests { capture_utc: None, capture_tz_source: None, import_timestamp: 1720000000, - hash_blake3: hash.clone(), + hash_sha256: hash.clone(), width: None, height: None, duration_ms: None, @@ -231,7 +231,7 @@ mod tests { capture_utc: None, capture_tz_source: None, import_timestamp: 1000, - hash_blake3: "e".repeat(64), + hash_sha256: "e".repeat(64), width: None, height: None, duration_ms: None, diff --git a/capsule-core/src/metadata/file.rs b/capsule-core/src/metadata/file.rs index a283840..e184c2e 100644 --- a/capsule-core/src/metadata/file.rs +++ b/capsule-core/src/metadata/file.rs @@ -31,8 +31,8 @@ impl Deref for HashData { #[derive(Debug, Clone, Serialize, Deserialize)] pub struct FileMetadata { - /// BLAKE3 hash (64-char lowercase hex) - pub hash_blake3: HashData, + /// SHA-256 hash (64-char lowercase hex) + pub hash_sha256: HashData, /// File size in bytes pub size: u64, // /// Media type if available @@ -85,7 +85,7 @@ impl FileMetadata { // let media_type = ...; Ok(FileMetadata { - hash_blake3: hash.into(), + hash_sha256: hash.into(), size, // media_type, original_filename, diff --git a/capsule-core/src/sidecar/asset_sidecar.rs b/capsule-core/src/sidecar/asset_sidecar.rs index cc2d843..360024c 100644 --- a/capsule-core/src/sidecar/asset_sidecar.rs +++ b/capsule-core/src/sidecar/asset_sidecar.rs @@ -16,7 +16,7 @@ pub struct AssetSidecar { pub original_filename: String, pub import_timestamp: i64, pub modified_timestamp: i64, - pub hash_blake3: String, + pub hash_sha256: String, pub file_size: u64, pub is_deleted: bool, pub rating: u8, @@ -87,7 +87,7 @@ impl Serialize for AssetSidecar { insert!("original_filename", self.original_filename); insert!("import_timestamp", self.import_timestamp); insert!("modified_timestamp", self.modified_timestamp); - insert!("hash_blake3", self.hash_blake3); + insert!("hash_sha256", self.hash_sha256); insert!("file_size", self.file_size); insert!("is_deleted", self.is_deleted); insert!("rating", self.rating); @@ -162,7 +162,7 @@ impl<'de> Deserialize<'de> for AssetSidecar { let original_filename = req!("original_filename", String); let import_timestamp = req!("import_timestamp", i64); let modified_timestamp = req!("modified_timestamp", i64); - let hash_blake3 = req!("hash_blake3", String); + let hash_sha256 = req!("hash_sha256", String); let file_size = req!("file_size", u64); let is_deleted = req!("is_deleted", bool); let rating = req!("rating", u8); @@ -197,7 +197,7 @@ impl<'de> Deserialize<'de> for AssetSidecar { original_filename, import_timestamp, modified_timestamp, - hash_blake3, + hash_sha256, file_size, is_deleted, rating, @@ -241,7 +241,7 @@ mod tests { original_filename: "IMG_1234.jpg".to_string(), import_timestamp: 1720000000, modified_timestamp: 1720000000, - hash_blake3: "a".repeat(64), + hash_sha256: "a".repeat(64), file_size: 1024 * 1024, is_deleted: false, rating: 0, diff --git a/capsule-core/src/sidecar/io.rs b/capsule-core/src/sidecar/io.rs index f1fd058..bbe5a48 100644 --- a/capsule-core/src/sidecar/io.rs +++ b/capsule-core/src/sidecar/io.rs @@ -95,7 +95,7 @@ mod tests { original_filename: "IMG_1234.jpg".to_string(), import_timestamp: 1720000000, modified_timestamp: 1720000000, - hash_blake3: "a".repeat(64), + hash_sha256: "a".repeat(64), file_size: 1024, is_deleted: false, rating: 0, diff --git a/capsule-core/src/utils/hash.rs b/capsule-core/src/utils/hash.rs index 098f57c..b47e12f 100644 --- a/capsule-core/src/utils/hash.rs +++ b/capsule-core/src/utils/hash.rs @@ -1,8 +1,15 @@ use std::{fs, io, path::Path}; -/// Get BLAKE3 hash of a file as a 64-char lowercase hex string. +use sha2::{Digest, Sha256}; + +/// SHA-256 hash of a byte slice as a 64-char lowercase hex string. +pub fn hash_bytes(bytes: &[u8]) -> String { + hex::encode(Sha256::digest(bytes)) +} + +/// Get SHA-256 hash of a file as a 64-char lowercase hex string. // TODO: switch to streaming version for large files pub fn get_file_hash(path: &Path) -> io::Result { let bytes = fs::read(path)?; - Ok(blake3::hash(&bytes).to_hex().to_string()) + Ok(hash_bytes(&bytes)) } diff --git a/capsule-docs/src/content/docs/design/ai.md b/capsule-docs/src/content/docs/design/ai.md index 23e8fd7..4adb697 100644 --- a/capsule-docs/src/content/docs/design/ai.md +++ b/capsule-docs/src/content/docs/design/ai.md @@ -1,119 +1,67 @@ --- title: AI/ML Integrations in Capsule -description: How do AI features fit into Capsule' architecture and design principles? +description: How do AI features fit into Capsule's architecture and design principles? --- - +> **Status:** Details below are **provisional** pending experimentation. The structure of categories, the namespace separation in [AI Output Containment](#ai-output-containment), and the canonical-model invariant from [ML Models — Embedding Provenance](/design/ml-models/#embedding-provenance) are stable; the specific feature list and per-feature behavior may evolve. -## System Architecture Overview +Capsule runs a hierarchy of ML models for various tasks. The E2E nature of Capsule's architecture requires careful consideration of device capabilities and latency requirements for different features. We broadly categorize the AI/ML processing into three functions: -The platform utilizes an asynchronous, event-driven microservice architecture designed to handle high-throughput ingestion of RAW photos and 4K+ video. +- **[Semantic Indexing](#semantic-indexing):** Generate a *global* embedding for each asset to enable natural language search and similarity search. +- **[Dense Tagging](#dense-tagging):** Generate *local* embeddings for objects, faces, and background elements to enable granular search and auto-album generation. +- **[Quality Assessment](#quality-assessment):** Generate quality scores for each asset to enable quality-based filtering and sorting. -* **API Gateway & Core Logic:** Rust (Axum/Actix-web) for maximum throughput and memory safety. -* **Source of Truth:** PostgreSQL. -* **Vector Database:** PostgreSQL with the `pgvector` extension for storing and querying ML embeddings. -* **Message Broker & Caching:** Valkey (Stream data structures for event queuing). -* **Object Storage:** S3-compatible store (MinIO/AWS S3) for original files and generated proxies. -* **AI Inference Workers:** Python/C++ microservices running models optimized with TensorRT, ONNX Runtime, or vLLM. +Additional AI/ML categories may be added; the canonical inventory is [ML Models](/design/ml-models/). -## The Complete ML Pipeline +## AI Output Containment -The pipeline is split into a synchronous "Fast Path" for immediate user feedback and an asynchronous "AI Path" for deep indexing. +AI inference can be wrong, biased, or hallucinatory. A core design rule prevents AI output from corrupting user intent: **AI outputs land in a separate namespace from user-authored metadata, structurally, not by policy.** -### Phase 1: Ingestion & Fast Path +- AI-suggested tags live in `tags_ai` (a separate OR-set from `tags_user`) — see [Metadata — Tag Provenance and Namespacing](/design/metadata/#tag-provenance-and-namespacing). An AI tag can never overwrite a user tag because they are different fields. +- AI-derived face identities, scene labels, and quality scores live in distinct sidecar fields (e.g. `ai_face_labels`, `ai_scene`, `ai_quality_score`) that the user does not directly edit; user corrections write to *user* fields and AI re-runs leave the user fields alone. +- Every AI output entry carries `model_id` and `model_version` (see [ML Models — Embedding Provenance](/design/ml-models/#embedding-provenance)). When the canonical model for that slot changes, old AI outputs are flagged as stale and excluded from queries until regenerated. +- Promoting an AI tag to a user tag is an explicit, signed lifecycle operation — never automatic, never silent. See [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set). -1. **Upload:** Client pushes the media file to the Object Store and notifies the Rust API. -2. **Metadata Extraction:** Rust extracts EXIF/IPTC data (f-stop, shutter speed, camera model, GPS). -3. **Deterministic Deduplication:** Rust calculates a file hash (XXH3) and a Perceptual Hash (pHash) for photos. -4. **Proxy Generation:** Rust generates web-optimized proxies and thumbnails. -5. **Event Dispatch:** A message (e.g., `media_ready_for_ai: {media_id: uuid}`) is pushed to a Valkey Stream. +A hallucinating model can pollute its own namespace; it cannot pollute user intent. This is the structural defense against the "AI mistake silently overwrites user-authored data" damage class — see [Threat Model — Forbidden Client Behaviors](/design/threat-model/#forbidden-client-behaviors). -### Phase 2: AI Processing (Asynchronous) +## Semantic Indexing -Worker nodes consume the Valkey stream and process the media in parallel: +To do semantic search, you convert an image and a text query into arrays of numbers (vectors) and measure the distance between them. Every embedding model maps the universe differently, and Capsule is end-to-end encrypted, so every device must run the *same* embedding model — vectors are otherwise incomparable across devices. The canonical model for this slot is declared in [ML Models](/design/ml-models/) (see the **Semantic Search** row). -1. **Embedding Generation:** The image is passed through a vision encoder to create a global semantic vector. -2. **Dense Tagging & OCR:** The image is analyzed for granular objects, background elements, and text. -3. **Biometric Pipeline:** Faces are detected, aligned, cropped, and embedded. Bodies are detected and embedded for Person Re-Identification (Re-ID). -4. **Quality Assessment:** The image is scored for technical flaws (blur, noise, exposure). +### Image Categorization & Tagging -### Phase 3: Storage & Indexing +We reuse the semantic embeddings for zero-shot classification to generate tags. This enables faceted search and auto-album generation without a separate classifier model. -1. **Vector Storage:** Embeddings are written to `pgvector` columns. -2. **Graph Linking:** Re-ID embeddings are linked to specific face profiles via database relations. +## Dense Tagging -## Specific ML Tasks & Models +We have the following ordering of operations: - - +- Face Detection & Matching (Clustering): see the **Face Detection** and **Face Recognition** rows in [ML Models](/design/ml-models/). The chosen detector and embedder are SOTA-small models that run near-instantly on mobile devices. -| Task | Category | Model(s) | Dataset(s) | Function | Implementation Status | -| --------------------------------- | ---------------- | --------------------------------------- | --------------------------- | --------------------------------------------------------------------------------------- | --------------------- | -| **Semantic Search** | Natural Language | SigLIP (`siglip-so400m`) | | Generates global image embeddings for natural language search. | WIP (high priority) | -| **Dense Tagging & OCR** | Dense Tagging | Florence-2 | | Unified vision-language model for bounding boxes, dense captions, and reading text. | -| **VLM / Image Chat** | Natural Language | Qwen2.5-VL or LLaVA-1.6 | | Quantized models for on-demand conversational queries about an image. | -| **Image Captioning** | Natural Language | BLIP-2 | | Generates a natural language description of the image content. | -| **Face Detection** | People | SCRFD | | Highly efficient face bounding box and landmark detection. | WIP (high priority) | -| **Face Recognition** | People | InsightFace (AdaFace) | | Generates face embeddings. AdaFace excels at handling low-quality/dark images. | WIP (high priority) | -| **Person Detection** | People | YOLOv10 | | Object detection for identifying "person" bounding boxes. | -| **Person Re-ID** | People | OSNet or TorReID | | Generates embeddings based on clothing and body shape when faces are hidden. | -| **Expression Analysis** | People | EmotioNet | | Detects facial action units to infer emotions. | -| **Quality Scoring** | People | LIQE / TOPIQ | | Blind image quality assessment for noise, blur, and lighting without a reference image. | -| **Object Detection** | Scene | YOLOv10, Grounding DINO, RT-DETR | | Detects objects and background elements for dense tagging. | WIP (high priority) | -| **Scene Classification** | Scene | VIT-L, ConvNeXt-L | Places365, SUN397 | Classifies the overall scene (e.g., "beach", "wedding", "cityscape"). | -| **Landmark Detection** | Scene | DINOv2 + GeM pooling | Google Landmarks v2 | Detects key landmarks (e.g., Eiffel Tower, Golden Gate Bridge) for geotagging. | -| **Bird/plant Detection** | Scene | BioCLIP | iNaturalist 2021 | Identifies and classifies birds and plants within images. | -| **General Animal Detection** | Scene | YOLOv8 finetuned on Open Images Animals | Open Images Animals | Detects common animals (dogs, cats, horses) for tagging and search. | -| **OCR** | Text | TrOCR | SynthText, IIIT-5K | Extracts text from images, including handwriting and signage. | -| **Screenshot Detection** | Scene | Custom CNN classifier | | Identifies screenshots to help culling. | -| **Voice Transcription** | Audio | Whisper-large-v3 | | State-of-the-art speech recognition for generating transcripts from video audio tracks. | -| **Aesthetic Scoring** | Quality | NIMA (Efficientnet head) | AVA Dataset | Rates the aesthetic quality of images to help users find their best shots. | -| **Blur detection** | Quality | Laplacian variance + CNN regressor | DefocusNet, CUHK | Detect blurry images. | -| **Exposure Assessment** | Quality | Custom CNN regressor | Custom | Evaluates the exposure level of images to ensure optimal lighting conditions. | -| **Noise Estimation** | Quality | Custom CNN regressor | Custom | Estimates the noise level in images to help users identify and filter out noisy shots. | -| **Near-duplicate / burst** | Similarity | pHash/dHash + CNN | Custom | Same moment, slightly different | -| **Semantic new-duplicate** | Similarity | SigLIP, CLIP embeddings + ANN | Custom | Same subject, different angle/day | -| **Best-shot selection** | Similarity | Quality models combined? | Custom | Select sharpest/best-exposed from burst | -| **Shot/scene boundary detection** | Video | TransNet v2, PyScene Detect | BBC Planet Earth, ClipShots | Segment video for thumbnail/highlights | -| **Highlight extraction** | Video | Temporal attention + quality scroe | SumMe, TVSum | Extract best moments from videos for highlights and thumbnails. | -| **Action/activity recognition** | Video | VideoMAE, TimeSformer | Kinetics-700, ActivityNet | Sports, cooking, playing, travel | -| **NSFW Detection** | Categorization | OpenCLIP or custom CNN | NSFW datasets | Detects explicit content to help users filter and manage sensitive media. | -| **Violence / Graphic Content** | Categorization | ViT classifier | Custom | Detects and flags sensitive content (e.g. in shared albums) | + -## Extended Detail: Key Algorithmic Implementations +## Quality Assessment - +TODO -### Video-as-Sparse-Photos Algorithm +## Model Batching -Processing every frame of a video through heavy ML models is computationally prohibitive. This algorithm treats video as a sparse collection of keyframes. +Memory is at a premium in mobile devices. We want to be as power-efficient as possible while fulfilling the computational needs of the models. As such, we batch the execution of models in the following ways: -1. **Cut Detection:** Use PySceneDetect (Content-Aware routing) to chunk the video into visually distinct scenes. -2. **Temporal Sampling:** Extract frames at the 10%, 50%, and 90% timestamps of each scene. -3. **Blur Rejection:** Calculate the variance of the Laplacian for each extracted frame: +- Horizontal Batching (model-by-model): Run each model sequentially across all assets. This minimizes the number of models that need to be loaded in memory at once but it incurs lots of IO (since you are reading assets multiple times). +- Vertical Batching (end-to-end): Run all models at once for each asset. This minimizes IO but it is memory intensive since you need to load all models at once, and may result in OOM killing the application process (on mobile OSes). - $$V = \text{var}(\nabla^2 I)$$ +We pick the execution model with the following process: -. If $V$ is below a defined threshold, the frame is too blurry and is discarded. -4. **Audio Processing:** Run Whisper-large-v3 concurrently to generate a timestamped transcript. -5. **Integration:** The surviving keyframes are pushed into the standard image Valkey stream. Database records map the keyframe embeddings to the parent `video_id` and specific timestamp. +- Calculate RAM capacity upfront: Upon starting the task, check the device's available memory. Decide dynamically whether to use Horizontal or Vertical batching based on the device's resources. +- Enforce Micro-Batching: Never pass a massive batch to the inference engine. Break your "huge batch" down into micro-batches of 1, 4, or 8 images. This keeps the NPU cache hot and prevents battery-draining DRAM fetches. +- Quantize everything: Ensure your models are quantized to INT8 or FP16. This halves the memory bandwidth required, which directly translates to less battery consumed and less heat generated. +- Throttle based on thermals: Modern mobile APIs allow you to monitor device temperature. If the device hits 40°C, artificially pause the pipeline for a few seconds. A slightly slower job is better than the OS terminating your app or the hardware thermal-throttling your speeds to a crawl. -### The Re-ID & Pseudo-Labeling Loop +## Database Indexing and View Generation -This algorithm identifies individuals even when they turn away from the camera during an event. +Since each model (except for a few) generate embeddings in a common vector space, we store them locally in a database. We use SQLite + `sqlite-vec`. -1. **The Anchor Pass:** When an image contains a high-confidence frontal face, run InsightFace. If the embedding matches a known profile (e.g., "Bride"), record the bounding box. -2. **The Body Pass:** Run a standard object detector (YOLOv10) to find all "person" bounding boxes. Pass these crops through OSNet to get a 512-dimensional body embedding. -3. **The Linking Phase:** Calculate the Intersection over Union (IoU) of the Face bounding box and the Body bounding box. If $\text{IoU} > 0.7$, link the OSNet body embedding to the "Bride" profile for the duration of this specific album/event. -4. **Pseudo-Labeling:** When an image features a person facing away (no face detected), compare the OSNet body embedding against the temporary event-specific body embeddings using cosine similarity: +## Models and Algorithms - $$\text{sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|}$$ - -. If the similarity exceeds the threshold, tag the individual as the "Bride." - -### High-Dimensional Vector Search in Postgres - -To maintain high throughput in Postgres, exact K-Nearest Neighbors (KNN) is too slow for millions of rows. - -1. Implement **HNSW (Hierarchical Navigable Small World)** indexes on the `pgvector` columns. -2. Use the inner product operator (`<#>`) for normalized embeddings, as it is computationally cheaper than calculating $L_2$ distance (`<->`) or cosine distance (`<=>`) at scale. +The concrete model chosen for each task, and the key algorithms that combine them, are catalogued in [ML Models and Algorithms](/design/ml-models/). diff --git a/capsule-docs/src/content/docs/design/asset-stacking.md b/capsule-docs/src/content/docs/design/asset-stacking.md deleted file mode 100644 index 1dd3115..0000000 --- a/capsule-docs/src/content/docs/design/asset-stacking.md +++ /dev/null @@ -1,31 +0,0 @@ ---- -title: Asset Stacking -description: Details on how asset stacking works in Capsule. ---- - -## Asset Stacking in Capsule - -In large media collections, it’s common for related files to belong together. Instead of cluttering your library with dozens of nearly identical files, Capsule "stacks" them into a single unit. - -You’ve likely seen this in action before—think of how photo apps group RAW+JPG pairs or how video editors sync external audio with camera footage. Capsule uses a "best-effort" auto-detection system to identify these relationships and keep your workspace clean. - -### Photography & Mobile Stacks - -* **RAW + JPEG Pairs:** The classic "prosumer" stack. We treat the uncompressed RAW file and the processed JPEG as one asset to keep your grid tidy. -* **Burst Stacks:** A sequence of high-speed stills (e.g., 10–30 fps). The app identifies a "Best Photo" and tucks the rest behind it. -* **Live Photos:** A JPEG or HEIC paired with a 1.5–3 second video clip, managed as a single interactive unit. -* **Portrait/Depth Stacks:** An image paired with its depth map. This allows you to adjust the bokeh (background blur) after the shot is taken. -* **Smart Selection:** AI-driven grouping of visually similar images taken within seconds of each other to reduce "clutter." - -### Technical & Creative Stacks - -* **Exposure Bracketing (HDR):** Multiple shots of the same scene at different exposure levels (e.g., -2, 0, +2 EV) to be merged into a single High Dynamic Range image. -* **Focus Stacks:** A series of shots with shifting focus points. Often used in macro photography to create "infinite" depth of field. -* **Pixel Shift Stacks:** Found in high-end mirrorless cameras. The sensor moves slightly to capture multiple shots, which are stacked for ultra-high resolution and perfect color. -* **Panorama (Stitched):** A sequence of horizontal or vertical shots intended to be merged into a single wide-field image. - -### Video & Audio Stacks - -* **Proxy/Optimized Stacks:** Pairs a heavy "Master" file (like 8K RAW) with a lightweight "Proxy" (like 1080p ProRes) for smoother editing performance. -* **Chaptered Video:** Action cameras (like GoPro) often split long recordings into 4GB chunks. We stack files like `GOPR001.mp4` and `GOPR002.mp4` so they appear as one continuous video. -* **Dual-System Audio:** Groups video files with high-quality external audio (WAV/AIFF) using timecode or waveform matching. diff --git a/capsule-docs/src/content/docs/design/authentication.md b/capsule-docs/src/content/docs/design/authentication.md new file mode 100644 index 0000000..bb4f664 --- /dev/null +++ b/capsule-docs/src/content/docs/design/authentication.md @@ -0,0 +1,90 @@ +--- +title: Authentication +description: Authentication design +--- + +Authentication is executed with a few key principles: + +- Minimal surface: We implement the full OpenID Connect specification so identity is offloaded to an external provider. +- Cryptographic binding: We cryptographically bind the user's identity to their master key, which is the root of all encryption and decryption operations. This ensures that only authenticated users can access their encrypted assets, and the server never has access to the plaintext master key. + +## Authentication API + +We have a few parts: + +- OpenIDC endpoints: These facilitate authentication flows +- Identity and discovery: We expose standardized endpoints for clients to discover the authentication capabilities and endpoints of the server. See [Identity and Discovery](#identity-and-discovery) for details. +- Session management: Clients are given a permanent session secret (note this is not a JWT token) which permanently identifies the client to the server. See [Session Management](#session-management) for details. + +## Account Types + +- **Registered accounts:** These accounts are associated with a unique identity and have their own master key. They can be authenticated using password+TOTP or passkeys, which cryptographically bind the user to their master key. +- **Delegated/Sponsored accounts:** These accounts are encrypted with keys derived from a registered account's master key. They do not have their own identity and rely on the registered account for authentication and key management. Owners of the sponsored account have full access to the sponsored account. +- **Non-registered accounts:** These accounts do not have an associated identity or master key. They are typically used for share links, where the decryption keys are encapsulated around the secret stored. + +## Identity and Discovery + +We borrow from Matrix 2.0's patterns, with one critical departure: **`.well-known/` never enumerates the user list**. A federated setting where a peer can list every user on a server is unacceptable — both from an abuse-surface perspective (spam, harassment-target discovery, account-enumeration attacks) and a privacy perspective. + + + +- All users have a handle like `user@yourserver.tld` (this resembles Matrix's MXID pattern). +- `.well-known/capsule/server-info` is **public** and returns only server-scoped facts: the API base URL, the auth endpoints, the federation endpoint, the server's signing key, supported `protocol_version` range, and a list of `min_protocol_version` cutoffs for active deprecation windows. It **never** returns a user list. +- **User lookup is authenticated.** A client or a peer server must present credentials to resolve `user@server.tld`: + - **Local client lookup** (resolving another user on the same server, e.g. for sharing): authenticated by the looker's session token. + - **Federated peer lookup** (resolving a user across servers): authenticated by a federation capability token (see [Federation — Federation Capabilities](/design/federation/#federation-capabilities)) and rate-limited per peer. + - **Anonymous WebFinger**: returns only records the target user has explicitly opted into making public. The default is opt-out: no anonymous record. This is deliberately stricter than Matrix's default and follows the [deny-by-default rule](/design/threat-model/#schema-evolution-and-field-grammar) from the threat model. + +## Account Portability + +A user must be able to move servers without losing their identity. Capsule does **not** need a separate DID system for this: the user identity key (User IK — see [Key Management](/design/cryptography/#user-identity-keys-user-iks)) is *already* a server-independent root of trust. Only the `user@server.tld` handle is host-bound. + +Migration therefore re-homes the handle while keeping the same IK: + +- The new server registers the account under the same IK; nothing in the [key hierarchy](/design/cryptography/#key-management) changes. +- The old server publishes an IK-signed **"moved" certificate** at its `.well-known/` path, naming the new handle. This is the one well-known record that names a specific user — it is also opted-into (the user actively migrates) and carries the user's own signature, so it does not constitute the kind of enumeration leak we forbid. +- Clients and [federated](/design/federation/) peers that resolve the old handle follow the certificate, verifying the IK signature, and update to the new location. + +Because the IK signs the move and every device cross-signs to that IK, no server — old or new — can forge a migration or hijack the handle. + +## Session Management + +### Session ID + +All sessions are identified by a session ID with an associated [session token](#session-tokens). The session ID is a UUIDv7 that is generated by the server upon successful authentication and is used to track the session state and associated metadata. + +### Session Tokens + +Session tokens are a long-lived 128-bit secret that is generated by the server upon successful authentication and stored securely on the client. The session token is used to obtain an [access token](#access-tokens) for more frequent API requests. + +### Session Expiry and Revocation + +Sessions expire in two ways: **sliding inactivity expiry** (automatic) and **explicit revocation** (user-initiated). They coexist; either causes the session token to stop being honored. + +#### Sliding inactivity expiry + +A session that has not been used for **180 days** (default; deployment-configurable) expires automatically. "Used" means a successful [access-token](#access-tokens) issuance against the session token — each issuance refreshes the inactivity clock. This bounds the lifetime of a session on a device the user has forgotten about (a phone in a drawer, a laptop given to a relative) without forcing re-authentication on actively-used devices. + +#### Hard expiry + +In addition to the sliding inactivity expiry, every session token has a **hard expiry of 365 days** from issuance (default; deployment-configurable). The hard expiry **does not reset** on use — it is the upper bound on the lifetime of a token regardless of activity. + +The rationale is the malicious-keyholder class from [Threat Model — Client Class Taxonomy](/design/threat-model/#client-class-taxonomy): an attacker who silently exfiltrates a session token from a device the user actively uses would otherwise have an indefinite window of access. The hard expiry caps that window at one year; the user re-authenticates (passkey / password+TOTP) at most once a year per device, which is acceptable friction in exchange for a bounded leak-window. + +Both expiries are enforced server-side at access-token issuance; the session token itself is not invalidated for any other reason than these expiries or an explicit revoke. + +#### Explicit revocation + +A common user session ledger is used with the following capabilities for any authenticated sessions: + +1. List all active sessions (with last-used timestamp, so an expiring session is visible). +2. **Revoke any single session** by invalidating its session token — authenticated by any active session token. +3. **Revoke all sessions at once** (e.g. "log out of all devices") — authenticated by **proof of master-key possession** (a signature with the user's IK over a server-issued challenge), not by an active session token. + +The asymmetric authentication on (3) addresses a damage scenario that pure session-token auth opens up: an attacker holding a stolen session token could otherwise invoke "log out of all devices" and lock the legitimate user out of every other device. Requiring master-key proof for the global revoke means an attacker with a session token can only revoke *that* session — they cannot escalate to denial-of-service. A user who has lost their master key is no worse off: they can still revoke individual sessions one at a time. The single-session revoke (2) is the everyday tool; the global revoke (3) is the nuclear option, gated accordingly. + +Note: Server can theoretically just kick off sessions because session tokens are stored server-side and server holds the encrypted data. But this should not ever be implemented and an attempt to do so would be a bug — it bypasses the audit trail of a user-initiated revoke. + +### Access Tokens + +Access tokens are short-lived tokens derived from the session token that are used for authenticating API requests. They have a limited lifespan and can be refreshed using the session token without requiring the user to re-authenticate. Capsule uses **EdDSA JWTs** as access tokens, signed under the server's [Ed25519 signing key from the cryptographic primitives inventory](/design/cryptography/#signature-scheme) (classical half only — access tokens are short-lived enough that PQ hybridization is not worth the wire-size cost). diff --git a/capsule-docs/src/content/docs/design/authorization.md b/capsule-docs/src/content/docs/design/authorization.md new file mode 100644 index 0000000..1156297 --- /dev/null +++ b/capsule-docs/src/content/docs/design/authorization.md @@ -0,0 +1,63 @@ +--- +title: Authorization +description: Ensuring access is done by someone authorized +--- + +We want to pull out all authorization-related logic (validated by both server and client) into a centralized core to minimize implementation risks and isolating sensitive code to enforce authorization end-to-end. Both server and client validate against the same core, so a client cannot be tricked into accepting an operation the server would reject, and vice versa. + +## Asset Lifecycle + +**Key Problem:** Clients may want to destructively delete or replace assets, which servers must execute remotely. We want robust, centralized control over the lifecycle of every asset. + +Capsule treats every lifecycle transition as an authorized, signed, auditable operation. The design reuses the cryptographic machinery already defined for asset writes rather than inventing a parallel mechanism. + +### The Closed Action Set + +Every lifecycle operation is expressed as an [asset manifest](/design/cryptography/#provenance-and-signed-manifest) whose `action` field is one of the following **closed enum** (a value outside this set is a structural error, never a "future value to ignore" — see [Threat Model — Schema Evolution and Field Grammar](/design/threat-model/#schema-evolution-and-field-grammar)): + +| Action | Meaning | +| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | +| `create` | First write of an asset; `prior_provenance_hash` is `null`. | +| `replace` | Replace the original bytes (e.g. a re-encryption under a new AMK epoch); identity preserved. | +| `delete` | Soft-delete; the asset enters trash with a [retention window](/design/organization/#recycling). | +| `metadata-update` | Edit to the encrypted metadata blob or sidecar fields. | +| `derivative-add` | Add a thumbnail, preview, LQIP, or embedding (see [Cryptography — Derivative Provenance](/design/cryptography/#derivative-provenance)). | +| `derivative-replace` | Replace an existing derivative — the only authorized path; a silent overwrite is rejected. | +| `trash-restore` | Recover a soft-deleted asset from trash within its retention window. | + +Adding a value to this enum bumps `protocol_version` and the old albums remain pinned to their original set — a faulty or new client cannot inject an unknown action into a v_k album. + +### Authorizing a lifecycle operation + +Authorization is established exactly as for a write: + +- The operation must carry a valid signature under the album's per-epoch **write-tier key** — only writers at that epoch hold it. +- It must also carry the device's hybrid `device_sig` for provenance. +- A client acknowledges the operation only after **both** signatures verify through the single [`verify_asset`](/design/cryptography/#write-authorization) chokepoint. +- The manifest's `prior_provenance_hash` must match the asset's current chain head — a stale or forked chain position is rejected (see [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)). This applies uniformly to every action except `create`. + +A `delete` or `replace` is therefore authorized by the same proof as the original `create`: there is no weaker path to destroy data than to add it. Similarly, a `derivative-replace` is authorized as strongly as the original `derivative-add` — a buggy client cannot quietly poison a thumbnail. + +### The server executes but never authorizes + +Per the principle of [trusting the server for storage, never for authorization](/design/cryptography/#implementation), the server **carries out** a remote delete or replace but is **never** the authority that permits it. A server-asserted lifecycle change with no valid write-tier signature is rejected by every client. This bounds the damage a compromised or buggy server can do: it can refuse to store data, but it cannot forge its destruction. + +That said, the server is not *passive*. Even without keys, it enforces the structural envelope of every manifest before persisting it — `action` is in the closed enum, `prior_provenance_hash` matches the stored chain head, `created_by_device` is in the user's published device directory, the device's hybrid signature is structurally well-formed (correct curve, correct key lengths), `crypto_suite_id` and `protocol_version` match the album's pin, and the timestamp is within the ±30-day window. The full checklist is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants). A rejection here means no row is written and no provenance record is appended; the rejection itself is logged. + +### Deletes are soft first + +Destructive operations are staged, not immediate: + +- A `delete` first soft-deletes the asset — it is flagged and moved to trash, recoverable for a retention window before any hard purge. +- The retention window is **signed into the delete manifest at delete time**, not server-configured, so a hostile server cannot accelerate or delay a user-configured window (see [Asset Organization — Recycling](/design/organization/#recycling)). +- Only after the window expires is the underlying blob hard-purged. A `trash-restore` action issued before expiry returns the asset to the live set and appends another provenance record — recovery is itself audited. + +This is the [trash soft-delete recovery path](/design/cryptography/#failure-modes-and-recovery) and gives a reversal window for both buggy and erroneous deletes. + +### Every transition is auditable + +Each lifecycle operation emits a [provenance record](/design/cryptography/#provenance-of-library-modifications) — timestamp, device, client version, and action — anchored by the signed manifest. The chain is **append-only** (see [Threat Model — Provenance Immutability Rules](/design/threat-model/#provenance-immutability-rules)): even an attacker holding every current key cannot rewrite a past record. This audit trail is what lets an operator distinguish a legitimate delete from a malicious or bug-induced one after the fact. + +### Federated peers + +A lifecycle operation arriving from a [federated](/design/federation/) peer is subject to the same `verify_asset` check plus the server's structural envelope check; peer-asserted ordering and timestamps are never trusted for authorization. Peer attempts at [stale revival](/design/import-synchronization/#stale-revival-detection) — submitting an old-but-validly-signed manifest to resurrect a deleted asset — are caught by the `prior_provenance_hash` chain check and quarantined. diff --git a/capsule-docs/src/content/docs/design/backup-recovery.md b/capsule-docs/src/content/docs/design/backup-recovery.md new file mode 100644 index 0000000..24451da --- /dev/null +++ b/capsule-docs/src/content/docs/design/backup-recovery.md @@ -0,0 +1,77 @@ +--- +title: Backup and Recovery +description: How Capsule backs up libraries and recovers them after device or key loss +--- + +Capsule treats loss of data — and loss of the keys that decrypt it — as a first-class failure mode to design against. Recovery rests on a single rule: +holding the recovery secret must restore every asset, even after every device is lost. This document consolidates the artifacts and mechanisms that uphold it. + +Two distinct things are called a "backup" here, and they are kept separate on purpose: + +- The **encrypted backup artifact** — a portable, encrypted export of a library's assets. +- The **master-key escrow** — a small server-side blob that lets a passphrase reconstruct the key hierarchy. + +## Backup Artifact + +A backup is a single self-describing, versioned, **streamable** archive containing everything needed to restore a library's assets. It is itself encrypted and kept independent of the device key hierarchy, so recovery does not depend on reconstructing MLS ratchet state (see [Cryptography](/design/cryptography/#failure-modes-and-recovery)). + +A backup is an export artifact — not part of the live library or the server blob store — and may be stored locally or on external storage such as hard drives or cloud storage. It is used to restore assets after data loss or when setting up a new device. The format is versioned to allow future improvements and changes without breaking older backups. + +### Container Format + +The container is an **uncompressed POSIX tar** with deterministic entry ordering and a top-level signed integrity manifest: + +- **Uncompressed.** Asset ciphertext is incompressible (it's the output of [AES-256-GCM-STREAM](/design/cryptography/#bulk-aead)); compressing it buys nothing and adds CPU cost. Metadata blobs are likewise encrypted before they hit the archive, so the same applies. +- **Streamable.** Tar is append-friendly and has no central directory, so a backup of arbitrary size can be written and read end-to-end without seeking — important when exporting a terabyte-scale library to spinning rust or an external drive. +- **Deterministic ordering.** Entries are written in sorted order by `(album_id, asset_id, blob_role)`, so two exports of the same logical content produce byte-identical archives. This lets the integrity manifest's signature verify across re-exports. +- **Top-level integrity manifest.** The first entry is `MANIFEST.cbor` — a CBOR document listing every entry's path, [content hash](/design/cryptography/#primitives-inventory), declared size, and the exporting device's identity. The manifest is authenticated **two ways**: + - An **HMAC** keyed by the backup's wrap key (derived from the user passphrase via the [password-based KDF](/design/cryptography/#password-based-kdf)) catches truncation, reordering, and corruption *before* any decrypt is attempted. + - A **hybrid Ed25519 + ML-DSA-65 signature** from the exporting device's [DSK](/design/cryptography/#device-keys) — the same [signature scheme](/design/cryptography/#signature-scheme) used for asset manifests. The signature defeats a symmetric-key attacker who could otherwise re-HMAC after tampering: an attacker who steals the wrap key can re-HMAC but cannot forge the device signature. + Both checks must pass before restore proceeds. The signing device must be present in the user's [device directory](/design/cryptography/#per-user-device-coordination) at restore time; an exporter device that was later revoked is rejected. +- **Versioned.** A `VERSION` entry pins the artifact format version, `crypto_suite_id`, and `min_protocol_version` per [Versioning](/design/versioning/) and [Cryptography — Versioning Identifiers](/design/cryptography/#versioning-identifiers). Older backup artifacts remain restorable by newer Capsule versions; an artifact whose `crypto_suite_id` is not in the current inventory is rejected at restore (per [Threat Model — Schema Evolution](/design/threat-model/#schema-evolution-and-field-grammar)). + +ZIP was considered and rejected: its central-directory-at-end makes streaming writes awkward at terabyte scale, ZIP64 tooling support is inconsistent, and there is no compression benefit to gain from ZIP-internal deflate. + +## Master-Key Escrow + +The account master key is the single backed-up root of the key hierarchy (see [Cryptography](/design/cryptography/#key-management)). It is escrowed server-side so a user holding only their recovery secret can reconstruct it: + +- Wrap the account master key with a user-chosen high-entropy passphrase or a randomly generated 48+ bit recovery code. +- Derive the wrapping key with the [password-based KDF](/design/cryptography/#password-based-kdf). Store the wrapped blob server-side. +- If you can run enclaves (SGX/Nitro/SEV-SNP), do Signal's SVR trick: rate-limit PIN attempts inside the enclave so a weak PIN is still safe. Without enclaves, require a real passphrase or recovery code — don't let users pick 4-digit PINs. + +## Recovery Mechanisms + +Two recovery mechanisms ship by default; a third is available opt-in for users who want extra redundancy without compromising the default's simplicity. + +### Default mechanisms + +- **Recovery passphrase / BIP39-style seed** shown at setup; the user prints it or stores it in a password manager. It unwraps the master-key escrow above. +- **Cross-device recovery** — any existing signed-in device can re-bootstrap a new one over a verified channel. + +(We need at least two for redundancy; the third below is opt-in to keep the default flow simple.) + +### Opt-in: Shamir Secret Sharing + +Users who want to spread recovery across trusted parties or storage locations can enable **Shamir Secret Sharing** of the recovery seed. The default scheme is **2-of-3**: + +- The recovery seed (the same one that unwraps the master-key escrow) is split into 3 shares; any 2 reconstruct the seed; 1 alone reveals nothing. +- Each share is itself wrapped with a per-share passphrase via the [password-based KDF](/design/cryptography/#password-based-kdf), so storing a share on a less-trusted medium (cloud drive, second device, trusted family member) is safer. +- Reconstruction happens fully client-side. Capsule's server never sees more than one share at a time and never sees a reconstructed seed. +- Custom `m`-of-`n` (e.g. 3-of-5 for users who want broader distribution) is supported but not the default. + +This is the social-recovery escape hatch — useful for users who would otherwise lose access from a single forgotten passphrase plus a single dead device. + +## Backup Verification + +A restore that overwrites live state silently is the worst foot-gun a backup system can ship. Capsule therefore makes **dry-run the default**: a `restore` invocation runs in dry-run mode unless the user passes an explicit `--commit` flag (or its UI equivalent: a confirm-with-typed-phrase dialog after the dry-run report is shown). The mode hierarchy is: + +- **Preview mode (always safe).** Verify the shape of your content makes sense — counts, sizes, asset titles where readable. No decrypt, no write. +- **Dry-run mode (default for `restore`).** Verify everything can be decrypted, matches its hashes, and (as a sanity check) that images and videos decode properly in the [sandboxed decoder](/design/clients/#sandboxed-decoder). Compute the diff against the current live library: what would be added, what would conflict, what would be skipped as already present. No write. +- **Signature-chain verification.** Every [asset manifest](/design/cryptography/#provenance-and-signed-manifest) verifies against the published [device directory](/design/cryptography/#per-user-device-coordination), and every device certificate chains to a user IK. The MANIFEST.cbor itself must verify both HMAC and exporter signature (above). Any break is flagged and the restore is refused. +- **AMK completeness check.** Confirm every `amk_version` referenced by an asset is present in the backup, so no asset is silently unrecoverable. +- **Commit (only with explicit consent).** The user reviews the dry-run report and explicitly commits. Even at commit, the restore obeys the [stale-revival defense](/design/import-synchronization/#stale-revival-detection): a restored manifest whose `prior_provenance_hash` conflicts with the live library's current chain head goes to the [quarantine surface](/design/threat-model/#quarantine-surfaces) and the user resolves it explicitly. The interaction between backup restore and the stale-revival defense is flagged as an [open question](/design/threat-model/#open-questions) — the resolution will land here before the docs ship. + +## Backup Provenance + +The MANIFEST.cbor carries the exporter's device id, the export timestamp, the source library version, the `crypto_suite_id` at export time, and a list of every provenance-chain head per asset included in the backup. The MANIFEST is itself a [provenance record](/design/cryptography/#provenance-of-library-modifications) at the library level: who exported, when, from what device. A successful restore re-injects each per-asset provenance chain into the restored library, so the audit trail survives the round-trip — a restored library knows it was restored, from when, by whom. diff --git a/capsule-docs/src/content/docs/design/clients.md b/capsule-docs/src/content/docs/design/clients.md new file mode 100644 index 0000000..9668ed1 --- /dev/null +++ b/capsule-docs/src/content/docs/design/clients.md @@ -0,0 +1,49 @@ +--- +title: Clients for Capsule +description: An overview of the core architectural decisions for clients in Capsule. +--- + +This document outlines the core architectural decisions for clients in Capsule, including the rationale behind them and how they contribute to the overall design of the system. + +## Design Priorities + +- **Native:** We prioritize native implementations for each platform to ensure familiar usability and enable platform-specific optimizations. +- **Minimal divergence:** While we carefully version everything where applicable and minimize data that acts as sources of truth, we heavily centralize all the heavy and complex logic in `capsule-core` and `capsule-sdk`. Any client-specific logic is generally minimal and focused on display. + +## Platform Limitations + +Given the quantity of distinct native clients (each having distinct portions of platform-specific logic), certain features are limited to certain platforms. + +## Client Validation Duties + +Clients are not trusted to enforce their own correctness — but they are responsible for **refusing to apply** state they cannot validate. The full client-side validation checklist is owned by [Threat Model — Client-Side Validation Invariants](/design/threat-model/#client-side-validation-invariants); the duties are summarized here so client implementations have a single in-doc reference for what they must do: + +- **Run [`verify_asset`](/design/cryptography/#write-authorization)** on every received asset manifest. Quarantine on failure; never silent-drop, never silent-accept. +- **Refuse forward-version writes.** Reject any incoming `sidecar_schema`, `crypto_suite_id`, or `protocol_version` above the client's max known. Reading is allowed only in read-only mode if explicitly opted into. +- **Enforce the protocol handshake.** Send `X-Capsule-Protocol` on every request; honor `426 Upgrade Required` by stopping the request, never by silently downgrading. +- **Check the provenance chain.** Maintain a local `latest_provenance_hash` per asset; refuse to apply a manifest whose `prior_provenance_hash` is behind it. See [Import & Sync — Stale-Revival Detection](/design/import-synchronization/#stale-revival-detection). +- **Reject unknown closed-enum values.** `action`, `content_type`, `DerivativeManifest.role`, and `gps.source` are closed per protocol version; unknown values are structural errors, not "future to ignore." +- **Preserve unknown CBOR keys within a known schema** (Postel's Law) but never act on them. +- **Decode remote-origin asset bytes only in the [Sandboxed Decoder](#sandboxed-decoder).** +- **Never invoke `revoke_all_sessions` without master-key proof.** A pure session-token revoke-all is a [forbidden client behavior](/design/threat-model/#forbidden-client-behaviors). +- **Honor the [forbidden behaviors checklist](/design/threat-model/#forbidden-client-behaviors).** A client that backdates timestamps, strips unknown sidecar fields, overwrites provenance, or signs for an epoch it does not hold is *buggy by definition*. + +Centralizing the validation logic in `capsule-core` (per [Design Priorities](#design-priorities)) ensures each native client gets the same enforcement; the wrapper layer that issues UI surfaces for quarantine and protocol-mismatch errors is the platform-specific portion. + +## Sandboxed Decoder + +Capsule's server never holds plaintext, so server-side image/video decoding is impossible by design. **Decoding happens on the client**, and the decode path is the largest remaining attack surface — image-format CVEs (libjpeg, libwebp, libheif, libavif have all shipped exploits in recent years) reach the client directly with attacker-controlled bytes. + +The defense is structural isolation: + +- **Every remote-origin asset is decoded in a separate OS process or a WASM sandbox** that has no filesystem write access, no network access, and no shared memory with the host app process. +- The sandbox communicates with the host via a narrow IPC channel that exchanges only the produced pixel buffer (or an error code) — not arbitrary structured data. +- **The sandbox is allowed to crash.** A decoder CVE that triggers a segfault kills the sandbox, not the app. The host process logs the crash, surfaces "asset failed to decode," and continues. The sandbox is restarted on the next decode request. +- **Local-origin assets** (this device was the uploader and the bytes have never left local storage) bypass the sandbox at the user's option — they have not crossed a trust boundary. By default the sandbox is still used uniformly, because the modest perf cost is worth the categorical guarantee. +- A media file that fails to decode after N retries in the sandbox is flagged in the UI as "unreadable on this device" rather than removed from the library — the bytes are preserved (per the recovery-first principle in [Filesystem](/design/filesystem/#repair)) for inspection on another device. + +This is the canonical declaration of the sandbox; [Federation — Security Against Malicious Files](/design/federation/#security-against-malicious-files) references it for the federated-asset case, and [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) references it for dry-run decode sanity checks. + +## Additional Comments + +- Compose Multiplatform was heavily considered initially for cross-platform logic but since most format processing is Rust and Kotlin/Native continues to have multiple limitations, we decided to stick to Rust-first approach. diff --git a/capsule-docs/src/content/docs/design/cryptography.md b/capsule-docs/src/content/docs/design/cryptography.md new file mode 100644 index 0000000..9c683fb --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography.md @@ -0,0 +1,619 @@ +--- +title: Cryptography +description: Details of the key cryptography primitives for building Capsule on +--- + +## Pillars of Cryptography + +*These are key aspects for those out of the loop.* + +| Pillar | The Core Question | Primary Cryptographic / Security Tool | +| ------------------- | ------------------------------ | ------------------------------------------- | +| **Confidentiality** | Can anyone else read this? | Symmetric/Asymmetric Encryption (AES, RSA) | +| **Integrity** | Has this been tampered with? | Hashing (SHA-256), MACs | +| **Availability** | Can I access this right now? | Redundancy, Backups, DDoS Protection | +| **Authentication** | Are you who you say you are? | Digital Certificates, Passwords, Biometrics | +| **Authorization** | Are you allowed to do this? | Access Tokens, RBAC, ACLs | +| **Non-repudiation** | Can you deny doing this later? | Digital Signatures, Secure Audit Logs | + +## E2E Security Model + +E2E security model has been prevalent for the past decade but applying the same restrictions on an asset-heavy application that aims to be performant and robust is not as trivial. This document outlines the high-level details of balancing security and capability trade-offs. + +We need to encrypt assets (data) along with their metadata in a way that respects the hierarchy of accounts, albums, assets, and permissions. Think of them in layers: + +- Identity: see [Signature Scheme](#signature-scheme) per device, cross-signed by the user master identity. See [Key Management](#key-management) for details. +- Group membership: One MLS group per shared album; each device is a leaf. See [Group Membership](#group-membership) for details. +- Asset encryption: [bulk AEAD](#bulk-aead) per file, keyed via the [KDF](#key-derivation) from per-album keys. See [Authenticated Asset Encryption](#authenticated-asset-encryption) for details. +- CBOR Metadata encryption: [bulk AEAD](#bulk-aead) per metadata blob, keyed via the [KDF](#key-derivation) from per-album keys. (We do not have a STREAM construction since it's typically fetched all together.) See [Metadata Encryption](#metadata-encryption) for details. + +## Primitives Inventory + +This table is **the single source of truth** for every cryptographic primitive Capsule +uses. Other docs (and the rest of this doc) reference these by anchor — they never +restate the choice. Swapping a primitive is a single-row edit here plus its dedicated +section below. + +| Primitive | Choice | Used for | +| ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------ | +| [Cryptographic hash](#cryptographic-hash) | SHA-256 | Content addressing, integrity verification | +| [Key derivation (KDF)](#key-derivation) | HKDF-SHA512 | Per-file and per-album key derivation | +| [Password-based KDF](#password-based-kdf) | Argon2id (device-tier-aware parameters) | Master-key escrow unwrap, backup unwrap | +| [Bulk AEAD](#bulk-aead) | AES-256-GCM with [STREAM](#stream-construction) | Asset and metadata ciphertext | +| [MLS control AEAD](#mls-control-aead) | ChaCha20-Poly1305 | Inherited from the [MLS ciphersuite](#mls-ciphersuite) | +| [Signature scheme](#signature-scheme) | Hybrid Ed25519 + ML-DSA-65 | Identity, device, asset manifest, write tier | +| [KEM](#kem) | X-Wing (X25519 + ML-KEM-768) | MLS HPKE | +| [MLS ciphersuite](#mls-ciphersuite) | `MLS_256_XWING_CHACHA20POLY1305_SHA256_Ed25519` (0x004D) | Group key management | +| [Randomness](#randomness) | OS CSPRNG (`getrandom`) | All keys, salts, nonces | +| [Transport](#transport-security) | TLS 1.3 with hybrid X25519+ML-KEM | Client-server, server-server | + +The per-primitive sections below carry the rationale; the table is the at-a-glance +reference. + +## Versioning Identifiers + +A faulty, malicious, or version-mismatched client could damage data by writing +under a primitive set the receiving side does not implement (see +[Threat Model](/design/threat-model/)). Three identifiers — owned here, in +[Versioning](/design/versioning/), and in [Metadata](/design/metadata/) — bind +each on-disk and on-wire structure to a specific set of primitives or schema so +that mismatches **fail closed** rather than corrupting state: + +| Identifier | Type | Declared in | Carried in | +| ------------------ | ------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `crypto_suite_id` | `u16` | this doc | every [AssetManifest](#provenance-and-signed-manifest), every [metadata blob](#metadata-encryption), the backup [MANIFEST.cbor](/design/backup-recovery/) | +| `protocol_version` | string `YYYY-MM-DD` | [Versioning](/design/versioning/) | every AssetManifest, every wire request (see [Threat Model — Protocol Handshake](/design/threat-model/)), the album's MLS pin | +| `sidecar_schema` | `u16` | [Metadata — Sidecar Schema](/design/metadata/#sidecar-schema-v1) | CBOR sidecar field 0 (readable before parsing the rest) | + +`crypto_suite_id = 0x0001` denotes exactly the [Primitives Inventory](#primitives-inventory) above. Retiring any primitive (a broken SHA-256, a deprecated AEAD) **does not edit the row** — it adds a new row and a new suite id. An old AssetManifest carrying `0x0001` keeps verifying against the original row forever; new writes use the new suite id. This is the single-doc edit the inventory promises, generalized to the bundle. + +The signatures on the manifest cover `crypto_suite_id` and `protocol_version`, so a downgrade-attempt (re-signing an existing manifest under a weaker suite) cannot be silently produced. + +## Key Cryptographic Primitives + +### Cryptographic Hash + +We use SHA-256 (SHA-2) for content hashing, addressing, and integrity verification — everywhere, with no second hash algorithm. It is the most prevalent, audited, NIST-approved standard, and is hardware-accelerated on most modern platforms. + +- Using exactly one hash means one less algorithm and implementation to maintain and audit. +- We reuse SHA-256 values across layers rather than recomputing them: the ciphertext hash used for content-addressing (see [Authenticated Asset Encryption](#authenticated-asset-encryption)) is the same value the [signed manifest](#provenance-and-signed-manifest) commits to, and the same value the upload protocol declares and verifies. +- SHA-3 was rejected for weaker hardware support; BLAKE3's parallelism is attractive but unneeded given simultaneous uploads, and its keyed mode is redundant against our already-authenticated encryption. + +### Key Derivation + +We use **HKDF-SHA512** for per-file and per-album key derivation. The wider 512-bit hash matches the post-quantum posture of the rest of the stack: under Grover's algorithm a 256-bit hash collapses to ~128-bit PQ security, while SHA-512 retains ~256-bit. KDFs are not on the hot path, so the cost difference is negligible. SHA-256 stays for *content addressing* — a different security goal where universal hardware acceleration matters more than PQ margin. + +Every derivation includes a versioned `info` string (e.g. `"asset-file/v1"`, `"albums/v1"`) and a scope-unique salt (e.g. `album_id`, `file_id`) so a future KDF change can land alongside v1 derivations without a flag day. + +### Password-based KDF + +For password-based key derivation we use **Argon2id** with device-tier-aware parameters. Password-based derivation only runs at account recovery and device bootstrap — never on a hot path — so the cost is acceptable even on constrained hardware. Parameters are recorded inside the wrapped-blob [construction](#versioning) so they can be raised later without a flag day. + +| Device tier | Memory | Iterations (`t`) | Parallelism (`p`) | When applies | +| ----------------------- | ------- | ---------------- | ----------------- | ---------------------------------------- | +| Low-RAM (≤ 2 GiB total) | 128 MiB | 3 | 1 | Entry-level Android, low-end embedded | +| Normal mobile / laptop | 256 MiB | 3 | 1 | Default for phones and laptops | +| Desktop (≥ 8 GiB) | 512 MiB | 4 | 1 | Wrapping new escrow blobs from a desktop | + +The salt is always a 32-byte CSPRNG draw. The tier chosen at *wrap* time is recorded +in the blob; *unwrap* respects whatever tier was recorded, so a desktop-wrapped blob +unwraps correctly on a phone (slowly) and vice versa. + +### Bulk AEAD + +For bulk data and metadata encryption we use **AES-256-GCM**. Combined with the [STREAM construction](#stream-construction) it covers asset ciphertext; standalone AES-256-GCM (fresh random nonce per blob) covers CBOR metadata blobs. + +- AES hardware acceleration (Intel AES-NI, ARMv8 AES extensions, Apple Silicon dedicated AES units) is universal on every platform Capsule targets, so AEAD is never the bottleneck. +- We standardize on AES-GCM rather than ChaCha20-Poly1305 for stack consistency with the [SHA-2 family](#cryptographic-hash) and to keep one bulk-AEAD choice across the codebase. MLS retains ChaCha20-Poly1305 from its [ciphersuite spec](#mls-ciphersuite); that's a separate layer. +- Nonce misuse is the structural risk of GCM. We close it two ways: every file uses a freshly-derived per-file key (so the STREAM counter can safely start at zero), and standalone metadata blobs each draw a fresh CSPRNG nonce. + +### MLS Control AEAD + +For MLS control traffic we use **ChaCha20-Poly1305**, inherited from the [MLS ciphersuite](#mls-ciphersuite). This protects MLS's own membership and key messages, not user data; user data uses the [bulk AEAD](#bulk-aead) above. + +### Signature Scheme + +We use **hybrid Ed25519 + ML-DSA-65** for identity, device, asset manifest, and write-tier signatures. Both halves must verify before a peer is accepted. The classical and post-quantum halves are independent, so neither algorithm being broken compromises authentication. MLS LeafNode signatures stay Ed25519-only (pinned by the ciphersuite); the ML-DSA half lives at the identity layer — see [Group Membership](#group-membership). + +### KEM + +We use **X-Wing (X25519 + ML-KEM-768)**. This is the KEM defined by the [MLS ciphersuite](#mls-ciphersuite) we adopt. + +### MLS Ciphersuite + +We use **`MLS_256_XWING_CHACHA20POLY1305_SHA256_Ed25519`** (OpenMLS ciphersuite 0x004D) — MLS (RFC 9420) with the PQ ciphersuites from `draft-ietf-mls-pq-ciphersuites`. See [Group Membership](#group-membership) for how the ciphersuite's choices (X-Wing KEM, ChaCha20-Poly1305 control AEAD, SHA-256 hash, Ed25519 leaf sigs) interact with the identity layer. + +### Randomness + +All keys, salts, and nonces are drawn from the operating system CSPRNG (`getrandom`). We never seed our own PRNG. + +Nonces are never hand-rolled. The [STREAM construction](#stream-construction) derives per-chunk nonces deterministically; standalone [bulk-AEAD](#bulk-aead) metadata blobs each receive a fresh random nonce. + +## Key Management + +Capsule's keys form a single hierarchy with one backed-up root: + +- The **account master key** is the only key that is escrowed/backed up. It does not encrypt assets directly. Its job is to (1) wrap the per-device identity private keys and (2) anchor the encrypted backup that escrows album keys. +- **Device keys** are hardware-bound, non-exportable, and therefore disposable — a device is re-bootstrapped from the master key rather than recovered. +- **Album keys** (AMKs) are random per-epoch keys ledgered in MLS, escrowed both in the master-key backup and in the [Owner Group](#owner-group-keys-ogks). + +The guiding rule is to **keep the backup path independent of the MLS ratchet** so that losing all devices, but holding the recovery passphrase, still restores every photo. Do not be like Matrix, where undecryptable content is a routine failure mode. See [Failure Modes and Recovery](#failure-modes-and-recovery). + +### Key Generation + +All key generation happens client-side, from the OS CSPRNG. We use a PQ-safe ("post-quantum") hybrid scheme throughout: classical + PQ primitives combined so that breaking either one alone does not break security. + +#### User Identity Keys (User IKs) + +User IKs are generated once per user ever, and live forever (or until account compromise). This is the root of trust and signs everything below it. It is always verified out-of-band or via safety numbers. + +A User IK is a **hybrid Ed25519 + ML-DSA-65** signing keypair generated entirely on the client at account creation. The private halves are wrapped under the [account master key](#registered-accounts) and never leave the client in the clear; the public halves are published in the signed [device directory](#per-user-device-coordination). + +It can be revoked for a global account reset (irreversible, non-recoverable nuclear operation). Revocation is published as a separate revocation certificate, hybrid-signed by the IK itself, to a well-known location so clients can check for it. + +#### Device Keys + +Using the [user IK](#user-identity-keys-user-iks), each device's keys are cross-signed into the [device directory](#per-user-device-coordination): + +1. **DSK** (Device Signing Key): hybrid **Ed25519 + ML-DSA-65**. +2. **DEK** (Device Encryption Key): hybrid **X25519 + ML-KEM-768**. + +Both are signed by the IK (hybrid signature). Device private keys are **generated inside and never leave hardware** — Secure Enclave (iOS), StrongBox/Keystore (Android), TPM (desktop) — and are non-exportable. Because they cannot be backed up, devices are treated as disposable: a lost device is simply removed and a new one re-bootstrapped from the master key. + +A device key can be revoked without affecting the user's identity or other devices. This allows for per-device access control and recovery from lost devices without a full account reset. Revocation is done by signing a revocation statement with the IK and publishing it to a well-known location. The server then refuses to deliver new key wraps to that device, and remaining devices rotate any group keys the revoked device had access to. + +#### Owner Group Keys (OGKs) + +Since assets' `owner_id` maps to a set of users, treat each owner as an MLS group. + +- **Type:** Symmetric AES-256 root key of an MLS group whose members are the owner's user set. +- **Purpose:** A recovery/escrow layer. The OGK does **not** wrap individual file keys. Instead, it escrows every album's [AMK versions](#album-master-keys-amks), so any current owner member can always recover every album key — and therefore every asset — independent of album membership. This avoids double-wrapping each file while still guaranteeing the owner never loses access. +- **Epoch:** Bumps on any owner-set change. Every member's client commits to MLS, producing a new OGK; the server stores the welcome/commit messages. +- **Revocation:** Remove a user from the owner set → MLS Remove proposal → new epoch → the removed user's device can no longer derive future OGKs and is dropped from future AMK escrow. + +#### Album Master Keys (AMKs) + +Each album is its own MLS group. Members = users with any permission on the album. + +- **Type:** Random 32-byte symmetric key, minted per epoch. AMKs are *not* derived from MLS epoch state (which is complicated to handle at edge cases) — they are random keys distributed *over* MLS application messages and ledgered. + +Capsule separates **secrecy** (enforced by encryption) from **authorization** (enforced by signatures). We use one content key plus two signing capabilities, to minimize keys which can be possibly leaked: + +- **`AMK` — the content key.** Read access. MLS delivers it to *all* album members. Holding it lets you decrypt; not holding it means you cannot. +- **Write capability — a per-epoch write-tier signing keypair.** Distributed via MLS to writers only. Used to sign [asset manifests](#provenance-and-signed-manifest). It rotates with the AMK epoch, so a removed writer cannot sign for future epochs. This is authorization, not secrecy. See [Write Authorization](#write-authorization). +- **Admin capability — an admin-tier signing keypair.** Distributed to admins only; used to sign MLS membership commits. + +Epoch bump triggers: member add/remove, permission change, scheduled rotation (e.g., every 30 days for long-lived albums). + +#### Write Authorization + +A device signature on an [asset manifest](#provenance-and-signed-manifest) proves *which device* produced an asset — but not that the device was *authorized to write* to that album at that time. The server is **not trusted for authorization**: it could replay, reorder, or surface an asset signed by a reader-only device, a removed writer, or a device acting outside its write window. A bug could also produce such an asset. Both must be rejected robustly, with the verification logic kept small enough to be hard to get wrong. + +- **Epoch-bound write proof.** Every asset manifest carries, in addition to the device DSK signature, a signature under the album's **per-epoch write-tier signing key**. Only writers at that epoch hold that key. The manifest's `amk_version` identifies the epoch. +- **Authorization authority is MLS history, not the server.** The client verifies the write-tier signature against the write-tier public key it learned for that epoch *from MLS* — the album's MLS commit chain (admin-signed) is the sole authority on who could write when. A server-asserted authorization is never sufficient. +- **What this accepts vs. rejects.** An asset signed by a writer who was *later* removed is still acknowledged — it was valid when written, and nothing after removal un-seeds it. An asset signed at an epoch where the signer lacked write capability is **rejected**: an attacker (or a buggy/colluding server) cannot produce a valid write-tier signature for an epoch they were not a writer in. +- **Single verification chokepoint.** All of this lives in one `verify_asset(manifest, ciphertext, mls_state)` function in `capsule-core/crypto` — the only path by which a client acknowledges an asset. Per [contract-driven development](#implementation), it ships with exhaustive negative test cases: reader-signed, removed-writer, wrong-epoch, forged certificate chain, replayed manifest. +- **Defensive failure handling.** A verification failure is *never* silently dropped and *never* silently accepted. The asset is quarantined and surfaced in the [provenance/audit trail](#provenance-of-library-modifications) so an operator can distinguish a bug from an attack after the fact. This bounds the blast radius of an implementation bug. +- **Downgrade-resistant.** Both signatures cover `crypto_suite_id`, `protocol_version`, and `prior_provenance_hash`. A manifest cannot be silently re-signed under a weaker suite or back-dated onto a different chain position without breaking either signature; an attempt to do so is rejected at the same `verify_asset` chokepoint. +- **Timestamp grammar.** Servers refuse a manifest whose `timestamp` is outside **±30 days of server clock** (configurable). The cryptography proves "this asset was signed by a device that held epoch-N write capability"; the time window prevents a buggy or hostile client from injecting timestamps decades in the past or future that would silently distort the audit trail. The grammar lives in [Threat Model](/design/threat-model/) and is mirrored in [Server-Side Validation Invariants](/design/threat-model/). + +#### Forward Secrecy & Post-Compromise Security + +The MLS-based scheme provides forward secrecy (FS) and post-compromise security (PCS). The specific implementation we follow is MLS (RFC 9420) with the PQ ciphersuites from `draft-ietf-mls-pq-ciphersuites`. + +**Clarification:** True FS on data-at-rest is a contradiction (the ciphertext persists). What MLS gives you at each epoch bump is: a compromise of the current epoch's keys doesn't help an attacker read past epochs, and removed members can't read future epochs. That's the practical security property you want. + +For data-in-transit between clients and server (uploads, key-bundle fetches), use TLS 1.3 with ephemeral ECDHE — that's where session-level FS lives. See [Transport Security](#transport-security). + +#### Resisting Key Loss + +Loss of keys — and thus loss of data — is a first-class failure mode. The master key, not any MLS ratchet state, is the single backed-up root. All safeguards and the redundant restore paths are consolidated in [Failure Modes and Recovery](#failure-modes-and-recovery). + +#### Key Chain + +The account master key does **not** derive album keys — albums are MLS groups with random AMKs. The master key's role is to wrap device identity keys and to anchor the encrypted backup that escrows AMKs: + +```plaintext +account_master_key (backed up — see Resisting Key Loss) + ├─ wraps device identity private keys (IK / DSK / DEK private halves) + └─ anchors the encrypted backup that escrows: + AMK_v{n} (random 32 bytes, per album, minted per MLS epoch) + └─ HKDF-SHA512(ikm=AMK_v{n}, salt=file_id, info="asset-file/v1") → 32-byte AES file key + └─ AES-256-GCM-STREAM +``` + +Important details on construction: + +- Always include a version string in `info` so you can rotate the KDF later. +- Salt with something unique per scope (`album_id`, `file_id`) — don't reuse salts across scopes. +- The 512-bit KDF output is truncated to 32 bytes (256-bits) for the AES-256 file key. See [Key Derivation](#key-derivation) for the SHA-512 rationale. +- Each file gets a fresh derived key, so the STREAM nonce can safely start at zero per file. + +Photo/media keys specifically: separate the "MLS/ratchet" world from "data at rest." Per-album AMKs are escrowed in the server-side encrypted backup (see [Backup and Recovery](/design/backup-recovery/)) and the [OGK](#owner-group-keys-ogks) — not derived from ratchet state — so losing all devices but holding the recovery passphrase still restores photos. Ratchet keys are expected to be ephemeral. + +### Identity-based Key Derivation + +Since all assets are encrypted via keys ultimately recoverable from an account's master key, we encapsulate user identity keys differently depending on the [account type](/design/authentication/#account-types). + +#### Registered accounts + +Most users have their own unique master key. It is **generated client-side** at account creation from the OS CSPRNG. The server never holds the naked master key. Each device stores its own copy wrapped under that device's DEK; a new device obtains the master key either via [cross-device recovery](/design/backup-recovery/#recovery-mechanisms) or by unwrapping the [encrypted server-side backup](/design/backup-recovery/#master-key-escrow) with the recovery passphrase. + +#### Delegated/Sponsored accounts + +A sponsored account is anchored under the sponsor's master key but holds its own encryption keys. The mechanism — and the only sound way to revoke — is: + +1. **Per-sponsoree KEK.** When a sponsor creates a sponsored account, the sponsor draws a fresh 32-byte **sponsoree KEK** from the CSPRNG (it is *not* derived from the master key — a deterministic derivation would be reproducible by the sponsor at any future point, defeating revocation). The KEK is wrapped under the sponsor's master key and stored in the sponsor's escrowed hierarchy. +2. **Sponsoree key material.** The sponsoree's own identity, device, and album keys are generated normally (see the rest of this section). Their private halves are wrapped under the sponsoree KEK rather than directly under the sponsor's master key, so the sponsor can re-wrap or destroy a single sponsoree's keys without touching its own or the other sponsorees'. +3. **Shared-asset access.** Sponsorees gain access to a sponsor's shared albums via ordinary MLS membership (the sponsoree's devices are added as MLS leaves in the sponsor's album groups). The KEK is *not* a content key — it only wraps the sponsoree's private keys. +4. **Revocation.** Revocation is a three-step operation, all signed by the sponsor's IK: + - **Rotate** the sponsoree KEK: draw a new KEK, re-wrap surviving sponsorees if any, drop the old KEK. + - **Publish** an IK-signed revocation certificate naming the revoked sponsoree's identity and the timestamp. + - **Remove** the revoked sponsoree's devices from every MLS group they were a member of (album groups, owner group) via the standard [MLS Remove](#membership-operations) flow, bumping AMK epochs. + +The sponsor's *own* master key is untouched by any sponsoree revocation. The published revocation certificate is what clients and [federated](/design/federation/) peers check to refuse traffic from a revoked sponsoree. + +#### Non-registered accounts + +**Reading.** Since key management operates at the user level, userless share links are handled distinctly. We encapsulate the decryption keys around the secret stored in the link. The owner can optionally attach a password, in which case the [password-based KDF](#password-based-kdf) adds a second encapsulation layer on top of the link secret. + +**Writing.** Writing is **not supported** for non-registered accounts. Every uploaded asset must be encrypted under an album key and signed with a write-tier key; a non-registered user has neither a device encryption key (DEK) nor a place in any album's MLS group, so it cannot produce a valid [asset manifest](#provenance-and-signed-manifest). Supporting guest uploads would require an ephemeral link-scoped key hierarchy; this is a deliberate non-goal to keep the design simple. + +### Key Rotation and Revocation + +- **Master key rotation.** The master key can be replaced at will. Rotation re-wraps the key hierarchy (device-key wraps and the AMK escrow blob) under the new master key; the old master key is retained only long enough to complete the re-wrap, then discarded. Existing signed-in sessions hold device and derived keys directly and are **unaffected** — they keep working through the rotation. +- **Device revocation.** Handled via the [device key](#device-keys) revocation certificate plus an MLS `Remove` for that device's leaves (see [Membership operations](#membership-operations)). +- **Album-member revocation.** Handled by an MLS `Remove` and an AMK epoch bump (see [Membership operations](#membership-operations)). + +## Group Membership + +Capsule's group layer is the [MLS ciphersuite](#mls-ciphersuite) from the inventory. The ciphersuite's choice of [ChaCha20-Poly1305](#mls-control-aead) (rather than [AES-GCM](#bulk-aead) used for user data) is acceptable because: + +- It only protects MLS's own control messages (kilobytes of membership and key data, not your photos). +- ChaCha20-Poly1305 is one of the two most-audited AEADs in existence. +- The alternative is a classical-only MLS ciphersuite plus a hand-rolled PQ retrofit — exactly the custom crypto we're trying to avoid. + +One follow-on: MLS binds LeafNode signatures to Ed25519 in this suite, so the ML-DSA half of the [hybrid signature scheme](#signature-scheme) lives at the **application layer** — identity certificates sign the Ed25519 MLS key with both Ed25519 and ML-DSA, and peers verify both before accepting a device into a group. This keeps MLS pure while preserving PQ authentication end-to-end. + +### Membership operations + +**Add user Bob to album:** + +1. Fetch Bob's device directory (list of his devices with KeyPackages published to the server) +2. MLS `Add` proposal + `Commit` adding all Bob's devices as leaves +3. The `Welcome` message to Bob's devices carries current `AMK_v_current` as a Welcome extension +4. If full history is desired (usually yes for shared albums), also include `AMK_v1..AMK_{current-1}` in the Welcome — Bob's devices can now decrypt everything +5. If only post-join history, omit older AMKs — Bob sees only future photos + +**Remove user Charlie:** + +1. MLS `Remove` proposal + `Commit` removing all Charlie's devices +2. MLS advances to a new epoch; Charlie's devices can no longer read MLS traffic +3. Committer generates fresh `AMK_v{current+1}` and broadcasts via MLS to remaining members +4. All future photo uploads use `AMK_v{current+1}` +5. Charlie retains `AMK_v1..current` locally, so he can still decrypt photos he *already had access to* — this is correct behavior (he already had those photos; nothing you do after removal un-seeds them). But new uploads are invisible to him. + +**Add new device for existing member:** + +1. Alice's existing device adds Alice's new device as a leaf in the MLS group +2. Welcome carries all AMK versions Alice is entitled to +3. New device is now equivalent to Alice's other devices + +**Remove lost device:** + +1. Any of user's remaining devices issues MLS `Remove` for the lost device +2. Treat like a removal above — bump AMK version, since you must assume the lost device's keys are compromised + +## Per-user device coordination + +Each user publishes a signed device directory: + +```rust +DeviceDirectory { + user_id, + devices: [ + { device_id, ed25519_pk, mldsa_pk, key_package_ref, added_at, signed_by_master }, + ... + ], + signature: Hybrid(master_ed25519, master_mldsa) +} +``` + +When Alice's device A1 adds Bob to an album, it fetches Bob's directory, verifies the hybrid signature against Bob's published master identity, and adds all Bob's listed devices. Alice's other devices (A2, A3) see the MLS commit and update local state — MLS handles idempotent application of commits, so this just works. + +Conflicts (A1 and A2 trying to add different people simultaneously) are handled by MLS's proposal/commit ordering — one wins, the other re-proposes on top. OpenMLS exposes this. + +### History delivery for new joiners + +This is the one spot where you write real custom code. Two patterns: + +**Full history (recommended for shared albums):** +Welcome message carries encrypted blob of `[AMK_v1, AMK_v2, ..., AMK_current]`. New joiner decrypts all, can now read every photo. + +**Capped history (e.g., last 90 days):** +Only include AMKs corresponding to epochs ≥ threshold. Older photos remain visible but not decryptable — you show placeholders. + +Matrix supports both; most photo-sharing products default to full history. Pick one default, expose the choice if needed later. + +### Notes on Scaling + +MLS scales to thousands of leaves, so even a 50-user album (200+ devices) is fine. Note that every `Commit` touches the whole tree and each `Welcome` carries `log(N)` path secrets plus the AMK blob — a cost to watch for very large shared albums. + +## Authenticated Asset Encryption + +Every asset is content-addressed by the SHA-256 of its ciphertext and encrypted with a unique file key. We use AES-256-GCM with the STREAM construction for authenticated encryption. The file key is derived from the appropriate [AMK](#album-master-keys-amks); the AMK itself is recoverable from the account's master key (see [Identity-based Key Derivation](#identity-based-key-derivation)). + +### Asset Key Derivation + +Each asset is encrypted with a key derived from a versioned album master key (AMK), distributed and ledgered over MLS (see [Group Membership](#group-membership)). Note we never derive a key from the MLS epoch's internal state. + +An album's AMK ledger looks like this: + +```rust +Album { + id: UUID, + mls_group: MlsGroup, + keys: [ + AMK_v1: (random 32 bytes, created at album creation), + AMK_v2: (random 32 bytes, created when member X was removed), + AMK_v3: ... + ], + current_version: 3, +} +``` + +The per-file key is derived from the AMK version that encrypted it, using the [KDF](#key-derivation): + +```rust +file_key = HKDF_SHA512( + ikm: AMK_v{amk_version}, + salt: file_id, + info: "asset-file/v1", + length: 32 // 32 bytes for AES-256; HKDF-SHA512 expand truncates safely +) +``` + +AMKs are delivered over MLS application messages. When epoch N's MLS group is established, the creating device sends an `AlbumKeyDistribution { amk_version, amk_bytes }` message through MLS. Every current member's device receives and stores it locally (hardware-wrapped). + +### Provenance and Signed Manifest + +Capsule frequently needs a verifiable trace of *who* produced an asset, so the provenance signature must be cryptographically bound to the ciphertext — while still allowing streaming. We do this with a small **signed manifest** rather than a Merkle tree: the STREAM construction already detects per-chunk tampering, truncation, and reordering, so a Merkle tree's only marginal gain (early-abort on a forged *whole-file* signature) is not worth the extra format complexity. + +Each asset is stored as: + +```rust +AssetManifest { + version: "asset-manifest/v1", + crypto_suite_id: u16, // see Versioning Identifiers above + protocol_version: String, // YYYY-MM-DD; matches album pin + file_id: UUID, + album_id: UUID, + amk_version: u32, // identifies the AMK epoch + write-tier key + ciphertext_hash: { algo: String, value: bytes }, // content address; reused by upload protocol + plaintext_size: u64, + chunk_size: u32, // plaintext bytes per chunk (65,520) + nonce_prefix: [u8; 7], // STREAM nonce prefix, random per file + created_by_user: UUID, + created_by_device: UUID, + client_version: String, + timestamp: RFC3339, // bounded to ±30 days of server clock at accept + action: enum, // create | replace | delete | metadata-update + // | derivative-add | derivative-replace | trash-restore + prior_provenance_hash: Option<[u8;32]>, // SHA-256 over the previous manifest in this asset's + // provenance chain. null only for `action = create`. + // See Provenance of Library Modifications. + + device_sig: Hybrid(Ed25519, ML-DSA-65), // over all fields above + write_sig: Signature, // under epoch write-tier key, over all fields above +} + +AssetBlob { + manifest: AssetManifest, + chunks: [AES-256-GCM-STREAM encrypted chunks], +} +``` + +The manifest carries **two signatures**, and a client acknowledges the asset only if **both** verify: + +1. `device_sig` — hybrid Ed25519 + ML-DSA-65 by the uploading device's [DSK](#device-keys). Provides provenance; the device certificate chains to the user IK via the [device directory](#per-user-device-coordination). +2. `write_sig` — a signature under the epoch's [write-tier key](#album-master-keys-amks). Proves the signer held write authorization at `amk_version` (see [Write Authorization](#write-authorization)). + +The signed manifest is stored as the encrypted asset's header and is itself part of the [provenance record](#provenance-of-library-modifications). The same signing approach applies to other surfaces — [metadata blobs and sidecars](#metadata-encryption) and the [device directory](#per-user-device-coordination) are each hybrid, device-signed, and versioned. + +**Streaming is preserved.** The STREAM authentication tags verify every chunk *during* the stream. The manifest signature is a one-time provenance check. `ciphertext_hash.value` is computed incrementally as bytes arrive and confirmed at stream end — no separate pass, no buffering the whole file. + +### Encryption Workflow + +Encrypting an asset for upload: + +1. Derive `file_key` from `AMK_v{current}` (see [Asset Key Derivation](#asset-key-derivation)). +2. Generate a random 7-byte `nonce_prefix` from the OS CSPRNG. +3. Split the plaintext into 65,520-byte chunks and encrypt sequentially with `EncryptorBE32`, producing 64 KiB ciphertext chunks (16-byte tag each); the final chunk is flagged as last. +4. Compute `ciphertext_hash.value` incrementally over the produced ciphertext (the `algo` is fixed by `crypto_suite_id`). +5. Build and sign the [manifest](#provenance-and-signed-manifest) (device signature + write-tier signature). +6. Upload the blob (see [Import Synchronization](/design/import-synchronization/)). + +Streaming download / ranged reads: + +- **Sequential:** `DecryptorBE32` consumes chunks in order, verifying each tag. +- **Ranged:** To start at plaintext byte `B`, the client computes `chunk_index = B / 65,520`. Because the [STREAM construction](#stream-construction) derives each chunk's nonce deterministically, chunk `i` decrypts independently given `file_key` and `i` — the server need only serve that 64 KiB ciphertext chunk, which the client decrypts and verifies. + +### STREAM Construction + +Our scheme strictly requires streaming. + +The chosen method is AES-256-GCM with the STREAM construction (Hoang-Reyhanitabar-Rogaway-Vizár, 2015). STREAM splits the file into chunks, encrypts each with AES-GCM using a structured nonce (`prefix || counter || last-chunk-flag`), and guarantees you detect truncation, reordering, and chunk deletion. + +In Rust: the RustCrypto `aead` crate exposes `stream::EncryptorBE32` and `stream::DecryptorBE32` — drop-in. We use a 65,520-byte plaintext chunk → 64 KiB ciphertext chunk. (Note the upload transport's 4 KiB chunk alignment, described in [Import Synchronization](/design/import-synchronization/), is a separate concern from this crypto chunk size.) + +## Metadata Encryption + +Not all metadata can be encrypted — some must stay server-readable for routing and preview. The split is deliberate: + +- **Encrypted** (AES-256-GCM under a key derived from the album's AMK, fresh random nonce per blob): the CBOR sidecar / metadata blobs. Each blob is independently versioned and signed like an [asset manifest](#provenance-and-signed-manifest). +- **Server-plaintext by necessity:** `owner_id`, the [ciphertext content hash](#primitives-inventory), the ciphertext size, the [chromahash LQIP](/design/thumbnails/#lqip), and `dominant_color`. These are needed for routing and for generating previews without decryption. This is a deliberate, documented trade-off. +- **AI embeddings** (semantic-search vectors, face embeddings) are sensitive — a user can be re-identified from them. They are kept plaintext *locally* (vector search requires it) but encrypted at rest in the server-side backup. + +CBOR metadata blobs use **deterministic encoding** (RFC 8949 §4.2). Because a blob's hash is what content-addresses it and what the [signed manifest](#provenance-and-signed-manifest) commits to, two implementations encoding the same logical metadata must produce byte-identical output — otherwise the hash diverges and the signature fails to verify across [federated](/design/federation/) peers. + +### Metadata Blob Wire Format + +An encrypted metadata blob is a single contiguous byte string. Implementations MUST produce and consume exactly this layout, with no framing variations, so two correct implementations can compute identical content hashes byte-for-byte. + +```text ++---------------------+---------------------+--------------------------+---------------+ +| crypto_suite_id (2) | nonce (12 bytes) | ciphertext (variable) | tag (16 bytes)| ++---------------------+---------------------+--------------------------+---------------+ +| big-endian u16 | fresh CSPRNG draw | AES-256-GCM(plaintext) | GCM tag | +``` + +- `crypto_suite_id` (2 bytes, big-endian `u16`) — pins the AEAD and KDF used to derive the key. Identical to the field carried inside the manifest (see [Versioning Identifiers](#versioning-identifiers)), and a mismatch with the manifest's value rejects the blob at decode. +- `nonce` (12 bytes) — fresh OS-CSPRNG per blob; never reused, never derived. +- `ciphertext` — the deterministically-encoded CBOR plaintext, sealed with AES-256-GCM under `HKDF-SHA512(ikm=AMK_v{n}, salt=blob_id, info="metadata-blob/v1", length=32)`. +- `tag` (16 bytes) — GCM authentication tag. + +The total blob's `ciphertext_hash` (in the asset's [signed manifest](#provenance-and-signed-manifest)) is computed over the full byte string above — header, nonce, ciphertext, and tag concatenated. + +## Provenance of Library Modifications + +Every modification of data or metadata produces a **provenance record** — timestamp, device, client version, action — anchored by a [signed manifest](#provenance-and-signed-manifest). The records form an **append-only, hash-chained log per asset**, which is what lets an operator distinguish a legitimate delete from a malicious or bug-induced one after the fact, and what defeats the [stale-revival attack](/design/threat-model/) described in the Threat Model. + +### Chained, Append-Only Structure + +```rust +ProvenanceRecord { + asset_id: UUID, + manifest: AssetManifest, // see Provenance and Signed Manifest + prior_provenance_hash: Option<[u8;32]>, // SHA-256 over the previous record; + // null only for `action = create` + // The manifest's own `prior_provenance_hash` mirrors this value, so signature + // coverage of the manifest is signature coverage of the chain link itself. +} +``` + +Each non-create record references its predecessor by hash; a rewrite of any past record breaks the chain at that point and is detectable by any client walking forward from `create`. + +### What an Attacker With All Current Keys Still Cannot Do + +Even if every current key (every device's DSK, every album's current AMK and write-tier key) is compromised: + +- **Forward writes are possible** — the attacker can append new records, just like any holder of those keys. +- **Past records cannot be rewritten** — the prior record was signed by a (possibly retired) device whose hybrid signature is still verifiable against the public half published in the [device directory](#per-user-device-coordination). Replacing the past record would require forging that earlier device's signature, which the hybrid construction prevents. +- **Past records cannot be silently removed** — every later record carries the prior hash, so a removal breaks the chain. + +This bounds the blast radius of a credential compromise: history is read-only. + +### Physical Storage + +- **Client.** An append-only CBOR file at `media/{YYYY}/{YYYY-MM}/{uuid}.provenance.cbor`, alongside the asset and its sidecar. The file is a sequence of `ProvenanceRecord` entries. The client never deletes this file — on hard-delete of an asset the log persists as a tombstone-with-history. +- **Server.** A content-addressed encrypted blob, distinct from the [encrypted metadata blob](#metadata-encryption), so a metadata edit (which mints a new metadata blob) never rewrites history. The server's no-key envelope of every provenance write includes `prior_provenance_hash`, so the server can enforce monotonic chain advance without holding any key — see [Threat Model — Server-Side Validation Invariants](/design/threat-model/). + +The server is **append-only** for provenance: there is no API path that overwrites or deletes an existing entry. An attempt is rejected at the [server's structural validation layer](/design/threat-model/). + +### Derivative Provenance + +Thumbnails, previews, and embeddings are generated client-side and uploaded as ordinary encrypted blobs. Without provenance they would be silently overwritable by any client with write capability — a buggy v4 client could quietly replace a v3 client's good thumbnail with a corrupt one. To prevent this, every derivative carries a small signed manifest of its own: + +```rust +DerivativeManifest { + version: "derivative-manifest/v1", + crypto_suite_id: u16, + source_asset_id: UUID, + role: enum, // thumbnail | preview | lqip | embedding + format: String, // e.g. "image/avif", "embedding/mobileclip-b" + ciphertext_hash: { algo, value }, + generated_by_device: UUID, + generated_by_client: String, + model_id: Option, // for embeddings; see ML Models + model_version: Option, // for embeddings + generated_at: RFC3339, + prior_provenance_hash: Option<[u8;32]>, // chained per (asset_id, role) + device_sig: Hybrid(Ed25519, ML-DSA-65), + write_sig: Signature, // under the album's epoch write-tier key +} +``` + +A derivative overwrite is therefore a `derivative-replace` lifecycle action that appends to the provenance chain like any other write. Quarantine semantics from [Write Authorization](#write-authorization) apply: a derivative whose manifest fails verification is surfaced, never silently applied — a buggy client cannot poison a derivative under the receiving side's nose. + +## Failure Modes and Recovery + +Capsule treats loss of data — and loss of the keys that decrypt it — as a first-class concern. This section enumerates what can go wrong, how each failure is detected or contained, and the redundant, independent paths that restore a user's *entire* asset collection — including after catastrophic software bugs, not just key loss. + +### Failure Mode Catalog + +| Failure mode | Detected / contained by | Recovery path | +| ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| **Master key loss** | — | Master-key escrow (path 1) or cross-device recovery (path 2) | +| **Device key loss** | Device keys are disposable by design | Re-bootstrap from the master key (path 1/2); device keys are never recovered | +| **AMK loss** (album key) | — | OGK escrow (path 3) and the master-key-anchored backup escrow (path 4) | +| **Write-tier key loss** | — | Re-minted and redistributed over MLS at the next epoch; no asset is lost | +| **Master key compromise** | — | Master-key rotation re-wraps the hierarchy — see [Key Rotation and Revocation](#key-rotation-and-revocation) | +| **Device compromise** | — | Device revocation certificate + MLS `Remove`; surviving devices rotate group keys | +| **AMK / write-tier compromise** | — | MLS epoch bump mints a fresh AMK and write-tier key; the compromised epoch cannot read or sign future epochs | +| **Server compromise** | Server is never trusted for authorization or plaintext | Authorization is verified against MLS history; data is E2E-encrypted at rest | +| **Classical primitive broken** (Ed25519, X25519) | Hybrid construction | The PQ half (ML-DSA-65 / ML-KEM-768) still holds — confidentiality and authentication survive | +| **PQ primitive broken** (ML-DSA, ML-KEM) | Hybrid construction | The classical half still holds | +| **Ciphertext corruption; chunk truncation, reorder, or deletion** | AES-256-GCM-STREAM per-chunk tags + `ciphertext_sha256` | Re-fetch the blob from a content-addressed copy (path 6) | +| **Reader-signed / removed-writer / wrong-epoch / forged-chain / replayed manifest** | The single [`verify_asset`](#write-authorization) chokepoint | Asset is quarantined and surfaced in the [audit trail](#provenance-of-library-modifications) | +| **MLS ratchet corruption or loss** | — | The recovery path is independent of ratchet state (paths 1, 3, 4) | +| **Backup incompleteness** (a referenced `amk_version` missing from the escrow) | Backup verification's AMK-completeness check | Caught before the backup is relied on; re-export | +| **Nonce reuse** | Structurally prevented | STREAM derives per-chunk nonces; metadata blobs draw fresh random nonces; a fresh per-file key lets the STREAM counter start at zero | +| **CBOR non-determinism** breaking cross-peer signature verification | RFC 8949 §4.2 deterministic encoding | Byte-identical re-encoding; the signature verifies | +| **Catastrophic software bug** corrupting the library DB / index | The DB is a rebuildable cache, not a source of truth | Filesystem rebuild from CBOR sidecars (path 5) | +| **Erroneous delete** (bug or user) | Soft-delete is the default | Restore from trash within the retention window (path 7) | +| **Stale-revival attempt** (peer or restore sends an old-but-validly-signed manifest) | `prior_provenance_hash` chain (see [Provenance](#provenance-of-library-modifications)) and matching server-side envelope check (see [Threat Model](/design/threat-model/)) | Manifest is quarantined; chain advance is refused on both client and server | +| **Suite-downgrade attempt** (re-sign a manifest under a weaker `crypto_suite_id`) | Signature covers `crypto_suite_id` and `protocol_version` | Verification fails at `verify_asset`; manifest is quarantined | +| **Derivative poisoning** (buggy or hostile client overwrites a good thumbnail/embedding) | Every derivative carries a [`DerivativeManifest`](#derivative-provenance) on its own chain | Overwrite without a valid manifest is rejected; provenance chain detects an unauthorized replacement | +| **Cross-schema sidecar overwrite** (old client writes back a sidecar after stripping unknown fields) | Sidecar signature covers every byte including unknown fields; old client `refuses to write` when `sidecar_schema` exceeds its max known | Old client cannot strip-and-resign; new client detects schema regression and quarantines | + +### Redundant Recovery Paths + +Restoring a complete asset collection does not depend on any single mechanism. The following paths are **independent** — each is annotated with the failures it survives: + +1. **Master-key escrow.** A recovery passphrase or BIP39-style seed unwraps the server-side escrow blob → account master key → AMK escrow → every asset. *Survives: total device loss.* See [Master-Key Escrow](/design/backup-recovery/#master-key-escrow). +2. **Cross-device recovery.** Any signed-in device re-bootstraps a new device over a verified channel. *Survives: partial device loss, and loss of the master-key backup — as long as one device survives.* +3. **Owner Group Key (OGK).** Any current member of the [owner set](#owner-group-keys-ogks) recovers every album's AMK versions, independent of album membership. *Survives: lost album membership, gaps in AMK distribution over MLS.* +4. **Portable backup artifact.** A self-describing, versioned, encrypted archive, stored offline. *Survives: server data loss, account compromise, escrow-blob corruption.* See [Backup Artifact](/design/backup-recovery/#backup-artifact) for the container format. +5. **Recovery-first filesystem rebuild.** CBOR sidecars are the canonical metadata store; the database is a rebuildable query cache. The idempotent `rebuild_index()` (`capsule-core/src/library/rebuild.rs`) walks `.cbor` sidecars and reconstructs the index. *Survives: DB corruption and catastrophic bugs in the index/query layer.* +6. **Content-addressed durability redundancy.** Ciphertext is addressed by the SHA-256 of its bytes, so any byte-identical copy — on another device or a [federated](/design/federation/) peer — is independently verifiable. This is a *durability* path: it restores ciphertext, not keys. *Survives: single-server data loss.* +7. **Trash soft-delete window.** Deletes are soft first — `soft_delete()` / `purge_expired_trash()` (`capsule-core/src/library/trash.rs`) give a reversal window before a hard purge. *Survives: erroneous deletes by a bug or user.* + +**Account-type coverage.** Registered accounts have all seven paths. [Delegated/sponsored accounts](/design/authentication/#account-types) are recovered via the sponsoring account's master key, since their keys derive from it. Non-registered (share-link) accounts hold no collection of their own — recovery is not applicable. + +### Bug-Resistance Invariants + +These cross-cutting properties make recovery robust specifically against *catastrophic bugs*, not just key loss: + +- **The backup path is independent of the MLS ratchet.** Restore never reconstructs ratchet state, so a ratchet bug cannot strand data. The master key — not any ratchet state — is the single backed-up root. +- **Hardware-bound, disposable device keys.** Device keys live inside hardware, are non-exportable, and are never backed up — a lost device is re-bootstrapped, not recovered. +- **Cross-signing (Matrix-style).** The master identity signs every device key; adding a device means an existing device signs it, so losing one device never compromises the account. +- **Every construction is versioned.** KDF `info` strings, in-blob Argon2id parameters, the [`crypto_suite_id`](#versioning-identifiers) on every manifest and metadata blob, and the [`sidecar_schema`](/design/metadata/#sidecar-schema-v1) on every sidecar mean a buggy v2 never strands v1 data — v2 keys and structures coexist with v1 without a flag day. Signature coverage of `crypto_suite_id` defeats downgrade-attempts. +- **`verify_asset` quarantines, never drops.** A bug-produced invalid asset is neither silently dropped nor silently accepted; it is quarantined and surfaced in the audit trail so an operator can tell a bug from an attack. +- **Provenance is append-only.** Each `ProvenanceRecord` carries the hash of its predecessor (`prior_provenance_hash`), and every record is hybrid-signed by the producing device. An attacker holding every *current* key still cannot rewrite a past record without forging an earlier (possibly retired) device's signature — history is read-only. See [Provenance of Library Modifications](#provenance-of-library-modifications). +- **Stale-revival is rejected.** An incoming manifest whose `prior_provenance_hash` is behind the receiver's stored `latest_provenance_hash` is treated as stale and quarantined — a deleted asset cannot be silently resurrected by a peer or a backup restore. The check is enforced both client-side and server-side (no key needed); see [Threat Model](/design/threat-model/). +- **Backup verification runs before reliance.** Preview, dry-run, signature-chain, and AMK-completeness checks (see [Backup Verification](/design/backup-recovery/#backup-verification)) detect an incomplete or broken backup *before* it is needed. + +## Transport Security + +All client-server communication is over HTTPS. While our stack aims to stay PQ-safe (within due course), the transport layer (TLS) must be configured by the server administrator to be PQ-resistant as well. As of writing, the standard is TLS 1.3 with hybrid X25519+ML-KEM key exchange enabled. Since application servers do not terminate TLS, ensure your ingress/reverse proxy is properly configured. + +## Implementation + +- **Centralized audit paths:** All key cryptographic primitives are centralized in `capsule-core/crypto`. Asset acknowledgement goes through the single `verify_asset` chokepoint (see [Write Authorization](#write-authorization)). +- **Contract-driven development:** Define the crypto interfaces, data structures, and the full set of test cases — especially negative cases — before implementing logic. +- **Backward compatibility:** The server stores all data and metadata encrypted; its database model is distinct from the client's and records `crypto_suite_id` and `protocol_version` for every manifest. Old suite ids and protocol versions remain decryptable forever — retiring a primitive adds an inventory row and a new suite id, never edits or removes an old one. Clients outside the server's supported `protocol_version` range are rejected at the [protocol handshake](/design/threat-model/), before any state is written. +- **Trust the server (and only the server) for storage, never for authorization:** The server owns, provisions, and maintains the encrypted user data, so we rely on it to *hold* data — but authorization decisions are verified cryptographically against MLS-distributed keys, never taken on the server's word. +- **Memory hygiene:** All keys and decrypted data are zeroed in memory immediately after use. We also use secure memory allocation where possible to prevent swapping to disk. + +Further guidance: + +- Use audited libraries only — libcrux (formally verified), RustCrypto, ed25519-dalek, x25519-dalek; never be the first serious user. +- Use MLS rather than inventing group crypto; it handles the 1:1 case and shifts the audit burden to the IETF and OpenMLS. +- Keep the backup path independent of the ratchet — album keys live in the backed-up hierarchy, so recovery never reconstructs ratchet state. +- Version every key derivation with an `info` string (`"albums/v1"`, `"asset-file/v1"`) so v2 keys can derive alongside v1 without a flag day. +- Store device private keys in hardware (Secure Enclave, StrongBox, TPM) to eliminate memory-extraction attacks. +- Write test vectors against known implementations (libsignal, OpenMLS, RFC vectors) before writing anything novel. + +### Versioning + +The construction of every encryption metadata structure is always versioned. Parameters (e.g. for Argon2id) must be saved inside the construction to ensure future changes do not break previous constructions. diff --git a/capsule-docs/src/content/docs/design/federation.md b/capsule-docs/src/content/docs/design/federation.md new file mode 100644 index 0000000..5bf5c61 --- /dev/null +++ b/capsule-docs/src/content/docs/design/federation.md @@ -0,0 +1,153 @@ +--- +title: Federation +description: How Capsule implements server-to-server federation for sharing and collaboration +--- + +Federation lets an album owned on one Capsule server be shared with users whose +accounts live on another. This document covers **server-to-server** federation +only; direct device-to-device sync for a single user is [Peering](/design/peering/). + +## Threat Model + +Federation is designed under one assumption: **a remote server is hostile until +proven otherwise.** It may be running ancient, buggy code; it may be compromised; +it may be actively malicious; peers may collude. The only thing Capsule trusts is +cryptography it verifies itself. Every other claim a peer makes is unverified +input until a signature or a content hash says otherwise. + +This is in line with the security posture established in the [cryptography](/design/cryptography/) design toward Capsule's *own* server ("trust the server for storage, never for authorization"). Federation extends it to servers Capsule does not even operate. + +## Federation Reuses Existing Primitives + +Federation deliberately introduces **no new data protocol**. A remote server fetches exactly the same content-addressed primitives a client uses (see [Import and Synchronization](/design/import-synchronization/#discovering-what-changed)): + +| Operation | Purpose | +| -------------------------- | --------------------------------------------------------------------------------------------- | +| `GET /sync` (album-scoped) | A page of metadata-blob changes after a cursor, for an album the peer holds a capability for. | +| `GET /blob/{hash}` | Fetch an opaque ciphertext blob by its content address. | +| `POST` capability proof | Present a [federation capability](#federation-capabilities) to establish or refresh access. | + +Everything else — notifications, presence — rides a separate, lower-trust channel and never feeds the validation pipeline directly. + +Because blobs are content-addressed by their [ciphertext content hash](/design/cryptography/#primitives-inventory), a peer *physically cannot* lie about what a hash contains: Capsule recomputes the hash on arrival and rejects a mismatch. This collapses most of the trust problem — Capsule never trusts a peer's *claim* about an object, it fetches and verifies. + +ActivityPub and Nextcloud Federated Sharing were considered and rejected as the wire protocol: Capsule's E2EE model (ciphertext-only blobs, MLS-gated album membership) does not map onto either, and adopting one would mean tunnelling Capsule's real primitives through a foreign envelope for no gain. + +## Pull-Only Federation + +Peers **pull**; they never push into Capsule's database. A remote server fetches on Capsule's schedule, through Capsule's validation pipeline, and the result is written only after it verifies. The single thing a peer may push is a **notification** — "a new event exists in album A" — over the separate low-trust channel; Capsule then fetches and validates on its own terms. Push-based writes are where most federation exploits live, so the design simply does not have them. + +## Album Ownership (v1: Single Home Server) + +For v1, **each album has exactly one home server** — the server that issued the album's initial capability is the authoritative origin for every blob in it. A peer server that holds cached blobs from a federated album is exactly that: a cache, not an origin. The home server alone serves the *current* manifest for any asset in its album. + +This rule keeps the v1 federation API surface small (no replication, no cross-server commit ordering) and forecloses several damage classes — split-brain ownership, two-server delete races, conflicting AMK-epoch advances — that would otherwise need explicit cross-server consensus to prevent. + +Cross-server replication of a *single* album (where two users on different home servers each want to write the same album) is **out of scope for v1** and deferred to v2. v1 supports cross-server sharing in the read direction (Alice on `home.tld` shares an album to Bob on `other.tld`; Bob reads via federation; Bob's writes either remain on `home.tld` via a registered or sponsored account, or are out of scope). The v2 design space is flagged in [Threat Model — Open Questions](/design/threat-model/#open-questions). + +## Federation Capabilities + +Sharing an album with `alice@other.tld` requires her server to be *able* to fetch that album's blobs. Capsule issues her server an **album-scoped capability token**: a signed, expiring, revocable grant naming the album, the scope, and an expiry, reusing the [EdDSA-JWT machinery](/design/authentication/#access-tokens) already built for access tokens — no separate macaroon or ZCAP format is introduced. + +### Token Contents + +A federation capability token is an EdDSA-JWT with the following claims: + +| Claim | Type | Meaning | +| ---------------------- | -------- | ---------------------------------------------------------------------------------------- | +| `iss` | string | The issuing home server (`home.tld`). | +| `sub` | string | The peer server identity (`other.tld`). | +| `aud` | string | The album id this capability scopes to (`urn:capsule:album:UUID`). | +| `scope` | enum | `read` (full) or `read-derivative-only` (thumbnails and previews only, never originals). | +| `exp` | RFC 3339 | Expiry; never more than **24 h** after `iat`. | +| `nbf` | RFC 3339 | Not-before; clock-skew tolerance against the peer's wall-clock. | +| `jti` | UUIDv7 | Unique token identifier; the revocation key. | +| `min_protocol_version` | string | Lowest `protocol_version` the issuing server still serves; matches the album's pin. | + +Signed under the home server's [Ed25519 signing key from the cryptographic primitives inventory](/design/cryptography/#signature-scheme) — classical only at this layer (operational server keys rotate easily; the [hybrid PQ scheme](/design/cryptography/#signature-scheme) is reserved for user/device identity). + +### Token Lifecycle and Chain of Trust + +1. **Issuance.** A user on `home.tld` shares an album with `alice@other.tld`. `home.tld` mints a capability token for `other.tld` and delivers it as part of the share-invite message to Alice's client. Alice's client posts the token to `other.tld`; `other.tld` caches it server-side and uses it on every subsequent pull. +2. **Verification.** Capsule (the verifier, `home.tld` in this case) verifies the token offline against its own published signing key — no third-party PKI, no network call to a notary except for key rotation (see [Server Identity and Key Rotation](#server-identity-and-key-rotation)). +3. **Refresh.** A token nearing `exp` is replaced by `other.tld` requesting a new one on Alice's behalf; the request is itself authenticated by the previous token. Idempotency keyed by `(peer_id, jti)` per [Threat Model — Idempotency Invariants](/design/threat-model/#idempotency-invariants). +4. **Revocation.** Revocation is a short TTL (`exp ≤ 24h`) plus a published **revocation list** at `/.well-known/capsule/revoked-jti`. Peers fetch and cache the list with a **maximum staleness of 15 minutes**. A peer holding a revoked-but-not-yet-expired token will still be honored for up to 15 minutes after revocation — this is the deliberate trade-off between revocation latency and revocation-list polling overhead. +5. **Expiry.** A token past `exp` is rejected unconditionally; the verifier returns `401` and the peer must obtain a fresh token before continuing. + +This capability is a **transport-scoped control, not a confidentiality control.** A peer holding it can fetch ciphertext and nothing more — confidentiality is already enforced by [MLS album membership](/design/cryptography/#group-membership): without the album master key, fetched bytes are unreadable. The capability exists to gate *who may fetch at all* — rate-limiting, anti-enumeration, and clean revocation of a sharing relationship — not to keep content secret. + +## Validation at the Boundary + +Every byte from a peer crosses a hard boundary before it is trusted. The exhaustive checklist — refuse-by-default, applied to every federated write — is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants); the rules that follow are the federation-specific specialization of that list. + +- **Strict schema match.** Input must conform exactly to the schema for its declared protocol version (see [album version pinning](/design/versioning/#album-protocol-version-pinning)). Anything else is rejected. `crypto_suite_id` and `sidecar_schema` must each be values the verifying server recognizes; an unknown value is **not** preserved-and-ignored, it is rejected (cf. the asymmetric Postel's Law in [Principles](/design/principles/) and [Threat Model — Schema Evolution](/design/threat-model/#schema-evolution-and-field-grammar)). +- **Closed enums.** `action`, `content_type`, `DerivativeManifest.role`, and `gps.source` are closed per protocol version. An unknown value is a structural error, not a "future to ignore." +- **Hard caps.** Size caps on every field, depth caps on nested structures, length caps on bounded collections (e.g. `superseded_captions ≤ 16`), rate caps per peer. No unbounded input reaches a parser. +- **Unknown fields within a known schema preserved, never executed.** Top-level unknown fields are rejected; field-level unknown CBOR keys within a known schema are preserved verbatim for forward compatibility but are never interpreted. +- **Manifest envelope checks.** All items 1–18 of [Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants) apply — `protocol_version` in range, `crypto_suite_id` in inventory, hash algorithm matches the suite, declared size against received bytes, `created_by_device` in the user's device directory, `timestamp` within ±30 days, monotonic `amk_version`, and the [stale-revival check](/design/import-synchronization/#stale-revival-detection) on `prior_provenance_hash`. +- **Capability token.** Items 19–21 of the same list: token verifies under the home server's signing key, `exp` in future, `jti` not in the revocation list, per-peer rate budgets unbroken. +- **The parser is a security boundary.** Capsule's decoders for federated input are written in memory-safe Rust against audited libraries (`ciborium`, `serde_cbor`); we explicitly assume the host language and decoder are memory-safe (the same assumption [Federation — Security Against Malicious Files](#security-against-malicious-files) makes at the client edge). Decoder CVEs in client decode paths for *opaque media bytes* are handled by the [sandboxed decoder](/design/clients/#sandboxed-decoder), not by re-implementing the decoder. The federation CBOR decode path is additionally fuzzed. + +## Per-Peer Compartmentalization + +Each peer is its own blast-radius boundary — a bad peer cannot starve good ones: + +- **Quotas.** Per-peer budgets on events/hour, bytes/hour, and CPU/hour. Exceeding a budget queues or drops further requests. +- **Error budget + circuit breaker.** Malformed input spends a per-peer error budget; enough failures trip a circuit breaker that backs the peer off exponentially (e.g. 5 / 30 / 60 minutes). A buggy peer cannot DoS Capsule. +- **Quarantine for new peers.** First contact puts a server in a probationary tier: tighter quotas, stricter validation, no push notifications accepted. It graduates after a period of clean behavior. This cuts off the "spin up a fresh instance to attack" vector, mirroring email reputation systems. + +## Stale-Revival Defense + +A federated peer may have cached an old manifest for an asset that the home server has since marked deleted (or otherwise advanced beyond). Submitting that old manifest back must not silently resurrect the asset. The defense is owned by [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) and surfaced for federation here: + +- The home server only serves the **current** manifest for any asset — it does not expose an API to fetch an arbitrary past manifest. A peer can therefore only present a manifest it has previously cached. +- A peer presenting a manifest whose `prior_provenance_hash` is behind the home server's stored `latest_provenance_hash` is rejected with `409` (stale-revival), and the rejected manifest's hash is added to the bounded rejected-hash table (see [Soft-Fail Semantics](#soft-fail-semantics)). The same defense runs on the receiving client when a peer's pull serves a stale manifest forward. +- The chain check is fully no-key: the server reads `prior_provenance_hash` from the manifest envelope and compares it to its own stored value. + +This is the federation-layer specialization of [Threat Model — § 4 (Damage Scenario Map)](/design/threat-model/#damage-scenario--invariant-map), row #4. + +## Soft-Fail Semantics + +A federated event that fails validation is rejected **locally** — not applied, not shown, no authority derived from it — but its hash is **remembered**. Remembering the hash keeps Capsule's view from silently diverging from peers that (wrongly) accepted it: divergence is the real enemy, and explicit rejection-with-memory is the cure. This is the federation-facing counterpart of the [`verify_asset` quarantine](/design/cryptography/#write-authorization) — a failure is never silently dropped and never silently accepted. + +**Bounded memory.** A hostile peer could otherwise flood the rejected-hash table indefinitely, so the table is **capped**: default 100,000 entries with a 90-day TTL per entry, both deployment-configurable. Eviction is LRU within the cap. The hashes that age out are the ones Capsule hasn't seen referenced again — by the time they age out they are no longer load-bearing for divergence detection. + +## Reconstructing State Without Trusting Peers + +Capsule never trusts the *order* in which a peer returns results. Federated state is reconciled from cryptographic signals — content hashes and signatures on [asset manifests](/design/cryptography/#provenance-and-signed-manifest) — not from peer-supplied ordering. A manifest's `timestamp` is self-asserted and used for audit only. + +**Cross-peer consistency checks.** As a cheap backstop, a client may periodically fetch the same album state from the home server and from a peer and diff them. A mismatch flags a potentially misbehaving server. This is rare and off the hot path, but one server cannot rewrite history without another noticing. + +## Robustness Against Connectivity Loss + +Assets linked from external servers may be unreachable — server downtime, network issues. Capsule indicates the asset is currently unavailable and retries fetching it later. It does **not** thrash and remove the external asset's metadata from the local index; the unavailability is logged for debugging and monitoring. + +Under the v1 [single-home-server rule](#album-ownership-v1-single-home-server), an unreachable home server makes its assets unreachable but does not produce conflicting state — there is no second authoritative server to diverge from. After a configurable downtime budget (**default 30 days of failed pulls**), Capsule marks the album **degraded** in the UI ("Owned by an unreachable server"). The local index entries are **never removed** — the assets are unreachable, not deleted, and resuming federation with the home server (when it recovers) re-validates and re-enables the album. There is no "kick the server" mechanism in v1 because there is nothing to kick: a single home server's silence is observed as unavailability, full stop. + +## Security Against Malicious Files + +Linking assets from an external server means a client inherently trusts bytes from that server. Two defenses apply: + +- **Untrusted-server whitelist.** If an album contains assets from an untrusted external server, clients skip loading them unless the user explicitly consents, accepting the risk. +- **Sandboxed decoding on the client.** The Capsule *server* never decodes media — it handles only ciphertext — so image-decoder CVEs (libjpeg, libwebp, libheif, libavif have all shipped exploits recently) are a **client-side** risk. Clients decode untrusted remote-origin assets through an isolated/sandboxed decode path that can be crashed freely (see [Clients](/design/clients/)). + +## Federated Breadcrumb Index + +Search spanning federated albums uses a two-tier index: + +- **Tier 1 — local full-fidelity index.** Everything on the home server — own uploads plus cached remote content — gets the full treatment described in [AI/ML Integrations](/design/ai/): embeddings, tags, perceptual hashes. +- **Tier 2 — federated breadcrumb index.** For accessible remote albums, Capsule keeps only a lightweight record per asset — content hash, timestamp, author, size, album membership. When the user actually views the remote album, relevant assets are fetched and **promoted** into the Tier-1 index. Promotion is lazy and on-demand; Capsule never pre-indexes every federated album wholesale. + +## Moderation and Abuse + +Capsule is end-to-end encrypted, so a server **cannot** scan content it holds — server-side content or CSAM scanning is impossible by design, and no content scanner is built. Moderation instead operates on what *is* available: + +- **Federated reporting protocol.** A report against `alice@other.tld`'s asset is routed to her home server's administrators, since they are the only party that can act on her account. +- **Blocklists.** Server-level blocklists, plus per-user blocks that federate. +- **Untrusted-server whitelist.** The same [whitelist](#security-against-malicious-files) that gates malicious files is the front-line abuse control for content from servers Capsule does not trust. + +## Server Identity and Key Rotation + +- Server-to-server requests are signed under the [server's signing key](/design/cryptography/#signature-scheme) (classical-only at this layer is acceptable since operational server keys rotate easily; the [hybrid PQ scheme](/design/cryptography/#signature-scheme) is reserved for user/device identity), published at a well-known path. Matrix, ActivityPub (HTTP Signatures), and AT Protocol all converge on this pattern. +- Servers cache each other's public keys, so key rotation needs a notary / perspective endpoint so a peer can confirm a rotated key. +- Album protocol versions are pinned per album — see [Album Protocol Version Pinning](/design/versioning/#album-protocol-version-pinning). diff --git a/capsule-docs/src/content/docs/design/filesystem.md b/capsule-docs/src/content/docs/design/filesystem.md index 1e441c9..d39cb3a 100644 --- a/capsule-docs/src/content/docs/design/filesystem.md +++ b/capsule-docs/src/content/docs/design/filesystem.md @@ -1,949 +1,496 @@ --- title: Filesystem -description: How filesystem is structured in Capsule +description: How Capsule structures files on disk, on the server and on clients --- -Content in Capsule is organized on both server and client filesystems. Design priorities differ between the two, but both follow the same core principles. - -## Core Principles - -These principles apply universally to both client and server. - -**Recovery-First**: The filesystem must be reconstructible from partial corruption. No database is required to interpret critical data — sidecar files are the canonical metadata store; the database is a rebuildable query cache. - -**Deterministic**: File placement is algorithmic. The same media capture timestamp always produces the same path, enabling verification and repair. - -**Self-Describing**: Each media file is paired with a CBOR sidecar containing all user-editable and stable metadata. Files are independently interpretable without a running database. - -**Atomic Writes**: Use temp-file + rename throughout. Direct overwrites risk corruption on power loss. - -**Postel's Law**: Liberal in what we accept — unknown sidecar fields preserved, missing optional fields tolerated. Strict in what we create — every required field must be present and valid before committing. - ---- - -## On-Server Filesystem - -The server stores authoritative copies of media files and sidecars for each library. Cache artifacts (thumbnails, transcodes, `.meta.cbor`) are ephemeral and regenerated on demand. - -### Ownership Model - -Content is partitioned at the root by `owner_id`. An owner is the billing and namespace entity (a person, organization, or team). The bijective mapping between `owner_id` and the set of authorized `user_id`s is managed entirely in PostgreSQL — the filesystem has no knowledge of individual user identities. Access control is enforced at the API layer, not the filesystem layer. `owner_id`, `album_id`, and `user_id` are all independent namespaces; no two of these ever share a value. - -### Server Layout +Capsule's end-to-end encryption splits the filesystem into two fundamentally +different roles. The **server** stores only opaque, content-addressed +ciphertext — it never holds a decryption key and cannot interpret a single byte +it stores (see [Cryptography](/design/cryptography/)). **Clients** hold the keys, so a +client filesystem is a working library of plaintext media, sidecar metadata, and +rebuildable caches. The two layouts share a small set of principles but +otherwise have little in common. + +This document covers on-disk structure only. The import pipeline, the upload +protocol, and synchronization are covered in +[Import and Synchronization](/design/import-synchronization/); metadata extraction in +[Metadata](/design/metadata/); derivative generation in +[Thumbnails and Previews](/design/thumbnails/); grouping and trash semantics in +[Asset Organization](/design/organization/); backup and recovery in +[Backup and Recovery](/design/backup-recovery/). + +## Shared Principles + +These follow directly from [Core Principles](/design/principles/): + +- **Recovery-first.** No database is required to interpret canonical data. On + the client, sidecar files are the source of truth and the index is a + rebuildable cache. On the server, PostgreSQL is the authoritative index, but + it holds only key-free facts. +- **Atomic writes.** Every write that must not tear uses temp-file + atomic + rename on the same filesystem. Direct overwrites risk corruption on power loss. +- **Ephemeral derived data.** Only originals and their canonical metadata are + irreplaceable. Thumbnails, transcodes, parsed-metadata caches, and the query + index can all be regenerated and are treated as such. +- **4 KiB alignment.** Data is processed and written block-aligned to 4 KiB, + which matches memory and disks and enables the reflink assembly path below. +- **Content-addressing.** Stored blobs are named by their ciphertext content hash — + the same hash everywhere a content address is needed (see + [Cryptography Primitives Inventory](/design/cryptography/#primitives-inventory)). + +## Server vs Client at a Glance + +| Concern | Server | Client | +| ------------ | ------------------------------------------ | --------------------------------------------- | +| Holds keys | No | Yes | +| Stored form | Opaque ciphertext blobs | Plaintext media + CBOR sidecars | +| Naming | Content-addressed by ciphertext hash | UUIDv7 stems, date-bucketed | +| Index | PostgreSQL (key-free facts only) | SQLite (rebuildable, full plaintext metadata) | +| Derived data | Stored as client-generated encrypted blobs | Generated locally, cached, rebuildable | +| Originals | Always retained while referenced | Present only if synced locally | + +## Server Filesystem + +### Stores by Deployment Profile + +The server's durable state is always split across **two required systems** plus an **optional third** for high-concurrency deployments: + +- **Blob store** (filesystem) — the encrypted bytes of every asset. *Required.* +- **PostgreSQL** — the authoritative index: ownership, album references, blob + references, lifecycle state, and (in the default profile) upload-session state. + *Required.* +- **Valkey** — volatile upload-session state (offsets, status) with a 24-hour + TTL. *Optional.* Recommended only for deployments where upload-session hot-path + contention on PostgreSQL becomes measurable. + +This gives two concrete deployment profiles: + +| Profile | Session state lives in | When to choose it | +| --------------------------- | ------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- | +| **Default (Postgres-only)** | `upload_sessions` table with `expires_at` TTL column and a periodic sweep | Self-hosted, small-to-medium servers, single-node deployments. Reduces ops surface. | +| **High-concurrency** | Valkey (keyed `upload:session:{id}`) with native 24-hour TTL; PostgreSQL still holds the durable pending-asset row | Large multi-tenant deployments where session-table contention is a measured bottleneck | + +Switching profiles is operationally invisible to clients — the upload protocol does not change, only where the server stores volatile session counters. The [upload protocol](/design/import-synchronization/) is written to be store-agnostic. + +The server performs no decoding, no metadata extraction, and no thumbnail +generation — it cannot, since it never holds a key. + +### Blob Store Layout ```text -{data_root}/ # Configured at server startup (e.g., /var/capsule) -├── {owner_id}/ # One directory per owner; UUID-named, no prefix -│ └── {album_id}/ # One directory per library under that owner -│ ├── media/ -│ │ └── {YYYY}/ -│ │ └── {YYYY-MM}/ -│ │ ├── {uuid}.{ext} # Media file (read-only after commit) -│ │ └── {uuid}.cbor # Sidecar metadata (mutable) -│ ├── cache/ -│ │ ├── meta/ -│ │ │ └── {uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor # Ephemeral full metadata -│ │ ├── thumbnails/ -│ │ │ └── {xs|s|m|l|xl|o}/ -│ │ │ └── {uuid[0:2]}/{uuid[2:4]}/ -│ │ │ ├── {uuid}.jxl -│ │ │ └── {uuid}.webp -│ │ └── transcodes/ -│ │ ├── h264/ -│ │ │ └── {uuid[0:2]}/{uuid[2:4]}/{uuid}.mp4 # H.264/MP4 (ephemeral) -│ │ └── live/ -│ │ └── {uuid[0:2]}/{uuid[2:4]}/{uuid}.mov # Live Photo video (ephemeral) -│ └── trash/ -│ └── {uuid}.{ext} # Soft-deleted media (30-day quarantine) +{blob_root}/ +├── incoming/ +│ ├── {upload_id}_{n}.part # in-flight chunk +│ └── {upload_id}.bin # assembled blob, pre-verification +├── blobs/ +│ └── {hash[0:2]}/{hash[2:4]}/ +│ └── {hash} # finalized blob, content-addressed └── .server/ - ├── version.cbor # Server filesystem schema version - └── config.cbor # Server-wide configuration + ├── version # server filesystem schema version + └── config # server-wide configuration ``` -**Path derivation**: Every file path on the server is fully deterministic from `(owner_id, album_id, asset_id, artifact_type)`. No scanning or database lookup is needed to compute a path. - -**`{data_root}`**: Absolute path configured at server startup. The entire tree must be on a single filesystem to guarantee atomic renames within any library. - -**`owner_id`**: Assigned at account creation. Never reused or changed. Because UUIDs are lowercase hex + hyphens and `.server` starts with a dot, there is no naming collision between owner directories and the server config directory. - -**Transcode type directories** (`h264/` and `live/`): Each transcode type is a separate subdirectory instead of a filename suffix. The path for any transcode is unambiguously derived from the asset UUID and the type — no suffix convention to remember, no risk of name collision within a directory. - -### Differences from Client Layout - -| Concern | Client | Server | -| ------------------ | ----------------------------------------------- | -------------------------------------------- | -| Root partitioning | N/A (single library per root) | `{owner_id}/{album_id}/` | -| Index | `index/library.sqlite` (file-based) | PostgreSQL (rebuildable from sidecars) | -| Lock | `.library/lock` (per-library process lock file) | Process-managed; no lock file | -| Config / version | `.library/{version,config}.cbor` | `.server/{version,config}.cbor` at data root | -| Migrations | `.library/migrations/` | Server-managed; not on the filesystem | -| Cache artifacts | `index/{meta,thumbnails,transcodes}/` | `cache/{meta,thumbnails,transcodes}/` | -| Transcode subtypes | `index/transcodes/{h264,live}/{shard}/` | `cache/transcodes/{h264,live}/{shard}/` | -| Trash | `.library/trash/` | `{album_id}/trash/` | - -### Server Config Schema (`.server/config.cbor`) - -```text -schema_version u8 REQUIRED. Config schema version. Current: 1. -``` - -### Server Index (PostgreSQL) - -The server uses PostgreSQL as its authoritative index. The core tables (`assets`, `asset_stacks`, `stack_members`, `asset_tags`) mirror the client's SQLite schema (same names and semantics). The server schema is a superset with additional server-only columns and tables. - -**Server-only columns on `assets`**: `owner_id`, `chromahash`, `dominant_color`, `content_type`, `is_favorite`, `upload_user_id`, `uploaded`. - -**Server-only tables**: `asset_stacks.metadata` (JSONB — stack-type-specific structured data), `owners`, `users`, `owner_members`, `albums`, `album_shares`, `people`, `faces`, `smart_tags`, `asset_smart_tags`, `share_links`, `memories`, `passkeys`. - -Schema version is stored in a `_schema_meta` table row instead of `PRAGMA user_version`. - -The server database is rebuildable from sidecars by the same scan-and-ingest logic as the client. On startup, if the database is empty and sidecar files are present, trigger a full rebuild. Stack relationships are reconstructed from `stack_hint` fields in sidecars (see Stack Reconstruction on Rebuild). - ---- - -## On-Client Filesystem - -*Desktop clients only. Mobile clients (Android/iOS) use platform-sandboxed storage and are handled separately.* - -### Design Priorities - -**Performance**: SQLite index caches queries but is rebuildable. Thumbnails sharded to avoid directory entry limits on large libraries. - -### Client Layout +- **`{blob_root}`**: absolute path configured at server startup. The entire tree + must be on a single filesystem so that finalization renames are atomic. +- **`incoming/`**: live uploads. Chunks land as `{upload_id}_{n}.part`; on + finalization they are concatenated into `{upload_id}.bin`. The 4 KiB chunk + alignment is what allows each chunk to be reflinked into place on + copy-on-write filesystems, turning assembly into a near-instant metadata + operation. See the upload protocol in + [Import and Synchronization](/design/import-synchronization/). +- **`blobs/`**: the finalized store. A blob's filename is its [ciphertext content hash](/design/cryptography/#primitives-inventory); the two-level hex-prefix shard keeps directory sizes bounded for + multi-million-blob stores. A finalized blob is immutable. +- **`.server/`**: the server operator's own configuration and schema version. + This is plaintext server metadata, not user data — it is the one thing under + `{blob_root}` that is not an encrypted blob. + +### Uniform, Opaque Blobs + +A single asset produces a **bundle** of blobs (see +[Import and Synchronization](/design/import-synchronization/) — "What Gets Uploaded"): +the encrypted original, encrypted derivatives (thumbnails, previews, LQIP), the +encrypted CBOR metadata blob, and the encrypted provenance blob (see +[Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)). +The blob store does not distinguish them — every blob is just content-addressed +ciphertext. The mapping from an asset to its constituent blobs, and the role of +each blob, lives entirely in PostgreSQL. + +### Recovering the Index from Blobs Alone + +The PostgreSQL index is authoritative but **not the only copy** of what the +server knows. Every blob carries enough server-visible structural metadata — +the [unencrypted portion](/design/cryptography/#provenance-and-signed-manifest) +of the asset manifest — to rebuild the index row that referenced it. This is +the server-side counterpart of the recovery-first principle that lets a client +rebuild its index from CBOR sidecars. + +The server-visible portion of a blob includes: + +- `crypto_suite_id`, `protocol_version`, `amk_version` — what bundle of + primitives encrypted this asset and which album epoch +- the ciphertext hash (`hash.value`) and declared size — content address and + storage attribution +- `created_by_user`, `created_by_device`, `album_id`, `file_id`, + `prior_provenance_hash`, `action` — owner, provenance chain link, and + lifecycle action +- the device's hybrid signature — provenance attribution; verifiable against + the public device directory even without any key Capsule's server holds + +A rebuild walks `blobs/`, reads the manifest envelope of each blob, verifies +the device signature against the cached device directory, and writes an index +row. The rebuild is idempotent: re-running it against an existing index +produces no changes. The full envelope check list a server runs at recovery is +the same list it runs at write time — see +[Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants). + +A blob whose manifest envelope fails structural validation during rebuild is +**quarantined**, not silently dropped — moved to `{blob_root}/quarantine/` +with a sibling `.reason.json` recording the rejection code. This guarantees +that an unrecoverable byte sequence is preserved for forensic inspection +rather than vanishing on rebuild. + +Operationally the rebuild is invoked when a PostgreSQL restore is incomplete +or a logical-corruption event is detected; it is **never** the hot path. The +hot path runs through the authoritative PG index. The recovery path's job is +to make the index reconstructible if PG is lost, not to substitute for it. + +### Manifest Envelope Validation (Server-Side) + +Every write — `POST /upload`, `PATCH /upload/{id}`, finalization, any +lifecycle manifest, any federation pull — passes through structural +validation of the manifest envelope **before** any state is persisted. The +server holds no decryption key, so it cannot verify the cryptographic +signatures; but it does enforce that every envelope field is present, +structurally well-formed, within bounds, and consistent with the album the +manifest claims to address. + +The complete refuse-by-default checklist is owned by +[Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants). +A rejection at any check returns the rejection code listed there and writes +no state. This is what defeats the version-mismatched-client damage class +without requiring the server to hold a key. + +### Content-Addressing and Deduplication + +Naming blobs by their [ciphertext content hash](/design/cryptography/#primitives-inventory) makes deduplication free: a blob already present +is never stored twice. At upload-session creation the server checks for a blob +with the same content hash already owned by the uploader — an exact +local-and-remote duplicate is rejected up front, and an asset that exists +remotely under a *different* ciphertext resolves to a **merge** that links the +existing blob rather than storing a second copy (see +[Import and Synchronization](/design/import-synchronization/) — "Deduplication and +Merge"). Reference counting in PostgreSQL determines when a blob is genuinely +unreferenced. + +### PostgreSQL: What the Server Knows + +The server index records only what can be known without a key: + +- `asset_id`, `owner_id`, `album_id`, `upload_user_id` +- references to the asset's blobs (their [content hashes](/design/cryptography/#primitives-inventory)) and each blob's role +- `amk_version` — which album-key epoch encrypted the asset (see + [Cryptography](/design/cryptography/)) +- declared ciphertext size and `content_type` +- the `uploaded` flag and server-visible lifecycle state +- creation/modification timestamps and provenance records (see + [Cryptography](/design/cryptography/) — "Provenance of Library Modifications") + +No plaintext capture date, dimensions, EXIF, tags, or filename ever reaches the +server. Those live inside the encrypted metadata blob (see [Metadata Encryption](/design/cryptography/#metadata-encryption)) and are readable only by authorized clients. + +Session creation writes a *pending* asset row (`uploaded = false`) that reserves +the asset ID the bundle's blobs reference; finalization flips it. See the +session lifecycle in [Import and Synchronization](/design/import-synchronization/). + +### Ownership, Partitioning, and Quota + +`owner_id` is the billing and namespace entity; the `owner_id` → user-set +mapping lives in PostgreSQL and is mirrored as an MLS group (the Owner Group +Key — see [Cryptography](/design/cryptography/)). Storage quota is accounted to +`upload_user_id`, which is distinct from `owner_id`. The blob store itself is +not partitioned by owner — content-addressing is global — but every blob +*reference* is owner-scoped in PostgreSQL, and deduplication checks are scoped +to the owner. + +### Deletion and Garbage Collection + +The server cannot read an asset's `is_deleted` flag — it is inside the encrypted +metadata blob. Lifecycle transitions are therefore signalled by the client and +recorded as server-visible state on the asset row; soft delete is a state +change, not a file operation. Permanent deletion drops the asset's blob +references, and a blob whose reference count reaches zero becomes eligible for a +garbage-collection sweep. Consistent with the data-integrity principle, blob +removal is conservative — a blob is deleted only after its references are +provably gone. + +## Client Filesystem + +Clients hold keys, so a client stores plaintext. Desktop clients keep a +self-contained library directory; mobile clients use platform-sandboxed storage. + +What a client keeps locally depends on its sync setting — *metadata only*, +*metadata + thumbnails*, or *metadata + thumbnails + original* (see +[Import and Synchronization](/design/import-synchronization/) — "Synchronization +Scope"). A library therefore routinely contains assets whose original is +server-only, and the layout must represent an asset whether or not its original +bytes are present locally. + +### Desktop Library Layout ```text {library_root}/ ├── media/ -│ └── {YYYY}/ -│ └── {YYYY-MM}/ -│ ├── {uuid}.{ext} # Media file (read-only after commit) -│ └── {uuid}.cbor # Sidecar metadata (mutable) +│ └── {YYYY}/{YYYY-MM}/ +│ ├── {uuid}.{ext} # original media (plaintext; absent if not synced locally) +│ ├── {uuid}.cbor # canonical metadata sidecar (plaintext, signed) +│ └── {uuid}.provenance.cbor # append-only signed provenance chain +├── cache/ +│ ├── thumbnails/{size}/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{fmt} +│ ├── meta/{uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor # verbose parsed metadata +│ └── transcodes/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{ext} ├── index/ -│ ├── library.sqlite # Rebuildable query cache -│ ├── meta/ -│ │ └── {uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor # Full parsed metadata (ephemeral) -│ ├── transcodes/ -│ │ ├── h264/ -│ │ │ └── {uuid[0:2]}/{uuid[2:4]}/{uuid}.mp4 # H.264/MP4 transcode (ephemeral) -│ │ └── live/ -│ │ └── {uuid[0:2]}/{uuid[2:4]}/{uuid}.mov # Live Photo video (ephemeral) -│ └── thumbnails/ -│ └── {xs|s|m|l|xl|o}/ -│ └── {uuid[0:2]}/{uuid[2:4]}/ -│ ├── {uuid}.jxl # JXL (default) -│ └── {uuid}.webp # WebP (fallback) +│ └── library.sqlite # rebuildable query + vector index └── .library/ - ├── version.cbor # Library schema version - ├── config.cbor # User preferences and library state - ├── lock # Process lock file (ephemeral) + ├── version # library schema version + ├── config # user preferences and library state + ├── lock # process lock file (ephemeral) ├── trash/ - │ └── {uuid}.{ext} # Soft-deleted media (30-day quarantine) - └── migrations/ # Schema upgrade scripts + │ └── {uuid}.{ext} # soft-deleted media + └── quarantine/ + ├── {uuid}.{ext} # irreplaceable bytes that failed validation + └── {uuid}.reason.json # parse error / signature failure / schema mismatch ``` -### File Naming and Placement - -**Naming**: `{UUIDv7}.{extension}` — lowercase UUID, lowercase extension. - -- UUIDv7 is time-ordered, sortable, and globally unique. -- Extension is the original file extension, lowercased (e.g., `.jpg`, `.arw`). -- All UUIDs and path components are always lowercase. Never mix case. - -**Placement**: `media/{YYYY}/{YYYY-MM}/` based on the media capture timestamp (see EXIF Timezone Resolution for the fallback chain). - -**Filesystem requirements**: FAT32 is not supported (4 GB file limit, 512-entry directory limit). exFAT, APFS, NTFS, and any modern Linux filesystem are supported. Filesystem type is not checked at runtime. - -**Case sensitivity**: macOS defaults to case-insensitive. Always write lowercase; query case-insensitively. - ---- - -## Supported File Formats - -Capsule accepts all formats supported by [rawshift](https://github.com/justin13888/rawshift#format-support), the decoding and thumbnail-generation backend. - -At import time, files are filtered by extension (case-insensitive). Files with unrecognized extensions are skipped and included in the import summary as `Unsupported`. XMP sidecar files (`.xmp`) are a recognized extension and are handled as stack members (see Stack Detection and Pairing). ZIP archives and other container formats are not extracted. - ---- - -## Data Model - -### Asset-to-File Model - -**Every file is its own asset.** Each imported file receives a unique UUID, a CBOR sidecar, and an `assets` row. Related files (e.g., a RAW+JPEG pair, a burst sequence) are connected via the **stack relationship layer** — a separate set of tables (`asset_stacks`, `stack_members`) — not by collapsing them into a single asset. - -This model is consistent with the Recovery-First principle: every file is independently recoverable from its sidecar alone, with no dependency on a companion file. - -**Version prioritization**: Which file to show in the grid for a collapsed stack is determined by `asset_stacks.cover_asset_id` (user-overridable) falling back to `asset_stacks.primary_asset_id` (set at detection time). The `member_role` and `sequence_order` fields on `stack_members` encode the semantic relationship and display order within an expanded stack. See Timeline Ordering and Stack Display. - -### CBOR Sidecar Schema - -Every `.cbor` sidecar must include the fields below. Fields marked **REQUIRED** must be non-null on every write for new imports. Fields marked **WRITE-ONCE** are set at import and must not be overwritten on subsequent metadata updates, even if the caller provides a new value. Nullable fields may be absent on read — sidecars written by an older version will not have fields added in later versions. - -```text -version u8 REQUIRED. Schema version. Current: 1. -uuid string REQUIRED. UUIDv7 of this asset (matches filename stem). -asset_type string REQUIRED. "photo" | "video" | "motion_photo". -original_filename string REQUIRED. Filename at import time (e.g., "IMG_1234.JPG"). -import_timestamp i64 REQUIRED. Unix epoch seconds UTC. When this file was imported. -modified_timestamp i64 REQUIRED. Unix epoch seconds UTC. Last metadata edit time. - Set to import_timestamp at import. Updated on every sidecar write. -hash_blake3 string REQUIRED. 64-char lowercase hex. BLAKE3 hash of the media file bytes. -file_size u64 REQUIRED. Byte size of the media file. -is_deleted bool REQUIRED. Soft-delete flag. true = in trash. -rating u8 REQUIRED. 0–5 star rating. 0 = unrated. -tags [string] REQUIRED. User-assigned tags. Empty list if none. -import_mode string REQUIRED. WRITE-ONCE. "copy" | "move". Source handling at import. -importer_version string REQUIRED. WRITE-ONCE. Semver of Capsule client that imported this file. -rawshift_version string REQUIRED. WRITE-ONCE. Semver of rawshift at import time. - -capture_timestamp i64 Nullable. Local wall-clock time stored as if it were Unix epoch - (no UTC offset applied). Fallback chain: EXIF DateTimeOriginal → - EXIF DateTime → file mtime → import_timestamp. - WARNING: Do NOT use for timeline sorting. Use capture_utc instead. - Preserved for repair and display of the raw EXIF value. -capture_utc i64 Nullable. Capture time as Unix epoch seconds UTC. - Null when capture_tz_source = "floating". -capture_tz string Nullable. IANA timezone name (e.g., "America/New_York") or UTC - offset string (e.g., "+09:00"). Null = floating/unknown. -capture_tz_source string Nullable. "offset_exif" | "gps_lookup" | "floating". -tz_db_version string Nullable. IANA tz-db release tag (e.g., "2024b") used for GPS - lookup. Non-null only when capture_tz_source = "gps_lookup". - -width u32 Nullable. Pixel width. Null for unknown/corrupt files. -height u32 Nullable. Pixel height. Null for unknown/corrupt files. -duration_ms u64 Nullable. Duration in milliseconds. Null for photos. - -stack_hint map Nullable. Mutable. Stack reconstruction hint used to rebuild - stack relationships during a full index rebuild. - Null if this asset has never been part of a stack. - Set at import for auto-detected stacks. Written or updated - when the user manually stacks, re-stacks, or changes roles. - Set to null when the user dissolves a stack. - .detection_key string REQUIRED if stack_hint present. Shared key used to group - peers during rebuild. Always lowercase. - - "filename_stem": lowercase stem (e.g., "img_1234") - - "content_identifier": Apple Live Photo UUID string - - "timecode": shared SMPTE timecode string - - "manual": UUIDv7 of the stack (stable, assigned at stack creation) - .detection_method string REQUIRED if stack_hint present. - "filename_stem" | "content_identifier" | "timecode" | "manual" - .member_role string REQUIRED if stack_hint present. Role of this file within its stack. - "primary" | "raw" | "video" | "audio" | "depth_map" | "processed" - | "source" | "alternate" | "sidecar" | "proxy" | "master" - .stack_type string REQUIRED if stack_hint present. Classification of the stack. - "raw_jpeg" | "burst" | "live_photo" | "portrait" - | "smart_selection" | "hdr_bracket" | "focus_stack" - | "pixel_shift" | "panorama" | "proxy" | "chaptered" - | "dual_audio" | "custom" - -album_id string Nullable. Album UUID. Null if not assigned. - One album per asset; multi-album support is a future concern. - -deleted_at i64 Nullable. Unix epoch seconds UTC. Null if not deleted. - -camera_make string Nullable. EXIF Make (e.g., "Apple"). Null if absent. -camera_model string Nullable. EXIF Model (e.g., "iPhone 15 Pro"). Null if absent. -gps_lat f64 Nullable. Decimal degrees latitude. Null if absent. -gps_lon f64 Nullable. Decimal degrees longitude. Null if absent. -``` - -**Forward compatibility**: Unknown fields encountered on read must be preserved verbatim on write. Never deserialize into a strict struct that drops unknown keys — use a map-based merge strategy. - -**`stack_hint` mutability rules**: All `stack_hint` writes follow the atomic temp + rename sidecar write pattern. When multiple assets in a stack are affected by a single operation (manual stack, re-stack, dissolve), all their sidecars are updated atomically as a group — write all `.cbor.tmp` files first, then rename each in sequence. If any rename fails, delete all `.tmp` files and do not commit any writes. See Stack Metadata Writes. - -### Stack Reconstruction on Rebuild - -When rebuilding the index from sidecars (empty DB + sidecars present, or explicit repair): - -1. Scan all `media/**/*.cbor` sidecars. -2. For each sidecar with a non-null `stack_hint`: group by `(detection_key, detection_method)`. -3. For each group: create one `asset_stacks` row using the `stack_type` from the hint. The file with `member_role: "primary"` becomes `primary_asset_id`; if no primary is present, use the file with the lowest `capture_utc`. Set `is_auto_generated` based on whether `detection_method` is `"manual"` (false) or any other value (true). -4. Insert one `stack_members` row per file in the group with `sequence_order`, `member_role` from the hint. -5. Recompute `is_stack_hidden` for all stack members (see Timeline Ordering and Stack Display). -6. Assets with `stack_hint: null` are imported as standalone (no stack row created). - -### Comprehensive Metadata Cache (`.meta.cbor`) - -Full decoded metadata for formats that produce verbose parsed output (RAW files especially) is stored at: - -- Client: `index/meta/{uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor` -- Server: `cache/meta/{uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor` - -These files are **ephemeral** — deleted and regenerated at any time, including on app upgrade if the parser version changes. - -**Why separate from the sidecar**: Raw camera metadata can be hundreds of fields per file and is parser-version-sensitive. Keeping it separate avoids sidecar bloat and prevents parser changes from requiring sidecar migrations. - -**Generation timing**: Deferred — generated on first access (e.g., when the client opens the detail view). Not generated during import. Background pre-generation is permitted as a low-priority idle task after import completes but must not block import progress reporting. - -**Authority**: `.meta.cbor` is a pure cache. Parser-version-sensitive fields (verbose EXIF, proprietary maker notes) never get promoted to the sidecar. Each device generates `.meta.cbor` independently from the media file using its local rawshift version. `.meta.cbor` is never synced between devices. - -#### `.meta.cbor` Schema - -```text -version u8 REQUIRED. Schema version. Current: 1. -uuid string REQUIRED. UUIDv7 of the asset this metadata belongs to. -schema string REQUIRED. Parser family identifier. - Format: "capsule.meta.." - Examples: "capsule.meta.exif.v1", "capsule.meta.raw.v1" - Used to detect stale caches from a different parser generation. -rawshift_version string REQUIRED. Semver of rawshift used to generate this file. -generated_at i64 REQUIRED. Unix epoch seconds UTC. When this file was generated. -``` - -All remaining fields are schema-specific. Unknown fields must not be interpreted by code that does not recognize the `schema` value. - -**Stale detection**: - -- If `schema` does not match the current app's expected schema string → treat as missing, regenerate. -- If `rawshift_version` **major** component differs from current rawshift major → regenerate. -- Minor/patch differences: preserve (rawshift maintains semver compat within a major). - -### Library Config Schema (`.library/config.cbor`) - -```text -schema_version u8 REQUIRED. Config schema version. Current: 1. -library_name string REQUIRED. User-visible name for this library. -last_opened_at i64 REQUIRED. Unix epoch seconds UTC. Updated on every library open. -last_scrubbed_at i64 Nullable. Unix epoch seconds UTC. Updated after each .tmp cleanup. - Null if never scrubbed. -``` - -`last_opened_at` triggers a full index rebuild on startup if >30 days have elapsed, indicating the index may have drifted from external edits or OS file operations. - -`last_scrubbed_at` controls the 7-day cooldown for `.tmp` cleanup scans (see Temp File Staging and Recovery). - -### SQLite Index Schema - -The index is fully rebuildable by scanning `media/**/*.cbor` sidecars. Schema is intentionally minimal — only fields required for querying, browsing, and duplicate detection. Verbose metadata lives in `.meta.cbor`, not here. - -```sql -CREATE TABLE assets ( - uuid TEXT PRIMARY KEY, - asset_type TEXT NOT NULL, -- 'photo' | 'video' | 'motion_photo' - capture_timestamp INTEGER NOT NULL, -- Local wall-clock as epoch; NOT UTC - capture_utc INTEGER, -- Unix epoch seconds UTC; null if floating - capture_tz_source TEXT, -- 'offset_exif' | 'gps_lookup' | 'floating' - import_timestamp INTEGER NOT NULL, -- Unix epoch seconds UTC - hash_blake3 TEXT NOT NULL, - width INTEGER, - height INTEGER, - duration_ms INTEGER, -- null for photos - stack_id TEXT, -- FK to asset_stacks.id; null if unstacked - is_stack_hidden INTEGER NOT NULL DEFAULT 0, -- 1 = hidden in collapsed stack view - chromahash TEXT, -- base64 Chromahash; null if not generated - dominant_color TEXT, -- '#rrggbb' hex; null if not generated - album_id TEXT, - rating INTEGER NOT NULL DEFAULT 0, - is_deleted INTEGER NOT NULL DEFAULT 0, -- 0/1 boolean - deleted_at INTEGER -- null if not deleted -); - -CREATE TABLE asset_stacks ( - id TEXT PRIMARY KEY, -- UUIDv7 - stack_type TEXT NOT NULL, -- StackType value (e.g., 'raw_jpeg') - primary_asset_id TEXT NOT NULL REFERENCES assets(uuid), - cover_asset_id TEXT REFERENCES assets(uuid), - is_collapsed INTEGER NOT NULL DEFAULT 1, -- 0/1; 1 = collapsed in grid view - is_auto_generated INTEGER NOT NULL DEFAULT 1, -- 0/1; 0 = manually created - created_at INTEGER NOT NULL, -- Unix epoch seconds UTC - modified_at INTEGER NOT NULL -- Unix epoch seconds UTC -); - -CREATE TABLE stack_members ( - id TEXT PRIMARY KEY, -- UUIDv7 - stack_id TEXT NOT NULL REFERENCES asset_stacks(id), - asset_id TEXT NOT NULL REFERENCES assets(uuid), - sequence_order INTEGER NOT NULL, -- Display order within expanded stack - member_role TEXT NOT NULL, -- MemberRole value (e.g., 'primary', 'raw') - created_at INTEGER NOT NULL, -- Unix epoch seconds UTC - UNIQUE (stack_id, asset_id) -); - -CREATE TABLE asset_tags ( - uuid TEXT NOT NULL REFERENCES assets(uuid), - tag TEXT NOT NULL, - PRIMARY KEY (uuid, tag) -); - --- Core asset indices -CREATE INDEX idx_assets_hash ON assets(hash_blake3); -CREATE INDEX idx_assets_utc ON assets(capture_utc, capture_timestamp); -CREATE INDEX idx_assets_deleted ON assets(is_deleted); -CREATE INDEX idx_assets_album ON assets(album_id); -CREATE INDEX idx_assets_stack ON assets(stack_id); --- Composite index for the main timeline query (is_deleted=0, is_stack_hidden=0) -CREATE INDEX idx_assets_timeline ON assets(is_deleted, is_stack_hidden, capture_utc, capture_timestamp); - --- Stack indices -CREATE INDEX idx_stacks_type ON asset_stacks(stack_type); -CREATE INDEX idx_stacks_primary ON asset_stacks(primary_asset_id); -CREATE INDEX idx_stack_members_stack ON stack_members(stack_id); -CREATE INDEX idx_stack_members_asset ON stack_members(asset_id); - -CREATE INDEX idx_tags_tag ON asset_tags(tag); -``` - -Schema version is stored via `PRAGMA user_version = 1`. Increment on any structural change. Because the index is always rebuildable, migrations may drop and rebuild rather than `ALTER TABLE`. - -`capture_utc` is preferred for timeline sorting. Fall back to `capture_timestamp` only when `capture_tz_source = 'floating'` (UTC is unknowable). - ---- - -## Import Pipeline - -Import is a four-phase pipeline. Each phase produces a typed result passed to the next. The pipeline is fully offline — no network calls at any point. - -### Phase 1 — Scan (fast, read-only, no hashing) - -- **Input**: One or more file or directory paths from the user. -- **Directory input**: Recursive traversal, extension-filtered (case-insensitive). -- **Per file**: Parse header via `capsule-media` → `MotionPhotoInfo { format, content_identifier }`; detect Apple Live Photo pairs within the same source directory by `content_identifier`; detect stacks by filename stem and file type (see Stack Detection and Pairing). -- **Output**: `ScanResult { candidates: Vec }` where each `ImportCandidate` is: - ``` - ImportCandidate { - source_paths: Vec, - detected_type: AssetType, - stack_type: Option, // None if standalone - detection_method: Option, // None if standalone - detection_key: Option, // None if standalone - members: Vec<(Path, MemberRole)>, // role per file - } - ``` - `StackType` and `MemberRole` map directly to the server enum values (`"raw_jpeg"`, `"live_photo"`, etc.). -- Cancellable between files. - -### Phase 2 — Plan (hashing, I/O-intensive) - -- For each candidate: compute BLAKE3 of source file(s) → query SQLite for duplicates (Phase A hash — see BLAKE3 Hash Timing). -- Determine action per candidate: `Import | SkipDuplicate | SkipUnsupported | SkipError`. -- If `target_album_id` is provided: validate it exists in SQLite. If not found, fail the entire plan before any files are copied. -- **Output**: `ImportActionPlan { actions: Vec, counts: PlanCounts }`. -- Cancellable between files. - -### Phase 3 — Review (UI, no side effects) - -Caller (UI layer) receives `ImportActionPlan` and presents counts. User confirms or cancels; optionally selects target album, `import_mode`, and whether to force-import duplicates. - -### Phase 4 — Execute (commit, streaming progress) - -Walks `ImportActionPlan`; commits each `Import` action per the atomic two-phase commit spec below. Emits `ImportProgressEvent` per file. Produces `ImportExecutionSummary` at completion. - -#### Atomic Two-Phase Commit (per file) - -```text -1. Generate UUIDv7. -2. Create target directory if needed (idempotent mkdir). -3. Copy source to {uuid}.{ext}.tmp (same directory as the committed file). -4. Compute BLAKE3 of .tmp → verify equals source_hash (Phase B integrity check). - On mismatch: delete .tmp, record CorruptTransfer, skip this file. -5. Build sidecar struct with all REQUIRED fields populated. - If this file has a detected stack: populate stack_hint. -6. Write sidecar to {uuid}.cbor.tmp. -7. Atomic rename {uuid}.{ext}.tmp → {uuid}.{ext}. -8. Atomic rename {uuid}.cbor.tmp → {uuid}.cbor. -9. Insert row into SQLite assets table. Failure here = stale cache only; file is already committed. -10. Update stack tables in SQLite (see Stack Index Updates). -``` - -**Invariant**: Both files commit or neither does. If step 6 fails, delete `{uuid}.{ext}.tmp` before returning. If step 7 succeeds but step 8 fails, the media file exists without a sidecar (orphaned) — log as `OrphanedMedia`; attempt cleanup on next startup scrub. - -**Move mode**: Delete source files only after step 10 succeeds for all files in the group (see Copy vs. Move Policy). - -#### Stack Index Updates (step 10) - -After the `assets` row is inserted, update stack tables if the file belongs to a stack: - -```text -1. Check if an asset_stacks row exists for this (detection_key, detection_method) pair - by looking up stack_members for the existing candidates in this batch. -2. No existing stack: - - Create asset_stacks row: id = new UUIDv7, stack_type from candidate, - primary_asset_id = this UUID (tentative), cover_asset_id = null initially, - is_collapsed = 1, is_auto_generated = 1, created_at = now, modified_at = now. - - Insert stack_members row: stack_id, asset_id, sequence_order = 0, member_role. - - This file is the cover for now; set is_stack_hidden = 0. -3. Existing stack: - - Insert stack_members row with next available sequence_order. - - If member_role = "primary": update asset_stacks.primary_asset_id. - - If no cover_asset_id set and member_role = "primary": update cover_asset_id too. - - Non-cover member: set is_stack_hidden = 1 on this asset's row. -4. Finalize cover/primary after all batch files are committed: - - Prefer the file with member_role = "primary" as both primary_asset_id and cover_asset_id. - - If multiple primaries: use the one with the lowest sequence_order. - - Recompute is_stack_hidden for all stack members in the batch. -``` - ---- - -## Import Details - -### Copy vs. Move Policy - -The pipeline accepts an explicit `import_mode` parameter: `"copy"` (default) or `"move"`. - -**Copy**: Source files are never touched. Safe for read-only media (SD cards, network shares, shared folders). - -**Move**: Source files are deleted only after the destination is fully committed — both renames and the SQLite insert have succeeded. Never delete a source file speculatively or mid-flight. - -**Move with stacks**: All files in the stack must be individually committed (through step 10) before any source in the stack is deleted. If the stack commits partially (one file fails), no sources are deleted. - -`import_mode` is WRITE-ONCE in the sidecar and preserved on all subsequent metadata updates. - -### BLAKE3 Hash Timing and Copy Integrity - -Two sequential hash phases are required for every imported file. - -**Phase A — Duplicate detection** (before any library I/O): - -```text -1. Open source file. -2. Compute BLAKE3 → source_hash. -3. Query SQLite WHERE hash_blake3 = source_hash. -4. If match found → handle per duplicate policy; skip copy. -``` - -**Phase B — Copy integrity verification** (after writing `.tmp`): - -```text -5. Copy source to {uuid}.{ext}.tmp. -6. Compute BLAKE3 of .tmp → dest_hash. -7. If source_hash != dest_hash: - delete .tmp - record CorruptTransfer - skip this file — do not proceed -8. hash_blake3 stored in sidecar = source_hash (verified equal to dest_hash). -``` - -Phase A hashes the source on the source filesystem (works for read-only media). Phase B hashes the destination `.tmp` on the library filesystem, catching silent bitrot or hardware faults during the copy. A file whose integrity cannot be verified is never committed. - -### Duplicate Detection - -At Phase A, if a matching `hash_blake3` is found in SQLite: - -- **Default**: Skip import; log existing asset UUID in summary as `DuplicateSkipped`. -- **Force re-import** (user-triggered via Phase 3): Assign a new UUID and import as a separate asset. - -Duplicates do not block bulk imports. They are resolved per-candidate in Phase 2 and reported in the final summary. - -### Stack Detection and Pairing - -Stack detection during the scan phase identifies multi-file relationships and classifies them into the `StackType` taxonomy. Detection is best-effort and fully offline. Cross-directory pairing is not performed. - -#### Stack Type Reference - -| Stack Type | `stack_type` value | Detection method | Detected by | Valid member roles | -| ----------------- | ------------------- | ---------------------- | ---------------------------------------------------- | ----------------------------------------------- | -| RAW + JPEG | `raw_jpeg` | `filename_stem` | Matching stem with RAW + primary extension | `primary`, `raw`, `sidecar` | -| Burst | `burst` | `filename_stem` | Sequential stems + EXIF burst sequence metadata | `primary`, `alternate` | -| Live Photo | `live_photo` | `content_identifier` | Apple `content_identifier` matching HEIC + MOV | `primary`, `video` | -| Portrait / Depth | `portrait` | `filename_stem` | Depth map companion (`.heic` + depth-flagged variant)| `primary`, `depth_map` | -| Smart Selection | `smart_selection` | `manual` (AI-created) | AI-similarity grouping (post-import AI pipeline) | `primary`, `alternate` | -| HDR Bracket | `hdr_bracket` | `filename_stem` | Sequential stems + EXIF EV values (±EV bracket set) | `source`, `processed` | -| Focus Stack | `focus_stack` | `filename_stem` | Sequential stems + EXIF focus distance progression | `source`, `processed` | -| Pixel Shift | `pixel_shift` | `filename_stem` | Proprietary maker note (Olympus, Sony) pixel shift | `source`, `processed` | -| Panorama | `panorama` | `filename_stem` | Sequential stems + EXIF panorama sequence metadata | `source`, `processed` | -| Proxy / Optimized | `proxy` | `filename_stem` | Matching stem with master (8K RAW) + proxy (HD) | `master`, `proxy` | -| Chaptered Video | `chaptered` | `filename_stem` | GoPro/action cam chapter naming (`GOPR001`, `GP001`)| `source` (each chapter) | -| Dual-System Audio | `dual_audio` | `timecode` | Shared SMPTE timecode (video + external WAV/AIFF) | `primary`, `audio` | -| Manual | `custom` | `manual` | User-created in UI | Any; user-assigned | - -**RAW extensions** (`member_role: "raw"`): - -```text -ARW, CR2, CR3, NEF, NRW, RW2, ORF, PEF, RAF, SRW, -3FR, DCR, DNG, ERF, MEF, MOS, MRW, PTX, RWL, X3F -``` - -DNG is classified as raw even though it is a standardized format — it is typically the lossless capture file and the JPEG is the display file. - -**Primary extensions** (`member_role: "primary"` in a `raw_jpeg` stack): - -```text -JPG, JPEG, HEIC, HEIF, AVIF, PNG, TIFF, TIF -``` - -**XMP sidecar extension** (`member_role: "sidecar"`): - -```text -XMP -``` - -**Pairing rules (RAW+JPEG)**: - -- Match by identical filename stem, case-insensitive, within the same source directory. -- One primary + one or more raws → all share a stack; primary has `member_role: "primary"`. -- Multiple raw formats for the same stem (e.g., `.ARW` + `.DNG` + `.JPG`) → all imported into the same stack; each raw gets `member_role: "raw"`. -- No primary found for a raw → raw imported as standalone (`stack_hint: null`). -- XMP paired to the raw of the same stem; if no raw exists, XMP imports standalone. -- Apple Live Photo pairing (by `content_identifier`) is handled separately and is not stem-based. - -**XMP sidecar handling**: An `.xmp` file receives its own UUID and is stored at `media/{YYYY}/{YYYY-MM}/{uuid}.xmp`. It shares the `stack_hint.detection_key` of the RAW it is paired with, with `member_role: "sidecar"` and `stack_type: "raw_jpeg"`. XMP content is not parsed into the Capsule sidecar — preserved verbatim as an opaque binary blob for third-party tool compatibility (Lightroom, Capture One, etc.). Capsule never modifies the XMP file's content. - -### Stack Metadata Writes - -Edits (tags, rating, album) applied to a stack are written atomically: - -1. Write all updated sidecars to `.tmp` files. -2. Atomic rename each `.tmp` → final in sequence. -3. If any rename fails: delete all `.tmp` files; do not commit any writes in the stack. -4. Update SQLite for all rows after all renames succeed. - -**Manual stack/re-stack/dissolve**: Updating `stack_hint` across multiple sidecars follows the same atomic group write pattern. A dissolve sets `stack_hint` to null on all affected sidecars and removes the `asset_stacks` and `stack_members` rows from SQLite. A re-stack (moving an asset from one stack to another) updates the sidecar's `stack_hint` and migrates `stack_members` rows. - -### Partial Stack Duplicate Handling - -| Primary state | RAW state | Action | -| ------------- | --------- | ---------------------------------------------------------------------------------------- | -| Duplicate | Duplicate | Skip both. Log both existing UUIDs. | -| Duplicate | New | Import RAW as standalone (`stack_hint: null`). Log skipped primary. | -| New | Duplicate | Import primary as standalone. Log skipped RAW. | -| New | New | Standard stack import. | - -For stacks of 3+ files: same logic — any non-duplicate member is imported as a standalone asset. - -Never link a new file into an existing stack — that would require modifying the existing stack's `primary_asset_id` or cover selection, and silently mutating another user's import. The user must manually merge stacks in the UI. - -Summary outcome: `PartialStackImported` — includes imported UUID(s) and skipped duplicate UUID(s). - -### Timeline Ordering and Stack Display - -**Sorting key**: `capture_utc` (preferred), falling back to `capture_timestamp` when `capture_tz_source = 'floating'`. - -**Collapsed stacks in the timeline**: When a stack has `is_collapsed = 1`, only the cover asset represents the stack in the timeline grid. The cover is determined by: -1. `asset_stacks.cover_asset_id` if explicitly set (user override). -2. Otherwise, `asset_stacks.primary_asset_id`. - -All other stack members have `is_stack_hidden = 1` on their `assets` row and are excluded from the main timeline query. - -**Timeline query** (efficient): - -```sql -SELECT * FROM assets -WHERE is_deleted = 0 AND is_stack_hidden = 0 -ORDER BY COALESCE(capture_utc, capture_timestamp) DESC -LIMIT ? OFFSET ?; -``` - -The composite index `idx_assets_timeline ON assets(is_deleted, is_stack_hidden, capture_utc, capture_timestamp)` makes this efficient even for multi-million-asset libraries. - -**Stack expansion**: When the user expands a stack in the UI, query: - -```sql -SELECT a.* FROM stack_members sm -JOIN assets a ON sm.asset_id = a.uuid -WHERE sm.stack_id = ? -ORDER BY sm.sequence_order ASC; -``` - -Display all members inline in the expanded area. `is_stack_hidden` is not consulted for this query — it is only a timeline optimization. - -**Maintaining `is_stack_hidden`**: This flag is a denormalized cache. It is recomputed whenever: -- A stack is created or dissolved. -- `is_collapsed` changes on a stack. -- `primary_asset_id` or `cover_asset_id` changes. -- A member is added, removed, or reassigned. - -The flag is never stored in the sidecar — it is a pure DB optimization. On index rebuild, it is recomputed from the reconstructed stack state. - -**Uncollapsed stacks** (`is_collapsed = 0`): All members have `is_stack_hidden = 0` and appear individually in the timeline at their own `capture_utc` positions. The UI visually groups them (shared border, stack count badge) but each occupies its own grid cell. - -### Motion Photo Detection - -**Detection algorithm** (fully offline, via `capsule-media`): - -```text -1. Parse candidate file via capsule-media → MotionPhotoInfo { format, content_identifier }. -2. Dispatch on format: - - GoogleMicroVideo → single file, self-contained; asset_type = "motion_photo" - - SamsungMotion → single file, self-contained; asset_type = "motion_photo" - - AppleLivePhoto → still component; resolve video companion (see below) - - Unknown → treat as regular photo/video based on extension -``` - -**Apple Live Photo pairing** (scan phase): - -```text -1. Extract content_identifier from .HEIC - (XMP field: com.apple.quicktime.content.identifier). -2. Scan the same source directory for a .MOV with a matching content_identifier. -3. Paired .MOV found → treat the pair as a single motion_photo asset with: - stack_type = "live_photo", detection_method = "content_identifier", - detection_key = content_identifier value. - HEIC gets member_role = "primary", MOV gets member_role = "video". -4. No .MOV found → import .HEIC as asset_type = "photo" (still only). - Log outcome as LivePhotoWithoutPair. -``` - -**Storage by format**: - -| Format | Primary file | Video component | Sidecars | -| -------------------- | ----------------------------- | ------------------------------------------------ | -------- | -| Google MicroVideo | `{uuid}.jpg` (self-contained) | Embedded — no separate file | 1 | -| Samsung Motion (SEF) | `{uuid}.jpg` | Embedded | 1 | -| Apple Live Photo | `{uuid}.heic` | `transcodes/live/{shard}/{uuid}.mov` (ephemeral) | 1 | - -The Apple Live Photo `.MOV` is stored at `index/transcodes/live/{uuid[0:2]}/{uuid[2:4]}/{uuid}.mov` (client) or `cache/transcodes/live/…` (server). It is ephemeral — deleted and re-sourced on re-import or server fetch. The `live/` subdirectory separates it from the H.264/MP4 transcode at `transcodes/h264/{shard}/{uuid}.mp4`; path derivation for both is unambiguous from UUID + type alone. - -**Thumbnails**: Generated from the still frame using the standard thumbnail pipeline. No special handling. - -**Playback**: - -| Option | Container | Codec | When | -| ---------- | ---------------------------- | -------------------- | -------------------- | -| Original | MOV / HEIC / vendor-specific | HEVC or vendor codec | Platform supports it | -| Transcoded | MP4 | H.264 (AVC) | Universal fallback | - -H.264/MP4 was chosen over AV1/VP9: motion photos are short, low-bitrate clips where universal hardware decode support outweighs compression efficiency. - -### Album Assignment at Import Time - -The pipeline accepts an optional `target_album_id: Option`. If provided, all committed assets in the batch receive `album_id = target_album_id`. Stack members all receive the same `album_id`. - -**Constraints**: - -- The album must already exist in the library. Validated in Phase 2 — if not found in SQLite, the entire plan fails before any files are copied. -- No album is auto-created during import. -- `target_album_id` is uniform across the batch; per-file album assignment is not supported at import time. -- Duplicate-skipped files are not re-assigned — the existing asset's `album_id` is not modified. - -### Import Cancellation - -Cancellation is honoured only between files, not mid-file. When the user cancels: - -1. Finish writing the current file pair (`{uuid}.{ext}` + `{uuid}.cbor`) to completion. -2. Do not start the next file. -3. All fully committed files remain in the library. -4. Update the SQLite index for committed files before returning. - -Partial stacks (e.g., RAW written but JPEG cancelled) are committed as-is — the RAW becomes a standalone asset with `stack_hint: null`. The partial `asset_stacks` row (if created) is cleaned up: if only one member remains after cancellation, dissolve the stack and update that member to standalone. - -### Import Progress and Error Reporting - -The pipeline emits typed progress events through a channel or callback for real-time UI updates. Errors are surfaced as events, not buffered to the end. - -**Event types**: - -```text -ImportStarted { total_files: u64, total_bytes: u64 } -FileStarted { index: u64, total: u64, source_path: string } -FileCompleted { index: u64, uuid: string, outcome: ImportOutcome, bytes: u64 } -ImportCompleted { summary: ImportExecutionSummary } -``` - -**`ImportOutcome` values**: `Imported | DuplicateSkipped | Unsupported | CorruptUnreadable | CorruptTransfer | PermissionDenied | PartialStackImported | LivePhotoWithoutPair`. - -**Guarantees**: - -- `total_files` counts only `Import`-action candidates; duplicates and unsupported are already resolved in Phase 2. -- `FileCompleted` is emitted for every file including errors — never silently dropped. -- `ImportCompleted` is emitted exactly once, after all files (including any stack rollbacks). -- Progress indices are 1-based and monotonically increasing — never reset on error. - -**Summary outcome table**: - -| Outcome | In library | Logged info | -| ---------------------- | ------------------- | --------------------------- | -| `Imported` | Yes | — | -| `DuplicateSkipped` | No (already exists) | Existing asset UUID | -| `Unsupported` | No | Source path | -| `CorruptUnreadable` | No | Source path + error | -| `CorruptTransfer` | No | Source path + hash mismatch | -| `PermissionDenied` | No | Source path + OS error | -| `PartialStackImported` | Partial | Imported + skipped UUIDs | -| `LivePhotoWithoutPair` | Yes (as photo) | Source path | - -The summary is returned to the UI layer and not written to disk. Do not surface per-file dialogs during a bulk import. - ---- - -## Operational Behaviors - -### Library Initialization - -When creating a new library at a given directory path: - -```text -1. Verify the target directory is empty or does not exist. - Abort if it contains unrecognized files. -2. Create directory skeleton: - media/ - index/thumbnails/{xs,s,m,l,xl,o}/ - index/meta/ - index/transcodes/h264/ - index/transcodes/live/ - .library/migrations/ - .library/trash/ -3. Initialize empty SQLite database at index/library.sqlite. -4. Write .library/version.cbor: { version: 1 } -5. Write .library/config.cbor: { schema_version: 1, last_opened_at: , - library_name: , last_scrubbed_at: null } -``` - -All steps must succeed atomically. If any step fails, remove all created files and directories and surface an error. Do not leave a partially initialized library on disk. - -On subsequent opens: validate that `.library/version.cbor` is present and readable. If missing or unreadable, treat as a corrupt library and refuse to open — do not auto-repair silently. - -### Concurrent Access and Library Locking - -**Library-level locking**: `.library/lock` prevents two app instances from opening the same library simultaneously. The lock file is a JSON object (human-readable for debugging): - -```text -{ "pid": , "hostname": "", "locked_at": } -``` - -**Acquire**: Create exclusively (`O_CREAT|O_EXCL` on Unix; `CreateFile` with `CREATE_NEW` on Windows). If creation fails (file already exists): read the record; if the PID is no longer alive **and** the hostname matches the current host → overwrite (stale lock recovery). Otherwise surface an error and refuse to open. - -**Release**: Delete the lock file on library close. Never included in index rebuilds or library scans. - -**Sidecar-level locking**: External drives may be accessed by multiple devices. Use file locking (`fcntl` on Unix, `LockFileEx` on Windows) when writing sidecars to prevent interleaved writes from concurrent processes. - -### Temp File Staging and Startup Recovery - -Temp files are written to the final destination directory to guarantee same-filesystem atomic renames. - -```text -media/2024/2024-07/ -├── {uuid}.jpg.tmp # in-flight media -└── {uuid}.cbor.tmp # in-flight sidecar -``` - -**Startup scrub**: On library open, if `last_scrubbed_at` is null or >7 days ago, scan all `media/**/*.tmp` files. Any `.tmp` file older than 5 minutes is a crashed import. Delete both `{uuid}.{ext}.tmp` and `{uuid}.cbor.tmp` for that UUID. Log the cleanup. Write updated `last_scrubbed_at` to `config.cbor`. - -### Directory Auto-Creation - -The library initialization skeleton creates only top-level directories. Per-date and per-UUID-prefix directories are created on demand (idempotent mkdir — no error if already exists). - -| Directory | Created by | -| -------------------------------------------------- | ---------------------------------------------------------- | -| `media/{YYYY}/{YYYY-MM}/` | Importer, before writing the first `.tmp` for that month | -| `index/meta/{uuid[0:2]}/{uuid[2:4]}/` | Whatever generates the first `.meta.cbor` for that prefix | -| `index/thumbnails/{size}/{uuid[0:2]}/{uuid[2:4]}/` | Whatever generates the first thumbnail for that prefix | -| `index/transcodes/h264/{uuid[0:2]}/{uuid[2:4]}/` | Whatever writes the first H.264 transcode for that prefix | -| `index/transcodes/live/{uuid[0:2]}/{uuid[2:4]}/` | Whatever writes the first Live Photo video for that prefix | - -If the app crashes after mkdir but before writing the `.tmp`, the empty directory is harmless — ignored by startup scrub and left in place. - -### Index Staleness - -SQLite may lag reality. Always verify file existence before operations. Trigger a full index rebuild on startup if `last_opened_at` >30 days ago or if the library reports structural inconsistencies on open. - -### Soft Deletion and Trash - -```text -1. Mark in SQLite: is_deleted = 1, deleted_at = -2. Update sidecar: is_deleted = true, deleted_at = (atomic rename pattern) -3. Move media to quarantine: .library/trash/{uuid}.{ext} - Sidecar stays at media/{YYYY}/{YYYY-MM}/{uuid}.cbor with is_deleted = true -4. After 30-day trash period (checked at startup or on explicit purge): - permanent deletion — remove sidecar first, then media from .library/trash/ -``` - -**Stacked assets**: Deleting a stack member also updates `is_stack_hidden` for remaining members. Deleting the current cover or primary triggers cover reassignment: promote the next member by `sequence_order`. Deleting all members of a stack dissolves the stack (remove `asset_stacks` and `stack_members` rows). - -Sidecar is removed first on permanent deletion: orphaned media in trash is recoverable (re-import); orphaned sidecars serve no purpose. Never immediate deletion — the trash period allows recovery from accidental deletes. - -### EXIF Handling - -EXIF is preserved in the original media file untouched. Key fields (capture date, GPS, camera model) are copied into the sidecar at import time. The sidecar is the authoritative metadata source for Capsule; EXIF in the media file is left intact for third-party tool compatibility. The media file is read-only after import — Capsule never writes to it. - -### EXIF Timezone Resolution - -`DateTimeOriginal` represents local wall-clock time. To establish an absolute timeline, resolve UTC + timezone at import using the following algorithm: - -**Case 1 — `OffsetTimeOriginal` present**: - -- `capture_tz` = offset string (e.g., `"+09:00"`) -- `capture_tz_source` = `"offset_exif"` -- `capture_utc` = `DateTimeOriginal` + offset → UTC -- `tz_db_version` = null - -**Case 2 — `OffsetTimeOriginal` absent, GPS present**: - -Perform a fully offline reverse-geocoded timezone lookup (see Reverse Geocoding). - -- `capture_tz` = IANA timezone name (e.g., `"America/New_York"`) -- `capture_tz_source` = `"gps_lookup"` -- `capture_utc` = calculated UTC timestamp -- `tz_db_version` = IANA tz-db release tag used (e.g., `"2024b"`) - -If the GPS lookup fails (ocean, Antarctica, corrupt db): fall through to Case 3. Do not fail the import. - -**Case 3 — No offset, no GPS (or lookup failed)**: - -- `capture_tz` = null -- `capture_tz_source` = `"floating"` -- `capture_utc` = null -- `tz_db_version` = null - -**Display**: Clients must use the sidecar's stored `capture_tz` to display local capture time. Use `capture_utc` for all timeline sorting and cross-library queries. Fall back to `capture_timestamp` only when `capture_utc` is null. - -**Immutability**: `capture_tz` and `capture_utc` are written once at import. If the server later derives a different timezone from a newer tz-db version, it records that in its own layer — it does not silently overwrite the sidecar's fields without an explicit user-triggered repair. `tz_db_version` makes GPS-derived zone provenance auditable. - -### Reverse Geocoding (Offline) - -Network calls for timezone lookup are prohibited. GPS → timezone resolution must be fully offline. - -**Mechanism**: Bundle an offline timezone boundary database compiled into the binary or shipped as a read-only asset (e.g., `tzf-rs` with the IANA timezone boundary dataset for Rust). No DNS lookup, HTTP request, or IPC call to external services is permitted during import. - -### Thumbnail Generation - -Thumbnails are generated on-the-fly when first needed and cached locally. Not pre-generated during import. - -**Formats**: Two formats per variant. Client requests JXL first, falls back to WebP. - -| Format | Role | Notes | -| ------ | -------- | ------------------------------- | -| JXL | Default | Progressive decoding; preferred | -| WebP | Fallback | Used when JXL is unsupported | - -**Size variants**: Defined by **minor dimension** (shorter side). Ensures sufficient pixel coverage regardless of orientation. - -| Variant | Key | Minor dimension | -| ----------- | --- | --------------- | -| Micro | xs | 200 px | -| Small | s | 450 px | -| Medium | m | 900 px | -| Large | l | 1500 px | -| Extra Large | xl | 2400 px | -| Original | o | No downscale | - -**No upscaling**: If the source minor dimension is smaller than the requested variant, the `o` variant is used instead. - -**Aspect ratio clamping**: Clamped to 3:1 maximum in either orientation. Panoramas and extreme crops are clamped to 3:1. - -**Client size selection**: Select the smallest variant whose minor dimension meets or exceeds the display slot requirement. - -- **Landscape** (width ≥ height): minor dimension = height. -- **Portrait** (height > width): minor dimension = width. - -Example: a 400×300 display slot → needs ≥ 300 px minor → select `s` (450 px). - -**Path**: `index/thumbnails/{size}/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{format}` (client) or `cache/thumbnails/…` (server). - -**Client vs. server thumbnails**: Client-generated thumbnails are cached locally and never transmitted to the server. When fetching an asset from the server, the client may seed its local thumbnail cache from the server-provided thumbnail rather than generating its own. The server generates thumbnails independently; the client never pushes locally-generated thumbnails back. - -**Stacked assets**: Thumbnails are generated per-asset, not per-stack. The UI selects which asset's thumbnail to display based on the stack's cover (see Timeline Ordering and Stack Display). Thumbnails for `is_stack_hidden` members are generated lazily on stack expansion — not eagerly. - -### LQIP (Low-Quality Image Placeholder) - -LQIP provides an instant visual placeholder before the full thumbnail loads. The implementation uses **ThumbHash** — a compact (~28 byte) perceptual encoding that stores a blurred preview, approximate aspect ratio, and average color in a single byte sequence. - -**Server-side generation**: After an asset is uploaded and committed, the server generates its LQIP as a background task: - -1. Decode the media file to an RGBA buffer via `capsule-media`. -2. Call `capsule-media::image::lqip::LQIP::from_image_buffer` — internally resizes to a maximum of 100 px on the longest dimension, then calls `thumbhash::rgba_to_thumb_hash`. -3. Extract dominant color via `LQIP::average_rgba()` → convert to `#rrggbb` hex. -4. Base64-encode the raw ThumbHash bytes. -5. Store in `assets.chromahash` (varchar, base64) and `assets.dominant_color` (varchar, hex). Both are nullable — null means not yet generated. - -**Client-side (local library)**: LQIP generation is **skipped**. The client already has the full-resolution file locally and generates thumbnails on demand. The LQIP overhead is unnecessary when the source file is already accessible. - -**Client-side (synced library)**: The client fetches `chromahash` and `dominant_color` from the server as part of asset metadata. These values are stored in the local SQLite `assets` table. The client does not regenerate LQIP locally. - -**Sidecar**: LQIP is **not** stored in the sidecar. It is an ephemeral derived value (like thumbnails and `.meta.cbor`) that can be regenerated from the media file at any time. Storing it in the sidecar would bloat permanent archival metadata with a cache artifact. - -**Dominant color fallback**: While the ThumbHash decodes asynchronously in the browser, `dominant_color` is available immediately as a CSS background-color — visible even before the LQIP image renders. - -**API exposure**: `chromahash` and `dominant_color` are exposed on the GraphQL `AssetMetadata` type. The frontend decodes the base64 ThumbHash bytes to an RGBA bitmap using the `thumbhash` npm package (`thumbHashToRGBA`) and renders it as a blurred data-URL image. - -**Stacked assets**: LQIP is stored per-asset (not per-stack). Only the cover asset's LQIP is displayed in the collapsed grid cell. No special handling is needed — the UI simply uses the cover asset's `chromahash`. +- **`media/`**: originals, their sidecars, and their provenance chains. Filenames are + `{UUIDv7}.{extension}` (always lowercase), `{UUIDv7}.cbor`, and + `{UUIDv7}.provenance.cbor` respectively. The CBOR sidecar is the client's + canonical, self-describing metadata record (see + [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1)) — the + plaintext counterpart of the encrypted metadata blob the server stores. The + `.provenance.cbor` file is an append-only signed log per asset (see + [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)); + the client never deletes it, so a hard-deleted asset leaves a + tombstone-with-history. Per the recovery-first principle, the entire library + is reconstructible from these three files alone. Files are date-bucketed by + capture timestamp because the client, unlike the server, can read capture + dates. +- **`cache/`**: purely derived and rebuildable — thumbnails and previews (formats declared in [Thumbnails and Previews](/design/thumbnails/#thumbnail-and-preview-formats)), verbose + parsed-metadata caches, and transcodes. Sharded by UUID prefix to bound + directory sizes. Deletable at any time; never a source of truth. +- **`index/library.sqlite`**: a rebuildable query cache over the sidecars, and + the local vector index backing AI features (`sqlite-vec` — see + [AI/ML Integrations](/design/ai/)). On a schema change it may be dropped and rebuilt + rather than migrated, since it is always reconstructible. +- **`.library/`**: library-scoped state — schema version, user configuration, a + process lock file that prevents two app instances from opening the same + library, the trash (soft-delete retention area), and `quarantine/` (where + irreplaceable bytes that failed structural or signature validation are + preserved verbatim alongside a `.reason.json` recording the rejection). The + quarantine area is the union surface listed in + [Threat Model — Quarantine Surfaces](/design/threat-model/#quarantine-surfaces). + +The full sidecar and SQLite schemas are owned by [Metadata](/design/metadata/) and not +duplicated here. + +### Mobile Clients + +Android and iOS use platform-sandboxed storage rather than a user-visible +library directory. The logical model is the same — originals (when synced), +canonical metadata, rebuildable caches, and a local SQLite index — but placement +follows each platform's sandbox rules. Capsule deliberately does not store +rebuildable derivatives in OS-managed cache locations: the OS may evict them +indiscriminately, and a thumbnail that is expensive to regenerate is not +genuinely disposable (see [Import and Synchronization](/design/import-synchronization/) +— "Space Recovery"). + +### Local Index Staleness + +SQLite may lag the filesystem after external edits or interrupted operations. +The client verifies file existence before acting on an index row and triggers a +full rebuild from sidecars when it detects structural inconsistency. Because the +index is always rebuildable, this recovery is safe. + +### Space Recovery + +Majority of data except non-backed up files are considered ephemeral but are not +considered disposable nor to be stored in cache storage. It is much easier for +the Capsule app to determine which versions of the same data can be retained and +which can be deleted. Storing thumbnails as cache may result in them being +deleted by the OS indiscriminately, when it is in fact useful. We provide tools +to analyze the biggest storage consumers and allow users to selectively delete +data. + +## Library Self-Maintenance + +The data-integrity principle treats client storage as *potentially lost* (see +[Core Principles](/design/principles/)): unlike the server, a client library +sits on consumer hardware, syncs only partially, and is edited by a long-lived +process that can be killed mid-write. A client therefore never assumes its +library is consistent — it periodically *proves* it is, repairs what it can +repair safely, and surfaces what it cannot. Three routines do this: +**scrubbing** removes the debris of interrupted operations, **self-validation** +confirms the library is structurally and bitwise intact, and **deduplication** +collapses byte-identical assets. All three are conservative — consistent with +"we can NEVER delete data unexpectedly," irreplaceable data is never removed +without explicit user confirmation. + +### Scrubbing + +A startup **scrub** sweeps the debris of interrupted writes. Atomic writes +(below) stage to `.tmp` files; a crash between the write and the rename strands +them. The scrub walks `media/` and removes `.tmp` files older than a few minutes +— the age floor avoids racing a write that is legitimately in flight elsewhere +in the process. It runs at most once every seven days, gated by a +`last_scrubbed_at` timestamp in the library config, since stale temp files are +harmless clutter rather than an urgent fault. Every removal is logged. The +server performs the equivalent sweep of stale `.part`/`.bin` files (see +[Atomic Writes and Crash Recovery](#atomic-writes-and-crash-recovery)). + +### Self-Validation + +Validation answers a stronger question than scrubbing: *is the library still a +faithful, interpretable copy of its assets?* It runs in two tiers, separated by +cost. + +**Structural validation** is a cheap directory walk, run at startup. It checks +the invariants of the [layout](#desktop-library-layout): + +- Every `{uuid}.{ext}` original has a matching `{uuid}.cbor` sidecar and + `{uuid}.provenance.cbor` chain. Every sidecar parses as valid CBOR with its + required fields present, has a `sidecar_schema` ≤ the client's max known + (per the [tightened Postel's Law](/design/principles/)), and bears a valid + signature from a device in the user's directory. +- A sidecar's `uuid` field matches its filename, and its date bucket matches its + capture timestamp. +- Every `cache/` entry (thumbnail, transcode, parsed-metadata cache) and every + `.library/trash/` file refers to an asset the library still knows. +- The provenance chain for each asset is walkable from `create` to head, with + each record's `prior_provenance_hash` matching the preceding record's content + hash. A break in the chain is a quarantine surface, not a silent skip. +- Index rows reference files that exist — this subsumes + [Local Index Staleness](#local-index-staleness) above. + +**Content validation** is expensive — it recomputes the [content hash](/design/cryptography/#primitives-inventory) of each locally +present original and compares it against the sidecar's `hash` field (the +algorithm-tagged form declared in [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1); +the algorithm itself follows whatever `crypto_suite_id` the sidecar carries). +The original is the only irreplaceable thing on a client, so +silent bit rot is the worst failure a client can suffer and nothing else detects +it. Because hashing every original is heavy I/O, content validation is not run +at startup: it is scheduled opportunistically (device idle, on power, unmetered) +and throttled, can be triggered on demand, and re-verifies each original on a +slow rolling cadence rather than all at once. + +### Repair + +Repair follows directly from the data-integrity principle — *ephemeral data is +rebuilt silently; irreplaceable data is never destroyed to resolve an +inconsistency.* + +| Finding | Action | +| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Stale `.tmp` / partial file | Deleted by the scrub. | +| Orphaned `cache/` entry | Deleted — derived and rebuildable. | +| Index inconsistency | Index dropped and rebuilt from sidecars — always safe. | +| Orphaned sidecar (no original) | Expected when the [sync scope](/design/import-synchronization/#synchronization-scope) is metadata-only — not a fault. Flagged only if the scope says the original should be present locally, in which case the original is re-fetched from the server. | +| Orphaned original (no sidecar) | The file is irreplaceable, so it is never deleted. It is moved to `.library/quarantine/` and surfaced to the user; the client attempts to re-derive a minimal sidecar from the file itself and the server index. | +| Malformed CBOR sidecar | The bytes are preserved — moved verbatim to `.library/quarantine/{uuid}.cbor` with a sibling `.reason.json` recording the parse error, and surfaced to the user. **Never silent-skipped:** a sidecar whose CBOR does not parse, whose required fields are missing, or whose `sidecar_schema` is above the client's max known is treated as a quarantine surface (see [Threat Model — Quarantine Surfaces](/design/threat-model/#quarantine-surfaces)). The client attempts to re-fetch a current sidecar from the server before treating the asset as lost. | +| Sidecar signature invalid | Same as malformed: quarantined, never auto-overwritten. The client re-fetches; a persistent failure surfaces the asset as "provenance broken" rather than silently dropping it. | +| Corrupt original (hash mismatch) | If the asset also exists on the server, the ciphertext blob is re-fetched and its derivatives re-generated. If the corrupt copy is the only copy — this device was its uploader and it was never synced — it cannot be auto-healed and is surfaced loudly. | + +Every finding and every repair is logged, so the state of the library is +reconstructible after the fact. + +### Deduplication + +Capsule deduplicates at three distinct layers, and they must not be confused: + +- **Server-side ciphertext dedup** — content-addressed blobs are never stored + twice (see [Content-Addressing and Deduplication](#content-addressing-and-deduplication)). +- **Import-time dedup** — import refuses an asset already uploaded from this + library and resolves a remote-only match to a merge (see + [Import and Synchronization](/design/import-synchronization/#deduplication-and-merge)). +- **Intra-library dedup** — described here: two assets *within one client + library* whose originals are byte-identical. + +Import-time dedup catches most duplicates as they arrive, but it cannot catch +all of them. Byte-identical assets still accumulate — the same file imported +from two different sources, a folder import that overlaps an earlier one, an +asset re-imported after its sidecar was lost, or a backup restored over a +library that still holds the originals. + +The dedup key is the plaintext **`hash.value`** recorded in every sidecar (the +algorithm-tagged form from [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1)) — +the same value the index lets the client look up directly. Two assets that share +it are exact duplicates. This is deliberately distinct from the server's +*ciphertext* hash: two devices may encrypt the same plaintext under different +album keys, so only the plaintext hash identifies duplicates across a library. + +Deduplication is **not** stacking. A RAW+JPEG pair, a burst, and a Live Photo +are *different bytes* deliberately kept together — they are +[stacked](/design/organization/#asset-stacking), never deduplicated. +Visually-similar but non-identical photos are a separate AI grouping feature +(Smart Selection) that never deletes. Dedup only ever acts on originals that are +bit-for-bit identical. + +Resolution is conservative and never silent. The client presents each duplicate +set and lets the user choose the survivor. On merge, the survivor inherits the +union of album memberships and tags (merged through the OR-set CRDT — see +[Metadata](/design/metadata/#collaborative-metadata)), the highest rating, and +the earliest import and capture timestamps; the losing copy is soft-deleted into +the trash, so the action is reversible and is recorded as a signed, +provenance-tracked modification like any other deletion (see +[Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)). +Whole-library deduplication is a user-initiated maintenance action or a surfaced +suggestion — never an automatic background deletion — consistent with the rule +that data is never removed unexpectedly. + +## Atomic Writes and Crash Recovery + +Every write that must not tear uses temp-file + atomic rename, staged on the +same filesystem as its destination. The atomicity rule is enforced at three +granularities — the single file, the per-asset bundle, and the multi-asset +edit — each of which is owned by a section of +[Threat Model — Atomicity Invariants](/design/threat-model/#atomicity-invariants). + +- **Client — single-file writes.** Sidecar and provenance appends stage to + `{uuid}.cbor.tmp` and `{uuid}.provenance.cbor.tmp` in the destination + directory, then rename into place. A direct overwrite is never used. +- **Client — per-asset bundle.** An asset import or update is a *bundle*: + original (when present locally), sidecar, and a new provenance record. + All `.tmp` files stage first; only after every staged file is on disk do + the renames execute, and only in a fixed order (original → sidecar → + provenance). A failure at any rename discards every remaining `.tmp` and + rolls back the renames already done by deleting the just-renamed targets, + so the on-disk state never reflects a partial bundle. The + `.provenance.cbor` is the last to be renamed, so the existence of a new + provenance record implies the rest of the bundle is committed. +- **Client — stack edit.** A stack edit touches multiple sidecars and writes + a single provenance record per affected asset. All `.tmp` files (one per + sidecar plus one per provenance file) stage first and rename together; any + rename failure discards the entire batch. There is no partial stack. +- **Server — chunk assembly.** Chunks stage as `{upload_id}_{n}.part`; the + assembled blob is `{upload_id}.bin`. The blob is renamed into its + content-addressed location under `blobs/` only after the ciphertext hash + is recomputed and matches the declared value (see + [Import and Synchronization — Finalization and Integrity](/design/import-synchronization/#finalization-and-integrity)). +- **Server — finalization transaction.** The manifest envelope insert, the + blob rename, the metadata blob insert, the provenance blob insert, and + the asset row update commit in a single PostgreSQL transaction. The + server never exposes an asset whose bundle is partially persisted; a + crash between any pair leaves the session in `WaitingForProcessing` and + the next finalization attempt either completes the bundle or fails it + cleanly. + +On startup, each side scrubs incomplete work: stale `.part`, `.tmp`, and `.bin` +files left by an interrupted upload or import are identified and removed, and +the cleanup is logged. A blob or media file is never published, on either side, +until its integrity has been verified. + +## Encrypted Backups + +A backup is an export artifact — encrypted, self-describing, and kept outside +both `{library_root}` and `{blob_root}` — so it is not part of the live library +or the server blob store, and may be stored on external or cloud storage. Its +format, the master-key escrow, and the recovery flow are covered in +[Backup and Recovery](/design/backup-recovery/). diff --git a/capsule-docs/src/content/docs/design/import-prioritization.md b/capsule-docs/src/content/docs/design/import-prioritization.md deleted file mode 100644 index bc8ff43..0000000 --- a/capsule-docs/src/content/docs/design/import-prioritization.md +++ /dev/null @@ -1,15 +0,0 @@ ---- -title: Import Prioritization -description: How to prioritize imports for best user experience for large collections. ---- - -## Import & Upload Prioritization Criterias - -- **File Size:** Smaller files might be processed first to give a quicker sense of progress, or larger files might be prioritized if they are deemed more critical. - - While file size is a useful heuristic, for internal ordering, we should let the order files are uploaded be naturally determined by simultaneous uploads and the network conditions, which would fall to the responsibility of the underlying file transfer protocol (i.e., as of writing, ) -- **Last Modified Times:** Newer or recently modified files might be more relevant to the user. (Note this filesystem metadata may not be always reliable so some fallbacks may be needed. Last accessed time was also considered but relatime makes this heuristic relatively noisy.) -- **Directory Depth:** Files closer to the root of the specified paths might be processed first. - -### Non-Criterias - -- **File Type/Extension:** Prioritizing purely by file types may result in anomalies. Instead we should have exceptions for certain sidecar files (e.g. `.xmp` associated with an image, or `.wav` associated with a video file). diff --git a/capsule-docs/src/content/docs/design/import-synchronization.md b/capsule-docs/src/content/docs/design/import-synchronization.md new file mode 100644 index 0000000..69de6f9 --- /dev/null +++ b/capsule-docs/src/content/docs/design/import-synchronization.md @@ -0,0 +1,270 @@ +--- +title: Import and Synchronization +description: How Capsule imports and synchronizes assets across devices and platforms +--- + +We define **import** as the process of taking assets from an external source (e.g. a camera, a directory on the filesystem) and bringing them into Capsule's management. This involves scanning the files, extracting metadata, and preparing them for upload. + +We split [synchronization](#synchronization) into two parts: + +- Upload: Locally stored assets are uploaded to the server and made available across devices. +- Download: Assets are downloaded from the server to local devices as needed. + +Capsule additionally produces [encrypted backups](/design/backup-recovery/) — encrypted, portable exports of a library — which are covered separately. + +## Import + +Every import is deterministic and idempotent. But imports can be partially completed. Every import is identified by an *import ID*. + +### Import Pipeline + +Our import pipeline is as follows: + +- Initiate import: Users initiate an import in one of the following methods: + - Manual: User selects files or directories to import through the UI. It can either point to a flat structure or a standardized directory structure (e.g. DCIM) + - Automated: Platforms (primarily mobile) can automatically detect new media in directories being watched and appropriately trigger imports. +- File scanning and metadata extraction: *See [Metadata](/design/metadata/)* for details on how we extract metadata and organize files. +- Import planning and confirmation: + - Before we import any file, we parse and verify it is a format we support. We strictly reject unsupported formats to avoid any issues later on. The server independently enforces a closed-enum `content_type` allow-list at session creation (see [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants)), so a malicious or buggy client declaring an unsupported format is rejected before any bytes are uploaded. Bytes received over the wire are decoded only inside the [client's sandboxed decoder](/design/clients/#sandboxed-decoder), so a format-mismatch attack cannot reach the host process. + - Based on the scanned files and extracted metadata, we can provide users with a summary of what will be imported (e.g. number of files, total size, any issues detected) and allow them to confirm or adjust the import. + - If uploaded assets are detected locally, we will refuse to import them. Note even if asset exists remotely, since we defer encryption and hash of encrypted blob until upload, we will allow import but upload will involve a merge operation. +- Execute import on each new file to be imported in order specified by [Upload Prioritization](#upload-prioritization): + + - Import into detected space: We can automatically move the files that are to be imported into the appropriate space. We compute the necessary metadata for cryptography (detailed in [Cryptography](/design/cryptography/)) and prepare the files for upload. This step can be optimized by parallelizing the processing of files and prioritizing certain files based on heuristics (see [Upload Prioritization](#upload-prioritization)). + - Generate thumbnails and previews: *See [Thumbnails](/design/thumbnails/)* for details on how we generate thumbnails and previews. + - Upload files: We choose to upload the files based on criterias outlined in [Sync](#synchronization). + +## Synchronization + +Core to the synchronization mechanism is the E2E/encryption requirements (see [Cryptography](/design/cryptography/)). This means that uploading and downloading require careful management of all asset metadata to ensure asset is accessible and properly decrypted on all devices (and inaccessible to unauthorized parties). + +### Upload + +Every upload is idempotent but stateful. Uploads can be completed partially and are identified by an *upload ID*. + +The upload path is a critical hot path. Its design is held to a higher standard of correctness and performance than the rest of the API: it must behave predictably under interrupted connections, concurrent transfers, and constrained hardware. The protocol below is deliberately *strict* — ambiguity in a resumable transfer protocol is what produces silent corruption and orphaned state. + +#### Protocol & Mechanics + +##### What Gets Uploaded + +An asset is never uploaded as a single plaintext file. Because Capsule is end-to-end encrypted (see [Cryptography](/design/cryptography/)), the client **encrypts and signs** everything *before* transmission, and the server only ever stores opaque, content-addressed ciphertext blobs. A single imported asset produces a **bundle** of blobs: + +- The **original blob** — the source asset, encrypted under the [bulk AEAD](/design/cryptography/#bulk-aead) with the [STREAM construction](/design/cryptography/#stream-construction). +- **Derivative blobs** — thumbnails, previews, and LQIP, generated client-side during import (see [Thumbnails](/design/thumbnails/)), each encrypted independently. +- The **metadata blob** — the CBOR metadata document (capture date, dimensions, EXIF-derived fields, provenance), encrypted under the [bulk AEAD](/design/cryptography/#bulk-aead) (see [Metadata](/design/metadata/)). + +Each blob is its own upload with its own upload ID; the protocol does not couple them. The client is responsible for completing the full set, and the server exposes the asset to other devices only once its required members (at minimum the original and metadata blobs) are finalized. Using one uniform mechanism for every blob type keeps the protocol small, and decoupling lets small derivatives land quickly while a large original is still transferring. + +The server performs no decoding, no metadata extraction, and no thumbnail generation — it cannot, since it never holds a decryption key. All such work happens client-side during [import](#import). + +##### Design Invariants + +The upload protocol guarantees the following, and every endpoint is designed to uphold them: + +- **Content-addressed.** Every blob is identified by its [ciphertext content hash](/design/cryptography/#primitives-inventory). The plaintext hash is never transmitted to the server. +- **Idempotent.** Re-creating a session for a blob already stored is a no-op that resolves to the existing asset. Re-sending a chunk at an already-acknowledged offset is accepted and simply returns the current offset. +- **Resumable.** A session survives connection loss for the lifetime of its TTL. A client resumes by querying the authoritative offset and continuing from there — no bytes are re-sent unnecessarily. +- **Strictly bounded.** The total ciphertext size is declared at session creation and immutable thereafter. The cumulative received bytes may never exceed it, nor exceed the server's per-file limit. +- **Verified.** No upload is marked complete until the server has recomputed the ciphertext hash and confirmed it matches the declared value. +- **Recoverable.** Every session is either driven to a terminal state or garbage-collected. There are no permanently orphaned chunks or pending asset rows. + +##### Upload Protocol + +We use a custom resumable-upload protocol modeled on [TUS](https://tus.io/) but trimmed to our needs: no per-request capability negotiation, no metadata smuggled in headers, ciphertext-only payloads. All endpoints are authenticated with a bearer JWT. Compatibility is instead gated once, up front — see [Protocol Versioning](#protocol-versioning). + +| Method | Path | Purpose | +| -------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `POST` | `/upload` | Create a session. Body declares ciphertext `size`, `hash` (the [content hash](/design/cryptography/#primitives-inventory) as a tagged object `{ algo, value }`), `content_type` (closed enum), `crypto_suite_id`, `protocol_version`, `manifest_envelope` (the unencrypted manifest fields the server validates per [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants)), optional `album_id`, optional `owner_id`, optional `intent_id` (required only during an [album upgrade](/design/versioning/#album-upgrade-ceremony)). Returns `201` with `Location: /upload/{id}` and `X-Capsule-Suggested-Chunk-Size`. Rejects with `400` / `403` / `426` per the validation invariants. | +| `HEAD` | `/upload/{id}` | Query progress. Returns `X-Capsule-Offset` (next expected byte), `X-Capsule-Content-Length`, and session status. This is the resumption primitive. | +| `PATCH` | `/upload/{id}` | Append a chunk at `X-Capsule-Offset`, with an optional per-chunk `X-Capsule-Checksum`. Returns `204` and the new offset. | +| `DELETE` | `/upload/{id}` | Cancel the session — removes chunks, the session record, and the pending asset row. | +| `GET` | `/upload/sessions` | List the caller's active sessions, so a client can resume across app restarts or devices. | + +Creating a session writes a *pending* asset row to Postgres (`uploaded = false`) and a session record to the configured **session-state store** (see [Filesystem — Stores by Deployment Profile](/design/filesystem/#stores-by-deployment-profile): Postgres by default, Valkey in the high-concurrency profile). The pending row reserves the asset ID that derivative and metadata blobs reference. + +**Chunk rules.** These are enforced strictly; a violation fails the request rather than being silently corrected: + +- Every chunk except the final one MUST be a multiple of 4 KiB (4096 bytes). This keeps server-side writes block-aligned, which is what makes the reflink assembly path (below) work. A non-aligned, non-final chunk is rejected with `400`. +- Offsets are strictly sequential. A `PATCH` must arrive at exactly the current received-byte count; an out-of-order or gapped write is rejected with `409`, and the client recovers by issuing `HEAD` to learn the authoritative offset. +- **Idempotency tuple.** The server keys each accepted PATCH by `(upload_id, offset, chunk_hash)` where `chunk_hash` is the SHA-256 of the chunk bytes (carried in the `X-Capsule-Checksum` header). A duplicate PATCH with the same tuple returns the same response — a re-send after a lost ACK is a no-op. A PATCH at an already-acknowledged offset *with a different `chunk_hash`* is rejected with `409` + a corruption error: this is the structural defense against a faulty client that retries with garbage. The complete idempotency contract is owned by [Threat Model — Idempotency Invariants](/design/threat-model/#idempotency-invariants). +- Cumulative size may never exceed the declared `size` nor the server's `max_file_size`. The server checks the cumulative count **at every chunk arrival**, not only at finalization — a buggy client that streams past the declared size is cut off before more bytes are persisted. Either ceiling is rejected (`400` / `413`) and the session is moved to a failed state. +- The upload completes exactly when received bytes equal the declared size; finalization then runs automatically. + +##### Protocol Versioning + +The upload protocol is the most fragile contract between client and server: a client that misunderstands chunk alignment, offset semantics, or finalization can silently corrupt or orphan data. The upload session is therefore gated by Capsule's universal protocol handshake, defined in [Threat Model — Protocol and Capability Negotiation](/design/threat-model/#protocol-and-capability-negotiation), so a client never begins a transfer against a server it is not known to be compatible with. This section names the upload-specific specializations. + +Versioning is **date-based** (`YYYY-MM-DD` — the day a protocol revision is frozen), not integer or semver. An integer version conveys nothing about ordering granularity and invites a bump for every change; semver implies a minor/patch backward-compatibility contract finer than we are willing to maintain on a hot path. A date is unambiguously ordered, human-readable, and maps directly onto a release. + +- Every client sends `X-Capsule-Protocol: ` on every request (the upload-specific alias `X-Capsule-Upload-Protocol` remains accepted but is deprecated). The server advertises the inclusive range it accepts via `X-Capsule-Protocol-Min` and `-Max` on every response, errors included. +- A `POST /upload` whose version falls outside the accepted range is rejected with `426 Upgrade Required` *before* any session or pending asset row is created. The response names the supported range so the client can show an actionable message ("update Capsule to keep uploading"). Per [Threat Model](/design/threat-model/#protocol-and-capability-negotiation), the same rule applies to every other write surface. +- This is a one-shot **compatibility gate**, not negotiation: there is no back-and-forth to settle on a shared version, and the protocol carries no capability flags. A client either speaks a version the server accepts, or it does not upload. +- The server supports a *window* of past protocol versions, not only the newest, so a staggered client rollout keeps working. A version leaves the window only after the deprecation period defined in [Threat Model — Min-Supported-Client Deprecation Policy](/design/threat-model/#min-supported-client-deprecation-policy); dropping one is a breaking change announced ahead of time. +- The date is bumped only for an **incompatible** wire change — offset semantics, alignment rules, finalization, the state machine. Purely additive, safely-ignorable changes do not bump it, and server-tunable parameters such as suggested chunk sizes and adaptive-sizing tiers are not protocol surface at all. + +##### Session Lifecycle + +A session moves through a strict state machine: + +```plaintext +Pending ─▶ Uploading ─▶ WaitingForProcessing ─▶ Completed + └─▶ FailedProcessing +``` + +- **Pending** — session created, no bytes received. +- **Uploading** — at least one chunk received, transfer in progress. +- **WaitingForProcessing** — all declared bytes received; finalization (assembly + hash verification) is running. +- **Completed** — hash verified, asset marked uploaded, now visible to other devices. Terminal. +- **FailedProcessing** — terminal failure (hash mismatch, assembly error). Chunks and the pending asset row are removed. Terminal. + +Session records live in the [session-state store](/design/filesystem/#stores-by-deployment-profile) with a 24-hour TTL and a per-owner index for listing. This split is intentional: the session store holds only volatile transfer state, so the hot path — offset increments and status transitions — never touches the durable Postgres asset row. (In the default Postgres-only profile, sessions live in an `upload_sessions` table with an `expires_at` column and a periodic sweep; in the high-concurrency profile, they live in Valkey under keys `upload:session:{id}` with atomic `HINCRBY`/`HSET` and native TTL.) Postgres's durable asset record is written exactly twice per upload regardless of profile: once at session creation (the pending row) and once at finalization (mark uploaded). A session that reaches its TTL before completing is garbage-collected — chunks deleted, pending asset row removed — and the client treats an expired session as gone and re-imports. (Client should imply retries if this happens but halt after too many retries.) + +#### Reliability & Integrity + +##### Server-Side Storage and Assembly + +Each chunk is written to disk as `{upload_id}_{n}.part`; the assembled blob is `{upload_id}.bin`. Because this is a hot path, the storage layer is aggressively optimized: + +- **Streaming writes.** Chunk bytes are streamed from the request body straight to disk; large transfers must never accumulate in hot memory. On Linux, the write path uses `io_uring`. +- **Reflink assembly.** Finalization concatenates chunks into the final blob using `FICLONERANGE` (copy-on-write reflink) on CoW filesystems such as Btrfs and XFS. The 4 KiB chunk alignment is precisely what allows each chunk to be reflinked at its destination offset; only the final (possibly unaligned) chunk needs a plain copy. Reflink turns assembly into a near-instant metadata operation instead of an O(file size) copy. On filesystems without reflink support, the code falls back to a sequential copy. +- **Offloaded blocking work.** Chunk assembly and hashing run on a blocking thread pool, never on the async reactor. +- **Backpressure.** `max_cache_size` bounds the total in-flight upload bytes held on disk; `max_file_size` bounds any single blob. The configuration asserts `max_file_size < max_cache_size` and warns if fewer than ~10 concurrent maximum-size uploads would fit. The distinct task pools — network I/O, file I/O, and hashing — are sized and load-tested independently against realistic hardware limits. + +##### Finalization and Integrity + +When received bytes reach the declared size, the server finalizes: + +1. Session transitions to **WaitingForProcessing**. +2. Chunks are assembled into the final blob. +3. The server recomputes the [content hash](/design/cryptography/#primitives-inventory) over the assembled ciphertext on the blocking pool and compares it to the declared `hash`. +4. **On match** — the pending asset is marked uploaded inside a Postgres transaction and the session transitions to **Completed**. +5. **On mismatch** — the blob and the pending asset row are deleted, the session transitions to **FailedProcessing**, and a checksum-mismatch error is returned. A mismatch is always treated as corruption or tampering and is never silently retried server-side. + +The server verifies only the *ciphertext* hash — it has no other option. The client independently verifies the *plaintext* on download via the [STREAM construction](/design/cryptography/#stream-construction)'s per-chunk authentication tags, which detect truncation, reordering, and chunk deletion. The two checks are complementary: the server guarantees "the bytes I stored are the bytes you declared," and the AEAD guarantees "the plaintext I decrypted is authentic." + +##### Robustness + +- An upload is not expected to run to completion in a single connection. The server tolerates arbitrarily long pauses within the session TTL, and clients resume via `HEAD`. [Auto syncing](#auto-syncing) explicitly assumes interrupted transfers are normal. +- A chunk re-sent at an already-acknowledged offset is idempotent. A chunk at a stale offset receives `409` together with the authoritative offset so the client can re-align. +- Concurrent finalization attempts on a single session are guarded — a second attempt observes a non-`Pending`/`Uploading` status and returns a conflict rather than double-processing. +- Every critical step — session creation, each chunk, assembly, hash verification, finalization — is logged with the upload ID so an interrupted or failed upload can be reconstructed and recovered after the fact. + +#### Performance + +##### Adaptive Chunk Sizing + +The server suggests an initial chunk size by file-size tier — `< 10 MB` → 256 KiB, `< 100 MB` → 1 MiB, `≥ 100 MB` → 4 MiB. The client may then adapt *within a tier-bounded range* based on throughput measured over a sliding 30-second window: doubling the chunk size when sustained throughput is high (`> 5 MB/s`), halving it when low (`< 1 MB/s`), and always staying 4 KiB-aligned. The rationale is a direct trade-off — chunks that are too small waste round-trips, while chunks that are too large waste re-transmission on a flaky link and pin more memory per in-flight request. + +Adaptation is purely a client concern; the server only enforces alignment and bounds. The client must never let adaptation regress effective throughput — if a tier's range is mis-tuned, the conservative choice is the tier minimum. + +We deliberately do **not** expose per-blob upload *ordering* as a protocol concern. Concurrent sessions plus the OS and TCP stack settle ordering naturally; see [Upload Prioritization](#upload-prioritization) for the client-side heuristics that decide which assets to *start*. + +##### Upload Prioritization + +We have a specific ordering which we pick how to upload many files simultaneously. + +- **File Size:** Smaller files might be processed first to give a quicker sense of progress, or larger files might be prioritized if they are deemed more critical. + - While file size is a useful heuristic, for internal ordering, we should let the order files are uploaded be naturally determined by simultaneous uploads and the network conditions, which fall to the underlying file transfer protocol — the custom resumable-upload protocol described above, running as concurrent sessions over the OS and TCP stack (see [Adaptive Chunk Sizing](#adaptive-chunk-sizing)). +- **Last Modified Times:** Newer or recently modified files might be more relevant to the user. (Note this filesystem metadata may not be always reliable so some fallbacks may be needed. Last accessed time was also considered but relatime makes this heuristic relatively noisy.) +- **Directory Depth:** Files closer to the root of the specified paths might be processed first. + +Note that file **type/extension** is deliberately *not* a prioritization criterion — prioritizing purely by file type may result in anomalies. Instead we have exceptions for certain sidecar files (e.g. `.xmp` associated with an image, or `.wav` associated with a video file). + +#### Access Control + +##### Deduplication and Merge + +Because blobs are addressed by their [ciphertext content hash](/design/cryptography/#primitives-inventory), the protocol can avoid redundant transfers: + +- At session creation, the server checks for an asset with the same content hash already owned by the user. An exact duplicate that exists both locally and remotely is rejected up front — nothing is re-uploaded. The dedup check and the pending-row insert run inside a single PostgreSQL transaction (a `SELECT ... FOR UPDATE` followed by `INSERT ... ON CONFLICT`), so two concurrent uploaders cannot both observe "no existing row" and each insert their own — the TOCTOU race is closed at the database layer. +- [Import](#import) treats already-uploaded *local* assets as non-importable. But because encryption and hashing are deferred until upload, an asset may already exist remotely under a *different* ciphertext (for example, re-encrypted under a newer album key). Import still admits such an asset, and the upload then resolves to a **merge**: the server links the existing stored blob to the new asset and album reference rather than storing a second copy. The original blob's upload short-circuits, and only the new metadata blob is transferred. +- **Merge is strictly additive on the server.** A merge **never** deletes an existing blob or rewrites an existing manifest — it only adds a new reference. The blob's reference count goes up, never down, on merge. Reference removal happens only through an explicit `delete` lifecycle action signed by a current writer (see [Authorization](/design/authorization/)), and the underlying blob is hard-purged only after every reference is provably gone. + +These checks deduplicate at upload time. Byte-identical assets that still slip into a client library — for example through overlapping folder imports or a restore over an existing library — are collapsed separately by client-side [intra-library deduplication](/design/filesystem/#deduplication). + +##### Quota and Permissions + +- An upload is attributed to `upload_user_id` (the authenticated uploader) for storage-quota accounting, which is distinct from `owner_id` (the asset's owner). Uploading on behalf of a different owner requires a verified relationship and is permission-checked at session creation. +- Adding an asset to an album requires write-tier album access (`AMK_write`; see [Cryptography](/design/cryptography/)); the server validates album write permission before creating the session. +- Only the uploader may append chunks. The uploader or the owner may query (`HEAD`) or cancel (`DELETE`) a session. + +### Download + +Download is the inverse of upload, and rests on the same two foundations: blobs are **content-addressed by ciphertext hash**, and the server never holds a key, so it serves only opaque ciphertext. Where the upload path optimises for correctness under interruption, the download path optimises for **bandwidth and storage frugality** — a client fetches the smallest representation that satisfies the user's current intent, and nothing more. + +#### Discovering What Changed + +A client never polls assets individually. It holds a single opaque **sync cursor** and asks the server for everything that changed after it: + +| Method | Path | Purpose | +| ------ | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | +| `GET` | `/sync` | Returns a page of asset changes (created, metadata-updated, deleted) after `cursor`, with a `next_cursor`. The feed is monotonic and resumable. | +| `GET` | `/blob/{hash}` | Fetch a ciphertext blob by its content address. Supports HTTP `Range` for resumable and partial reads. | + +The `/sync` feed carries only the small encrypted **metadata blobs** and each asset's **blob manifest** — the content hashes of its original and derivative blobs — never original or derivative bytes. Discovering a thousand new assets costs a few hundred kilobytes. The client decrypts each metadata blob, learns the asset's dimensions, capture date, and LQIP, and only *then* decides what else, if anything, to fetch. A deleted or modified asset arrives as a tombstone or an updated metadata reference; the client reconciles local state against it (see [Synchronization Scope](#synchronization-scope)). + +**Sync feed validation.** Every entry in a `/sync` response carries a `protocol_version` (matching the album's pin) and a per-album monotonic `sync_seq`. The client refuses to apply an entry whose `protocol_version` is above its max known (per the [tightened Postel's Law](/design/principles/)) and refuses any page whose `sync_seq` regresses against what the client has already seen for that album — a regressing `sync_seq` indicates a malicious or buggy server attempting to rewind the client's view, and the client surfaces it rather than applying it. + +#### Stale-Revival Detection + +A malicious or buggy server, peer, or backup could submit an old-but-validly-signed manifest to resurrect an asset that the receiving device has tombstoned at a later state. The defense — owned by [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) — is the per-asset `prior_provenance_hash` chain. Two layers enforce it: + +- **Client.** Every device's local index stores a `latest_provenance_hash` per `asset_id`. When a sync entry, federation pull, peering artifact, or backup restore proposes a manifest whose `prior_provenance_hash` is **behind** that local value, the entry is **quarantined** (see [Threat Model — Quarantine Surfaces](/design/threat-model/#quarantine-surfaces)) and surfaced as "peer sent stale state." +- **Server (no-key).** The server stores the same `latest_provenance_hash` per asset in PostgreSQL and rejects any incoming non-`create` manifest whose `prior_provenance_hash` does not match. This is described in the [server-side validation invariants](/design/threat-model/#server-side-validation-invariants). + +A deleted asset cannot be silently resurrected, on either side, without the resurrection appearing as a quarantine surface to the user. + +#### Tiered, On-Demand Fetch + +Each asset has a ladder of representations, cheapest first: + +1. **LQIP** — embedded in the metadata blob (see [Thumbnails](/design/thumbnails/)); available the instant metadata syncs, at zero extra request. +2. **Thumbnail** — fetched when the asset scrolls into, or near, view in a grid. +3. **Preview** — a screen-resolution derivative, fetched when the asset is opened. +4. **Original** — fetched only on explicit demand: viewing at full fidelity, exporting, or sharing the original. + +The default policy follows the per-library setting in [Synchronization Scope](#synchronization-scope) — *metadata only*, *metadata + thumbnails*, or *metadata + thumbnails + original*. Anything above the configured tier is fetched lazily, on demand. The original is never fetched speculatively unless the device was its uploader, in which case it already holds the plaintext locally and downloads nothing. + +Because every blob is content-addressed, a fetch is skipped entirely when the blob is already in the local cache — the client looks up its cache by hash before issuing any request, so a representation shared between assets (an identical thumbnail, a merged original) is only ever fetched once. + +#### Resumption and Verification + +- Large originals are fetched with HTTP `Range` requests; an interrupted download resumes from the last persisted byte instead of restarting, mirroring the upload protocol's resumability. +- The client verifies integrity itself. Since the server can only attest to ciphertext, the client recomputes the [ciphertext content hash](/design/cryptography/#primitives-inventory) against the requested content address, then decrypts and relies on the [STREAM construction](/design/cryptography/#stream-construction)'s authentication tags to detect truncation, reordering, or chunk deletion. Any failure discards the blob and re-fetches it. + +#### Prefetch and Frugality + +- Prefetch is bounded and predictive — thumbnails for assets just beyond the viewport, the preview for the likely-next asset in a sequence — and is cancelled as soon as the user's focus moves. +- Prefetch and any above-tier fetch obey the same connection rules as [Auto Syncing](#auto-syncing): on a metered connection the client fetches only what the user explicitly opens, and defers the rest. +- Fetched-but-unpinned blobs are ordinary cache citizens, subject to [Space Recovery](/design/filesystem/#space-recovery); the client transparently re-fetches them on demand if they are evicted. + +### Auto Syncing + +On mobile clients, we support auto syncing which can be very useful for ensuring new assets are backed up (not to be confused with [encrypted backups](/design/backup-recovery/)) to the server and assets from other device loaded onto device. + +#### Synchronization Criteria + +We are conservative in when we check whether synchronization is needed. To bypass the possibility of outdated reconciliations, we reconcile the assets that required syncing (both uploading and downloading), and immediately execute backup as long as criterias remain throughout the data transfer process. If conditions change (e.g. internet connection became metered), it will be re-evaluated and potentially paused gracefully. Upload server does not expect the client to always complete transfers to completion (e.g., due to network conditions). + +Finally, the actual synchronization criteria are strict and scale with the reconciliation amount (i.e. total upload + download transfer): + +- **Small reconciliation** — a handful of new assets, or metadata-only deltas: synced proactively whenever the device has any non-metered connection. +- **Large reconciliation** — bulk uploads, or original-tier downloads: deferred until the device is connected to unmetered Wi-Fi. + +#### Platform Limitations + +We strictly implement auto sync ONLY if we can guarantee it will behave appropriately under all scenarios. We explicitly do not implement it on platforms that do not give all the APIs we need (e.g., detecting metered connection) to avoid surprises. + +#### Notifications + +When the auto sync criteria have not been met for a prolonged period — **two weeks** specifically — the library falls silently out of date, which defeats the purpose of a backup. The client surfaces this rather than letting it pass unnoticed: + +- After two weeks without a completed sync, the user is notified that the library is behind and offered a one-tap **force sync now**, which proceeds regardless of the metered/Wi-Fi criteria with their explicit consent. +- The notification can be **snoozed** until a later date (e.g. another two weeks) or **disabled** outright. Snoozing only suppresses the warning; disabling opts out of the warning entirely and does not affect auto sync itself. + +### Synchronization Scope + +- Uploadable new content: We upload the source (i.e. original) asset as well as all associated metadata and derivatives. +- Modified/deleted content: We update the associated metadata. +- Fetch new content: Depending on setting, it either fetches *metadata only*, *metadata + thumbnails*, or *metadata + thumbnails + original* for all new assets. Unless original already exists locally (e.g., if device was the original uploader), the original is only fetched on demand (e.g. user explicitly tries to view original or share original with others). This is to save bandwidth and storage on client devices. Note that metadata includes LQIP which can be used as a preview before even thumbnails are fetched. diff --git a/capsule-docs/src/content/docs/design/metadata.md b/capsule-docs/src/content/docs/design/metadata.md new file mode 100644 index 0000000..14389cd --- /dev/null +++ b/capsule-docs/src/content/docs/design/metadata.md @@ -0,0 +1,156 @@ +--- +title: Metadata +description: How Capsule extracts and utilizes metadata from assets +--- + +## Design Philosophy + +All metadata processing in Capsule is handled by `capsule-core`, which is implemented in Rust and exposed to all languages via FFI. It handles the I/O natively and is generally opaque to minimize FFI surface. + +This doc is the **single source of truth** for the CBOR sidecar schema. Per the [single-source-of-truth rule](/design/principles/#single-source-of-truth), other docs reference fields here by name and never re-declare them. + +## Metadata Capabilities + +We minimize the logic involved in repository and leverage dependencies where useful. This is the rough breakdown (subject to being outdated): + +- `capsule-core`: Extracts the filesystem metadata for verification and indexing. + +## Sidecar Schema v1 + +The CBOR sidecar is the client's canonical, plaintext-local-only metadata record (see [Filesystem — Client Filesystem](/design/filesystem/#client-filesystem)). It is **self-describing**: field 0 carries the schema version so any reader can detect a schema it does not implement *before* parsing the rest. Versioning the schema in-band is what prevents a faulty or old client from corrupting state with a partial parse (see [Threat Model — Schema Evolution](/design/threat-model/)). + +```rust +SidecarV1 { + sidecar_schema: u16, // FIELD 0 — readable before parsing the rest. Currently 1. + crypto_suite_id: u16, // matches the asset's manifest; see Cryptography + uuid: UUIDv7, + hash: { algo: String, value: bytes }, // canonical plaintext hash + capture_timestamp: RFC3339, + import_timestamp: RFC3339, + content_type: String, // closed enum per protocol_version + dimensions: Option<{ width: u32, height: u32 }>, + + // collaborative metadata (see Collaborative Metadata below) + tags_user: OR_set<(tag: String, add_id)>, + tags_ai: OR_set<(tag: String, add_id, model_id: String, model_version: String)>, + caption_lww: Option<{ value: String, ts: RFC3339, by: device_id }>, + superseded_captions: Vec<{ value: String, written_by: device_id, ts: RFC3339 }>, // bounded ≤ 16 + rating_lww: Option<{ value: u8, ts: RFC3339, by: device_id }>, + + // identifiers (see Identifiers below; privacy-on-export rules apply) + camera_id: Option<{ model: String, serial: String }>, + device_id: UUIDv4, + session_id: UUIDv7, + + // geolocation (see Geolocation below) + gps: Option<{ lat: f64, lon: f64, source: GpsSource }>, + + // provenance binding + provenance_chain_hash: [u8; 32], // hash of the latest ProvenanceRecord for this asset + + // forward-compat + _unknown: Map, // unknown CBOR keys preserved verbatim, never executed + + // signature + signature: Hybrid(Ed25519, ML-DSA-65), // covers every byte above, including _unknown +} +``` + +### Schema Versioning Rules + +- `sidecar_schema` is **CBOR field 0 by deterministic key order** (RFC 8949 §4.2). A reader can determine the schema before allocating a parser for the rest. +- A client whose `max_known_sidecar_schema < this.sidecar_schema` **refuses to write** to that sidecar. Reading is allowed only in read-only mode if explicitly opted-in. This is the [refuse-by-default rule](/design/threat-model/) from the threat model — an old client cannot strip-and-resign a newer sidecar. +- The signature covers every byte including `_unknown`, so stripping unknown fields invalidates the signature and is detectable. +- A schema bump is a coordinated change; per [Versioning — Album Protocol Version Pinning](/design/versioning/#album-protocol-version-pinning), an album's pinned protocol version constrains which sidecar schemas may be written into it. + +### Add-id Binding + +`add_id` is the tuple `(device_id: UUIDv4, monotonic_counter: u64)`, where `monotonic_counter` is incremented per-device per-(asset, OR-set) pair. Every OR-set add carries an `add_id`; every OR-set remove targets a specific `add_id`. A remove that names an `add_id` the receiver has never observed an add for is **rejected**, not silently no-op — preventing the "remove an element you never added" attack noted in the [Threat Model](/design/threat-model/). + +## Identifiers + +The three identifying fields defined inside the sidecar schema are subject to the [Privacy on Export](#privacy-on-export) rules below when an asset crosses a trust boundary. + +- **Camera identifier (`camera_id`).** Model ID of the device plus a unique identifier for the specific device (e.g. serial number). Useful for grouping shots from the same physical camera across libraries. +- **Device identifier (`device_id`).** UUIDv4 generated on the original importing device. Useful for provenance. +- **Session ID (`session_id`).** Identifies the authenticated session in which the asset was imported. Defined in [Session Management](/design/authentication/#session-id). + +## Privacy on Export + +The identifiers above and several other metadata fields are **fingerprinting surface** if they leave the user's trust boundary unredacted: a camera serial uniquely links every photo to one physical device, and precise GPS reveals home addresses. When an asset crosses a boundary, Capsule strips these fields by default and only includes them on explicit opt-in. + +A boundary crossing is any of: + +- A **share link** is generated for a non-member of the album. +- An **external backup** is exported to media the user will hand off (e.g. cloud storage shared with someone else, a physical drive given to a friend). +- A **federated peer** outside the owning user's home server fetches the asset (see [Federation](/design/federation/)). + +When the boundary is crossed, the following fields are stripped from the exported metadata blob unless the user has explicitly opted in to retain them: + +| Field | Default on export | Opt-in retains | +| ------------------------------------------------------- | ----------------------------------------- | -------------- | +| Camera serial number | Stripped | Full value | +| Device identifier (UUIDv4) | Stripped | Full value | +| Session ID | Stripped | Full value | +| GPS coordinates | Truncated to city-level precision (~1 km) | Full precision | +| Personal contact tags (faces matched to a known person) | Stripped | Retained | + +Stripping happens at the moment of export — the encrypted sidecar inside the user's library is untouched, so the user does not lose the data locally. Retention opt-in is per-export, not a sticky account setting, to prevent foot-guns where a user opts in once and forgets. + +Capsule's *own* devices syncing the *same user's* library do **not** trigger this redaction — that is intra-trust, not a boundary crossing. + +## Collaborative Metadata + +User-editable metadata on a shared album — tags, captions, ratings — can be edited concurrently on different devices, including offline. To make these merges deterministic, such fields are modelled as CRDTs: + +- **Tags:** an OR-set (observed-remove set) with explicit [`add_id` binding](#add-id-binding), so a tag added on one device and removed on another converge predictably, and a remove that targets an unknown `add_id` is rejected rather than treated as a no-op. +- **Single-value fields** (`caption_lww`, `rating_lww`): last-writer-wins registers keyed by a signed timestamp and the writing `device_id` as the lexicographic tiebreaker. + +### Surfacing Concurrent Edits + +A plain LWW register loses one side of a tied edit silently — a real problem when two people caption the same photo from different devices within seconds. Capsule keeps the most recent value as authoritative *and* preserves the displaced ones: + +- The losing value of every concurrent caption edit lands in `superseded_captions`, capped at 16 entries (oldest evicted). Each entry carries who wrote it and when, so the UI can surface a "this caption replaced another" hint and let the user restore the earlier value. +- Ratings are unambiguous numerically; they do not need a superseded log. + +This converts a silent-data-loss damage vector (a buggy client clobbering another device's edit) into an explicit, recoverable surface. See [Threat Model — Forbidden Client Behaviors](/design/threat-model/) for the corresponding rule that clients must never strip `superseded_captions`. + +### How Operations Travel + +We encrypt the **operations**, not the resulting state. Merges are then commutative and associative, so order of arrival does not matter and a peer replaying a stale operation cannot corrupt current state. The operation log reconciles into the canonical CBOR sidecar, which remains the source of truth (see [Core Principles](/design/principles/) — recovery-first). + +Each operation carries the same `prior_provenance_hash` chain link as any [lifecycle action](/design/authorization/#asset-lifecycle), so a metadata-update is provenance-tracked exactly like a create or delete. + +Album *membership* is deliberately **not** a CRDT here — it is driven by MLS proposals and commits (see [Group Membership](/design/cryptography/#group-membership)), which already resolve concurrent changes. + +This LWW/OR-set approach is intentionally simpler than a full event-graph with state resolution: photo metadata does not need it, and the extra machinery would not be functionally justified. + +## Tag Provenance and Namespacing + +User tags and AI-suggested tags live in **structurally separate OR-sets** (`tags_user` and `tags_ai` in the [sidecar schema](#sidecar-schema-v1)). The separation is structural, not policy: + +- An AI tag can never overwrite a user tag and vice versa — they are different fields, so the question does not arise. A hallucinating model cannot pollute user intent. +- Every `tags_ai` entry carries `model_id` and `model_version` (see [ML Models](/design/ml-models/)). When the canonical model for that slot changes, AI tags from the old model are flagged as stale; cross-model semantic comparison is forbidden (see [Threat Model — Client-Side Validation Invariants](/design/threat-model/)). +- A user can **promote** an AI tag — explicit user action copies the entry to `tags_user` (with a fresh user-scoped `add_id`) and may optionally remove it from `tags_ai`. Promotion is a signed lifecycle operation; never automatic. +- A user can **dismiss** an AI tag — an OR-set remove on `tags_ai` keyed by the original `add_id`. + +The same dual-namespace structure applies to any future ML-derived metadata field that overlays a user-editable one (face labels, location guesses, etc.). The owner doc for the model is [ML Models](/design/ml-models/); the storage shape is owned here. + +## Geolocation + +Most modern camera devices record geolocation data. This is almost universally in **WGS-84 (Earth Coordinates)**. However, mapping data in China (perhaps there are also other countries) use obfuscated coordinates, namely: + +- GCJ-02 (Mars Coordinates): The obfuscated coordinate system mandated by the Chinese government for national security. All authorized maps inside mainland China (AMap/Gaode, Tencent Maps, Apple Maps via AMap) use this. +- BD-09 (Baidu Coordinates): Baidu Maps takes GCJ-02 and applies a second layer of obfuscation. You only need to worry about this if you specifically use the Baidu Maps SDK. + +While annoying, we can translate WGS-84 coordinates into the obfuscated coordinates with a deterministic algorithm before plotting on maps. Capsule does this strictly on the client-side with the capability found in `capsule-core`. + +### Mapping Providers + +These are the recommended mapping providers for all scenarios: + +- All Apple devices: Apple Maps (uses AMap data in China so it works globally) +- Web clients in China: AMap (Gaode) JavaScript API +- Web clients outside of China: Google Maps JavaScript API +- All non-Apple devices in China: AMap/Gaode (Tencent Maps is also fine but AMap has better support for geolocation and POI search) +- All non-Apple devices outside China: Google Maps (this is the most robust and developer-friendly provider). diff --git a/capsule-docs/src/content/docs/design/ml-models.md b/capsule-docs/src/content/docs/design/ml-models.md new file mode 100644 index 0000000..1d551d4 --- /dev/null +++ b/capsule-docs/src/content/docs/design/ml-models.md @@ -0,0 +1,106 @@ +--- +title: ML Models and Algorithms +description: The model inventory and key algorithmic implementations behind Capsule's AI features +--- + +This is the reference companion to [AI/ML Integrations](/design/ai/): the +concrete model chosen for each task, and the key algorithms that combine them. + +**This doc is the canonical model inventory.** Per the [single-source-of-truth rule](/design/principles/#single-source-of-truth), every ML model identity Capsule uses is declared here and referenced from other docs by link. Swapping a model is a one-row edit in the table below. + +> **Status:** The table below is **provisional** pending experimentation and field testing on Capsule's target devices in 2026. The doc *structure* — one canonical row per task with an explicit `model_id`/`model_version` — is the stable contract; the specific row choices are subject to revision and individual rows may be marked WIP or alt as the inventory matures. + +**E2EE constraint on embedding models.** Capsule's server never holds plaintext, so embeddings are generated client-side. Every device that ingests assets must therefore run the *same* embedding model — otherwise vectors aren't comparable across devices. The model size floor is set by the lowest-end device Capsule supports, not by what runs comfortably on a desktop. + +## Embedding Provenance + +Every embedding stored in Capsule — locally in the SQLite vector index, in an encrypted backup, or inside a [`DerivativeManifest`](/design/cryptography/#derivative-provenance) for an embedding-class derivative — carries the tuple `(model_id, model_version)` identifying which row of the table below produced it. Embeddings are not comparable across `(model_id, model_version)` pairs: the vector spaces are different. The invariant: + +- The vector index **refuses inserts** whose `model_id` is not the current canonical row for its task (the row marked `WIP (high priority)` or its successor). A buggy or new client uploading embeddings from an unrecognized model is rejected at the insert API, never silently mixed in. +- A model swap (a new row replacing an old one) increments `model_version` for that task. Old embeddings are **flagged as stale** and excluded from queries until they are regenerated from the originals. Cross-version semantic comparison is forbidden — see [Threat Model — Client-Side Validation Invariants](/design/threat-model/#client-side-validation-invariants). +- Regenerating embeddings after a model swap is a background task that walks the library and produces fresh embeddings at the new `model_version`. The old entries are removed only after the new ones are persisted (atomicity: per-asset replace, not a global truncate-and-rebuild). +- The mapping from `model_id` to a row in this table is what gives a swap its *single-doc-edit* property: changing the canonical model is a one-row edit here, the `model_id` string changes, and every downstream consumer follows. + +This invariant lives in [Threat Model — § Damage Scenario Map](/design/threat-model/#damage-scenario--invariant-map) row #14 and is what defeats the "silent invalidation of the vector index" damage class identified in the audit. + +## Specific ML Tasks & Models + + + + + + +| Task | Category | Model(s) | Dataset(s) | Function | Implementation Status | +| --------------------------------- | ---------------- | ------------------------------------------------------------------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- | +| **Semantic Search** | Natural Language | **MobileCLIP-B** (ONNX, INT8) — canonical; quantized SigLIP-tiny as fallback[^semantic-alt] | | Generates global image embeddings for natural language search. Sized for the lowest-end device Capsule supports (see the E2EE constraint above). | WIP (high priority) | +| **Dense Tagging & OCR** | Dense Tagging | Florence-2 | | Unified vision-language model for bounding boxes, dense captions, and reading text. | +| **VLM / Image Chat** | Natural Language | Qwen2.5-VL or LLaVA-1.6 | | Quantized models for on-demand conversational queries about an image. | +| **Image Captioning** | Natural Language | BLIP-2 | | Generates a natural language description of the image content. | +| **Face Detection** | People | SCRFD | | Highly efficient face bounding box and landmark detection. | WIP (high priority) | +| **Face Recognition** | People | InsightFace (AdaFace) | | Generates face embeddings. AdaFace excels at handling low-quality/dark images. | WIP (high priority) | +| **Person Detection** | People | YOLOv10 | | Object detection for identifying "person" bounding boxes. | +| **Person Re-ID** | People | OSNet or TorReID | | Generates embeddings based on clothing and body shape when faces are hidden. | +| **Expression Analysis** | People | EmotioNet | | Detects facial action units to infer emotions. | +| **Quality Scoring** | People | LIQE / TOPIQ | | Blind image quality assessment for noise, blur, and lighting without a reference image. | +| **Object Detection** | Scene | **YOLOv10**[^objdet-alt] | | Detects objects and background elements for dense tagging. | WIP (high priority) | +| **Scene Classification** | Scene | VIT-L, ConvNeXt-L | Places365, SUN397 | Classifies the overall scene (e.g., "beach", "wedding", "cityscape"). | +| **Landmark Detection** | Scene | DINOv2 + GeM pooling | Google Landmarks v2 | Detects key landmarks (e.g., Eiffel Tower, Golden Gate Bridge) for geotagging. | +| **Bird/plant Detection** | Scene | BioCLIP | iNaturalist 2021 | Identifies and classifies birds and plants within images. | +| **General Animal Detection** | Scene | YOLOv8 finetuned on Open Images Animals | Open Images Animals | Detects common animals (dogs, cats, horses) for tagging and search. | +| **OCR** | Text | TrOCR | SynthText, IIIT-5K | Extracts text from images, including handwriting and signage. | +| **Screenshot Detection** | Scene | Custom CNN classifier | | Identifies screenshots to help culling. | +| **Voice Transcription** | Audio | **Distil-Whisper-large**[^asr-alt] | | Speech recognition for generating transcripts from video audio tracks. ~6× faster than Whisper-large-v3 at ~1% WER cost — the trade is the right one for on-device transcription. | +| **Aesthetic Scoring** | Quality | NIMA (Efficientnet head) | AVA Dataset | Rates the aesthetic quality of images to help users find their best shots. | +| **Blur detection** | Quality | Laplacian variance + CNN regressor | DefocusNet, CUHK | Detect blurry images. | +| **Exposure Assessment** | Quality | Custom CNN regressor | Custom | Evaluates the exposure level of images to ensure optimal lighting conditions. | +| **Noise Estimation** | Quality | Custom CNN regressor | Custom | Estimates the noise level in images to help users identify and filter out noisy shots. | +| **Near-duplicate / burst** | Similarity | pHash/dHash + CNN | Custom | Same moment, slightly different | +| **Semantic new-duplicate** | Similarity | Embeddings from the canonical Semantic Search row + ANN | Custom | Same subject, different angle/day | +| **Best-shot selection** | Similarity | Quality models combined? | Custom | Select sharpest/best-exposed from burst | +| **Shot/scene boundary detection** | Video | TransNet v2, PyScene Detect | BBC Planet Earth, ClipShots | Segment video for thumbnail/highlights | +| **Highlight extraction** | Video | Temporal attention + quality scroe | SumMe, TVSum | Extract best moments from videos for highlights and thumbnails. | +| **Action/activity recognition** | Video | VideoMAE, TimeSformer | Kinetics-700, ActivityNet | Sports, cooking, playing, travel | +| **NSFW Detection** | Categorization | OpenCLIP or custom CNN | NSFW datasets | Detects explicit content to help users filter and manage sensitive media. | +| **Violence / Graphic Content** | Categorization | ViT classifier | Custom | Detects and flags sensitive content (e.g. in shared albums) | + +[^semantic-alt]: Considered and rejected: SigLIP-so400m (~400M params, impractical on the lowest-end mobile we support — the E2EE constraint forces every device to run the same model), full CLIP ViT-L/14 (similar size class), OpenCLIP ViT-G (much larger). MobileCLIP-B is the size sweet spot; quantized SigLIP-tiny stays as a fallback if MobileCLIP semantic quality is insufficient in field tests. +[^objdet-alt]: Considered and rejected for the *committed* slot: Grounding DINO (open-vocabulary; heavier; revisit if dense-tagging breadth becomes the bottleneck), RT-DETR (transformer-based; comparable accuracy, slower on mobile). YOLOv10 is the committed choice; alternatives may run as additional specialized passes later. +[^asr-alt]: Considered and rejected: Whisper-large-v3 (best accuracy but too slow on mobile for opportunistic background transcription), Whisper-medium (similar speed to Distil-Whisper-large but worse accuracy), faster-whisper CT2 ports (a runtime optimization layer; can be applied on top of Distil-Whisper). + +## Key Algorithmic Implementations + + + +### Video-as-Sparse-Photos Algorithm + +Processing every frame of a video through heavy ML models is computationally prohibitive. This algorithm treats video as a sparse collection of keyframes. + +1. **Cut Detection:** Use PySceneDetect (Content-Aware routing) to chunk the video into visually distinct scenes. +2. **Temporal Sampling:** Extract frames at the 10%, 50%, and 90% timestamps of each scene. +3. **Blur Rejection:** Calculate the variance of the Laplacian for each extracted frame: + + $$V = \text{var}(\nabla^2 I)$$ + +. If $V$ is below a defined threshold, the frame is too blurry and is discarded. +4. **Audio Processing:** Run the canonical ASR model (see the **Voice Transcription** row above) concurrently to generate a timestamped transcript. +5. **Integration:** The surviving keyframes are pushed into the standard image-processing queue. Database records map the keyframe embeddings to the parent `video_id` and specific timestamp. + +### The Re-ID & Pseudo-Labeling Loop + +This algorithm identifies individuals even when they turn away from the camera during an event. + +1. **The Anchor Pass:** When an image contains a high-confidence frontal face, run InsightFace. If the embedding matches a known profile (e.g., "Bride"), record the bounding box. +2. **The Body Pass:** Run a standard object detector (YOLOv10) to find all "person" bounding boxes. Pass these crops through OSNet to get a 512-dimensional body embedding. +3. **The Linking Phase:** Calculate the Intersection over Union (IoU) of the Face bounding box and the Body bounding box. If $\text{IoU} > 0.7$, link the OSNet body embedding to the "Bride" profile for the duration of this specific album/event. +4. **Pseudo-Labeling:** When an image features a person facing away (no face detected), compare the OSNet body embedding against the temporary event-specific body embeddings using cosine similarity: + + $$\text{sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|}$$ + +. If the similarity exceeds the threshold, tag the individual as the "Bride." + +### High-Dimensional Vector Search in Postgres + +To maintain high throughput in Postgres, exact K-Nearest Neighbors (KNN) is too slow for millions of rows. + +1. Implement **HNSW (Hierarchical Navigable Small World)** indexes on the `pgvector` columns. +2. Use the inner product operator (`<#>`) for normalized embeddings, as it is computationally cheaper than calculating $L_2$ distance (`<->`) or cosine distance (`<=>`) at scale. diff --git a/capsule-docs/src/content/docs/design/organization.md b/capsule-docs/src/content/docs/design/organization.md new file mode 100644 index 0000000..160bb15 --- /dev/null +++ b/capsule-docs/src/content/docs/design/organization.md @@ -0,0 +1,56 @@ +--- +title: Asset Organization +description: Details on how assets are organized and grouped in Capsule +--- + +## Keywords + +- [Albums and Collections](#albums-and-collections): Organize your media into albums and collections for easy browsing and sharing. +- [Asset Stacking](#asset-stacking): Group related files (e.g., RAW+JPEG pairs, burst photos, video chapters) into a single "stack" to keep your library organized. + +## Albums and Collections + +## Asset Stacking + +In large media collections, it’s common for related files to belong together. Instead of cluttering your library with dozens of nearly identical files, Capsule "stacks" them into a single unit. + +You’ve likely seen this in action before—think of how photo apps group RAW+JPG pairs or how video editors sync external audio with camera footage. Capsule uses a "best-effort" auto-detection system to identify these relationships and keep your workspace clean. + +**Stacking is metadata-only.** A stack edit modifies the `stack_membership` field of each member asset's sidecar and emits a `metadata-update` provenance record per affected asset. It **never** deletes, rewrites, or merges the underlying asset bytes — even a "best photo" choice within a burst is a pointer in metadata, not a destructive operation. A buggy or malicious stack edit therefore cannot lose original bytes. The full atomicity rule (stage all `.tmp` files, rename together, discard on any rename failure) lives in [Filesystem — Atomic Writes and Crash Recovery](/design/filesystem/#atomic-writes-and-crash-recovery) and [Threat Model — Atomicity Invariants](/design/threat-model/#atomicity-invariants). + +### Photography & Mobile Stacks + +* **RAW + JPEG Pairs:** The classic "prosumer" stack. We treat the uncompressed RAW file and the processed JPEG as one asset to keep your grid tidy. +* **Burst Stacks:** A sequence of high-speed stills (e.g., 10–30 fps). The app identifies a "Best Photo" and tucks the rest behind it. +* **Live Photos:** A JPEG or HEIC paired with a 1.5–3 second video clip, managed as a single interactive unit. +* **Portrait/Depth Stacks:** An image paired with its depth map. This allows you to adjust the bokeh (background blur) after the shot is taken. +* **Smart Selection:** AI-driven grouping of visually similar images taken within seconds of each other to reduce "clutter." + +### Technical & Creative Stacks + +* **Exposure Bracketing (HDR):** Multiple shots of the same scene at different exposure levels (e.g., -2, 0, +2 EV) to be merged into a single High Dynamic Range image. +* **Focus Stacks:** A series of shots with shifting focus points. Often used in macro photography to create "infinite" depth of field. +* **Pixel Shift Stacks:** Found in high-end mirrorless cameras. The sensor moves slightly to capture multiple shots, which are stacked for ultra-high resolution and perfect color. +* **Panorama (Stitched):** A sequence of horizontal or vertical shots intended to be merged into a single wide-field image. + +### Video & Audio Stacks + +* **Proxy/Optimized Stacks:** Pairs a heavy "Master" file (like 8K RAW) with a lightweight "Proxy" (like 1080p ProRes) for smoother editing performance. +* **Chaptered Video:** Action cameras (like GoPro) often split long recordings into 4GB chunks. We stack files like `GOPR001.mp4` and `GOPR002.mp4` so they appear as one continuous video. +* **Dual-System Audio:** Groups video files with high-quality external audio (WAV/AIFF) using timecode or waveform matching. + +## Recycling + +When you delete an asset, it defaults to trash (i.e. soft delete). On sync, new items in trash are essentially a metadata update rather than removal. A true "delete" operation is only performed when the user explicitly empties the trash, the asset has been in the trash for its full retention period, or the user requests immediate deletion. + +For consistency, deletion of assets is functionally similar to addition and modification of assets. See [Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) and [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set). + +### Retention Window + +The trash retention window is **signed into the `delete` manifest at delete time** as the `retention_until` field — not server-configured at purge time. The default is 30 days; the user can extend it per delete or per album-policy. Because the retention is part of the signed manifest: + +- The server **cannot accelerate** a purge by changing a server-side config — the cryptographic floor on retention is the signed manifest's `retention_until`. A hard purge before that timestamp is rejected (the server's purge worker reads `retention_until` from the manifest, not from a local policy). +- The server **cannot delay** a purge beyond an order issued by a `trash-restore` or a signed shorter-retention re-issue — the user remains in control. +- A `trash-restore` action issued before `retention_until` recovers the asset, appends a new provenance record, and rewinds the local lifecycle state. The original delete manifest is **not removed** from the provenance chain — it remains as a record of "this was deleted on date X and restored on date Y." + +This addresses the damage scenario where a hostile server unilaterally accelerates a purge to delete an asset the user expected to be recoverable, as well as the scenario where a buggy server retains data past the user's chosen window. diff --git a/capsule-docs/src/content/docs/design/peering.md b/capsule-docs/src/content/docs/design/peering.md new file mode 100644 index 0000000..649f620 --- /dev/null +++ b/capsule-docs/src/content/docs/design/peering.md @@ -0,0 +1,175 @@ +--- +title: Peering +description: How Capsule implements peering for direct device-to-device communication and synchronization +--- + +Peering is **device-to-device** sync within a single user's own devices. It is +distinct from [Federation](/design/federation/), which is server-to-server +sharing across *different* users. + +Peering exists as an **accelerator, never a replacement** for normal +[server synchronization](/design/import-synchronization/#synchronization). It +earns its place in three situations: + +- **LAN-speed transfer.** Two of a user's devices on the same network can move + a freshly imported asset directly, instead of round-tripping every byte + through the server and the internet. +- **Offline operation.** When the server is unreachable, devices on a shared + LAN still converge. This satisfies the + [offline/online divide](/design/principles/) — peering works fully offline. +- **Best-effort opportunism.** If no peer is found, peering simply does nothing + and the device falls back to server sync. Nothing depends on it succeeding. + +## Peering Reuses, Not Reinvents + +Peering deliberately introduces **no new payload format and no new sync +engine** — the same discipline [Federation](/design/federation/#federation-reuses-existing-primitives) +applies. The unit of transfer is a delta-scoped +[backup artifact](/design/backup-recovery/#backup-artifact): a self-describing, +versioned, encrypted, content-addressed blob that already exists for +[Backup and Recovery](/design/backup-recovery/). + +The receiving device ingests that artifact through the **same restore path** it +would use for any backup. Peering therefore owns only two things of its own — a +LAN **discovery** mechanism and a **transport**. Everything else (what an asset +is, how it is encrypted, how it is verified, what "changed" means) is borrowed +from designs that already exist and are already audited. Fewer moving parts +means a smaller blast radius and far less code unique to peering. + +## Trust Model + +Federation assumes [a remote server is hostile](/design/federation/#threat-model). +Peering does not: both endpoints are the *same user's* devices, each holding a +hardware-bound DSK cross-signed into that user's +[device directory](/design/cryptography/#per-user-device-coordination). A peer +is accepted only after a mutual hybrid-signature check confirms both devices +chain to the same User IK. + +Identity-trusted is **not** content-trusted, however. A device can still be +buggy, or compromised at the application layer above its hardware keys. So +peering keeps Federation's posture toward *data*: every received asset is +re-verified — its [ciphertext content hash](/design/cryptography/#primitives-inventory) +recomputed, its [STREAM tags](/design/cryptography/#stream-construction) checked, +its [asset manifest](/design/cryptography/#provenance-and-signed-manifest) +run through the single [`verify_asset`](/design/cryptography/#write-authorization) +chokepoint. The channel authenticates *who* you are talking to; it never +exempts *what* they send from validation. + +### Peer-Class Containment + +Even two of the same user's devices are separate failure-containment boundaries +([Threat Model — Damage Containment Layers](/design/threat-model/#damage-containment-layers)). +A buggy $v_k$ device cannot overwrite a $v_{k+1}$ device's state via a stale-but-valid +backup artifact, and a v_{k+1} device's writes are not retroactively applied to +a v_k device's view of an older album. Specifically: + +- Every received manifest is checked against the receiver's local + `latest_provenance_hash` for that asset (see [Applying Received Data](#applying-received-data)) + — a stale manifest is quarantined, not silently applied. +- Every received structure that announces a `sidecar_schema`, `crypto_suite_id`, + or `protocol_version` above the receiver's max known is rejected at decode — + the receiver refuses to interpret bytes it cannot validate. This is the + client-side counterpart of the [server-side schema lockdown](/design/threat-model/#schema-evolution-and-field-grammar). +- Device-directory revocations are honored immediately: a device that has been + removed from the user's directory cannot complete the TLS handshake (its + certificate no longer chains to a current IK signature), and any prior cached + state from that device is treated as suspect. + +## Discovery + +Discovery is the one genuinely new mechanism. Devices advertise a peering +service over **mDNS** on the local network and accept connections over **TCP**. + +Discovery is **LAN-only** — there is no relay, no internet-wide rendezvous. mDNS +broadcasts are visible to every host on the segment, so the advertisement must +not leak identity: a device advertises an **opaque, rotating service instance**, +not `user@server.tld` or a device name. Whether two advertisements belong to the +same user is established *inside* the encrypted channel (below), never from the +broadcast itself. + +If no peer answers, discovery fails silently and the device proceeds with +ordinary server sync. + +## Establishing the Channel + +A peer connection is HTTP over a **mutually authenticated TLS 1.3** channel. The +certificates presented are the **device keys themselves** — there is no CA. +Each side verifies that the other's device certificate carries a valid hybrid +signature chaining to the shared User IK, exactly as published in the +[device directory](/design/cryptography/#per-user-device-coordination). The +directory *is* the trust anchor; a device not in it cannot complete the +handshake. + +This doc covers sync between devices that are **already provisioned** — both +already hold the account master key. Bootstrapping a brand-new device (handing +it the master key for the first time) is **cross-device recovery** and is +specified in [Backup and Recovery](/design/backup-recovery/#recovery-mechanisms); +peering does not re-document it. + +## Determining the Delta + +Before building an artifact, the two devices must agree on what is missing. +Peering reuses the [sync cursor](/design/import-synchronization/#discovering-what-changed) +model rather than inventing a diff: each side offers its set of held +[ciphertext content addresses](/design/cryptography/#primitives-inventory) and its cursor, and the delta is the +complement. "What changed" is already defined by the `/sync` feed — peering +borrows that definition wholesale. + +## What Moves Over the Wire + +The transfer payload is a [backup artifact](/design/backup-recovery/#backup-artifact) +scoped to the delta — backup artifacts are explicitly *"constructed from a list +of assets, albums, and so on,"* so a delta-scoped one needs no special +construction path. + +Its contents honor the receiver's existing per-library +[Synchronization Scope](/design/import-synchronization/#synchronization-scope) +setting — there is no peering-specific knob: + +- **Always included:** the encrypted metadata blobs and the AMK versions needed + to decrypt the transferred assets. Without these the receiver cannot + interpret anything. +- **Per scope:** original and derivative blobs are included only up to the + receiver's configured tier (*metadata only* / *+ thumbnails* / + *+ original*). Tiers above the setting are fetched lazily later, just as with + server download. + +Because every blob is content-addressed, dedup is free: the receiver skips any +blob whose [content hash](/design/cryptography/#primitives-inventory) it already holds — the same lookup the +`/blob/{hash}` download path performs against its local cache. + +## Transfer Protocol + +Peering is **pull-only**, mirroring [Federation](/design/federation/#pull-only-federation): the device that is behind initiates the pull and applies the result only after it verifies. A peer that has new content may send a lightweight **notification hint** — "new content exists" — over a separate low-trust channel to prompt a pull sooner; that hint never feeds the validation pipeline directly and carries no authority. + +The artifact is fetched with HTTP `GET` and `Range` requests, which makes a transfer **resumable** across the flaky-by-nature LAN and **idempotent** — content-addressing turns a re-fetch of an already-held blob into a no-op. This is the same resumability the [upload](/design/import-synchronization/#protocol--mechanics) and [download](/design/import-synchronization/#resumption-and-verification) paths rely on. + +## Applying Received Data + +A received artifact is ingested through the **backup restore path** — peering adds no separate deserialization. Restore already re-verifies every blob's [ciphertext content hash](/design/cryptography/#primitives-inventory), checks [STREAM tags](/design/cryptography/#stream-construction) on decrypt, and runs each asset manifest through [`verify_asset`](/design/cryptography/#write-authorization). + +Additionally, every received manifest's `prior_provenance_hash` is checked against the receiver's local `latest_provenance_hash` for that asset (see [Import & Sync — Stale-Revival Detection](/design/import-synchronization/#stale-revival-detection)). A peering pull cannot resurrect an asset the local device has tombstoned at a later provenance step — even if the artifact was honestly produced from an older state of the sending device. The stale entry is **quarantined and surfaced** as "peer sent stale state." + +Failures follow Federation's [soft-fail semantics](/design/federation/#soft-fail-semantics): an asset that fails verification is **quarantined and surfaced** in the [provenance/audit trail](/design/cryptography/#provenance-of-library-modifications), never silently dropped and never silently accepted — so a bug can be told apart from an attack after the fact. + +## Reconciliation with the Server + +Peering does not fork a device's state away from the server. A peering-received asset arrives with its signed manifest intact, so when the server later sees the same asset — uploaded by whichever device the [upload policy](/design/import-synchronization/#synchronization-scope) assigns — it resolves through the existing [deduplication and merge](/design/import-synchronization/#deduplication-and-merge) path on the [content hash](/design/cryptography/#primitives-inventory). A device never re-uploads a blob the server already holds, and the two devices remain convergent with the server's view. + +## Versioning + +Peering has two independently versioned surfaces, both checked **once, up front**, crashing early on mismatch per [Principles](/design/principles/) and the universal [protocol handshake](/design/threat-model/#protocol-and-capability-negotiation): + +- The peering **transport protocol** — date-based (`YYYY-MM-DD`), exchanged via `X-Capsule-Protocol` at channel establishment. Mismatch terminates the TLS connection **before any payload byte is sent** — `426 Upgrade Required` in the channel's framing layer. There is no degraded-mode fallback; peering simply fails and the device proceeds to ordinary server sync. +- The **artifact format** — versioned by [Backup and Recovery](/design/backup-recovery/#backup-artifact), so a newer device can still ingest an artifact built by an older one. The artifact's `crypto_suite_id` and album `protocol_version` are validated against the receiver's max known on ingest; a forward-jumping value is rejected (refuse-by-default), never best-effort-parsed. + +These two surfaces are independent: a device with up-to-date transport protocol may still receive an artifact format it does not implement (and vice versa). Both checks must pass before any bytes are applied to local state. + +## Robustness + +Peering's failure posture falls out of the designs it reuses: + +- **Interruption.** `Range`-based transfers resume; nothing is re-sent unnecessarily. +- **Peer disappears.** A vanished peer is indistinguishable from "no peer found" — the device falls back to server sync. Peering is best-effort. +- **Offline.** With no server reachable, devices on a shared LAN still converge; the feature works solely offline. +- **No order trust.** Content-addressed, immutable blobs and signed manifests mean a peer cannot influence state by reordering a transfer — the same guarantee Federation states in [Reconstructing State Without Trusting Peers](/design/federation/#reconstructing-state-without-trusting-peers). diff --git a/capsule-docs/src/content/docs/design/principles.md b/capsule-docs/src/content/docs/design/principles.md new file mode 100644 index 0000000..57bb795 --- /dev/null +++ b/capsule-docs/src/content/docs/design/principles.md @@ -0,0 +1,62 @@ +--- +title: Core Principles +description: The core principles that guide the design and development of Capsule +--- + +These principles apply universally to all components of Capsule from clients to server. + +Determinism and idempotent processes. Raw and original data is the source of truth +All data is processed aligned to 4KiB (matches memory and disks). Just verify no edge cases require a smaller or bigger multiple though. +Forward and backwards compatibility: old clients ignore new fields and new clients tolerate missing ones gracefully + +Data integrity: We can NEVER delete data unexpectedly. We act under strict scenarios and crash early otherwise. We implement multiple layers of safeguards to avoid current and future bugs. We trust data in the server will be safe (and in robust hardware) and data in the clients as potentially lost. +Treat most data as ephemeral. If it wasn’t original data, it can be rebuilt. +Encryption, security, and isolation: Keep sensitive code that require auditing and storage of data separate. Encrypt metadata besides data. Compartmentalize every boundary as a failure-containment boundary — per-album, per-peer, per-event, per-user, per-version — so a bug or compromise on one side of a boundary cannot cross it. +Divide between offline and online functionalities: a feature should work either solely online or offline. It should not exhibit different behaviours depending on resource connectivity. This simplifies business logic and risk of state shifts. + +**Recovery-First**: The filesystem must be reconstructible from partial corruption. No database is required to interpret critical data — sidecar files are the canonical metadata store; the database is a rebuildable query cache. + +**Self-Describing**: Each media file is paired with a CBOR sidecar containing all user-editable and stable metadata. Files are independently interpretable without a running database. + +**Atomic Writes**: Use temp-file + rename throughout. Direct overwrites risk corruption on power loss. + +**Postel's Law**: Liberal in what we accept *within a known schema version* — unknown sidecar fields are preserved verbatim and missing optional fields are tolerated. **Cross-version is closed**: a structure announcing a schema version (`sidecar_schema`, `crypto_suite_id`, `protocol_version`) above the receiver's max known is rejected, never best-effort-parsed. The asymmetry is what prevents a faulty or new client from silently corrupting state — see [Threat Model — Schema Evolution and Field Grammar](/design/threat-model/#schema-evolution-and-field-grammar). + +## Single Source of Truth + +Every primitive, construction, format, or component identity Capsule depends on is **declared in exactly one design doc**. Other docs reference the declaration by anchor; they never restate the choice. The goal is that swapping a primitive (a hash, a model, a container format) is a single-doc edit, not a 10-doc cascade that silently leaves inconsistencies. + +The owner docs are: + +| Domain | Owner doc | +| ----------------------------------------------------------- | ------------------------------------------------------------- | +| All cryptographic primitives + constructions | [Cryptography](/design/cryptography/#primitives-inventory) | +| ML model identities | [ML Models and Algorithms](/design/ml-models/) | +| LQIP scheme + thumbnail/preview formats | [Thumbnails and Previews](/design/thumbnails/) | +| Server storage stack + topology | [Filesystem](/design/filesystem/) | +| Session/access tokens + auth flow | [Authentication](/design/authentication/) | +| Backup artifact container + escrow | [Backup and Recovery](/design/backup-recovery/) | +| CRDT scheme, identifiers, geolocation | [Metadata](/design/metadata/) | +| Upload/download protocol semantics | [Import and Synchronization](/design/import-synchronization/) | +| Federation trust model, capability tokens, soft-fail policy | [Federation](/design/federation/) | +| LAN discovery + peer channel | [Peering](/design/peering/) | +| Album protocol version pinning | [Versioning](/design/versioning/) | +| Stacking taxonomy + trash semantics | [Asset Organization](/design/organization/) | +| Lifecycle action set | [Authorization](/design/authorization/) | +| Damage containment, client-class taxonomy, server-side validation duties | [Threat Model](/design/threat-model/) | + +**Permitted secondary mentions.** Mechanism-explanatory phrasing inside a non-owner doc is fine — for example, "STREAM tags catch chunk reordering" inside [Peering](/design/peering/) is explaining a *behavior*, not declaring a *choice*. What the rule forbids is restating the choice itself ("we use SHA-256") outside the owner doc. When in doubt, link. + +## Damage Containment + +A faulty, malicious, or version-mismatched client must not be able to inflict irreparable damage on user data. The principles above (data integrity, atomic writes, recovery-first, self-describing, Postel's Law, encryption + compartmentalization) name the *posture*; the [Threat Model](/design/threat-model/) names the *defenses*. + +In particular, the threat model owns: + +- The **client class taxonomy** (honest, faulty, malicious, old, new) — how each is authenticated and what stops each from doing harm. +- The **damage scenario → invariant map** — for every concrete attack or bug class, the single owner doc that defeats it. +- **Server-side validation invariants** — the refuse-by-default structural checks a key-less server runs on every write. +- **Protocol and capability negotiation** — the universal fail-closed handshake that rejects version mismatches before any state is written. +- **Idempotency, atomicity, and quarantine** rules that span owner docs. + +Each owner doc grows a short section pointing into the relevant threat-model section, but the cross-cutting statements live there. Principles continues to own the universal *posture*; threat model owns the universal *defenses*. diff --git a/capsule-docs/src/content/docs/design/search.md b/capsule-docs/src/content/docs/design/search.md deleted file mode 100644 index d1e8159..0000000 --- a/capsule-docs/src/content/docs/design/search.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -title: Search -description: What search features Capsule offers ---- - -TODO diff --git a/capsule-docs/src/content/docs/design/threat-model.md b/capsule-docs/src/content/docs/design/threat-model.md new file mode 100644 index 0000000..1366ee3 --- /dev/null +++ b/capsule-docs/src/content/docs/design/threat-model.md @@ -0,0 +1,313 @@ +--- +title: Threat Model +description: How Capsule contains damage from faulty, malicious, or version-mismatched clients +--- + +This doc catalogues the ways a client can damage user data, the invariant in each owner doc that defeats each scenario, and the universal rules that bind them — protocol negotiation, server-side validation duties, idempotency, atomicity, and provenance immutability. + +It is **not** a primitives doc. Every primitive Capsule uses is declared in its [owner doc](/design/principles/#single-source-of-truth); this doc references those declarations rather than re-stating them. Where a specific invariant lives, the relevant owner doc enforces it; where a *defense* spans multiple docs, the canonical statement lives here. + +## Purpose and Scope + +E2EE shifts most of the trust to the client. The server holds no keys; clients write the canonical state. That makes the question "what damage can a client cause?" load-bearing for the design — a single buggy implementation, a hostile keyholder inside an album, a stranded old build, or a too-new prototype all have to fail safely. + +A faulty, malicious, or version-mismatched client must not be able to cause **irreparable** damage (loss of original bytes, loss of audit trail, undetected silent overwrite of user intent) and should not be able to cause more than **transient** damage (a quarantined asset surfaces to the user; a rejected write returns a clear error; a divergence is detected and reconciled). The recovery paths in [Cryptography — Failure Modes and Recovery](/design/cryptography/#failure-modes-and-recovery) cover key loss; this doc covers the *write-path* harm a wrong-but-signed client can attempt. + +## Client Class Taxonomy + +Every client request can be classified by one of these models. The defenses listed below apply to **all** of them — none of them are trusted to enforce their own correctness: + +| Class | Description | What authenticates them | What stops them | +| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Honest** | Conforming implementation, correct keys, correct version. | Session token + access token + device DSK + epoch write-tier signature. | Nothing to stop. This is the baseline. | +| **Faulty** | Conforming intent, buggy implementation. Writes structurally invalid or semantically wrong manifests under real keys. | Same as honest — the keys are correct. | Server-side [structural validation](#server-side-validation-invariants) + client-side [`verify_asset`](/design/cryptography/#write-authorization) chokepoint + quarantine surfaces. | +| **Malicious** | Adversary in possession of a current device's DSK and the album's epoch write-tier key. Writes deliberately malformed or destructive operations. | Same as honest — the keys are real, because the adversary owns them. | Provenance chain immutability + soft-delete window + per-album/per-event compartmentalization + audit trail for after-the-fact attribution. | +| **Old** | A signed-in client that predates a feature, schema, or suite the server now considers minimum. Cannot produce structurally valid writes for albums pinned above its version. | Authenticated, but `X-Capsule-Protocol` is below the server's accepted range. | [Protocol handshake](#protocol-and-capability-negotiation) rejects writes with `426 Upgrade Required` before any state is written. | +| **New** | A prototype or staging build that writes a `protocol_version`/`crypto_suite_id`/`sidecar_schema` ahead of what the receiver knows. | Authenticated, but the version is higher than the receiver's max known. | Receiver's refuse-by-default rule on unknown enum values, unknown schemas, and forward-jumping protocol versions; closed schema evolution boundary (see below). | + +The deliberate choice in the matrix above: a *malicious* client with real keys is the hardest to stop, because confidentiality and authentication don't help when the adversary already holds the keys. Capsule's response is to ensure such an adversary can do nothing **silently** — every write produces a signed provenance record, soft-delete is the default, and history is append-only. The audit trail is the recovery surface. + +## Damage Containment Layers + +Restating the boundary hierarchy from [Core Principles](/design/principles/) as concentric containment shells, with the owner doc that enforces each: + +| Shell | Boundary | Owner doc | +| ------------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | +| **Per-version** | Album protocol pinning isolates a buggy v_k from v_{k-1} albums. | [Versioning](/design/versioning/#album-protocol-version-pinning) | +| **Per-album** | MLS group + per-epoch AMK + per-epoch write-tier key. | [Cryptography — Group Membership](/design/cryptography/#group-membership) | +| **Per-event** (manifest) | Each lifecycle action is its own signed, chained record. | [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) | +| **Per-user** | Owner Group Key, sponsored-account isolation. | [Cryptography — Owner Group Keys](/design/cryptography/#owner-group-keys-ogks) | +| **Per-peer** (federation) | Capability tokens, error budgets, quarantine for new peers. | [Federation](/design/federation/) | +| **Per-device** (peering) | Device directory enforced via the TLS handshake. | [Peering — Establishing the Channel](/design/peering/#establishing-the-channel) | + +A bug or compromise on one side of any shell cannot cross it. + +## Damage Scenario → Invariant Map + +The lookup table for "what damage X is prevented by which invariant Y in which doc Z." Each row names a concrete vector found during the audit and the single owner-doc anchor that defeats it. + +| # | Damage scenario | Defense | Owner doc | +| --- | ------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 1 | Old client writes a sidecar after stripping unknown fields | Sidecar signature covers `_unknown`; old client refuses to write when `sidecar_schema` > its max known | [Metadata — Schema Versioning Rules](/design/metadata/#schema-versioning-rules) | +| 2 | Faulty client uploads bytes that don't match the declared content type | Server's `content_type` allow-list per protocol version (no-key check) + receiving client decoder sandbox | [Threat Model §5](#server-side-validation-invariants), [Clients — Sandboxed Decoder](/design/clients/#sandboxed-decoder) | +| 3 | Buggy client uploads chunk with wrong offset and re-tries | Idempotency tuple `(upload_id, offset, chunk_hash)`; duplicate at offset with different hash → reject | [Import & Sync — Upload Protocol](/design/import-synchronization/#upload-protocol) | +| 4 | Hostile peer sends an old-but-validly-signed manifest to revive a deleted asset | `prior_provenance_hash` chain advance check on both client and server | [Cryptography — Provenance](/design/cryptography/#provenance-of-library-modifications), [§ Server-Side Validation Invariants](#server-side-validation-invariants) | +| 5 | Malicious client re-signs an existing manifest under a weaker `crypto_suite_id` | Signatures cover `crypto_suite_id` and `protocol_version` | [Cryptography — Write Authorization](/design/cryptography/#write-authorization) | +| 6 | Two devices concurrently caption the same photo | Caption LWW + `superseded_captions` array surfaces the loser | [Metadata — Surfacing Concurrent Edits](/design/metadata/#surfacing-concurrent-edits) | +| 7 | Client issues an OR-set remove for an element it never observed an add for | Add-id binding: removes target a specific `add_id`; unknown `add_id` is rejected | [Metadata — Add-id Binding](/design/metadata/#add-id-binding) | +| 8 | Buggy client overwrites a good thumbnail with a corrupt one | Every derivative carries a signed `DerivativeManifest` on its own chain; overwrite is a `derivative-replace` lifecycle action | [Cryptography — Derivative Provenance](/design/cryptography/#derivative-provenance) | +| 9 | A client declares `timestamp = 2099-01-01` to distort the audit | Server rejects timestamp outside ±30 days of server clock at accept | [Cryptography — Write Authorization](/design/cryptography/#write-authorization) | +| 10 | Server-side TOCTOU on blob dedup creates a duplicate | Dedup-check and pending-row insert are atomic on a single Postgres transaction | [Filesystem — Content-Addressing and Deduplication](/design/filesystem/#content-addressing-and-deduplication) | +| 11 | A faulty client uploads bytes that exceed its declared size | Server bounds cumulative received at every chunk, not only at finalization | [Import & Sync — Chunk rules](/design/import-synchronization/#upload-protocol) | +| 12 | A new client writes a manifest with a `crypto_suite_id` the server does not recognize | Refuse-by-default at handshake: 400 before any session is created | [§ Protocol and Capability Negotiation](#protocol-and-capability-negotiation) | +| 13 | A federated peer floods the rejected-hash table to exhaust memory | Per-peer quota; bounded LRU memory | [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics) | +| 14 | A model swap silently invalidates the AI tag namespace | Every `tags_ai` entry carries `model_id`+`model_version`; cross-model comparison is forbidden | [Metadata — Tag Provenance and Namespacing](/design/metadata/#tag-provenance-and-namespacing) | +| 15 | A leaked session token revokes all of a user's other sessions to lock them out | `revoke_all_sessions` requires master-key proof, not session auth | [Authentication — Explicit revocation](/design/authentication/#explicit-revocation) | +| 16 | An attacker holding every current key tries to rewrite the asset's history | Provenance chain references each predecessor's hash; rewriting any past record requires forging an earlier (possibly retired) device's hybrid signature | [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) | +| 17 | A client picks a random `amk_version` to skip MLS | Server's no-key check: `amk_version` must be monotonic per album and known to the server | [§ Server-Side Validation Invariants](#server-side-validation-invariants) | +| 18 | A v_old client tries to write into an album that has been upgraded to v_new | Album pinning + upgrade ceremony quiescence: server returns `409` for writes carrying a stale `intent_id` | [Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony) | +| 19 | A malformed CBOR sidecar lands on disk after a crash mid-write | Malformed sidecar → quarantined to `.library/quarantine/`; never silent-skipped | [Filesystem — Repair](/design/filesystem/#repair) | +| 20 | A federation pull returns a manifest claiming a device that's not in the user's directory | Server's no-key check: `created_by_device` must be in the user's published device directory | [§ Server-Side Validation Invariants](#server-side-validation-invariants) | +| 21 | A buggy client uploads a metadata blob with a hand-crafted wire format | Metadata blob wire format is byte-exact; mismatched envelope rejected at decode | [Cryptography — Metadata Blob Wire Format](/design/cryptography/#metadata-blob-wire-format) | +| 22 | A retry of a delete manifest decrements blob refcount twice | Manifest idempotency keyed by `prior_provenance_hash`: a duplicate manifest is a no-op | [§ Idempotency Invariants](#idempotency-invariants) | +| 23 | A backup restore from 6 months ago silently overwrites current state | Restore-as-chain-fork: every restored manifest with a stale `prior_provenance_hash` is quarantined and surfaced for explicit merge | [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) ([open question](#open-questions)) | +| 24 | A new device claims its key is older than the account itself | Device entry in the device directory is signed by the IK and carries `added_at`; a server rejects an upload from a device whose `added_at` postdates the manifest | [Cryptography — Device Keys](/design/cryptography/#device-keys), [§ Server-Side Validation Invariants](#server-side-validation-invariants) | +| 25 | A peer floods notifications to make Capsule pull garbage | Notifications are advisory; pull is on Capsule's schedule and goes through full validation | [Federation — Pull-Only Federation](/design/federation/#pull-only-federation) | +| 26 | A federated server's TLS endpoint silently changes its public key | Servers cache each other's keys; rotation requires a notary endpoint co-sign | [Federation — Server Identity and Key Rotation](/design/federation/#server-identity-and-key-rotation) | +| 27 | A buggy client writes a stack edit that updates one member's sidecar and not the others | Stack edits are bundle-atomic: all `.tmp` files staged first, all renamed together; any rename failure discards the bundle | [Filesystem — Atomic Writes and Crash Recovery](/design/filesystem/#atomic-writes-and-crash-recovery) | +| 28 | A federated peer serves a stale capability token after revocation | Capability TTL ≤ 24h + published revocation list polled ≤ 15 min | [Federation — Federation Capabilities](/design/federation/#federation-capabilities) | +| 29 | A faulty client uploads embeddings derived from a model the receiver does not run | Vector index refuses inserts whose `model_id` is unknown | [ML Models — Embedding Provenance](/design/ml-models/#embedding-provenance) | +| 30 | A client tries to write directly to a server-derived field (e.g. computed ciphertext hash) | Server recomputes ciphertext hash at finalization and rejects mismatch | [Import & Sync — Finalization and Integrity](/design/import-synchronization/#finalization-and-integrity) | + +When a scenario surfaces during implementation that does not match any of the above, the rule is: add a row here, then declare the defense in exactly one owner doc. Never restate a defense in multiple docs. + +## Server-Side Validation Invariants + +The server holds no keys — it cannot verify any signature against a key it owns. But it **does** validate the *structure* of every write before persisting state. These checks are refuse-by-default and intentionally exhaustive; a buggy server that skips one of them silently widens the blast radius for the entire client class taxonomy above. + +This list is the canonical statement; [Filesystem](/design/filesystem/), [Import & Synchronization](/design/import-synchronization/), [Federation](/design/federation/), [Authorization](/design/authorization/), and [Authentication](/design/authentication/) reference it without restating. + +### On `POST /upload` (session creation) + +1. `X-Capsule-Protocol` is within the server's `[Min, Max]` range. Otherwise `426 Upgrade Required`, no session created. +2. `crypto_suite_id` is a row of the [Primitives Inventory](/design/cryptography/#primitives-inventory). Otherwise `400`. +3. `hash.algo` matches the algorithm declared by `crypto_suite_id`. Otherwise `400`. +4. `size` ∈ (0, `max_file_size`]. Otherwise `400` / `413`. +5. `content_type` ∈ closed enum for this protocol version. Otherwise `400`. +6. `album_id` exists; authenticated user has server-visible write capability on it; album's pinned `protocol_version` equals the request's. Otherwise `403`. +7. `created_by_device` is in the user's published device directory, and the directory entry's `added_at` precedes the request's `timestamp`. Otherwise `403`. +8. `timestamp` is within ±30 days of server clock. Otherwise `400`. + +### On each `PATCH /upload/{id}` chunk + +9. Offset is exactly the current received-byte count. Otherwise `409`, with `X-Capsule-Offset` returned. +10. Non-final chunk size is a multiple of 4 KiB. Otherwise `400`. +11. Cumulative received ≤ declared `size`. Otherwise `400` / `413`, session moves to `FailedProcessing`. +12. The `(upload_id, offset, chunk_hash)` idempotency tuple is new OR matches an exact prior PATCH. Otherwise (same offset, different hash) `409` + corruption error. + +### At finalization + +13. Total received == declared `size`. Otherwise `FailedProcessing`. +14. Recomputed ciphertext hash == declared `hash.value`. Otherwise `FailedProcessing` + corruption error. +15. Manifest envelope re-validated (rerun 1–8) inside the finalization transaction. + +### On non-upload writes (lifecycle action manifest, metadata-update, derivative-add/replace, trash-restore) + +16. `action` is in the closed enum. Otherwise `400`. +17. `prior_provenance_hash` equals the last accepted manifest's content hash for this `asset_id`. Otherwise `409` (stale-revival). +18. `amk_version` is monotonic per album (never regresses). Otherwise `400`. + +### On federation pull (server-to-server) + +19. Capability token verifies under home server's signing key; `exp` in future; `jti` not in revocation list (cached ≤ 15 min). Otherwise `401` / `403`. +20. All checks (1)–(18) re-applied — federation does not unlock looser rules. +21. Per-peer rate budgets unbroken (events/hour, bytes/hour, CPU/hour). Otherwise `429`. + +Every rejection is logged with a structured reason code; the rejected hash is remembered (bounded, see [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics)) so divergence between Capsule's view and a permissive peer's view is detectable. + +## Client-Side Validation Invariants + +Mirror checklist that every client implements before applying any received data — local or remote. A client that skips one of these is in the *faulty* class. + +- Run [`verify_asset`](/design/cryptography/#write-authorization) on every received `AssetManifest`. Quarantine on failure; never silent-drop, never silent-accept. +- Reject an incoming `sidecar_schema` greater than the client's `max_known_sidecar_schema`. Refuse to write that sidecar; refuse to read in normal mode (read-only opt-in is allowed). +- Reject an incoming `protocol_version` outside `[Min, Max]` known to the client. The same handshake the server runs. +- Reject an unknown enum value for any field whose enum is closed at the current schema (notably `action`, `content_type`, `gps.source`, `DerivativeManifest.role`). Unknown CBOR map keys are preserved per [Postel's Law](/design/principles/) and never executed. +- Maintain a local `latest_provenance_hash` per `asset_id`. Refuse to apply any manifest whose `prior_provenance_hash` is behind the local value. Surface it. +- Reject an OR-set remove whose `add_id` was never observed locally as an add. +- Refuse to follow a `revoke_all_sessions` confirmation that did not include a master-key proof. +- Decode remote-origin asset bytes only in the [sandboxed decoder](/design/clients/#sandboxed-decoder). + +## Protocol and Capability Negotiation + +Every versioned API surface — client-to-server uploads, sync feed, federation pull, peering — runs the same compatibility gate. The gate is **fail-closed**: a mismatch is a hard reject before any state is written, never a silent degrade. + +### Universal Headers + +| Header | Sent by | Meaning | +| ---------------------------- | ------------------------- | ------------------------------------------------------------------------------------------ | +| `X-Capsule-Protocol` | client / peer | `YYYY-MM-DD` protocol version the request is written against | +| `X-Capsule-Crypto-Suite` | client / peer on writes | `u16` suite id from the [Primitives Inventory](/design/cryptography/#primitives-inventory) | +| `X-Capsule-Sidecar-Schema` | client on metadata-update | `u16` schema version declared at `sidecar_schema` field 0 | +| `X-Capsule-Protocol-Min` | server on every response | the lowest protocol version this server accepts | +| `X-Capsule-Protocol-Max` | server on every response | the highest protocol version this server accepts | +| `X-Capsule-Min-Client-Build` | server on responses | semver deprecation cutoff; advisory unless the path is hard-deprecated | + +### Fail-Closed Rules + +- `X-Capsule-Protocol` outside `[Min, Max]` on a **write**: `426 Upgrade Required`. No session created, no row written. +- `X-Capsule-Crypto-Suite` not in the inventory: `400 Bad Request`. +- `X-Capsule-Sidecar-Schema` above the server's max known: `400 Bad Request`. (The server does not parse sidecars itself, but it refuses to acknowledge writes whose schema number it does not index.) +- **Reads of any past version succeed.** Read invariants are deliberately stable per [Versioning](/design/versioning/), so a current server still serves v_{k-N} blobs from years ago. +- Federation capability is an additional `401` / `403` layer on top of the protocol gate. A valid token never substitutes for a valid protocol header. + +The handshake is **one-shot per request**, not a negotiation. Either both sides agree by inspection, or the request fails. There is no back-and-forth that could leak partial state. + +## Idempotency Invariants + +Every write surface has a single idempotency key. Duplicates are no-ops; conflicts (same key, different content) are corruption errors. + +| Surface | Idempotency key | Duplicate behavior | +| ----------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------- | +| Upload chunk (`PATCH /upload/{id}`) | `(upload_id, offset, chunk_hash)` | Returns current offset; no double-write | +| Session creation (`POST /upload`) | `(owner_id, hash.value, album_id)` — server's existing dedup check | Returns the existing session; no second session | +| Lifecycle manifest write | `(asset_id, prior_provenance_hash, manifest_hash)` | No-op append; chain advances exactly once | +| Metadata-update operation | Operation id (UUIDv7) + `(asset_id, prior_provenance_hash)` | Re-applying the same op is structurally identical | +| Federation capability proof | `(peer_id, jti)` | Refresh with same `jti` returns the same response | +| Federation pull | `(peer_id, sync_cursor)` — the sync cursor itself is the key | Re-pull returns the same page | +| MLS commit | Handled by OpenMLS; commits are ordered by the group's commit chain | OpenMLS rejects duplicates | +| Album upgrade ceremony | `intent_id` (UUIDv7); see [Versioning](/design/versioning/#album-upgrade-ceremony) | Same intent never produces two forks | + +A write surface that does not appear here is, by default, **not** idempotent and must be designed before it ships. + +## Atomicity Invariants + +Multi-write operations that must succeed-as-one or not at all. A partial success on any of these is itself a damage scenario. + +- **Asset bundle finalization.** The manifest, ciphertext blob, metadata blob, and provenance blob commit together in a single Postgres transaction. Server failure between any pair leaves the entire bundle un-finalized; the session moves to `FailedProcessing` and the partial blobs are GC'd. ([Filesystem — Atomic Writes](/design/filesystem/#atomic-writes-and-crash-recovery)) +- **Stack edits.** All affected sidecars stage as `.tmp` files first; renames happen together. Any rename failure discards every `.tmp` in the bundle. ([Filesystem — Atomic Writes](/design/filesystem/#atomic-writes-and-crash-recovery)) +- **AMK epoch bump + write-tier key rotation.** A new AMK and a new write-tier key are minted as a single MLS commit. The two cannot exist out of sync. +- **Album upgrade ceremony.** The cutover is one MLS commit, the `AlbumTombstone`. Until applied, the client is in v_old; after, in v_new. ([Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony)) +- **Lifecycle manifest + provenance record.** Writing a lifecycle manifest and appending its provenance entry are the same act, because the provenance entry **is** the manifest plus the chain link. There is no separate "now record provenance" step that can race. + +## Quarantine Surfaces + +Every "don't apply, surface it" code path. The union exists so the UI surface and operator audit have a single inventory of "things that need a human to look at." + +| Surface | Where it lives on disk (client) | Source of truth doc | +| ------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------------------------------------- | +| `verify_asset` reject (any signature or chain failure) | Quarantine area surfaced via the audit log | [Cryptography — Write Authorization](/design/cryptography/#write-authorization) | +| Federation soft-fail | Rejected-hash table, bounded LRU | [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics) | +| Orphaned original (no sidecar) | `.library/quarantine/` after a failed recovery | [Filesystem — Repair](/design/filesystem/#repair) | +| Malformed CBOR sidecar | `.library/quarantine/` (the unparseable bytes are preserved) | [Filesystem — Repair](/design/filesystem/#repair) | +| Stale-revival (peer or restore sends old manifest) | Audit log + UI surface "peer sent stale state" | [Cryptography — Provenance](/design/cryptography/#provenance-of-library-modifications) | +| Album upgrade stranded write | Local `pending_until_upgrade` queue | [Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony) | +| Backup restore chain conflict | Audit log + UI surface "restore conflicts" | [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) | + +A quarantined item is **never silently dropped and never silently applied**. The user (or operator) can inspect, repair, or discard explicitly. + +## Provenance Immutability Rules + +The append-only hash-chained record per asset is defined in [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications). This section is the policy layer. + +- **No path exists to overwrite or delete an existing provenance entry.** Not via the API, not via the local filesystem (the client treats `.provenance.cbor` as append-only), not via federation. The constraint is structural, not enforced by a permission check. +- **Even a hard-delete preserves provenance.** When an asset is purged, its `media/{YYYY}/{YYYY-MM}/{uuid}.provenance.cbor` remains as a tombstone-with-history. The bytes that go away are the ciphertext blob and the encrypted metadata; the audit trail does not. +- **Export and backup carry the chain.** A backup artifact includes every asset's full provenance chain. On restore, the chain re-enters the local index — see the [open question on restore conflicts](#open-questions). +- **What a key-holding attacker still cannot do.** A complete current-key compromise lets the attacker append forward. It does not let them rewrite the past — every prior record is bound by a signature from a (possibly retired) device whose public half is still in the device directory. + +## Schema Evolution and Field Grammar + +The owner of "what a Capsule schema looks like" is each individual schema's owner doc; the owner of "what evolution is allowed" is this doc. + +### Deny-by-Default for Unknown Request Fields + +[Postel's Law](/design/principles/) — as tightened in principles — applies asymmetrically: + +- **In requests (client → server, or peer → server):** unknown fields at known positions in a known schema are accepted and preserved verbatim. Unknown fields at the **top level** that the receiver does not declare are **rejected**. Schema-bearing requests that announce a `sidecar_schema` or `crypto_suite_id` the receiver does not implement are rejected. The asymmetry is deliberate: liberal acceptance in requests is what lets new clients write extensions, but only *inside* a known schema envelope. +- **In responses (server → client):** unknown fields are preserved verbatim. A new server sending an old client a response with a new field does not break the old client. + +### Closed Enums + +The following enums are closed per `protocol_version` — a value outside the enum is a structural error, never a "future value to ignore": + +- `AssetManifest.action` +- `Sidecar.content_type` +- `Sidecar.gps.source` +- `DerivativeManifest.role` + +Adding a value to a closed enum bumps `protocol_version`. Old albums never see the new value because they are pinned. + +### Timestamp Grammar + +All `timestamp` and `ts` fields are RFC 3339 strings. Server-accepted values are bounded to **±30 days** of server wall-clock at the moment of accept (configurable per deployment). The bound applies to writes; reads serve whatever timestamp was historically accepted. + +A client whose system clock drifts more than 30 days from the server is rejected at handshake. This is the *honest* class's protection from a faulty NTP — the bound surfaces the drift instead of silently distorting audit timestamps. + +### Bounded String and Collection Sizes + +Every field has a maximum length declared in the schema (e.g. `caption_lww.value ≤ 4096 bytes`; `superseded_captions ≤ 16 entries`). The receiver rejects an oversized value. No field is unbounded. + +## Forbidden Client Behaviors + +A correct Capsule client implementation must never: + +- Back-date or post-date a `timestamp` outside the ±30-day window. +- Re-sign or re-issue a manifest under a `crypto_suite_id` lower than the original. +- Sign for an album epoch the client does not currently hold the write-tier key for. +- Issue an OR-set remove for an `add_id` it has not locally observed an add for. +- Strip `_unknown` fields from a sidecar it intends to write back. Round-trips must preserve everything the schema allows. +- Strip `superseded_captions` entries. +- Overwrite an existing `.provenance.cbor` file (the file is append-only). +- Submit a `revoke_all_sessions` without proof of master-key possession. +- Decode bytes received from a non-home peer outside the [sandboxed decoder](/design/clients/#sandboxed-decoder). +- Promote an AI tag to a user tag silently — promotion is an explicit, signed lifecycle operation. +- Treat a `429`, `409`, or `426` as a retry-with-the-same-payload. Each one requires a fix on the client (back off, re-align offset, upgrade) before retry. + +A client implementation that does any of the above is **buggy by definition**. The check belongs in the client implementation's own correctness tests; the network layers above protect against the consequences. + +## Min-Supported-Client Deprecation Policy + +Dropping a `protocol_version` from the server's accepted window is a breaking change. The policy: + +1. **Announcement.** A deprecation cutoff date is published at `/.well-known/capsule/deprecation` ahead of the cutoff by at least the announcement window (default 90 days, deployment-configurable). The announcement names the cutoff date and the minimum `protocol_version` that will remain accepted. +2. **Server response.** Below the cutoff, every response carries `X-Capsule-Min-Client-Build` and a `Warning:` header pointing to the deprecation URL. +3. **Hard cutoff.** On the cutoff date, the dropped version moves outside `[Min, Max]`. Writes from clients pinned to that version receive `426`. Reads still succeed. +4. **Stranded user.** A user whose only client is below the cutoff still has every recovery path from [Cryptography — Failure Modes and Recovery](/design/cryptography/#failure-modes-and-recovery): master key, cross-device, OGK, backup artifact. The deprecation does not strand data; it strands a specific old binary. + +The deprecation surface is **never** retroactive against historical state. Old albums pinned to a dropped version remain readable forever — they just cannot be written to from a current client. + +## Open Questions + +These survive the current design and should be resolved before the docs are considered final. + +1. **Restore-vs-stale-revival.** A restore from a 6-month-old backup hands the system manifests whose `prior_provenance_hash` is older than the local `latest_provenance_hash`. The naive defense quarantines every entry, which is a foot-gun. Two candidate resolutions: (a) restore enters a `restore_from_backup` chain branch the user explicitly merges, or (b) restore resets `latest_provenance_hash` from the backup contents under additional authentication. Resolution lives in [Backup & Recovery](/design/backup-recovery/). +2. **Sync cursor authenticity.** A malicious server could hand a client an older `sync_cursor` to rewind its view. The cursor is currently opaque; making it MAC'd by the server and validated as monotonic by the client is the leading fix. +3. **Cross-server album replication (v2).** v1 pins each album to a single home server; v2 will need a story for cross-server MLS state and federated commit ordering. +4. **Sponsored-account write damage.** A compromised registered account holds its sponsorees' KEKs and can manipulate their histories without their device keys. Enumerate the damage and bound it. +5. **AMK epoch monotonicity bootstrap.** A brand-new client cannot know the previous max `amk_version` without trusting the server. The fix bootstraps monotonicity from the MLS commit chain rather than the server's stored counter. +6. **Cross-language deterministic CBOR.** FFI consumers re-serializing may drift; no byte-identical cross-language test surface is documented. +7. **Federated quota DoS via honest user.** Per-peer quotas protect Capsule from a peer, but a single user receiving from many peers can exhaust the home server's storage. Needs a peer-attribution dimension. +8. **"New client" UI surface.** A client speaking a `protocol_version` ahead of an album's pin is rejected on writes but may *read* state a future client wrote. The unknown-extension UI surface needs definition in [Clients](/design/clients/). + +## Cross-References + +Each owner doc gains an invariant section or two that links back to this doc. The mapping: + +| Owner doc | Threat-model section(s) it ties into | +| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [Principles](/design/principles/) | [§ Damage Containment Layers](#damage-containment-layers) | +| [Versioning](/design/versioning/) | [§ Protocol and Capability Negotiation](#protocol-and-capability-negotiation), [§ Atomicity Invariants](#atomicity-invariants) | +| [Filesystem](/design/filesystem/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants), [§ Atomicity Invariants](#atomicity-invariants), [§ Quarantine Surfaces](#quarantine-surfaces) | +| [Cryptography](/design/cryptography/) | [§ Provenance Immutability Rules](#provenance-immutability-rules), [§ Damage Scenario Map](#damage-scenario--invariant-map) (signature/chain rows) | +| [Metadata](/design/metadata/) | [§ Schema Evolution and Field Grammar](#schema-evolution-and-field-grammar), [§ Damage Scenario Map](#damage-scenario--invariant-map) (CRDT rows) | +| [Import & Synchronization](/design/import-synchronization/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants), [§ Idempotency Invariants](#idempotency-invariants) | +| [Federation](/design/federation/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants), [§ Quarantine Surfaces](#quarantine-surfaces) | +| [Peering](/design/peering/) | [§ Client-Side Validation Invariants](#client-side-validation-invariants), [§ Damage Scenario Map](#damage-scenario--invariant-map) (peer rows) | +| [Authentication](/design/authentication/) | [§ Forbidden Client Behaviors](#forbidden-client-behaviors), [§ Damage Scenario Map](#damage-scenario--invariant-map) (revoke-all row) | +| [Authorization](/design/authorization/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants) | +| [Backup & Recovery](/design/backup-recovery/) | [§ Quarantine Surfaces](#quarantine-surfaces), [§ Open Questions](#open-questions) | +| [Thumbnails](/design/thumbnails/) | [§ Damage Scenario Map](#damage-scenario--invariant-map) (derivative row) | +| [ML Models](/design/ml-models/) | [§ Damage Scenario Map](#damage-scenario--invariant-map) (embedding model row) | +| [AI](/design/ai/) | [§ Forbidden Client Behaviors](#forbidden-client-behaviors) (AI tag namespace) | +| [Organization](/design/organization/) | [§ Atomicity Invariants](#atomicity-invariants), [§ Forbidden Client Behaviors](#forbidden-client-behaviors) | +| [Clients](/design/clients/) | [§ Client-Side Validation Invariants](#client-side-validation-invariants), [§ Min-Supported-Client Deprecation Policy](#min-supported-client-deprecation-policy) | diff --git a/capsule-docs/src/content/docs/design/thumbnails.md b/capsule-docs/src/content/docs/design/thumbnails.md new file mode 100644 index 0000000..701ee1f --- /dev/null +++ b/capsule-docs/src/content/docs/design/thumbnails.md @@ -0,0 +1,40 @@ +--- +title: Thumbnails and Previews +description: How we generate and manage thumbnails and previews for media assets in Capsule +--- + +We generate thumbnails and previews for all photos and videos. This doc is the **single source of truth** for the LQIP scheme and the thumbnail/preview formats — per the [single-source-of-truth rule](/design/principles/#single-source-of-truth), other docs reference these by link rather than restating the choice. + +## Thumbnail and Preview Formats + +> **Status:** The format table below is **provisional**. The choice between AVIF and JXL as the primary still-image codec is pending field testing of decoder availability and quality-per-byte across Capsule's target devices in 2026. The single-source-of-truth structure means any later swap is a one-row edit here, propagated nowhere else — see [Single Source of Truth](/design/principles/#single-source-of-truth). + + +Three derivative tiers per photo asset and one preview tier for video assets: + +| Tier | Photo format | Video format | Notes | +| ------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Thumbnail** (grid display) | **AVIF** (primary), WebP fallback for browsers without AVIF | First-frame AVIF still | AVIF q=50, 4:2:0 chroma, ~256 px long edge. | +| **Preview** (lightbox / single-asset view) | **AVIF** (primary), WebP fallback | **H.264 baseline** transcode at original resolution capped to 1080p | AVIF q=70 for stills; H.264 CRF 23 for video, 30 fps cap, AAC audio. | +| **Desktop-only optional cache** | **JXL** | (n/a) | JXL is generated only when the client is a desktop and the user opts in — best quality-per-byte but decoder support is still uneven in 2026. Never produced for shared/server-side derivatives. | + +- **AVIF** is the primary because in 2026 it ships in every major browser and on every major OS (iOS 16+, Android 12+, Chrome/Firefox/Safari current). Hardware decode is widespread. +- **WebP** is the fallback for the rare client that lacks AVIF. We deliberately do not fall back to JPEG — WebP covers everything JPEG would. +- **JXL** is kept as a *desktop-only optional* tier rather than the primary because cross-platform decoder coverage is still patchy. It is purely a local-cache choice; remote/sharing paths never use JXL. +- **H.264 baseline** for video previews — universally decodable, cheap CPU/GPU cost on every platform. AV1 was considered but encode cost is still high on mobile in 2026. + +If an original asset is lower-resolution than the highest thumbnail tier, the affected tier simply references the original instead of generating a redundant derivative. This is **distinct** from a missing derivative (an unintentional failure during generation) — the recovery-first principle treats missing derivatives as rebuildable from the original. + +## LQIP + +We use [chromahash](https://github.com/justin13888/chromahash) as a perceptual hash that decodes into a low-quality image placeholder. Chromahash was chosen for its color accuracy across color spaces and it was precisely developed for Capsule's particular needs. The hash is inlined into the encrypted CBOR metadata blob (see [Metadata Encryption](/design/cryptography/#metadata-encryption)), so it is available the instant metadata syncs, before any thumbnail fetch. + +Considered and rejected: ThumbHash (smaller wire size but worse color fidelity for the wide-gamut and HDR sources Capsule expects), BlurHash (older, blurrier, less color-accurate). The single-LQIP choice avoids exactly the kind of "chromahash/ThumbHash" hedge that previously caused doc drift. + +## Derivative Provenance + +Thumbnails and previews are *ephemeral by recovery posture* (they can always be regenerated from the original) but not *unowned*. A buggy or hostile client could otherwise quietly replace a good thumbnail with a corrupted one, and the receiving side would have no way to tell. To prevent this, every thumbnail and preview is uploaded as a derivative whose addition or replacement is an authorized, signed lifecycle action. + +The full derivative manifest structure and the `derivative-add` / `derivative-replace` action set are owned by [Cryptography — Derivative Provenance](/design/cryptography/#derivative-provenance) and [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set); this doc owns only the *format* of the derivative bytes. The two interact at exactly one point: the `DerivativeManifest.format` field names the codec/format from the table above, and the verifying side rejects a manifest whose `format` is not currently recognized (the closed-enum rule from [Threat Model — Schema Evolution](/design/threat-model/#schema-evolution-and-field-grammar)). + +A thumbnail whose `DerivativeManifest` fails verification is **regenerated locally from the original** rather than trusted — the [recovery-first principle](/design/principles/) means a derivative is always rebuildable, so refusal-and-regenerate is the safe default. The corrupt copy is discarded (not quarantined — it carries no irreplaceable bytes), and the corresponding regeneration appends a new `derivative-replace` provenance record. diff --git a/capsule-docs/src/content/docs/design/versioning.md b/capsule-docs/src/content/docs/design/versioning.md new file mode 100644 index 0000000..4ad3772 --- /dev/null +++ b/capsule-docs/src/content/docs/design/versioning.md @@ -0,0 +1,73 @@ +--- +title: Versioning +description: Handling versioning gracefully +--- + +Changes are inevitable. Capsule minimizes breaking changes but generously accepts compatible ones. The aim is backward-compatible reads forever and a deliberately fail-closed write path — a [version-mismatched client](/design/threat-model/) never silently corrupts state, it is rejected at the handshake. + +Versioning happens on multiple layers: + +- **Metadata CBOR schema** — `sidecar_schema` field 0 of every sidecar (see [Metadata — Schema Versioning Rules](/design/metadata/#schema-versioning-rules)). +- **Cryptographic primitive bundle** — `crypto_suite_id` on every manifest and metadata blob (see [Cryptography — Versioning Identifiers](/design/cryptography/#versioning-identifiers)). +- **Wire protocol** — `protocol_version` (date-based, `YYYY-MM-DD`) on every API request and album pin. See [Threat Model — Protocol and Capability Negotiation](/design/threat-model/#protocol-and-capability-negotiation) for the universal handshake. +- **Client cache** — internal and rebuildable; cache schema changes drop and rebuild rather than migrate. +- **Server data structures** — PostgreSQL schema migrations forward-only. The session-state store is a deployment choice, not a versioned API surface: by default `upload_sessions` lives in PostgreSQL, and high-concurrency deployments may relocate it to Valkey for hot-path performance only. The wire protocol is identical in both cases (see [Filesystem — Stores by Deployment Profile](/design/filesystem/#stores-by-deployment-profile)). + +## Compatibility Verification + +Initial startups of a client and server always strictly check for version compatibility and **crash early** rather than soft-degrade. The single handshake in [Threat Model — Protocol and Capability Negotiation](/design/threat-model/#protocol-and-capability-negotiation) is the only point at which compatibility is determined; once an operation is past the handshake, both sides know they agree on `protocol_version`, `crypto_suite_id`, and `sidecar_schema`. + +Capsule does **not** support backwards migrations or version downgrades. Server-side schema migrations are forward-only; if a migration fails, the server refuses to start and the operator restores from backup. There is no "rollback then continue" — that path is what corrupts data. + +## Album Protocol Version Pinning + +Each album declares a **protocol version at creation, and that version is immutable** for the album's lifetime. Every event in the album must conform to it. Adopting a new protocol feature does not mutate an existing album — it requires either creating a new album, or an explicit [upgrade ceremony](#album-upgrade-ceremony) that tombstones the old album and creates a new one. + +This bounds the blast radius of a buggy or malicious implementation: a faulty v4 implementation can only ever corrupt v4 albums, because v1–v3 validation rules never change. It matters most under [Federation](/design/federation/), where Capsule cannot assume a peer is running the same version — pinning is what lets old albums keep working when a peer ships bad v4 code. + +## Album Upgrade Ceremony + +A version-pinned album is upgraded by a **tombstone-plus-fork** ceremony: the old album is frozen, a new album at the target version is forked from its frozen state, and all members migrate. The ceremony is **atomic at the user level** — there is no halfway state visible to one client — and **resumable** if any participant crashes partway through. Every step is keyed by an `intent_id: UUIDv7` to defeat duplicate or contradictory upgrade proposals. + +```text +[v_old normal] --UpgradeIntent--> [v_old quiescing] --drain--> [v_old frozen] + | + AlbumTombstone commit + | + v + [v_new active] + ^ + queued v_old writes replayed +``` + +### Steps + +1. **Freeze proposal.** An album admin issues an MLS application message `UpgradeIntent { from_version, to_version, intent_id, proposer_device, deadline }`, hybrid-signed by the admin's [DSK](/design/cryptography/#device-keys). The proposal carries a deadline (default 7 days). Any member's client receiving an `UpgradeIntent` for an album that is already in upgrade quiescence under a *different* `intent_id` rejects the new proposal — only one upgrade can be in flight per album. +2. **Quiesce writes.** Members enter upgrade quiescence on receipt of `UpgradeIntent`: + - In-flight uploads against the album are allowed to reach a terminal state. + - New writes are queued **locally** with a `pending_until_upgrade` flag and the `intent_id`; they are not sent to the server. + - The server augments the album row with `upgrade_pending_to = to_version, intent_id`. New upload sessions for this album whose `manifest.intent_id` does **not** match are rejected with `409 Conflict` — preventing a stale v_old client from writing past the freeze. +3. **Drain.** The upgrade cannot proceed while any session for this album is in `Uploading` or `WaitingForProcessing`. The server exposes the in-flight count to the proposer's client. The deadline from step 1 bounds the wait; on deadline expiry the upgrade aborts cleanly (state returns to v_old normal; queued local writes are flushed back to v_old). +4. **Tombstone.** Once drained, the proposing admin issues an MLS commit `AlbumTombstone { intent_id, frozen_state_hash }`. `frozen_state_hash` is a SHA-256 over the canonical CBOR of the album's full state: the sorted member list, every accepted manifest's hash, and the head of the album's provenance log. Every receiving member's client recomputes the hash against its own state; on mismatch the upgrade aborts (each member independently — the album returns to normal operation). Hash mismatch means at least one member's view of the album diverges and must be resolved before any upgrade. +5. **Fork.** A new album group is created at `to_version`, MLS-named `parent_id_v{n}`, with the manifest field `upgraded_from: { old_album_id, intent_id, frozen_state_hash }`. Assets are **not** re-encrypted: the new album references the existing ciphertext blobs by content hash. Members are added to the new MLS group via standard `Add` proposals; fresh `AMK_v1` and a fresh write-tier key are minted. +6. **Apply queued writes.** Each member's locally queued `pending_until_upgrade` writes are re-encoded against `to_version` (the album pin and `crypto_suite_id` may have changed) and replayed into the new album. +7. **Resumption (partial-failure recovery).** A client that crashes between step 2 and step 6 reads its local `upgrade_pending_to` on restart, queries the server for the upgrade's current phase via the album row, and resumes from there. The `intent_id` is the idempotency key — the same `UpgradeIntent` never produces two forks, and a duplicate `AlbumTombstone` commit is a no-op at the MLS layer. +8. **Atomicity guarantee.** The cutover is the single MLS commit in step 4. Until that commit is applied by a member's client, the client is operating in v_old; after, in v_new. There is no in-between state visible to one client. Cross-member, the cutover is observed as each member processes the commit; until the slowest member processes it, that member is still in v_old (and its `pending_until_upgrade` writes remain queued locally, never lost). + +### What This Defends Against + +- **Version-mismatched-client damage.** A v_old client cannot write into a v_new album because every write carries `protocol_version`, which is rejected by the [protocol handshake](/design/threat-model/#protocol-and-capability-negotiation) and the [server-side validation invariants](/design/threat-model/#server-side-validation-invariants). +- **Partial-upgrade corruption.** Quiescence + drain ensures no v_old write is mid-flight at the moment of cutover. The `intent_id` keys every step so a retried, duplicated, or contradictory proposal cannot produce two divergent v_new albums. +- **Hostile member sabotage.** A member whose computed `frozen_state_hash` differs from the proposer's rejects the tombstone, aborting the upgrade. A malicious member cannot trick the rest into a forged "post-upgrade" state. + +The full atomicity rule lives in [Threat Model — Atomicity Invariants](/design/threat-model/#atomicity-invariants); stranded `pending_until_upgrade` writes are a [quarantine surface](/design/threat-model/#quarantine-surfaces). + +## Min-Supported-Client Window + +The server accepts a *window* of past `protocol_version` values, not only the newest, so a staggered client rollout keeps working. A version leaves the window only after a deprecation period; the policy is owned by [Threat Model — Min-Supported-Client Deprecation Policy](/design/threat-model/#min-supported-client-deprecation-policy). + +The interaction with album pinning: + +- A client whose `protocol_version` falls below the server's `Min` is rejected at the handshake for *any* write — it cannot upload into any album, including ones pinned to the version it can still parse. +- A client whose `protocol_version` falls below an album's pin is rejected for writes to *that album* — the album's pin is a per-album minimum, often higher than the server's minimum (e.g., a v_2024-09-01 album rejects v_2024-06-01 clients even on a server that still accepts v_2024-06-01 for other albums). +- **Reads are unaffected.** A v_old client can always *read* an album it cannot write to. The deprecation policy never makes historical state unreadable. diff --git a/capsule-docs/src/content/docs/development/upload.md b/capsule-docs/src/content/docs/development/upload.md deleted file mode 100644 index eaf7e2e..0000000 --- a/capsule-docs/src/content/docs/development/upload.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -title: Upload API -description: Architecture and implementation details for the Capsule upload API. ---- - - diff --git a/capsule-docs/src/content/docs/guides/self-hosting.md b/capsule-docs/src/content/docs/guides/self-hosting.md index 88348c8..02b3eed 100644 --- a/capsule-docs/src/content/docs/guides/self-hosting.md +++ b/capsule-docs/src/content/docs/guides/self-hosting.md @@ -30,12 +30,13 @@ Since Capsule extensively uses container technologies for both development and p The Capsule API is written almost entirely in Rust with several binary components serving distinct purposes: -- [GraphQL](/capsule-api/graphql/): GraphQL API for majority of user-facing functionality. Flexible and cross-platform. -- [Upload](/capsule-api/upload/): A performant TUS-based upload service. Enables high-throughput, resumable uploads. -- [Metadata](/capsule-api/metadata/): Used for efficient metadata fetching and updating. Consists of two parts: +- **GraphQL**: GraphQL API for majority of user-facing functionality. Flexible and cross-platform. +- **Upload**: A performant TUS-based upload service. Enables high-throughput, resumable uploads. +- **Metadata**: Used for efficient metadata fetching and updating. Consists of two parts: - A gRPC (web) service for efficient fetching and updating metadata. We strictly prefer binary-based protocols (i.e. no JSON) for lower-serialization costs with mobile clients. - WebSocket + ProtoBuf service for efficient real-time updates + *Note: These components may be combined into a single web server for low-resource environments. It is used in the one-click Docker installer as well.* From fdde85417fc01caec610b53ca5199f93e34efb8c Mon Sep 17 00:00:00 2001 From: Justin Chung <20733699+justin13888@users.noreply.github.com> Date: Sun, 31 May 2026 14:10:56 -0400 Subject: [PATCH 4/4] docs: expand design docs, add validation checklists, curate sidebar MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the autogenerated Design sidebar with an explicit ordered hierarchy (Foundations → Cryptography → Identity & Access → Storage → Import/Sync → Sharing → Clients → Threat Model) so navigation follows logical reading order rather than directory order. Substantially expand ai.md, authentication.md, and authorization.md: add Validation sections with unit/smoke test checklists per contract-driven methodology, a canonical model-inventory table (v1-committed slots + post-v1 candidates), Embedding Provenance invariants (model_id/version tuple, stale-flag, platform-partition E2EE fallback), algorithm implementations (Video-as-Sparse-Photos, Re-ID, HNSW search), and update all cross-references to the new per-page URL structure from earlier design-doc splits. Sync the Self-validation code-style bullet from CLAUDE.md into AGENTS.md so agent and human instructions stay in step. --- AGENTS.md | 2 + capsule-docs/astro.config.mjs | 81 ++- capsule-docs/src/content/docs/design/ai.md | 130 +++- .../src/content/docs/design/authentication.md | 98 +-- .../src/content/docs/design/authorization.md | 67 +- .../content/docs/design/backup-recovery.md | 83 ++- .../src/content/docs/design/clients.md | 50 +- .../src/content/docs/design/cryptography.md | 619 ------------------ .../docs/design/cryptography/encryption.md | 111 ++++ .../docs/design/cryptography/failure-modes.md | 82 +++ .../content/docs/design/cryptography/index.md | 33 + .../content/docs/design/cryptography/keys.md | 188 ++++++ .../content/docs/design/cryptography/mls.md | 89 +++ .../docs/design/cryptography/primitives.md | 113 ++++ .../docs/design/cryptography/provenance.md | 133 ++++ .../content/docs/design/device-enrollment.md | 92 +++ .../src/content/docs/design/federation.md | 83 +-- .../src/content/docs/design/filesystem.md | 496 -------------- .../content/docs/design/filesystem/client.md | 65 ++ .../content/docs/design/filesystem/index.md | 39 ++ .../docs/design/filesystem/maintenance.md | 94 +++ .../content/docs/design/filesystem/server.md | 118 ++++ .../docs/design/import-synchronization.md | 270 -------- .../docs/design/import/download-sync.md | 102 +++ .../src/content/docs/design/import/index.md | 33 + .../content/docs/design/import/pipeline.md | 78 +++ .../docs/design/import/upload-protocol.md | 156 +++++ capsule-docs/src/content/docs/design/index.md | 25 + .../src/content/docs/design/metadata.md | 84 ++- .../src/content/docs/design/ml-models.md | 106 --- .../src/content/docs/design/mls-resilience.md | 70 ++ .../src/content/docs/design/moderation.md | 76 +++ .../src/content/docs/design/module-map.md | 157 +++++ .../src/content/docs/design/organization.md | 95 ++- .../src/content/docs/design/peering.md | 157 ++--- .../src/content/docs/design/principles.md | 105 ++- capsule-docs/src/content/docs/design/quota.md | 82 +++ .../src/content/docs/design/share-links.md | 71 ++ .../src/content/docs/design/threat-model.md | 313 --------- .../content/docs/design/threat-model/index.md | 71 ++ .../docs/design/threat-model/scenarios.md | 73 +++ .../docs/design/threat-model/schema-rules.md | 74 +++ .../docs/design/threat-model/validation.md | 126 ++++ .../src/content/docs/design/thumbnails.md | 50 +- .../src/content/docs/design/versioning.md | 49 +- 45 files changed, 3080 insertions(+), 2209 deletions(-) delete mode 100644 capsule-docs/src/content/docs/design/cryptography.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/encryption.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/failure-modes.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/index.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/keys.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/mls.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/primitives.md create mode 100644 capsule-docs/src/content/docs/design/cryptography/provenance.md create mode 100644 capsule-docs/src/content/docs/design/device-enrollment.md delete mode 100644 capsule-docs/src/content/docs/design/filesystem.md create mode 100644 capsule-docs/src/content/docs/design/filesystem/client.md create mode 100644 capsule-docs/src/content/docs/design/filesystem/index.md create mode 100644 capsule-docs/src/content/docs/design/filesystem/maintenance.md create mode 100644 capsule-docs/src/content/docs/design/filesystem/server.md delete mode 100644 capsule-docs/src/content/docs/design/import-synchronization.md create mode 100644 capsule-docs/src/content/docs/design/import/download-sync.md create mode 100644 capsule-docs/src/content/docs/design/import/index.md create mode 100644 capsule-docs/src/content/docs/design/import/pipeline.md create mode 100644 capsule-docs/src/content/docs/design/import/upload-protocol.md create mode 100644 capsule-docs/src/content/docs/design/index.md delete mode 100644 capsule-docs/src/content/docs/design/ml-models.md create mode 100644 capsule-docs/src/content/docs/design/mls-resilience.md create mode 100644 capsule-docs/src/content/docs/design/moderation.md create mode 100644 capsule-docs/src/content/docs/design/module-map.md create mode 100644 capsule-docs/src/content/docs/design/quota.md create mode 100644 capsule-docs/src/content/docs/design/share-links.md delete mode 100644 capsule-docs/src/content/docs/design/threat-model.md create mode 100644 capsule-docs/src/content/docs/design/threat-model/index.md create mode 100644 capsule-docs/src/content/docs/design/threat-model/scenarios.md create mode 100644 capsule-docs/src/content/docs/design/threat-model/schema-rules.md create mode 100644 capsule-docs/src/content/docs/design/threat-model/validation.md diff --git a/AGENTS.md b/AGENTS.md index ef63635..5f4af6b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,8 +2,10 @@ ## Code Style +- Self-validation: Most if not all code should be modular, reusable, and testable. The code that requires human review and manual testing should be minimal and focused on user facing features. All critical code must be primarily covered by complete and automated tests. - Contract-driven development: Define the interfaces and data structures first, along with all test cases, before implementing the actual logic. - Cohesion: All code should be split into cohesive modules that have a single responsibility and clear interfaces. Encapsulate unnecessary details. - Minimalism: Choose to use a dependency if it reduces the scope of testing and quantity of code and as long as it does not compromise on performance and required capabilities. - Traceability: all critical processes are verbosely logged so it is clear what happened after the fact and recovery can be feasible. Use INFO logs where necessary and DEBUG,TRACE aggressively for all critical processes. Logs should be structured and easily queryable. Instrument hot paths (e.g. major functions) for performance monitoring and debugging in production. + - Mocking: Use mocks for all external dependencies and critical internal processes. This allows us to have deterministic tests and easily simulate edge cases and failure scenarios that are hard to reproduce with real dependencies. Do not try to wire up two incomplete complex systems to mock each other. diff --git a/capsule-docs/astro.config.mjs b/capsule-docs/astro.config.mjs index 7883ae9..23103a3 100644 --- a/capsule-docs/astro.config.mjs +++ b/capsule-docs/astro.config.mjs @@ -41,7 +41,86 @@ export default defineConfig({ }, { label: 'Design', - autogenerate: { directory: 'design' }, + items: [ + { + label: 'Foundations', + items: [ + { slug: 'design' }, + { slug: 'design/principles' }, + { slug: 'design/module-map' }, + ], + }, + { + label: 'Cryptography', + items: [ + { slug: 'design/cryptography' }, + { slug: 'design/cryptography/primitives' }, + { slug: 'design/cryptography/keys' }, + { slug: 'design/cryptography/encryption' }, + { slug: 'design/cryptography/mls' }, + { slug: 'design/cryptography/provenance' }, + { slug: 'design/cryptography/failure-modes' }, + ], + }, + { + label: 'Identity & Access', + items: [ + { slug: 'design/authentication' }, + { slug: 'design/authorization' }, + { slug: 'design/device-enrollment' }, + { slug: 'design/mls-resilience' }, + ], + }, + { + label: 'Storage', + items: [ + { slug: 'design/filesystem' }, + { slug: 'design/filesystem/server' }, + { slug: 'design/filesystem/client' }, + { slug: 'design/filesystem/maintenance' }, + { slug: 'design/metadata' }, + { slug: 'design/thumbnails' }, + { slug: 'design/quota' }, + ], + }, + { + label: 'Import & Sync', + items: [ + { slug: 'design/import' }, + { slug: 'design/import/pipeline' }, + { slug: 'design/import/upload-protocol' }, + { slug: 'design/import/download-sync' }, + { slug: 'design/backup-recovery' }, + { slug: 'design/versioning' }, + ], + }, + { + label: 'Sharing & Federation', + items: [ + { slug: 'design/federation' }, + { slug: 'design/peering' }, + { slug: 'design/share-links' }, + { slug: 'design/moderation' }, + ], + }, + { + label: 'Organization & Clients', + items: [ + { slug: 'design/organization' }, + { slug: 'design/clients' }, + { slug: 'design/ai' }, + ], + }, + { + label: 'Threat Model', + items: [ + { slug: 'design/threat-model' }, + { slug: 'design/threat-model/scenarios' }, + { slug: 'design/threat-model/schema-rules' }, + { slug: 'design/threat-model/validation' }, + ], + }, + ], }, { label: 'Development', diff --git a/capsule-docs/src/content/docs/design/ai.md b/capsule-docs/src/content/docs/design/ai.md index 4adb697..5cddc71 100644 --- a/capsule-docs/src/content/docs/design/ai.md +++ b/capsule-docs/src/content/docs/design/ai.md @@ -1,67 +1,131 @@ --- -title: AI/ML Integrations in Capsule -description: How do AI features fit into Capsule's architecture and design principles? +title: AI/ML Integrations +description: AI feature architecture, the canonical model inventory, embedding provenance, and AI/user metadata separation --- -> **Status:** Details below are **provisional** pending experimentation. The structure of categories, the namespace separation in [AI Output Containment](#ai-output-containment), and the canonical-model invariant from [ML Models — Embedding Provenance](/design/ml-models/#embedding-provenance) are stable; the specific feature list and per-feature behavior may evolve. +Capsule runs a hierarchy of ML models, all **client-side** (the server never holds plaintext). The stable contract is the *structure*: three functional categories, the AI/user namespace separation in [AI Output Containment](#ai-output-containment), the canonical model inventory in [Models and Algorithms](#models-and-algorithms), and the [embedding-provenance](#embedding-provenance) invariant. The specific feature list and per-model choices are current defaults that will evolve with field testing. -Capsule runs a hierarchy of ML models for various tasks. The E2E nature of Capsule's architecture requires careful consideration of device capabilities and latency requirements for different features. We broadly categorize the AI/ML processing into three functions: +The three categories: -- **[Semantic Indexing](#semantic-indexing):** Generate a *global* embedding for each asset to enable natural language search and similarity search. -- **[Dense Tagging](#dense-tagging):** Generate *local* embeddings for objects, faces, and background elements to enable granular search and auto-album generation. -- **[Quality Assessment](#quality-assessment):** Generate quality scores for each asset to enable quality-based filtering and sorting. +- **[Semantic Indexing](#semantic-indexing):** a *global* embedding per asset for natural-language and similarity search. +- **[Dense Tagging](#dense-tagging):** *local* embeddings for objects, faces, and scene elements for granular search and auto-albums. +- **[Quality Assessment](#quality-assessment):** per-asset quality scores for filtering and sorting. -Additional AI/ML categories may be added; the canonical inventory is [ML Models](/design/ml-models/). +Inference orchestration lives in `capsule-core::ml`; per-platform model runners (CoreML, NNAPI, ONNX Runtime) live in `capsule-sdk`; the local vector index lives in `capsule-core::db` (SQLite + `sqlite-vec`). ## AI Output Containment -AI inference can be wrong, biased, or hallucinatory. A core design rule prevents AI output from corrupting user intent: **AI outputs land in a separate namespace from user-authored metadata, structurally, not by policy.** +AI inference can be wrong, biased, or hallucinatory. A core rule prevents it from corrupting user intent: **AI outputs land in a separate namespace from user-authored metadata, structurally, not by policy.** The shape of the separation — `tags_ai` vs `tags_user` OR-sets, plus distinct sidecar fields for AI-derived facets — is owned by [Metadata — Tag Provenance and Namespacing](/design/metadata/#tag-provenance-and-namespacing); the consequences for AI features: -- AI-suggested tags live in `tags_ai` (a separate OR-set from `tags_user`) — see [Metadata — Tag Provenance and Namespacing](/design/metadata/#tag-provenance-and-namespacing). An AI tag can never overwrite a user tag because they are different fields. -- AI-derived face identities, scene labels, and quality scores live in distinct sidecar fields (e.g. `ai_face_labels`, `ai_scene`, `ai_quality_score`) that the user does not directly edit; user corrections write to *user* fields and AI re-runs leave the user fields alone. -- Every AI output entry carries `model_id` and `model_version` (see [ML Models — Embedding Provenance](/design/ml-models/#embedding-provenance)). When the canonical model for that slot changes, old AI outputs are flagged as stale and excluded from queries until regenerated. -- Promoting an AI tag to a user tag is an explicit, signed lifecycle operation — never automatic, never silent. See [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set). +- An AI tag can never overwrite a user tag — they live in different fields, so the question does not arise. +- As each AI facet ships, it lands in its own AI-namespaced sidecar field the user does not directly edit (illustratively `ai_face_labels`, `ai_scene`, `ai_quality_score`) — reserved alongside `tags_ai` in the [sidecar schema](/design/metadata/#sidecar-schema-v1) and added when the feature is committed, never overlapping a user field. User corrections write to *user* fields; AI re-runs leave them alone. +- Every AI output carries `(model_id, model_version)` ([Embedding Provenance](#embedding-provenance)). When the canonical model for a slot changes, old outputs are flagged stale and excluded from queries until regenerated. +- Promoting an AI tag to a user tag is an explicit, signed [lifecycle operation](/design/authorization/#the-closed-action-set) — never automatic. -A hallucinating model can pollute its own namespace; it cannot pollute user intent. This is the structural defense against the "AI mistake silently overwrites user-authored data" damage class — see [Threat Model — Forbidden Client Behaviors](/design/threat-model/#forbidden-client-behaviors). +A hallucinating model can pollute its own namespace, never user intent. This is the structural defense against the "AI mistake silently overwrites user data" damage class — see [Threat Model — Forbidden Client Behaviors](/design/threat-model/schema-rules/#forbidden-client-behaviors). ## Semantic Indexing -To do semantic search, you convert an image and a text query into arrays of numbers (vectors) and measure the distance between them. Every embedding model maps the universe differently, and Capsule is end-to-end encrypted, so every device must run the *same* embedding model — vectors are otherwise incomparable across devices. The canonical model for this slot is declared in [ML Models](/design/ml-models/) (see the **Semantic Search** row). +Semantic search converts an image and a text query into vectors and measures their distance. Because embeddings are generated client-side, every device must run the same canonical model along a deterministic path so vectors are comparable — the constraint and its platform-partition fallback are specified in [Embedding Provenance](#embedding-provenance). ### Image Categorization & Tagging -We reuse the semantic embeddings for zero-shot classification to generate tags. This enables faceted search and auto-album generation without a separate classifier model. +The semantic embeddings are reused for zero-shot classification to generate tags, enabling faceted search and auto-album generation without a separate classifier. ## Dense Tagging -We have the following ordering of operations: - -- Face Detection & Matching (Clustering): see the **Face Detection** and **Face Recognition** rows in [ML Models](/design/ml-models/). The chosen detector and embedder are SOTA-small models that run near-instantly on mobile devices. - - +Face Detection & Matching (clustering) runs the **Face Detection** and **Face Recognition** rows of the [model inventory](#models-and-algorithms) — SOTA-small models that run near-instantly on mobile. ## Quality Assessment -TODO +Deferred to post-v1. The category and its sidecar fields are reserved in the [containment model](#ai-output-containment) so it can land later without a schema change; the Quality candidate models in the [inventory](#models-and-algorithms) are not part of the v1 pipeline. ## Model Batching -Memory is at a premium in mobile devices. We want to be as power-efficient as possible while fulfilling the computational needs of the models. As such, we batch the execution of models in the following ways: +On-device inference is memory- and power-bound, so execution mode is chosen per device: -- Horizontal Batching (model-by-model): Run each model sequentially across all assets. This minimizes the number of models that need to be loaded in memory at once but it incurs lots of IO (since you are reading assets multiple times). -- Vertical Batching (end-to-end): Run all models at once for each asset. This minimizes IO but it is memory intensive since you need to load all models at once, and may result in OOM killing the application process (on mobile OSes). +- **Horizontal (model-by-model)** vs. **vertical (all models per asset)**: horizontal minimizes resident models at the cost of re-reading assets; vertical minimizes I/O but risks OOM on mobile. The mode is picked from available RAM at task start. +- **Micro-batching** (1/4/8 images) keeps the NPU cache hot; **INT8/FP16 quantization** halves memory bandwidth; **thermal throttling** pauses the pipeline past a temperature threshold (e.g. 40 °C) so the OS does not kill the app. -We pick the execution model with the following process: +## Database Indexing and View Generation -- Calculate RAM capacity upfront: Upon starting the task, check the device's available memory. Decide dynamically whether to use Horizontal or Vertical batching based on the device's resources. -- Enforce Micro-Batching: Never pass a massive batch to the inference engine. Break your "huge batch" down into micro-batches of 1, 4, or 8 images. This keeps the NPU cache hot and prevents battery-draining DRAM fetches. -- Quantize everything: Ensure your models are quantized to INT8 or FP16. This halves the memory bandwidth required, which directly translates to less battery consumed and less heat generated. -- Throttle based on thermals: Modern mobile APIs allow you to monitor device temperature. If the device hits 40°C, artificially pause the pipeline for a few seconds. A slightly slower job is better than the OS terminating your app or the hardware thermal-throttling your speeds to a crawl. +Embeddings share a common vector space and are stored locally in **SQLite + `sqlite-vec`**. The vector index is **derived state, not a source of truth** ([recovery-first](/design/principles/)): if lost or corrupted it is rebuilt by re-running inference over the originals — the same path a model-version bump takes ([Embedding Provenance](#embedding-provenance)). -## Database Indexing and View Generation +## Embedding Provenance -Since each model (except for a few) generate embeddings in a common vector space, we store them locally in a database. We use SQLite + `sqlite-vec`. +Every embedding Capsule stores — in the local SQLite vector index, in an encrypted backup, or inside a [`DerivativeManifest`](/design/cryptography/provenance/#derivative-provenance) for an embedding-class derivative — carries the tuple `(model_id, model_version)` identifying which [inventory](#models-and-algorithms) row produced it. Vector spaces differ across pairs, so embeddings are not comparable across `(model_id, model_version)`. Every `model_id` is declared in exactly one inventory row ([SSoT](/design/principles/#single-source-of-truth)); a swap is a one-row edit that propagates by `model_id` to every consumer. The invariant: + +- The vector index **refuses inserts** whose `model_id` is not the current canonical row for its task. A buggy or new client uploading embeddings from an unrecognized model is rejected at the insert API, never silently mixed in. +- A model swap increments `model_version` for that task. Old embeddings are **flagged stale** and excluded from queries until regenerated from the originals. Cross-version comparison is forbidden — see [Threat Model — Client-Side Validation Invariants](/design/threat-model/validation/#client-side-validation-invariants). +- Regeneration is a background task that walks the library producing fresh embeddings at the new version; old entries are removed only after new ones persist (per-asset replace, not a global truncate-and-rebuild). + +**E2EE constraint and its fallback.** Comparable embeddings need byte-identical inference output across heterogeneous NPUs/CPUs, and floating-point inference is not inherently reproducible across execution providers. So every device pins a **deterministic execution path** for the canonical model (fixed operator set, nondeterministic kernels disabled, quantized weights) and passes a byte-identical known-answer check; the model size floor is the lowest-end device Capsule supports, not the desktop. If a device cannot reach bit-exactness in field testing, the **fallback is explicit, never silent**: its embeddings are tagged with a `platform` discriminator and are **not merged** into another platform's index — they are regenerated locally and compared only within their own partition via tolerance-based ANN. The worst case is duplicated per-platform regeneration, never wrong search results. This defeats the "silent invalidation of the vector index" damage class ([Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) row #14). ## Models and Algorithms -The concrete model chosen for each task, and the key algorithms that combine them, are catalogued in [ML Models and Algorithms](/design/ml-models/). +One row per task. Where the size/accuracy trade allows, a single backbone is **reused across tasks** rather than loading one model per task — e.g. the canonical Semantic Search embedder also feeds zero-shot tagging and semantic-duplicate detection, and YOLOv10 serves both person and object detection. Reuse is the default whenever it does not measurably hurt quality; it is the main lever bounding peak VRAM on mobile. + +### v1-Committed Slots + +These four are the launch pipeline: + +| Task | Model(s) | Function | +| -------------------- | ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | +| **Semantic Search** | MobileCLIP-B (ONNX, INT8); quantized SigLIP-tiny fallback[^semantic-alt] | Global image embedding for natural-language + similarity search; sized for the lowest-end device (see the E2EE constraint above). | +| **Object Detection** | YOLOv10[^objdet-alt] | Object/background detection feeding dense tagging; the backbone is reused for person detection. | +| **Face Detection** | SCRFD | Efficient face bounding-box + landmark detection. | +| **Face Recognition** | InsightFace (AdaFace) | Face embeddings; AdaFace handles low-quality/dark images well. | + +### Candidate Tasks (post-v1) + +Planned tasks whose model choice is still subject to 2026 field testing. Each commits to a full inventory row (with datasets and the embedding-provenance tuple) when it ships: + +- **Natural language & VLM** — Dense Tagging & OCR (Florence-2); Image Chat (Qwen2.5-VL or LLaVA-1.6); Captioning (BLIP-2). +- **People** — Person Detection (YOLOv10); Person Re-ID (OSNet or TransReID); Expression Analysis (EmotioNet); Quality Scoring (LIQE / TOPIQ). +- **Scene** — Scene Classification (ViT-L, ConvNeXt-L); Landmark Detection (DINOv2 + GeM); Bird/plant (BioCLIP); General animal (YOLOv8 fine-tuned); Screenshot detection (custom CNN). +- **Text & audio** — OCR (TrOCR); Voice Transcription (Distil-Whisper-large[^asr-alt]). +- **Quality** — Aesthetic (NIMA); Blur (Laplacian variance + CNN); Exposure (CNN regressor); Noise (CNN regressor). +- **Similarity** — Near-duplicate / burst (pHash/dHash + CNN); Semantic near-duplicate (canonical Semantic Search embeddings + ANN); Best-shot selection (quality models combined). +- **Video** — Shot/scene boundary (TransNet v2, PySceneDetect); Highlight extraction (temporal attention + quality score); Action recognition (VideoMAE, TimeSformer). +- **Categorization** — NSFW (OpenCLIP or custom CNN); Violence / graphic content (ViT classifier), e.g. for shared-album flagging. + +[^semantic-alt]: Considered and rejected: SigLIP-so400m (~400M params, impractical on the lowest-end mobile we support — the E2EE constraint forces every device to run the same model), full CLIP ViT-L/14 (similar size class), OpenCLIP ViT-G (much larger). MobileCLIP-B is the size sweet spot; quantized SigLIP-tiny stays as a fallback if MobileCLIP semantic quality is insufficient in field tests. +[^objdet-alt]: Considered and rejected for the *committed* slot: Grounding DINO (open-vocabulary; heavier; revisit if dense-tagging breadth becomes the bottleneck), RT-DETR (transformer-based; comparable accuracy, slower on mobile). YOLOv10 is the committed choice; alternatives may run as additional specialized passes later. +[^asr-alt]: Considered and rejected: Whisper-large-v3 (best accuracy but too slow on mobile for opportunistic background transcription), Whisper-medium (similar speed to Distil-Whisper-large but worse accuracy), faster-whisper CT2 ports (a runtime optimization layer; can be applied on top of Distil-Whisper). + +### Key Algorithmic Implementations + +#### Video-as-Sparse-Photos + +Processing every frame through heavy models is prohibitive, so video is treated as a sparse collection of keyframes: + +1. **Cut Detection:** PySceneDetect (content-aware) chunks the video into visually distinct scenes. +2. **Temporal Sampling:** extract frames at the 10%, 50%, and 90% timestamps of each scene. +3. **Blur Rejection:** compute the variance of the Laplacian $V = \text{var}(\nabla^2 I)$; frames below a threshold are discarded as too blurry. +4. **Audio Processing:** run the canonical ASR model (the **Voice Transcription** row) concurrently for a timestamped transcript. +5. **Integration:** surviving keyframes enter the standard image queue; records map keyframe embeddings to the parent `video_id` and timestamp. + +#### Re-ID & Pseudo-Labeling + +Identifies individuals even when they turn away from the camera during an event: + +1. **Anchor Pass:** on a high-confidence frontal face, run InsightFace; if it matches a known profile (e.g. "Bride"), record the bounding box. +2. **Body Pass:** run YOLOv10 to find "person" boxes; pass crops through OSNet for a 512-dim body embedding. +3. **Linking:** if the face/body box IoU $> 0.7$, link the body embedding to the profile for this event. +4. **Pseudo-Labeling:** for a person facing away, compare the body embedding against event-specific embeddings via cosine similarity $\text{sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|}$; above threshold, apply the label. + +#### High-Dimensional Vector Search + +Exact KNN is too slow at millions of rows: use **HNSW** indexes on the vector columns, and the inner-product operator (`<#>`) for normalized embeddings (cheaper than $L_2$ or cosine at scale). + +## Validation + +- **Registry lookup (unit).** Each canonical `model_id` matches exactly one inventory row; non-canonical IDs are rejected at the insert API. +- **Stale-model rejection / version bump (unit).** Swap a `model_id`/bump `model_version`; assert pre-swap entries are flagged stale and excluded from queries, and background regen replaces them per-asset (not via global truncate). +- **Embedding-provenance round-trip (unit).** Insert an embedding tagged `(model_id, model_version)`; query; assert the tuple is preserved. +- **Namespace separation (unit).** Promote an AI tag to a user tag; assert the user-tag entry has a fresh user-scoped `add_id` and the AI entry remains separately editable. +- **Inference parity across devices (smoke per platform).** Run the canonical Semantic Search model on two devices over the same fixture; assert vectors are byte-identical (quantization-permitting). On a platform that cannot reach bit-exactness, assert the platform-partition fallback engages rather than silently merging incomparable vectors. +- **Algorithm correctness (smoke).** Video-as-sparse-photos selects keyframes at expected timestamps; the Re-ID loop produces expected per-event pseudo-labels. +- **Thermal throttle / batching bound (smoke).** Past the temperature threshold the pipeline pauses and resumes after cooldown; on a low-memory testbed micro-batch sizes stay within the ceiling and OOM never occurs. + +Two bounded E2E cases live in [Module Map](/design/module-map/#e2e-test-surface): index an asset → query semantically → match, and model regen after a version bump rebuilding the index. diff --git a/capsule-docs/src/content/docs/design/authentication.md b/capsule-docs/src/content/docs/design/authentication.md index bb4f664..caac1e9 100644 --- a/capsule-docs/src/content/docs/design/authentication.md +++ b/capsule-docs/src/content/docs/design/authentication.md @@ -1,90 +1,104 @@ --- title: Authentication -description: Authentication design +description: Identity, account portability, session and access tokens --- -Authentication is executed with a few key principles: +Authentication binds a user identity to their master key, which is the root of every encryption and decryption operation in Capsule. The server can prove "this request is from a session it issued" but cannot prove "this user is who they say they are" — the master key, owned client-side, is the actual identity root. Everything below works to keep that binding intact through the lifetime of a session and across server moves. -- Minimal surface: We implement the full OpenID Connect specification so identity is offloaded to an external provider. -- Cryptographic binding: We cryptographically bind the user's identity to their master key, which is the root of all encryption and decryption operations. This ensures that only authenticated users can access their encrypted assets, and the server never has access to the plaintext master key. +Implemented in `capsule-api-auth`: OIDC handler (`oidc`), session ledger (`session`), claim validation (`claims`), per-device records (`devices`). The session token format and the OIDC discovery surface below are the contracts other components — including federated peers — depend on. -## Authentication API +## Design Principles -We have a few parts: - -- OpenIDC endpoints: These facilitate authentication flows -- Identity and discovery: We expose standardized endpoints for clients to discover the authentication capabilities and endpoints of the server. See [Identity and Discovery](#identity-and-discovery) for details. -- Session management: Clients are given a permanent session secret (note this is not a JWT token) which permanently identifies the client to the server. See [Session Management](#session-management) for details. +- **Minimal surface.** The full OpenID Connect specification is implemented so identity is offloaded to an external provider where the user prefers it. +- **Cryptographic binding.** The user's identity is cryptographically bound to their master key. The server never sees the plaintext master key. ## Account Types -- **Registered accounts:** These accounts are associated with a unique identity and have their own master key. They can be authenticated using password+TOTP or passkeys, which cryptographically bind the user to their master key. -- **Delegated/Sponsored accounts:** These accounts are encrypted with keys derived from a registered account's master key. They do not have their own identity and rely on the registered account for authentication and key management. Owners of the sponsored account have full access to the sponsored account. -- **Non-registered accounts:** These accounts do not have an associated identity or master key. They are typically used for share links, where the decryption keys are encapsulated around the secret stored. +- **Registered accounts.** Associated with a unique identity and have their own master key. Authenticated using password+TOTP or passkeys, which cryptographically bind the user to their master key. +- **Delegated/sponsored accounts.** Encrypted with keys derived from a registered account's master key. They do not have their own identity and rely on the registered account for authentication and key management. Owners of the sponsored account have full access. See [Cryptography — Keys: Delegated/Sponsored accounts](/design/cryptography/keys/#delegatedsponsored-accounts) for the key derivation. +- **Non-registered accounts.** No associated identity or master key — typically used for [share links](/design/share-links/), where the decryption keys are encapsulated around the secret stored in the link. ## Identity and Discovery -We borrow from Matrix 2.0's patterns, with one critical departure: **`.well-known/` never enumerates the user list**. A federated setting where a peer can list every user on a server is unacceptable — both from an abuse-surface perspective (spam, harassment-target discovery, account-enumeration attacks) and a privacy perspective. - - +Patterns borrowed from Matrix 2.0, with one critical departure: **`.well-known/` never enumerates the user list**. A federated setting where a peer can list every user on a server is unacceptable — both from an abuse-surface perspective (spam, harassment-target discovery, account-enumeration attacks) and a privacy perspective. -- All users have a handle like `user@yourserver.tld` (this resembles Matrix's MXID pattern). -- `.well-known/capsule/server-info` is **public** and returns only server-scoped facts: the API base URL, the auth endpoints, the federation endpoint, the server's signing key, supported `protocol_version` range, and a list of `min_protocol_version` cutoffs for active deprecation windows. It **never** returns a user list. -- **User lookup is authenticated.** A client or a peer server must present credentials to resolve `user@server.tld`: +- All users have a handle like `user@yourserver.tld` (resembling Matrix's MXID pattern). +- `.well-known/capsule/server-info` is **public** and returns only server-scoped facts: the API base URL, auth endpoints, the federation endpoint, the server's signing key, supported `protocol_version` range, and `min_protocol_version` cutoffs for active deprecation windows. It **never** returns a user list. +- **User lookup is authenticated.** A client or peer server must present credentials to resolve `user@server.tld`: - **Local client lookup** (resolving another user on the same server, e.g. for sharing): authenticated by the looker's session token. - **Federated peer lookup** (resolving a user across servers): authenticated by a federation capability token (see [Federation — Federation Capabilities](/design/federation/#federation-capabilities)) and rate-limited per peer. - - **Anonymous WebFinger**: returns only records the target user has explicitly opted into making public. The default is opt-out: no anonymous record. This is deliberately stricter than Matrix's default and follows the [deny-by-default rule](/design/threat-model/#schema-evolution-and-field-grammar) from the threat model. + - **Anonymous WebFinger**: returns only records the target user has explicitly opted into making public. The default is opt-out: no anonymous record. This is deliberately stricter than Matrix's default and follows the [deny-by-default rule](/design/threat-model/schema-rules/#schema-evolution-and-field-grammar) from the threat model. ## Account Portability -A user must be able to move servers without losing their identity. Capsule does **not** need a separate DID system for this: the user identity key (User IK — see [Key Management](/design/cryptography/#user-identity-keys-user-iks)) is *already* a server-independent root of trust. Only the `user@server.tld` handle is host-bound. +A user must be able to move servers without losing their identity. Capsule does **not** need a separate DID system: the user identity key (User IK — see [Cryptography — Keys](/design/cryptography/keys/#user-identity-keys-user-iks)) is *already* a server-independent root of trust. Only the `user@server.tld` handle is host-bound. -Migration therefore re-homes the handle while keeping the same IK: +Migration re-homes the handle while keeping the same IK: -- The new server registers the account under the same IK; nothing in the [key hierarchy](/design/cryptography/#key-management) changes. -- The old server publishes an IK-signed **"moved" certificate** at its `.well-known/` path, naming the new handle. This is the one well-known record that names a specific user — it is also opted-into (the user actively migrates) and carries the user's own signature, so it does not constitute the kind of enumeration leak we forbid. -- Clients and [federated](/design/federation/) peers that resolve the old handle follow the certificate, verifying the IK signature, and update to the new location. +- The new server registers the account under the same IK; nothing in the [key hierarchy](/design/cryptography/keys/) changes. +- The old server publishes an IK-signed **"moved" certificate** at its `.well-known/` path, naming the new handle. This is the one well-known record that names a specific user — opted-into (the user actively migrates) and carrying the user's own signature, so it does not constitute the kind of enumeration leak we forbid. +- Clients and [federated](/design/federation/) peers that resolve the old handle fetch this certificate, verify its IK signature, and re-resolve to the new handle it names. Because the IK signs the move and every device cross-signs to that IK, no server — old or new — can forge a migration or hijack the handle. -## Session Management +## Session and Access Tokens + +These are the two token shapes consumers depend on. Both are issued by `capsule-api-auth::session` after a successful authentication ceremony. ### Session ID -All sessions are identified by a session ID with an associated [session token](#session-tokens). The session ID is a UUIDv7 that is generated by the server upon successful authentication and is used to track the session state and associated metadata. +Sessions are identified by a UUIDv7 generated by the server upon successful authentication. It tracks session state and associated metadata. + +### Session Token + +A long-lived **128-bit secret** generated by the server upon successful authentication and stored securely on the client. It is **not a JWT** — it is an opaque bearer secret. The session token's only purpose is to obtain [access tokens](#access-token) for API requests. -### Session Tokens +### Access Token -Session tokens are a long-lived 128-bit secret that is generated by the server upon successful authentication and stored securely on the client. The session token is used to obtain an [access token](#access-tokens) for more frequent API requests. +Short-lived tokens derived from the session token, used to authenticate API requests. They have a limited lifespan and are refreshed using the session token without re-authenticating the user. -### Session Expiry and Revocation +Capsule uses **EdDSA JWTs** as access tokens, signed under the server's Ed25519 signing key — classical only, per the [operational-signature carve-out](/design/cryptography/primitives/#signature-scheme) (access tokens are short-lived, so PQ hybridization buys no margin). + +## Session Expiry and Revocation Sessions expire in two ways: **sliding inactivity expiry** (automatic) and **explicit revocation** (user-initiated). They coexist; either causes the session token to stop being honored. -#### Sliding inactivity expiry +### Sliding Inactivity Expiry -A session that has not been used for **180 days** (default; deployment-configurable) expires automatically. "Used" means a successful [access-token](#access-tokens) issuance against the session token — each issuance refreshes the inactivity clock. This bounds the lifetime of a session on a device the user has forgotten about (a phone in a drawer, a laptop given to a relative) without forcing re-authentication on actively-used devices. +A session that has not been used for **180 days** (default; deployment-configurable) expires automatically. "Used" means a successful [access-token](#access-token) issuance against the session token — each issuance refreshes the inactivity clock. This bounds the lifetime of a session on a device the user has forgotten about (a phone in a drawer, a laptop given to a relative) without forcing re-authentication on actively-used devices. -#### Hard expiry +### Hard Expiry -In addition to the sliding inactivity expiry, every session token has a **hard expiry of 365 days** from issuance (default; deployment-configurable). The hard expiry **does not reset** on use — it is the upper bound on the lifetime of a token regardless of activity. +Every session token has a **hard expiry of 365 days** from issuance (default; deployment-configurable). The hard expiry **does not reset** on use — it is the upper bound on the lifetime of a token regardless of activity. -The rationale is the malicious-keyholder class from [Threat Model — Client Class Taxonomy](/design/threat-model/#client-class-taxonomy): an attacker who silently exfiltrates a session token from a device the user actively uses would otherwise have an indefinite window of access. The hard expiry caps that window at one year; the user re-authenticates (passkey / password+TOTP) at most once a year per device, which is acceptable friction in exchange for a bounded leak-window. +The rationale is the malicious-keyholder class from [Threat Model — Client Class Taxonomy](/design/threat-model/#client-class-taxonomy): an attacker who silently exfiltrates a session token from a device the user actively uses would otherwise have an indefinite window of access. The hard expiry caps that window at one year; the user re-authenticates (passkey / password+TOTP) at most once a year per device — acceptable friction in exchange for a bounded leak-window. Both expiries are enforced server-side at access-token issuance; the session token itself is not invalidated for any other reason than these expiries or an explicit revoke. -#### Explicit revocation +### Explicit Revocation -A common user session ledger is used with the following capabilities for any authenticated sessions: +A common user session ledger supports: -1. List all active sessions (with last-used timestamp, so an expiring session is visible). +1. **List all active sessions** (with last-used timestamp, so an expiring session is visible). 2. **Revoke any single session** by invalidating its session token — authenticated by any active session token. -3. **Revoke all sessions at once** (e.g. "log out of all devices") — authenticated by **proof of master-key possession** (a signature with the user's IK over a server-issued challenge), not by an active session token. +3. **Revoke all sessions at once** ("log out of all devices") — authenticated by **proof of master-key possession** (a signature with the user's IK over a server-issued challenge), not by an active session token. The asymmetric authentication on (3) addresses a damage scenario that pure session-token auth opens up: an attacker holding a stolen session token could otherwise invoke "log out of all devices" and lock the legitimate user out of every other device. Requiring master-key proof for the global revoke means an attacker with a session token can only revoke *that* session — they cannot escalate to denial-of-service. A user who has lost their master key is no worse off: they can still revoke individual sessions one at a time. The single-session revoke (2) is the everyday tool; the global revoke (3) is the nuclear option, gated accordingly. -Note: Server can theoretically just kick off sessions because session tokens are stored server-side and server holds the encrypted data. But this should not ever be implemented and an attempt to do so would be a bug — it bypasses the audit trail of a user-initiated revoke. +Note: the server can theoretically just kick off sessions because session tokens are stored server-side and the server holds the encrypted data. But this should not ever be implemented and an attempt to do so would be a bug — it bypasses the audit trail of a user-initiated revoke. + +## Validation + +- **Token issuance round-trip (unit).** Generate a session token; issue an access JWT from it; verify the JWT under the server's Ed25519 key. Repeat with rotated keys; assert old JWTs verify under the old key for their grace window. +- **Expiry enforcement (unit).** Mock the clock; assert sliding expiry refreshes on use, hard expiry does not. Assert an expired token is rejected at access-token issuance, not earlier or later. +- **Revoke-all master-key proof (unit).** Issue a revoke-all without master-key proof; assert rejection. With proof; assert success and invalidation of every other session. +- **Login flow (smoke).** Full OIDC handshake against a testcontainer IdP; assert session token issued, persisted, and usable for an immediate access-token request. Re-run after a server restart; assert resilience. +- **Account portability (smoke).** Issue a moved certificate from server A; assert server B can register the same IK; assert federated peers honor the move after fetching A's well-known. + +The cross-module case — auth → query library schema — is one bounded E2E test listed in [Module Map](/design/module-map/#e2e-test-surface). -### Access Tokens +## Related -Access tokens are short-lived tokens derived from the session token that are used for authenticating API requests. They have a limited lifespan and can be refreshed using the session token without requiring the user to re-authenticate. Capsule uses **EdDSA JWTs** as access tokens, signed under the server's [Ed25519 signing key from the cryptographic primitives inventory](/design/cryptography/#signature-scheme) (classical half only — access tokens are short-lived enough that PQ hybridization is not worth the wire-size cost). +- [Authorization](/design/authorization/) — the closed lifecycle-action set every write proves against. +- [Device Enrollment](/design/device-enrollment/) — how a device joins the account and the [device directory](/design/cryptography/keys/#device-directory). +- [Backup & Recovery](/design/backup-recovery/) — recovering the master key and account after device loss. diff --git a/capsule-docs/src/content/docs/design/authorization.md b/capsule-docs/src/content/docs/design/authorization.md index 1156297..f014d7d 100644 --- a/capsule-docs/src/content/docs/design/authorization.md +++ b/capsule-docs/src/content/docs/design/authorization.md @@ -1,50 +1,46 @@ --- title: Authorization -description: Ensuring access is done by someone authorized +description: The closed lifecycle-action set and how every destructive operation is signed and audited --- -We want to pull out all authorization-related logic (validated by both server and client) into a centralized core to minimize implementation risks and isolating sensitive code to enforce authorization end-to-end. Both server and client validate against the same core, so a client cannot be tricked into accepting an operation the server would reject, and vice versa. +Authorization in Capsule is **the same proof as a write**: every lifecycle transition — create, replace, delete, metadata-update, derivative add/replace, trash-restore — is an [asset manifest](/design/cryptography/provenance/#asset-manifest) signed under the album's per-epoch write-tier key. There is no weaker path to destroy data than to add it. -## Asset Lifecycle +This rule pulls authorization decisions out of any single trust boundary: the server can refuse to execute (it cannot forge destruction), and the client can refuse to apply (it cannot be tricked by a server-asserted change). The logic lives in two places that share the same verification machinery: `capsule-api-auth::roles` enforces structural envelope checks server-side, and `capsule-core::crypto::provenance` runs the [`verify_asset`](/design/cryptography/keys/#write-authorization) chokepoint client-side. Both pull from the same closed action enum below. -**Key Problem:** Clients may want to destructively delete or replace assets, which servers must execute remotely. We want robust, centralized control over the lifecycle of every asset. +## The Closed Action Set -Capsule treats every lifecycle transition as an authorized, signed, auditable operation. The design reuses the cryptographic machinery already defined for asset writes rather than inventing a parallel mechanism. +Every lifecycle operation's `action` field is one of the following **closed enum**. A value outside this set is a structural error, never a "future value to ignore" — see [Threat Model — Schema Rules](/design/threat-model/schema-rules/#closed-enums): -### The Closed Action Set +| Action | Meaning | +| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | +| `create` | First write of an asset; `prior_provenance_hash` is `null`. | +| `replace` | Replace the original bytes (e.g. re-encryption under a new AMK epoch); same `file_id`/`album_id`, new ciphertext + manifest. | +| `delete` | Soft-delete; the asset enters trash with a [retention window](/design/organization/#retention-window). | +| `metadata-update` | Edit to the encrypted metadata blob or sidecar fields. | +| `derivative-add` | Add a thumbnail, preview, or embedding (see [Cryptography — Derivative Provenance](/design/cryptography/provenance/#derivative-provenance)). | +| `derivative-replace` | Replace an existing derivative — the only authorized path; a silent overwrite is rejected. | +| `trash-restore` | Recover a soft-deleted asset from trash within its retention window. | -Every lifecycle operation is expressed as an [asset manifest](/design/cryptography/#provenance-and-signed-manifest) whose `action` field is one of the following **closed enum** (a value outside this set is a structural error, never a "future value to ignore" — see [Threat Model — Schema Evolution and Field Grammar](/design/threat-model/#schema-evolution-and-field-grammar)): +Adding a value to this enum bumps `protocol_version` and old albums remain pinned to their original set — a faulty or new client cannot inject an unknown action into a v_k album. -| Action | Meaning | -| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | -| `create` | First write of an asset; `prior_provenance_hash` is `null`. | -| `replace` | Replace the original bytes (e.g. a re-encryption under a new AMK epoch); identity preserved. | -| `delete` | Soft-delete; the asset enters trash with a [retention window](/design/organization/#recycling). | -| `metadata-update` | Edit to the encrypted metadata blob or sidecar fields. | -| `derivative-add` | Add a thumbnail, preview, LQIP, or embedding (see [Cryptography — Derivative Provenance](/design/cryptography/#derivative-provenance)). | -| `derivative-replace` | Replace an existing derivative — the only authorized path; a silent overwrite is rejected. | -| `trash-restore` | Recover a soft-deleted asset from trash within its retention window. | - -Adding a value to this enum bumps `protocol_version` and the old albums remain pinned to their original set — a faulty or new client cannot inject an unknown action into a v_k album. - -### Authorizing a lifecycle operation +## Authorizing a Lifecycle Operation Authorization is established exactly as for a write: - The operation must carry a valid signature under the album's per-epoch **write-tier key** — only writers at that epoch hold it. - It must also carry the device's hybrid `device_sig` for provenance. -- A client acknowledges the operation only after **both** signatures verify through the single [`verify_asset`](/design/cryptography/#write-authorization) chokepoint. -- The manifest's `prior_provenance_hash` must match the asset's current chain head — a stale or forked chain position is rejected (see [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)). This applies uniformly to every action except `create`. +- A client acknowledges the operation only after **both** signatures verify through the single [`verify_asset`](/design/cryptography/keys/#write-authorization) chokepoint. +- The manifest's `prior_provenance_hash` must match the asset's current chain head — a stale or forked chain position is rejected (see [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications)). This applies uniformly to every action except `create`. A `delete` or `replace` is therefore authorized by the same proof as the original `create`: there is no weaker path to destroy data than to add it. Similarly, a `derivative-replace` is authorized as strongly as the original `derivative-add` — a buggy client cannot quietly poison a thumbnail. -### The server executes but never authorizes +## The Server Executes But Never Authorizes -Per the principle of [trusting the server for storage, never for authorization](/design/cryptography/#implementation), the server **carries out** a remote delete or replace but is **never** the authority that permits it. A server-asserted lifecycle change with no valid write-tier signature is rejected by every client. This bounds the damage a compromised or buggy server can do: it can refuse to store data, but it cannot forge its destruction. +Per the principle of [trusting the server for storage, never for authorization](/design/cryptography/), the server **carries out** a remote delete or replace but is **never** the authority that permits it. A server-asserted lifecycle change with no valid write-tier signature is rejected by every client. This bounds the damage a compromised or buggy server can do: it can refuse to store data, but it cannot forge its destruction. -That said, the server is not *passive*. Even without keys, it enforces the structural envelope of every manifest before persisting it — `action` is in the closed enum, `prior_provenance_hash` matches the stored chain head, `created_by_device` is in the user's published device directory, the device's hybrid signature is structurally well-formed (correct curve, correct key lengths), `crypto_suite_id` and `protocol_version` match the album's pin, and the timestamp is within the ±30-day window. The full checklist is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants). A rejection here means no row is written and no provenance record is appended; the rejection itself is logged. +That said, the server is not *passive*. Even without keys, it enforces the structural envelope of every manifest before persisting it — `action` is in the closed enum, `prior_provenance_hash` matches the stored chain head, `created_by_device` is in the user's published device directory, the device's hybrid signature is structurally well-formed (correct curve, correct key lengths), `crypto_suite_id` and `protocol_version` match the album's pin, and the `timestamp` passes the [sanity bound](/design/threat-model/schema-rules/#timestamp-grammar). The full checklist is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants). A rejection here means no row is written and no provenance record is appended; the rejection itself is logged. -### Deletes are soft first +## Deletes Are Soft First Destructive operations are staged, not immediate: @@ -52,12 +48,21 @@ Destructive operations are staged, not immediate: - The retention window is **signed into the delete manifest at delete time**, not server-configured, so a hostile server cannot accelerate or delay a user-configured window (see [Asset Organization — Recycling](/design/organization/#recycling)). - Only after the window expires is the underlying blob hard-purged. A `trash-restore` action issued before expiry returns the asset to the live set and appends another provenance record — recovery is itself audited. -This is the [trash soft-delete recovery path](/design/cryptography/#failure-modes-and-recovery) and gives a reversal window for both buggy and erroneous deletes. +This is the [trash soft-delete recovery path](/design/cryptography/failure-modes/#redundant-recovery-paths) and gives a reversal window for both buggy and erroneous deletes. + +## Every Transition Is Auditable + +Each lifecycle operation emits a [provenance record](/design/cryptography/provenance/#provenance-of-library-modifications) — timestamp, device, client version, and action — anchored by the signed manifest. The chain is **append-only** (see [Threat Model — Provenance Immutability Rules](/design/threat-model/scenarios/#provenance-immutability-rules)): even an attacker holding every current key cannot rewrite a past record. This audit trail is what lets an operator distinguish a legitimate delete from a malicious or bug-induced one after the fact. + +## Federated Peers -### Every transition is auditable +A lifecycle operation arriving from a [federated](/design/federation/) peer is subject to the same `verify_asset` check plus the server's structural envelope check; peer-asserted ordering and timestamps are never trusted for authorization. Peer attempts at [stale revival](/design/import/download-sync/#stale-revival-detection) — submitting an old-but-validly-signed manifest to resurrect a deleted asset — are caught by the `prior_provenance_hash` chain check and quarantined. -Each lifecycle operation emits a [provenance record](/design/cryptography/#provenance-of-library-modifications) — timestamp, device, client version, and action — anchored by the signed manifest. The chain is **append-only** (see [Threat Model — Provenance Immutability Rules](/design/threat-model/#provenance-immutability-rules)): even an attacker holding every current key cannot rewrite a past record. This audit trail is what lets an operator distinguish a legitimate delete from a malicious or bug-induced one after the fact. +## Validation -### Federated peers +- **Per-action signing/verify (unit).** Each of the seven actions gets a unit test: build a manifest of that action, sign with the correct (device DSK, epoch write-tier) pair, run `verify_asset`, assert acceptance. Then build the same with the wrong write-tier key, wrong device, missing `prior_provenance_hash`, wrong `prior_provenance_hash`; assert rejection with the right structural code. +- **Closed-enum rejection (unit).** Submit a manifest with `action = "future-action-not-yet-defined"`; assert structural rejection at the envelope layer. +- **Stale-chain detection (unit).** Build a delete-then-restore chain; submit a second delete with a stale `prior_provenance_hash`; assert quarantine. +- **Server-side envelope (smoke).** All [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants) items 16–18 (non-upload action manifests) exercised against a real Postgres. -A lifecycle operation arriving from a [federated](/design/federation/) peer is subject to the same `verify_asset` check plus the server's structural envelope check; peer-asserted ordering and timestamps are never trusted for authorization. Peer attempts at [stale revival](/design/import-synchronization/#stale-revival-detection) — submitting an old-but-validly-signed manifest to resurrect a deleted asset — are caught by the `prior_provenance_hash` chain check and quarantined. +The cross-module case — full lifecycle (create → metadata-update → trash → restore → re-delete → hard-purge after retention) across server + client — is bounded E2E surface in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/backup-recovery.md b/capsule-docs/src/content/docs/design/backup-recovery.md index 24451da..b247b79 100644 --- a/capsule-docs/src/content/docs/design/backup-recovery.md +++ b/capsule-docs/src/content/docs/design/backup-recovery.md @@ -1,53 +1,71 @@ --- title: Backup and Recovery -description: How Capsule backs up libraries and recovers them after device or key loss +description: The portable backup artifact, the master-key escrow, and the recovery flows --- -Capsule treats loss of data — and loss of the keys that decrypt it — as a first-class failure mode to design against. Recovery rests on a single rule: -holding the recovery secret must restore every asset, even after every device is lost. This document consolidates the artifacts and mechanisms that uphold it. +Capsule treats loss of data — and loss of the keys that decrypt it — as a first-class failure mode. Recovery rests on a single rule: holding the recovery secret must restore every asset, even after every device is lost. This document defines the two artifacts and the mechanisms that uphold it. Two distinct things are called a "backup" here, and they are kept separate on purpose: -- The **encrypted backup artifact** — a portable, encrypted export of a library's assets. -- The **master-key escrow** — a small server-side blob that lets a passphrase reconstruct the key hierarchy. +- The **[backup artifact](#backup-artifact)** — a portable, encrypted export of a library's assets. +- The **[master-key escrow](#master-key-escrow)** — a small server-side blob that lets a passphrase reconstruct the key hierarchy. + +The artifact format is the contract that backup-restore implementations on every platform must conform to byte-for-byte (else a backup made on one device could not be restored on another). Implemented in `capsule-core::backup` — export, container assembly, manifest signing, and the inverse restore path — and called by per-platform UI flows. ## Backup Artifact -A backup is a single self-describing, versioned, **streamable** archive containing everything needed to restore a library's assets. It is itself encrypted and kept independent of the device key hierarchy, so recovery does not depend on reconstructing MLS ratchet state (see [Cryptography](/design/cryptography/#failure-modes-and-recovery)). +A backup is a single self-describing, versioned, **streamable** archive containing everything needed to restore a library's assets. It is itself encrypted and kept independent of the device key hierarchy, so recovery does not depend on reconstructing MLS ratchet state (see [Cryptography — Failure Modes](/design/cryptography/failure-modes/)). -A backup is an export artifact — not part of the live library or the server blob store — and may be stored locally or on external storage such as hard drives or cloud storage. It is used to restore assets after data loss or when setting up a new device. The format is versioned to allow future improvements and changes without breaking older backups. +A backup is an export artifact — not part of the live library or the server blob store — and may be stored locally or on external storage such as hard drives or cloud storage. It is used to restore assets after data loss or when setting up a new device. The format is versioned to allow future improvements without breaking older backups. ### Container Format The container is an **uncompressed POSIX tar** with deterministic entry ordering and a top-level signed integrity manifest: -- **Uncompressed.** Asset ciphertext is incompressible (it's the output of [AES-256-GCM-STREAM](/design/cryptography/#bulk-aead)); compressing it buys nothing and adds CPU cost. Metadata blobs are likewise encrypted before they hit the archive, so the same applies. +```text +backup.tar +├── VERSION # plaintext: artifact-format version, crypto_suite_id, min_protocol_version +├── MANIFEST.cbor # CBOR: entry list, hashes, sizes, exporter identity; HMAC + hybrid signature +├── keys/ +│ └── amk-ledger.cbor # every album's AMK versions needed to decrypt the included assets, +│ # wrapped under the backup wrap key (derived from the recovery passphrase) +└── + ├── blobs/{hash} # encrypted ciphertext blobs + ├── meta/{blob_id} # encrypted metadata blobs + └── provenance/{asset_id} # full per-asset provenance chains +``` + +The artifact carries its own [AMK](/design/cryptography/keys/#album-master-keys-amks) ledger so it is **self-sufficient**: a holder of the recovery passphrase can decrypt `amk-ledger.cbor` and then every included blob, without contacting the server or reconstructing MLS ratchet state. The ledger is wrapped under the backup wrap key (same passphrase-derived key that authenticates `MANIFEST.cbor`), not under any device key — this is what makes the [AMK-completeness check](#backup-verification) a check the artifact can answer about *itself* rather than a promise about a separate server-side blob. + +The container properties below are what make it both safe and portable: + +- **Uncompressed.** Asset ciphertext is incompressible (it is the output of [AES-256-GCM-STREAM](/design/cryptography/primitives/#bulk-aead)); compressing it buys nothing and adds CPU cost. Metadata blobs are likewise encrypted before they hit the archive, so the same applies. - **Streamable.** Tar is append-friendly and has no central directory, so a backup of arbitrary size can be written and read end-to-end without seeking — important when exporting a terabyte-scale library to spinning rust or an external drive. - **Deterministic ordering.** Entries are written in sorted order by `(album_id, asset_id, blob_role)`, so two exports of the same logical content produce byte-identical archives. This lets the integrity manifest's signature verify across re-exports. -- **Top-level integrity manifest.** The first entry is `MANIFEST.cbor` — a CBOR document listing every entry's path, [content hash](/design/cryptography/#primitives-inventory), declared size, and the exporting device's identity. The manifest is authenticated **two ways**: - - An **HMAC** keyed by the backup's wrap key (derived from the user passphrase via the [password-based KDF](/design/cryptography/#password-based-kdf)) catches truncation, reordering, and corruption *before* any decrypt is attempted. - - A **hybrid Ed25519 + ML-DSA-65 signature** from the exporting device's [DSK](/design/cryptography/#device-keys) — the same [signature scheme](/design/cryptography/#signature-scheme) used for asset manifests. The signature defeats a symmetric-key attacker who could otherwise re-HMAC after tampering: an attacker who steals the wrap key can re-HMAC but cannot forge the device signature. - Both checks must pass before restore proceeds. The signing device must be present in the user's [device directory](/design/cryptography/#per-user-device-coordination) at restore time; an exporter device that was later revoked is rejected. -- **Versioned.** A `VERSION` entry pins the artifact format version, `crypto_suite_id`, and `min_protocol_version` per [Versioning](/design/versioning/) and [Cryptography — Versioning Identifiers](/design/cryptography/#versioning-identifiers). Older backup artifacts remain restorable by newer Capsule versions; an artifact whose `crypto_suite_id` is not in the current inventory is rejected at restore (per [Threat Model — Schema Evolution](/design/threat-model/#schema-evolution-and-field-grammar)). +- **Top-level integrity manifest.** Written before any blob entry (right after the tiny `VERSION` header), `MANIFEST.cbor` lists every entry's path, [content hash](/design/cryptography/primitives/), declared size, and the exporting device's identity — so a streaming reader holds the full integrity list before the first blob arrives. The manifest is authenticated **two ways**: + - An **HMAC** keyed by the backup's wrap key (derived from the user passphrase via the [password-based KDF](/design/cryptography/primitives/#password-based-kdf)) catches truncation, reordering, and corruption *before* any decrypt is attempted. + - A **hybrid Ed25519 + ML-DSA-65 signature** from the exporting device's [DSK](/design/cryptography/keys/#device-keys) — the same [signature scheme](/design/cryptography/primitives/#signature-scheme) used for asset manifests. The signature defeats a symmetric-key attacker who could otherwise re-HMAC after tampering: an attacker who steals the wrap key can re-HMAC but cannot forge the device signature. + Both checks must pass before restore proceeds. The signing device must be present in the user's [device directory](/design/cryptography/keys/#device-directory) at restore time; an exporter device that was later revoked is rejected. +- **Versioned.** The `VERSION` entry pins the artifact format version, `crypto_suite_id`, and `min_protocol_version` per [Versioning](/design/versioning/) and [Cryptography — Versioning Identifiers](/design/cryptography/primitives/#versioning-identifiers). Older backup artifacts remain restorable by newer Capsule versions; an artifact whose `crypto_suite_id` is not in the current inventory is rejected at restore (per [Threat Model — Schema Rules](/design/threat-model/schema-rules/)). ZIP was considered and rejected: its central-directory-at-end makes streaming writes awkward at terabyte scale, ZIP64 tooling support is inconsistent, and there is no compression benefit to gain from ZIP-internal deflate. ## Master-Key Escrow -The account master key is the single backed-up root of the key hierarchy (see [Cryptography](/design/cryptography/#key-management)). It is escrowed server-side so a user holding only their recovery secret can reconstruct it: +The account master key is the single backed-up root of the key hierarchy (see [Cryptography — Keys](/design/cryptography/keys/)). It is escrowed server-side so a user holding only their recovery secret can reconstruct it: - Wrap the account master key with a user-chosen high-entropy passphrase or a randomly generated 48+ bit recovery code. -- Derive the wrapping key with the [password-based KDF](/design/cryptography/#password-based-kdf). Store the wrapped blob server-side. +- Derive the wrapping key with the [password-based KDF](/design/cryptography/primitives/#password-based-kdf). Store the wrapped blob server-side. - If you can run enclaves (SGX/Nitro/SEV-SNP), do Signal's SVR trick: rate-limit PIN attempts inside the enclave so a weak PIN is still safe. Without enclaves, require a real passphrase or recovery code — don't let users pick 4-digit PINs. ## Recovery Mechanisms -Two recovery mechanisms ship by default; a third is available opt-in for users who want extra redundancy without compromising the default's simplicity. +Two recovery mechanisms ship by default; a third is available opt-in for users who want extra redundancy without compromising the default's simplicity. These complement the [seven independent recovery paths](/design/cryptography/failure-modes/#redundant-recovery-paths); this section names the mechanisms a user actually invokes. -### Default mechanisms +### Default Mechanisms - **Recovery passphrase / BIP39-style seed** shown at setup; the user prints it or stores it in a password manager. It unwraps the master-key escrow above. -- **Cross-device recovery** — any existing signed-in device can re-bootstrap a new one over a verified channel. +- **Cross-device recovery** — any existing signed-in device can re-bootstrap a new one over a verified channel. The first-device-ever flow is owned by [Device Enrollment](/design/device-enrollment/). (We need at least two for redundancy; the third below is opt-in to keep the default flow simple.) @@ -56,7 +74,7 @@ Two recovery mechanisms ship by default; a third is available opt-in for users w Users who want to spread recovery across trusted parties or storage locations can enable **Shamir Secret Sharing** of the recovery seed. The default scheme is **2-of-3**: - The recovery seed (the same one that unwraps the master-key escrow) is split into 3 shares; any 2 reconstruct the seed; 1 alone reveals nothing. -- Each share is itself wrapped with a per-share passphrase via the [password-based KDF](/design/cryptography/#password-based-kdf), so storing a share on a less-trusted medium (cloud drive, second device, trusted family member) is safer. +- Each share is itself wrapped with a per-share passphrase via the [password-based KDF](/design/cryptography/primitives/#password-based-kdf), so storing a share on a less-trusted medium (cloud drive, second device, trusted family member) is safer. - Reconstruction happens fully client-side. Capsule's server never sees more than one share at a time and never sees a reconstructed seed. - Custom `m`-of-`n` (e.g. 3-of-5 for users who want broader distribution) is supported but not the default. @@ -64,14 +82,31 @@ This is the social-recovery escape hatch — useful for users who would otherwis ## Backup Verification -A restore that overwrites live state silently is the worst foot-gun a backup system can ship. Capsule therefore makes **dry-run the default**: a `restore` invocation runs in dry-run mode unless the user passes an explicit `--commit` flag (or its UI equivalent: a confirm-with-typed-phrase dialog after the dry-run report is shown). The mode hierarchy is: +A restore that overwrites live state silently is the worst foot-gun a backup system can ship. Capsule therefore makes **dry-run the default**: a `restore` invocation runs in dry-run mode unless the user passes an explicit `--commit` flag (or its UI equivalent: a confirm-with-typed-phrase dialog after the dry-run report is shown). The mode hierarchy: - **Preview mode (always safe).** Verify the shape of your content makes sense — counts, sizes, asset titles where readable. No decrypt, no write. - **Dry-run mode (default for `restore`).** Verify everything can be decrypted, matches its hashes, and (as a sanity check) that images and videos decode properly in the [sandboxed decoder](/design/clients/#sandboxed-decoder). Compute the diff against the current live library: what would be added, what would conflict, what would be skipped as already present. No write. -- **Signature-chain verification.** Every [asset manifest](/design/cryptography/#provenance-and-signed-manifest) verifies against the published [device directory](/design/cryptography/#per-user-device-coordination), and every device certificate chains to a user IK. The MANIFEST.cbor itself must verify both HMAC and exporter signature (above). Any break is flagged and the restore is refused. -- **AMK completeness check.** Confirm every `amk_version` referenced by an asset is present in the backup, so no asset is silently unrecoverable. -- **Commit (only with explicit consent).** The user reviews the dry-run report and explicitly commits. Even at commit, the restore obeys the [stale-revival defense](/design/import-synchronization/#stale-revival-detection): a restored manifest whose `prior_provenance_hash` conflicts with the live library's current chain head goes to the [quarantine surface](/design/threat-model/#quarantine-surfaces) and the user resolves it explicitly. The interaction between backup restore and the stale-revival defense is flagged as an [open question](/design/threat-model/#open-questions) — the resolution will land here before the docs ship. +- **Signature-chain verification.** Every [asset manifest](/design/cryptography/provenance/#asset-manifest) verifies against the published [device directory](/design/cryptography/keys/#device-directory), and every device certificate chains to a user IK. The MANIFEST.cbor itself must verify both HMAC and exporter signature (above). Any break is flagged and the restore is refused. +- **AMK completeness check.** Decrypt `keys/amk-ledger.cbor` and confirm every `amk_version` referenced by any included asset is present in it, so no asset is silently unrecoverable. Because the ledger ships *inside* the artifact, this check is answerable from the artifact alone — it does not depend on a separate server-side escrow blob that could have drifted. +- **Commit (only with explicit consent).** The user reviews the dry-run report and explicitly commits. Even at commit, a restore **never silently overwrites newer local state** — it is a chain-reconciliation, not a blind overwrite. Each restored manifest is reconciled against the live library's `latest_provenance_hash` for its `asset_id`: + - **Identical head** → no-op; already current (restore is idempotent). + - **Live head chains *forward* from the restored head** → the live copy is newer; the older restored manifest is **not applied**. It is surfaced read-only ("an older version exists in this backup") so the user may deliberately roll back, but nothing is overwritten automatically. + - **Divergent, behind, or locally tombstoned at a later step** → **not applied**; the restored manifest goes to the ["restore conflicts" quarantine surface](/design/threat-model/scenarios/#quarantine-surfaces) for explicit user merge. A six-month-old backup therefore cannot resurrect an asset the user later deleted, nor clobber edits made after the backup was taken — directly addressing [Damage Scenario #23](/design/threat-model/scenarios/#damage-scenario--invariant-map). + - **Asset absent locally** → applied directly; the restored provenance chain becomes the local chain. + + This is the committed resolution of the former restore-vs-stale-revival question: the conservative default is that newer local state always wins unless the user explicitly chooses an older version, and no restore is ever silently destructive. ## Backup Provenance -The MANIFEST.cbor carries the exporter's device id, the export timestamp, the source library version, the `crypto_suite_id` at export time, and a list of every provenance-chain head per asset included in the backup. The MANIFEST is itself a [provenance record](/design/cryptography/#provenance-of-library-modifications) at the library level: who exported, when, from what device. A successful restore re-injects each per-asset provenance chain into the restored library, so the audit trail survives the round-trip — a restored library knows it was restored, from when, by whom. +The MANIFEST.cbor carries the exporter's device id, the export timestamp, the source library version, the `crypto_suite_id` at export time, and a list of every provenance-chain head per asset included in the backup. The MANIFEST is itself a [provenance record](/design/cryptography/provenance/#provenance-of-library-modifications) at the library level: who exported, when, from what device. A successful restore re-injects each per-asset provenance chain into the restored library, so the audit trail survives the round-trip — a restored library knows it was restored, from when, by whom. + +## Validation + +- **Artifact round-trip (unit).** Export → import a small library; assert byte-equal blob set, sidecars, and provenance chains. Determinism check: re-export the same library twice; assert byte-identical archives. +- **MANIFEST verification (unit).** Tamper individual entries; assert HMAC mismatch detected. Tamper MANIFEST itself and re-HMAC; assert exporter-signature mismatch detected. Strip the exporter from the device directory; assert restore refusal. +- **AMK-completeness check (unit).** Build an artifact whose `keys/amk-ledger.cbor` is deliberately missing an `amk_version` that an included asset references; assert detection at dry-run, before any commit. Build a self-sufficient artifact; assert every included asset decrypts from the artifact's own ledger with no server contact. +- **Per-recovery-path smoke** (passphrase, cross-device, Shamir 2-of-3): each is a separate scenario that ends with the library restored on a fresh device. +- **Dry-run determinism (smoke).** Run dry-run twice against an unchanged backup + library; assert byte-identical diff report. +- **Restore reconciliation (smoke).** Exercise each reconciliation case and assert the outcome: identical head → no-op; live head ahead of the restored head → restored *not* applied (offered read-only); divergent or locally-tombstoned-later → quarantined for explicit merge, never silent overwrite; asset absent locally → applied. A six-month-old backup restored over a library with subsequent deletes and edits leaves no live state overwritten. + +The cross-module case — full backup → full restore on a fresh client → verify every asset readable — is one bounded E2E case in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/clients.md b/capsule-docs/src/content/docs/design/clients.md index 9668ed1..7dd995b 100644 --- a/capsule-docs/src/content/docs/design/clients.md +++ b/capsule-docs/src/content/docs/design/clients.md @@ -1,34 +1,45 @@ --- -title: Clients for Capsule -description: An overview of the core architectural decisions for clients in Capsule. +title: Clients +description: Native client priorities, what every client must validate, and the sandboxed decoder --- -This document outlines the core architectural decisions for clients in Capsule, including the rationale behind them and how they contribute to the overall design of the system. +Capsule's clients are native per platform, with as little divergence as possible. The cross-platform logic — including the entire [`verify_asset`](/design/cryptography/keys/#write-authorization) chokepoint, the [import pipeline](/design/import/pipeline/), and the [library layout](/design/filesystem/client/) — lives in `capsule-core` and is consumed by every native client through `capsule-sdk`. Each native client's job is the surface above that: rendering, input, and platform integration. + +The boundary this doc owns is **what every client must do** — the client-class duties that, if skipped, put the client in the *faulty* class (see [Threat Model — Client Class Taxonomy](/design/threat-model/#client-class-taxonomy)). Plus the sandboxed-decoder pattern, which is the largest remaining attack surface on the client. ## Design Priorities -- **Native:** We prioritize native implementations for each platform to ensure familiar usability and enable platform-specific optimizations. -- **Minimal divergence:** While we carefully version everything where applicable and minimize data that acts as sources of truth, we heavily centralize all the heavy and complex logic in `capsule-core` and `capsule-sdk`. Any client-specific logic is generally minimal and focused on display. +- **Native.** Native implementations per platform ensure familiar usability and enable platform-specific optimizations. +- **Minimal divergence.** Heavy and complex logic is centralized in `capsule-core` and `capsule-sdk`; client-specific code is generally minimal and focused on display. ## Platform Limitations -Given the quantity of distinct native clients (each having distinct portions of platform-specific logic), certain features are limited to certain platforms. +Given the quantity of distinct native clients (each with its own platform-specific portion), certain features are limited to certain platforms — notably [auto sync](/design/import/download-sync/#auto-syncing) on platforms where the necessary APIs are not available. ## Client Validation Duties -Clients are not trusted to enforce their own correctness — but they are responsible for **refusing to apply** state they cannot validate. The full client-side validation checklist is owned by [Threat Model — Client-Side Validation Invariants](/design/threat-model/#client-side-validation-invariants); the duties are summarized here so client implementations have a single in-doc reference for what they must do: +Clients are not trusted to enforce their own correctness — but they **are** responsible for **refusing to apply** state they cannot validate. The full client-side validation checklist is owned by [Threat Model — Client-Side Validation Invariants](/design/threat-model/validation/#client-side-validation-invariants); the duties are summarized here so client implementations have a single in-doc reference for what they must do: -- **Run [`verify_asset`](/design/cryptography/#write-authorization)** on every received asset manifest. Quarantine on failure; never silent-drop, never silent-accept. +- **Run [`verify_asset`](/design/cryptography/keys/#write-authorization)** on every received asset manifest. Quarantine on failure; never silent-drop, never silent-accept. This is *the* chokepoint every client must route through — it is implemented once in `capsule-core::crypto` and called by every receiving path (sync, federation, peering, backup-restore). - **Refuse forward-version writes.** Reject any incoming `sidecar_schema`, `crypto_suite_id`, or `protocol_version` above the client's max known. Reading is allowed only in read-only mode if explicitly opted into. - **Enforce the protocol handshake.** Send `X-Capsule-Protocol` on every request; honor `426 Upgrade Required` by stopping the request, never by silently downgrading. -- **Check the provenance chain.** Maintain a local `latest_provenance_hash` per asset; refuse to apply a manifest whose `prior_provenance_hash` is behind it. See [Import & Sync — Stale-Revival Detection](/design/import-synchronization/#stale-revival-detection). +- **Check the provenance chain.** Maintain a local `latest_provenance_hash` per asset; refuse to apply a manifest whose `prior_provenance_hash` is behind it. See [Import — Stale-Revival Detection](/design/import/download-sync/#stale-revival-detection). - **Reject unknown closed-enum values.** `action`, `content_type`, `DerivativeManifest.role`, and `gps.source` are closed per protocol version; unknown values are structural errors, not "future to ignore." - **Preserve unknown CBOR keys within a known schema** (Postel's Law) but never act on them. - **Decode remote-origin asset bytes only in the [Sandboxed Decoder](#sandboxed-decoder).** -- **Never invoke `revoke_all_sessions` without master-key proof.** A pure session-token revoke-all is a [forbidden client behavior](/design/threat-model/#forbidden-client-behaviors). -- **Honor the [forbidden behaviors checklist](/design/threat-model/#forbidden-client-behaviors).** A client that backdates timestamps, strips unknown sidecar fields, overwrites provenance, or signs for an epoch it does not hold is *buggy by definition*. +- **Honor the [forbidden behaviors checklist](/design/threat-model/schema-rules/#forbidden-client-behaviors).** A client that backdates timestamps, strips unknown sidecar fields, overwrites provenance, signs for an epoch it does not hold, or invokes `revoke_all_sessions` without master-key proof is *buggy by definition*. + +Centralizing the validation logic in `capsule-core` ensures each native client gets the same enforcement; the wrapper layer that issues UI surfaces for quarantine and protocol-mismatch errors is the platform-specific portion. + +## Reading State From a Newer Client + +A client routinely encounters state a *newer* client wrote: unknown CBOR keys inside a known schema (always preserved per Postel's Law), or — under an explicit read-only opt-in — a sidecar whose `sidecar_schema` exceeds the reader's max known. The duty is to render what it can without ever destroying what it cannot interpret: -Centralizing the validation logic in `capsule-core` (per [Design Priorities](#design-priorities)) ensures each native client gets the same enforcement; the wrapper layer that issues UI surfaces for quarantine and protocol-mismatch errors is the platform-specific portion. +- **Render the known, surface the unknown.** The client displays every field it understands and shows a **non-destructive indicator** on the affected asset/album — "Created with a newer version of Capsule; some details may not be shown or editable here" — rather than failing, hiding, or quarantining the asset. +- **Never strip, never rewrite.** Unknown CBOR keys and forward-schema sidecars are strictly read-only: the client never writes back a structure it cannot fully represent, because doing so would strip the extension and invalidate the signature — a [forbidden behavior](/design/threat-model/schema-rules/#forbidden-client-behaviors). Editing such an asset is disabled behind the same indicator, pointing the user to update. +- **Writes still fail closed.** Reading newer state is best-effort and read-only; *writing* under a `protocol_version`, `crypto_suite_id`, or `sidecar_schema` the client does not implement remains rejected at the [handshake](/design/threat-model/validation/#protocol-and-capability-negotiation). Tolerant reads, fail-closed writes — the [tightened Postel's Law](/design/principles/#postels-law-asymmetric). + +This is the resolution of the former "new client UI surface" question: forward-written state is legible and safe, never silently dropped and never destructively rewritten. ## Sandboxed Decoder @@ -40,10 +51,19 @@ The defense is structural isolation: - The sandbox communicates with the host via a narrow IPC channel that exchanges only the produced pixel buffer (or an error code) — not arbitrary structured data. - **The sandbox is allowed to crash.** A decoder CVE that triggers a segfault kills the sandbox, not the app. The host process logs the crash, surfaces "asset failed to decode," and continues. The sandbox is restarted on the next decode request. - **Local-origin assets** (this device was the uploader and the bytes have never left local storage) bypass the sandbox at the user's option — they have not crossed a trust boundary. By default the sandbox is still used uniformly, because the modest perf cost is worth the categorical guarantee. -- A media file that fails to decode after N retries in the sandbox is flagged in the UI as "unreadable on this device" rather than removed from the library — the bytes are preserved (per the recovery-first principle in [Filesystem](/design/filesystem/#repair)) for inspection on another device. +- A media file that still fails to decode after a small fixed retry budget (default 3 attempts, to absorb a transient sandbox crash) is flagged in the UI as "unreadable on this device" rather than removed from the library — the bytes are preserved (per [Filesystem — Repair](/design/filesystem/maintenance/#repair)) for inspection on another device. This is the canonical declaration of the sandbox; [Federation — Security Against Malicious Files](/design/federation/#security-against-malicious-files) references it for the federated-asset case, and [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) references it for dry-run decode sanity checks. -## Additional Comments +## Validation + +The validation duties above translate directly to test surface. Most live in `capsule-core` (so they apply uniformly to every client); the per-platform pieces are the sandbox harness. + +- **`verify_asset` per-receiver-path (unit).** Every receiver code path (sync entry, federation pull, peering artifact, restore) routes through `verify_asset`; assertion test confirms the same chokepoint is used, not a divergent implementation. +- **Forward-version rejection (unit).** Per-validation-duty unit test: synthesize an input whose declared version exceeds the client's max; assert *write* refusal. +- **Forward-state read surface (unit).** Present a sidecar with unknown CBOR keys and (opt-in) a higher `sidecar_schema`; assert known fields render, the non-destructive "newer version" indicator shows, editing is disabled, and any write-back attempt is refused *without* stripping the unknown keys. +- **Sandbox crash isolation (smoke per platform).** Feed the sandbox a known-CVE corpus; assert the host process survives every crash; assert the asset is surfaced as "unreadable on this device" and not removed from the library. +- **Sandbox boundary (smoke per platform).** Assert the sandbox cannot read the parent process's filesystem, open network sockets, or write outside its scratch area. Per-platform fixtures verify each restriction. +- **Forbidden-behavior tripwire (unit).** For each item in the [forbidden-behaviors checklist](/design/threat-model/schema-rules/#forbidden-client-behaviors), a unit test confirms that calling the corresponding `capsule-core` API in the forbidden way panics or returns a structural error (so a buggy client cannot accidentally do the wrong thing). -- Compose Multiplatform was heavily considered initially for cross-platform logic but since most format processing is Rust and Kotlin/Native continues to have multiple limitations, we decided to stick to Rust-first approach. +There is no client-only E2E case; the closest cross-module test is the upload-and-display round-trip used by the [Import](/design/import/) pipeline, which is bounded E2E in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/cryptography.md b/capsule-docs/src/content/docs/design/cryptography.md deleted file mode 100644 index 9c683fb..0000000 --- a/capsule-docs/src/content/docs/design/cryptography.md +++ /dev/null @@ -1,619 +0,0 @@ ---- -title: Cryptography -description: Details of the key cryptography primitives for building Capsule on ---- - -## Pillars of Cryptography - -*These are key aspects for those out of the loop.* - -| Pillar | The Core Question | Primary Cryptographic / Security Tool | -| ------------------- | ------------------------------ | ------------------------------------------- | -| **Confidentiality** | Can anyone else read this? | Symmetric/Asymmetric Encryption (AES, RSA) | -| **Integrity** | Has this been tampered with? | Hashing (SHA-256), MACs | -| **Availability** | Can I access this right now? | Redundancy, Backups, DDoS Protection | -| **Authentication** | Are you who you say you are? | Digital Certificates, Passwords, Biometrics | -| **Authorization** | Are you allowed to do this? | Access Tokens, RBAC, ACLs | -| **Non-repudiation** | Can you deny doing this later? | Digital Signatures, Secure Audit Logs | - -## E2E Security Model - -E2E security model has been prevalent for the past decade but applying the same restrictions on an asset-heavy application that aims to be performant and robust is not as trivial. This document outlines the high-level details of balancing security and capability trade-offs. - -We need to encrypt assets (data) along with their metadata in a way that respects the hierarchy of accounts, albums, assets, and permissions. Think of them in layers: - -- Identity: see [Signature Scheme](#signature-scheme) per device, cross-signed by the user master identity. See [Key Management](#key-management) for details. -- Group membership: One MLS group per shared album; each device is a leaf. See [Group Membership](#group-membership) for details. -- Asset encryption: [bulk AEAD](#bulk-aead) per file, keyed via the [KDF](#key-derivation) from per-album keys. See [Authenticated Asset Encryption](#authenticated-asset-encryption) for details. -- CBOR Metadata encryption: [bulk AEAD](#bulk-aead) per metadata blob, keyed via the [KDF](#key-derivation) from per-album keys. (We do not have a STREAM construction since it's typically fetched all together.) See [Metadata Encryption](#metadata-encryption) for details. - -## Primitives Inventory - -This table is **the single source of truth** for every cryptographic primitive Capsule -uses. Other docs (and the rest of this doc) reference these by anchor — they never -restate the choice. Swapping a primitive is a single-row edit here plus its dedicated -section below. - -| Primitive | Choice | Used for | -| ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------ | -| [Cryptographic hash](#cryptographic-hash) | SHA-256 | Content addressing, integrity verification | -| [Key derivation (KDF)](#key-derivation) | HKDF-SHA512 | Per-file and per-album key derivation | -| [Password-based KDF](#password-based-kdf) | Argon2id (device-tier-aware parameters) | Master-key escrow unwrap, backup unwrap | -| [Bulk AEAD](#bulk-aead) | AES-256-GCM with [STREAM](#stream-construction) | Asset and metadata ciphertext | -| [MLS control AEAD](#mls-control-aead) | ChaCha20-Poly1305 | Inherited from the [MLS ciphersuite](#mls-ciphersuite) | -| [Signature scheme](#signature-scheme) | Hybrid Ed25519 + ML-DSA-65 | Identity, device, asset manifest, write tier | -| [KEM](#kem) | X-Wing (X25519 + ML-KEM-768) | MLS HPKE | -| [MLS ciphersuite](#mls-ciphersuite) | `MLS_256_XWING_CHACHA20POLY1305_SHA256_Ed25519` (0x004D) | Group key management | -| [Randomness](#randomness) | OS CSPRNG (`getrandom`) | All keys, salts, nonces | -| [Transport](#transport-security) | TLS 1.3 with hybrid X25519+ML-KEM | Client-server, server-server | - -The per-primitive sections below carry the rationale; the table is the at-a-glance -reference. - -## Versioning Identifiers - -A faulty, malicious, or version-mismatched client could damage data by writing -under a primitive set the receiving side does not implement (see -[Threat Model](/design/threat-model/)). Three identifiers — owned here, in -[Versioning](/design/versioning/), and in [Metadata](/design/metadata/) — bind -each on-disk and on-wire structure to a specific set of primitives or schema so -that mismatches **fail closed** rather than corrupting state: - -| Identifier | Type | Declared in | Carried in | -| ------------------ | ------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `crypto_suite_id` | `u16` | this doc | every [AssetManifest](#provenance-and-signed-manifest), every [metadata blob](#metadata-encryption), the backup [MANIFEST.cbor](/design/backup-recovery/) | -| `protocol_version` | string `YYYY-MM-DD` | [Versioning](/design/versioning/) | every AssetManifest, every wire request (see [Threat Model — Protocol Handshake](/design/threat-model/)), the album's MLS pin | -| `sidecar_schema` | `u16` | [Metadata — Sidecar Schema](/design/metadata/#sidecar-schema-v1) | CBOR sidecar field 0 (readable before parsing the rest) | - -`crypto_suite_id = 0x0001` denotes exactly the [Primitives Inventory](#primitives-inventory) above. Retiring any primitive (a broken SHA-256, a deprecated AEAD) **does not edit the row** — it adds a new row and a new suite id. An old AssetManifest carrying `0x0001` keeps verifying against the original row forever; new writes use the new suite id. This is the single-doc edit the inventory promises, generalized to the bundle. - -The signatures on the manifest cover `crypto_suite_id` and `protocol_version`, so a downgrade-attempt (re-signing an existing manifest under a weaker suite) cannot be silently produced. - -## Key Cryptographic Primitives - -### Cryptographic Hash - -We use SHA-256 (SHA-2) for content hashing, addressing, and integrity verification — everywhere, with no second hash algorithm. It is the most prevalent, audited, NIST-approved standard, and is hardware-accelerated on most modern platforms. - -- Using exactly one hash means one less algorithm and implementation to maintain and audit. -- We reuse SHA-256 values across layers rather than recomputing them: the ciphertext hash used for content-addressing (see [Authenticated Asset Encryption](#authenticated-asset-encryption)) is the same value the [signed manifest](#provenance-and-signed-manifest) commits to, and the same value the upload protocol declares and verifies. -- SHA-3 was rejected for weaker hardware support; BLAKE3's parallelism is attractive but unneeded given simultaneous uploads, and its keyed mode is redundant against our already-authenticated encryption. - -### Key Derivation - -We use **HKDF-SHA512** for per-file and per-album key derivation. The wider 512-bit hash matches the post-quantum posture of the rest of the stack: under Grover's algorithm a 256-bit hash collapses to ~128-bit PQ security, while SHA-512 retains ~256-bit. KDFs are not on the hot path, so the cost difference is negligible. SHA-256 stays for *content addressing* — a different security goal where universal hardware acceleration matters more than PQ margin. - -Every derivation includes a versioned `info` string (e.g. `"asset-file/v1"`, `"albums/v1"`) and a scope-unique salt (e.g. `album_id`, `file_id`) so a future KDF change can land alongside v1 derivations without a flag day. - -### Password-based KDF - -For password-based key derivation we use **Argon2id** with device-tier-aware parameters. Password-based derivation only runs at account recovery and device bootstrap — never on a hot path — so the cost is acceptable even on constrained hardware. Parameters are recorded inside the wrapped-blob [construction](#versioning) so they can be raised later without a flag day. - -| Device tier | Memory | Iterations (`t`) | Parallelism (`p`) | When applies | -| ----------------------- | ------- | ---------------- | ----------------- | ---------------------------------------- | -| Low-RAM (≤ 2 GiB total) | 128 MiB | 3 | 1 | Entry-level Android, low-end embedded | -| Normal mobile / laptop | 256 MiB | 3 | 1 | Default for phones and laptops | -| Desktop (≥ 8 GiB) | 512 MiB | 4 | 1 | Wrapping new escrow blobs from a desktop | - -The salt is always a 32-byte CSPRNG draw. The tier chosen at *wrap* time is recorded -in the blob; *unwrap* respects whatever tier was recorded, so a desktop-wrapped blob -unwraps correctly on a phone (slowly) and vice versa. - -### Bulk AEAD - -For bulk data and metadata encryption we use **AES-256-GCM**. Combined with the [STREAM construction](#stream-construction) it covers asset ciphertext; standalone AES-256-GCM (fresh random nonce per blob) covers CBOR metadata blobs. - -- AES hardware acceleration (Intel AES-NI, ARMv8 AES extensions, Apple Silicon dedicated AES units) is universal on every platform Capsule targets, so AEAD is never the bottleneck. -- We standardize on AES-GCM rather than ChaCha20-Poly1305 for stack consistency with the [SHA-2 family](#cryptographic-hash) and to keep one bulk-AEAD choice across the codebase. MLS retains ChaCha20-Poly1305 from its [ciphersuite spec](#mls-ciphersuite); that's a separate layer. -- Nonce misuse is the structural risk of GCM. We close it two ways: every file uses a freshly-derived per-file key (so the STREAM counter can safely start at zero), and standalone metadata blobs each draw a fresh CSPRNG nonce. - -### MLS Control AEAD - -For MLS control traffic we use **ChaCha20-Poly1305**, inherited from the [MLS ciphersuite](#mls-ciphersuite). This protects MLS's own membership and key messages, not user data; user data uses the [bulk AEAD](#bulk-aead) above. - -### Signature Scheme - -We use **hybrid Ed25519 + ML-DSA-65** for identity, device, asset manifest, and write-tier signatures. Both halves must verify before a peer is accepted. The classical and post-quantum halves are independent, so neither algorithm being broken compromises authentication. MLS LeafNode signatures stay Ed25519-only (pinned by the ciphersuite); the ML-DSA half lives at the identity layer — see [Group Membership](#group-membership). - -### KEM - -We use **X-Wing (X25519 + ML-KEM-768)**. This is the KEM defined by the [MLS ciphersuite](#mls-ciphersuite) we adopt. - -### MLS Ciphersuite - -We use **`MLS_256_XWING_CHACHA20POLY1305_SHA256_Ed25519`** (OpenMLS ciphersuite 0x004D) — MLS (RFC 9420) with the PQ ciphersuites from `draft-ietf-mls-pq-ciphersuites`. See [Group Membership](#group-membership) for how the ciphersuite's choices (X-Wing KEM, ChaCha20-Poly1305 control AEAD, SHA-256 hash, Ed25519 leaf sigs) interact with the identity layer. - -### Randomness - -All keys, salts, and nonces are drawn from the operating system CSPRNG (`getrandom`). We never seed our own PRNG. - -Nonces are never hand-rolled. The [STREAM construction](#stream-construction) derives per-chunk nonces deterministically; standalone [bulk-AEAD](#bulk-aead) metadata blobs each receive a fresh random nonce. - -## Key Management - -Capsule's keys form a single hierarchy with one backed-up root: - -- The **account master key** is the only key that is escrowed/backed up. It does not encrypt assets directly. Its job is to (1) wrap the per-device identity private keys and (2) anchor the encrypted backup that escrows album keys. -- **Device keys** are hardware-bound, non-exportable, and therefore disposable — a device is re-bootstrapped from the master key rather than recovered. -- **Album keys** (AMKs) are random per-epoch keys ledgered in MLS, escrowed both in the master-key backup and in the [Owner Group](#owner-group-keys-ogks). - -The guiding rule is to **keep the backup path independent of the MLS ratchet** so that losing all devices, but holding the recovery passphrase, still restores every photo. Do not be like Matrix, where undecryptable content is a routine failure mode. See [Failure Modes and Recovery](#failure-modes-and-recovery). - -### Key Generation - -All key generation happens client-side, from the OS CSPRNG. We use a PQ-safe ("post-quantum") hybrid scheme throughout: classical + PQ primitives combined so that breaking either one alone does not break security. - -#### User Identity Keys (User IKs) - -User IKs are generated once per user ever, and live forever (or until account compromise). This is the root of trust and signs everything below it. It is always verified out-of-band or via safety numbers. - -A User IK is a **hybrid Ed25519 + ML-DSA-65** signing keypair generated entirely on the client at account creation. The private halves are wrapped under the [account master key](#registered-accounts) and never leave the client in the clear; the public halves are published in the signed [device directory](#per-user-device-coordination). - -It can be revoked for a global account reset (irreversible, non-recoverable nuclear operation). Revocation is published as a separate revocation certificate, hybrid-signed by the IK itself, to a well-known location so clients can check for it. - -#### Device Keys - -Using the [user IK](#user-identity-keys-user-iks), each device's keys are cross-signed into the [device directory](#per-user-device-coordination): - -1. **DSK** (Device Signing Key): hybrid **Ed25519 + ML-DSA-65**. -2. **DEK** (Device Encryption Key): hybrid **X25519 + ML-KEM-768**. - -Both are signed by the IK (hybrid signature). Device private keys are **generated inside and never leave hardware** — Secure Enclave (iOS), StrongBox/Keystore (Android), TPM (desktop) — and are non-exportable. Because they cannot be backed up, devices are treated as disposable: a lost device is simply removed and a new one re-bootstrapped from the master key. - -A device key can be revoked without affecting the user's identity or other devices. This allows for per-device access control and recovery from lost devices without a full account reset. Revocation is done by signing a revocation statement with the IK and publishing it to a well-known location. The server then refuses to deliver new key wraps to that device, and remaining devices rotate any group keys the revoked device had access to. - -#### Owner Group Keys (OGKs) - -Since assets' `owner_id` maps to a set of users, treat each owner as an MLS group. - -- **Type:** Symmetric AES-256 root key of an MLS group whose members are the owner's user set. -- **Purpose:** A recovery/escrow layer. The OGK does **not** wrap individual file keys. Instead, it escrows every album's [AMK versions](#album-master-keys-amks), so any current owner member can always recover every album key — and therefore every asset — independent of album membership. This avoids double-wrapping each file while still guaranteeing the owner never loses access. -- **Epoch:** Bumps on any owner-set change. Every member's client commits to MLS, producing a new OGK; the server stores the welcome/commit messages. -- **Revocation:** Remove a user from the owner set → MLS Remove proposal → new epoch → the removed user's device can no longer derive future OGKs and is dropped from future AMK escrow. - -#### Album Master Keys (AMKs) - -Each album is its own MLS group. Members = users with any permission on the album. - -- **Type:** Random 32-byte symmetric key, minted per epoch. AMKs are *not* derived from MLS epoch state (which is complicated to handle at edge cases) — they are random keys distributed *over* MLS application messages and ledgered. - -Capsule separates **secrecy** (enforced by encryption) from **authorization** (enforced by signatures). We use one content key plus two signing capabilities, to minimize keys which can be possibly leaked: - -- **`AMK` — the content key.** Read access. MLS delivers it to *all* album members. Holding it lets you decrypt; not holding it means you cannot. -- **Write capability — a per-epoch write-tier signing keypair.** Distributed via MLS to writers only. Used to sign [asset manifests](#provenance-and-signed-manifest). It rotates with the AMK epoch, so a removed writer cannot sign for future epochs. This is authorization, not secrecy. See [Write Authorization](#write-authorization). -- **Admin capability — an admin-tier signing keypair.** Distributed to admins only; used to sign MLS membership commits. - -Epoch bump triggers: member add/remove, permission change, scheduled rotation (e.g., every 30 days for long-lived albums). - -#### Write Authorization - -A device signature on an [asset manifest](#provenance-and-signed-manifest) proves *which device* produced an asset — but not that the device was *authorized to write* to that album at that time. The server is **not trusted for authorization**: it could replay, reorder, or surface an asset signed by a reader-only device, a removed writer, or a device acting outside its write window. A bug could also produce such an asset. Both must be rejected robustly, with the verification logic kept small enough to be hard to get wrong. - -- **Epoch-bound write proof.** Every asset manifest carries, in addition to the device DSK signature, a signature under the album's **per-epoch write-tier signing key**. Only writers at that epoch hold that key. The manifest's `amk_version` identifies the epoch. -- **Authorization authority is MLS history, not the server.** The client verifies the write-tier signature against the write-tier public key it learned for that epoch *from MLS* — the album's MLS commit chain (admin-signed) is the sole authority on who could write when. A server-asserted authorization is never sufficient. -- **What this accepts vs. rejects.** An asset signed by a writer who was *later* removed is still acknowledged — it was valid when written, and nothing after removal un-seeds it. An asset signed at an epoch where the signer lacked write capability is **rejected**: an attacker (or a buggy/colluding server) cannot produce a valid write-tier signature for an epoch they were not a writer in. -- **Single verification chokepoint.** All of this lives in one `verify_asset(manifest, ciphertext, mls_state)` function in `capsule-core/crypto` — the only path by which a client acknowledges an asset. Per [contract-driven development](#implementation), it ships with exhaustive negative test cases: reader-signed, removed-writer, wrong-epoch, forged certificate chain, replayed manifest. -- **Defensive failure handling.** A verification failure is *never* silently dropped and *never* silently accepted. The asset is quarantined and surfaced in the [provenance/audit trail](#provenance-of-library-modifications) so an operator can distinguish a bug from an attack after the fact. This bounds the blast radius of an implementation bug. -- **Downgrade-resistant.** Both signatures cover `crypto_suite_id`, `protocol_version`, and `prior_provenance_hash`. A manifest cannot be silently re-signed under a weaker suite or back-dated onto a different chain position without breaking either signature; an attempt to do so is rejected at the same `verify_asset` chokepoint. -- **Timestamp grammar.** Servers refuse a manifest whose `timestamp` is outside **±30 days of server clock** (configurable). The cryptography proves "this asset was signed by a device that held epoch-N write capability"; the time window prevents a buggy or hostile client from injecting timestamps decades in the past or future that would silently distort the audit trail. The grammar lives in [Threat Model](/design/threat-model/) and is mirrored in [Server-Side Validation Invariants](/design/threat-model/). - -#### Forward Secrecy & Post-Compromise Security - -The MLS-based scheme provides forward secrecy (FS) and post-compromise security (PCS). The specific implementation we follow is MLS (RFC 9420) with the PQ ciphersuites from `draft-ietf-mls-pq-ciphersuites`. - -**Clarification:** True FS on data-at-rest is a contradiction (the ciphertext persists). What MLS gives you at each epoch bump is: a compromise of the current epoch's keys doesn't help an attacker read past epochs, and removed members can't read future epochs. That's the practical security property you want. - -For data-in-transit between clients and server (uploads, key-bundle fetches), use TLS 1.3 with ephemeral ECDHE — that's where session-level FS lives. See [Transport Security](#transport-security). - -#### Resisting Key Loss - -Loss of keys — and thus loss of data — is a first-class failure mode. The master key, not any MLS ratchet state, is the single backed-up root. All safeguards and the redundant restore paths are consolidated in [Failure Modes and Recovery](#failure-modes-and-recovery). - -#### Key Chain - -The account master key does **not** derive album keys — albums are MLS groups with random AMKs. The master key's role is to wrap device identity keys and to anchor the encrypted backup that escrows AMKs: - -```plaintext -account_master_key (backed up — see Resisting Key Loss) - ├─ wraps device identity private keys (IK / DSK / DEK private halves) - └─ anchors the encrypted backup that escrows: - AMK_v{n} (random 32 bytes, per album, minted per MLS epoch) - └─ HKDF-SHA512(ikm=AMK_v{n}, salt=file_id, info="asset-file/v1") → 32-byte AES file key - └─ AES-256-GCM-STREAM -``` - -Important details on construction: - -- Always include a version string in `info` so you can rotate the KDF later. -- Salt with something unique per scope (`album_id`, `file_id`) — don't reuse salts across scopes. -- The 512-bit KDF output is truncated to 32 bytes (256-bits) for the AES-256 file key. See [Key Derivation](#key-derivation) for the SHA-512 rationale. -- Each file gets a fresh derived key, so the STREAM nonce can safely start at zero per file. - -Photo/media keys specifically: separate the "MLS/ratchet" world from "data at rest." Per-album AMKs are escrowed in the server-side encrypted backup (see [Backup and Recovery](/design/backup-recovery/)) and the [OGK](#owner-group-keys-ogks) — not derived from ratchet state — so losing all devices but holding the recovery passphrase still restores photos. Ratchet keys are expected to be ephemeral. - -### Identity-based Key Derivation - -Since all assets are encrypted via keys ultimately recoverable from an account's master key, we encapsulate user identity keys differently depending on the [account type](/design/authentication/#account-types). - -#### Registered accounts - -Most users have their own unique master key. It is **generated client-side** at account creation from the OS CSPRNG. The server never holds the naked master key. Each device stores its own copy wrapped under that device's DEK; a new device obtains the master key either via [cross-device recovery](/design/backup-recovery/#recovery-mechanisms) or by unwrapping the [encrypted server-side backup](/design/backup-recovery/#master-key-escrow) with the recovery passphrase. - -#### Delegated/Sponsored accounts - -A sponsored account is anchored under the sponsor's master key but holds its own encryption keys. The mechanism — and the only sound way to revoke — is: - -1. **Per-sponsoree KEK.** When a sponsor creates a sponsored account, the sponsor draws a fresh 32-byte **sponsoree KEK** from the CSPRNG (it is *not* derived from the master key — a deterministic derivation would be reproducible by the sponsor at any future point, defeating revocation). The KEK is wrapped under the sponsor's master key and stored in the sponsor's escrowed hierarchy. -2. **Sponsoree key material.** The sponsoree's own identity, device, and album keys are generated normally (see the rest of this section). Their private halves are wrapped under the sponsoree KEK rather than directly under the sponsor's master key, so the sponsor can re-wrap or destroy a single sponsoree's keys without touching its own or the other sponsorees'. -3. **Shared-asset access.** Sponsorees gain access to a sponsor's shared albums via ordinary MLS membership (the sponsoree's devices are added as MLS leaves in the sponsor's album groups). The KEK is *not* a content key — it only wraps the sponsoree's private keys. -4. **Revocation.** Revocation is a three-step operation, all signed by the sponsor's IK: - - **Rotate** the sponsoree KEK: draw a new KEK, re-wrap surviving sponsorees if any, drop the old KEK. - - **Publish** an IK-signed revocation certificate naming the revoked sponsoree's identity and the timestamp. - - **Remove** the revoked sponsoree's devices from every MLS group they were a member of (album groups, owner group) via the standard [MLS Remove](#membership-operations) flow, bumping AMK epochs. - -The sponsor's *own* master key is untouched by any sponsoree revocation. The published revocation certificate is what clients and [federated](/design/federation/) peers check to refuse traffic from a revoked sponsoree. - -#### Non-registered accounts - -**Reading.** Since key management operates at the user level, userless share links are handled distinctly. We encapsulate the decryption keys around the secret stored in the link. The owner can optionally attach a password, in which case the [password-based KDF](#password-based-kdf) adds a second encapsulation layer on top of the link secret. - -**Writing.** Writing is **not supported** for non-registered accounts. Every uploaded asset must be encrypted under an album key and signed with a write-tier key; a non-registered user has neither a device encryption key (DEK) nor a place in any album's MLS group, so it cannot produce a valid [asset manifest](#provenance-and-signed-manifest). Supporting guest uploads would require an ephemeral link-scoped key hierarchy; this is a deliberate non-goal to keep the design simple. - -### Key Rotation and Revocation - -- **Master key rotation.** The master key can be replaced at will. Rotation re-wraps the key hierarchy (device-key wraps and the AMK escrow blob) under the new master key; the old master key is retained only long enough to complete the re-wrap, then discarded. Existing signed-in sessions hold device and derived keys directly and are **unaffected** — they keep working through the rotation. -- **Device revocation.** Handled via the [device key](#device-keys) revocation certificate plus an MLS `Remove` for that device's leaves (see [Membership operations](#membership-operations)). -- **Album-member revocation.** Handled by an MLS `Remove` and an AMK epoch bump (see [Membership operations](#membership-operations)). - -## Group Membership - -Capsule's group layer is the [MLS ciphersuite](#mls-ciphersuite) from the inventory. The ciphersuite's choice of [ChaCha20-Poly1305](#mls-control-aead) (rather than [AES-GCM](#bulk-aead) used for user data) is acceptable because: - -- It only protects MLS's own control messages (kilobytes of membership and key data, not your photos). -- ChaCha20-Poly1305 is one of the two most-audited AEADs in existence. -- The alternative is a classical-only MLS ciphersuite plus a hand-rolled PQ retrofit — exactly the custom crypto we're trying to avoid. - -One follow-on: MLS binds LeafNode signatures to Ed25519 in this suite, so the ML-DSA half of the [hybrid signature scheme](#signature-scheme) lives at the **application layer** — identity certificates sign the Ed25519 MLS key with both Ed25519 and ML-DSA, and peers verify both before accepting a device into a group. This keeps MLS pure while preserving PQ authentication end-to-end. - -### Membership operations - -**Add user Bob to album:** - -1. Fetch Bob's device directory (list of his devices with KeyPackages published to the server) -2. MLS `Add` proposal + `Commit` adding all Bob's devices as leaves -3. The `Welcome` message to Bob's devices carries current `AMK_v_current` as a Welcome extension -4. If full history is desired (usually yes for shared albums), also include `AMK_v1..AMK_{current-1}` in the Welcome — Bob's devices can now decrypt everything -5. If only post-join history, omit older AMKs — Bob sees only future photos - -**Remove user Charlie:** - -1. MLS `Remove` proposal + `Commit` removing all Charlie's devices -2. MLS advances to a new epoch; Charlie's devices can no longer read MLS traffic -3. Committer generates fresh `AMK_v{current+1}` and broadcasts via MLS to remaining members -4. All future photo uploads use `AMK_v{current+1}` -5. Charlie retains `AMK_v1..current` locally, so he can still decrypt photos he *already had access to* — this is correct behavior (he already had those photos; nothing you do after removal un-seeds them). But new uploads are invisible to him. - -**Add new device for existing member:** - -1. Alice's existing device adds Alice's new device as a leaf in the MLS group -2. Welcome carries all AMK versions Alice is entitled to -3. New device is now equivalent to Alice's other devices - -**Remove lost device:** - -1. Any of user's remaining devices issues MLS `Remove` for the lost device -2. Treat like a removal above — bump AMK version, since you must assume the lost device's keys are compromised - -## Per-user device coordination - -Each user publishes a signed device directory: - -```rust -DeviceDirectory { - user_id, - devices: [ - { device_id, ed25519_pk, mldsa_pk, key_package_ref, added_at, signed_by_master }, - ... - ], - signature: Hybrid(master_ed25519, master_mldsa) -} -``` - -When Alice's device A1 adds Bob to an album, it fetches Bob's directory, verifies the hybrid signature against Bob's published master identity, and adds all Bob's listed devices. Alice's other devices (A2, A3) see the MLS commit and update local state — MLS handles idempotent application of commits, so this just works. - -Conflicts (A1 and A2 trying to add different people simultaneously) are handled by MLS's proposal/commit ordering — one wins, the other re-proposes on top. OpenMLS exposes this. - -### History delivery for new joiners - -This is the one spot where you write real custom code. Two patterns: - -**Full history (recommended for shared albums):** -Welcome message carries encrypted blob of `[AMK_v1, AMK_v2, ..., AMK_current]`. New joiner decrypts all, can now read every photo. - -**Capped history (e.g., last 90 days):** -Only include AMKs corresponding to epochs ≥ threshold. Older photos remain visible but not decryptable — you show placeholders. - -Matrix supports both; most photo-sharing products default to full history. Pick one default, expose the choice if needed later. - -### Notes on Scaling - -MLS scales to thousands of leaves, so even a 50-user album (200+ devices) is fine. Note that every `Commit` touches the whole tree and each `Welcome` carries `log(N)` path secrets plus the AMK blob — a cost to watch for very large shared albums. - -## Authenticated Asset Encryption - -Every asset is content-addressed by the SHA-256 of its ciphertext and encrypted with a unique file key. We use AES-256-GCM with the STREAM construction for authenticated encryption. The file key is derived from the appropriate [AMK](#album-master-keys-amks); the AMK itself is recoverable from the account's master key (see [Identity-based Key Derivation](#identity-based-key-derivation)). - -### Asset Key Derivation - -Each asset is encrypted with a key derived from a versioned album master key (AMK), distributed and ledgered over MLS (see [Group Membership](#group-membership)). Note we never derive a key from the MLS epoch's internal state. - -An album's AMK ledger looks like this: - -```rust -Album { - id: UUID, - mls_group: MlsGroup, - keys: [ - AMK_v1: (random 32 bytes, created at album creation), - AMK_v2: (random 32 bytes, created when member X was removed), - AMK_v3: ... - ], - current_version: 3, -} -``` - -The per-file key is derived from the AMK version that encrypted it, using the [KDF](#key-derivation): - -```rust -file_key = HKDF_SHA512( - ikm: AMK_v{amk_version}, - salt: file_id, - info: "asset-file/v1", - length: 32 // 32 bytes for AES-256; HKDF-SHA512 expand truncates safely -) -``` - -AMKs are delivered over MLS application messages. When epoch N's MLS group is established, the creating device sends an `AlbumKeyDistribution { amk_version, amk_bytes }` message through MLS. Every current member's device receives and stores it locally (hardware-wrapped). - -### Provenance and Signed Manifest - -Capsule frequently needs a verifiable trace of *who* produced an asset, so the provenance signature must be cryptographically bound to the ciphertext — while still allowing streaming. We do this with a small **signed manifest** rather than a Merkle tree: the STREAM construction already detects per-chunk tampering, truncation, and reordering, so a Merkle tree's only marginal gain (early-abort on a forged *whole-file* signature) is not worth the extra format complexity. - -Each asset is stored as: - -```rust -AssetManifest { - version: "asset-manifest/v1", - crypto_suite_id: u16, // see Versioning Identifiers above - protocol_version: String, // YYYY-MM-DD; matches album pin - file_id: UUID, - album_id: UUID, - amk_version: u32, // identifies the AMK epoch + write-tier key - ciphertext_hash: { algo: String, value: bytes }, // content address; reused by upload protocol - plaintext_size: u64, - chunk_size: u32, // plaintext bytes per chunk (65,520) - nonce_prefix: [u8; 7], // STREAM nonce prefix, random per file - created_by_user: UUID, - created_by_device: UUID, - client_version: String, - timestamp: RFC3339, // bounded to ±30 days of server clock at accept - action: enum, // create | replace | delete | metadata-update - // | derivative-add | derivative-replace | trash-restore - prior_provenance_hash: Option<[u8;32]>, // SHA-256 over the previous manifest in this asset's - // provenance chain. null only for `action = create`. - // See Provenance of Library Modifications. - - device_sig: Hybrid(Ed25519, ML-DSA-65), // over all fields above - write_sig: Signature, // under epoch write-tier key, over all fields above -} - -AssetBlob { - manifest: AssetManifest, - chunks: [AES-256-GCM-STREAM encrypted chunks], -} -``` - -The manifest carries **two signatures**, and a client acknowledges the asset only if **both** verify: - -1. `device_sig` — hybrid Ed25519 + ML-DSA-65 by the uploading device's [DSK](#device-keys). Provides provenance; the device certificate chains to the user IK via the [device directory](#per-user-device-coordination). -2. `write_sig` — a signature under the epoch's [write-tier key](#album-master-keys-amks). Proves the signer held write authorization at `amk_version` (see [Write Authorization](#write-authorization)). - -The signed manifest is stored as the encrypted asset's header and is itself part of the [provenance record](#provenance-of-library-modifications). The same signing approach applies to other surfaces — [metadata blobs and sidecars](#metadata-encryption) and the [device directory](#per-user-device-coordination) are each hybrid, device-signed, and versioned. - -**Streaming is preserved.** The STREAM authentication tags verify every chunk *during* the stream. The manifest signature is a one-time provenance check. `ciphertext_hash.value` is computed incrementally as bytes arrive and confirmed at stream end — no separate pass, no buffering the whole file. - -### Encryption Workflow - -Encrypting an asset for upload: - -1. Derive `file_key` from `AMK_v{current}` (see [Asset Key Derivation](#asset-key-derivation)). -2. Generate a random 7-byte `nonce_prefix` from the OS CSPRNG. -3. Split the plaintext into 65,520-byte chunks and encrypt sequentially with `EncryptorBE32`, producing 64 KiB ciphertext chunks (16-byte tag each); the final chunk is flagged as last. -4. Compute `ciphertext_hash.value` incrementally over the produced ciphertext (the `algo` is fixed by `crypto_suite_id`). -5. Build and sign the [manifest](#provenance-and-signed-manifest) (device signature + write-tier signature). -6. Upload the blob (see [Import Synchronization](/design/import-synchronization/)). - -Streaming download / ranged reads: - -- **Sequential:** `DecryptorBE32` consumes chunks in order, verifying each tag. -- **Ranged:** To start at plaintext byte `B`, the client computes `chunk_index = B / 65,520`. Because the [STREAM construction](#stream-construction) derives each chunk's nonce deterministically, chunk `i` decrypts independently given `file_key` and `i` — the server need only serve that 64 KiB ciphertext chunk, which the client decrypts and verifies. - -### STREAM Construction - -Our scheme strictly requires streaming. - -The chosen method is AES-256-GCM with the STREAM construction (Hoang-Reyhanitabar-Rogaway-Vizár, 2015). STREAM splits the file into chunks, encrypts each with AES-GCM using a structured nonce (`prefix || counter || last-chunk-flag`), and guarantees you detect truncation, reordering, and chunk deletion. - -In Rust: the RustCrypto `aead` crate exposes `stream::EncryptorBE32` and `stream::DecryptorBE32` — drop-in. We use a 65,520-byte plaintext chunk → 64 KiB ciphertext chunk. (Note the upload transport's 4 KiB chunk alignment, described in [Import Synchronization](/design/import-synchronization/), is a separate concern from this crypto chunk size.) - -## Metadata Encryption - -Not all metadata can be encrypted — some must stay server-readable for routing and preview. The split is deliberate: - -- **Encrypted** (AES-256-GCM under a key derived from the album's AMK, fresh random nonce per blob): the CBOR sidecar / metadata blobs. Each blob is independently versioned and signed like an [asset manifest](#provenance-and-signed-manifest). -- **Server-plaintext by necessity:** `owner_id`, the [ciphertext content hash](#primitives-inventory), the ciphertext size, the [chromahash LQIP](/design/thumbnails/#lqip), and `dominant_color`. These are needed for routing and for generating previews without decryption. This is a deliberate, documented trade-off. -- **AI embeddings** (semantic-search vectors, face embeddings) are sensitive — a user can be re-identified from them. They are kept plaintext *locally* (vector search requires it) but encrypted at rest in the server-side backup. - -CBOR metadata blobs use **deterministic encoding** (RFC 8949 §4.2). Because a blob's hash is what content-addresses it and what the [signed manifest](#provenance-and-signed-manifest) commits to, two implementations encoding the same logical metadata must produce byte-identical output — otherwise the hash diverges and the signature fails to verify across [federated](/design/federation/) peers. - -### Metadata Blob Wire Format - -An encrypted metadata blob is a single contiguous byte string. Implementations MUST produce and consume exactly this layout, with no framing variations, so two correct implementations can compute identical content hashes byte-for-byte. - -```text -+---------------------+---------------------+--------------------------+---------------+ -| crypto_suite_id (2) | nonce (12 bytes) | ciphertext (variable) | tag (16 bytes)| -+---------------------+---------------------+--------------------------+---------------+ -| big-endian u16 | fresh CSPRNG draw | AES-256-GCM(plaintext) | GCM tag | -``` - -- `crypto_suite_id` (2 bytes, big-endian `u16`) — pins the AEAD and KDF used to derive the key. Identical to the field carried inside the manifest (see [Versioning Identifiers](#versioning-identifiers)), and a mismatch with the manifest's value rejects the blob at decode. -- `nonce` (12 bytes) — fresh OS-CSPRNG per blob; never reused, never derived. -- `ciphertext` — the deterministically-encoded CBOR plaintext, sealed with AES-256-GCM under `HKDF-SHA512(ikm=AMK_v{n}, salt=blob_id, info="metadata-blob/v1", length=32)`. -- `tag` (16 bytes) — GCM authentication tag. - -The total blob's `ciphertext_hash` (in the asset's [signed manifest](#provenance-and-signed-manifest)) is computed over the full byte string above — header, nonce, ciphertext, and tag concatenated. - -## Provenance of Library Modifications - -Every modification of data or metadata produces a **provenance record** — timestamp, device, client version, action — anchored by a [signed manifest](#provenance-and-signed-manifest). The records form an **append-only, hash-chained log per asset**, which is what lets an operator distinguish a legitimate delete from a malicious or bug-induced one after the fact, and what defeats the [stale-revival attack](/design/threat-model/) described in the Threat Model. - -### Chained, Append-Only Structure - -```rust -ProvenanceRecord { - asset_id: UUID, - manifest: AssetManifest, // see Provenance and Signed Manifest - prior_provenance_hash: Option<[u8;32]>, // SHA-256 over the previous record; - // null only for `action = create` - // The manifest's own `prior_provenance_hash` mirrors this value, so signature - // coverage of the manifest is signature coverage of the chain link itself. -} -``` - -Each non-create record references its predecessor by hash; a rewrite of any past record breaks the chain at that point and is detectable by any client walking forward from `create`. - -### What an Attacker With All Current Keys Still Cannot Do - -Even if every current key (every device's DSK, every album's current AMK and write-tier key) is compromised: - -- **Forward writes are possible** — the attacker can append new records, just like any holder of those keys. -- **Past records cannot be rewritten** — the prior record was signed by a (possibly retired) device whose hybrid signature is still verifiable against the public half published in the [device directory](#per-user-device-coordination). Replacing the past record would require forging that earlier device's signature, which the hybrid construction prevents. -- **Past records cannot be silently removed** — every later record carries the prior hash, so a removal breaks the chain. - -This bounds the blast radius of a credential compromise: history is read-only. - -### Physical Storage - -- **Client.** An append-only CBOR file at `media/{YYYY}/{YYYY-MM}/{uuid}.provenance.cbor`, alongside the asset and its sidecar. The file is a sequence of `ProvenanceRecord` entries. The client never deletes this file — on hard-delete of an asset the log persists as a tombstone-with-history. -- **Server.** A content-addressed encrypted blob, distinct from the [encrypted metadata blob](#metadata-encryption), so a metadata edit (which mints a new metadata blob) never rewrites history. The server's no-key envelope of every provenance write includes `prior_provenance_hash`, so the server can enforce monotonic chain advance without holding any key — see [Threat Model — Server-Side Validation Invariants](/design/threat-model/). - -The server is **append-only** for provenance: there is no API path that overwrites or deletes an existing entry. An attempt is rejected at the [server's structural validation layer](/design/threat-model/). - -### Derivative Provenance - -Thumbnails, previews, and embeddings are generated client-side and uploaded as ordinary encrypted blobs. Without provenance they would be silently overwritable by any client with write capability — a buggy v4 client could quietly replace a v3 client's good thumbnail with a corrupt one. To prevent this, every derivative carries a small signed manifest of its own: - -```rust -DerivativeManifest { - version: "derivative-manifest/v1", - crypto_suite_id: u16, - source_asset_id: UUID, - role: enum, // thumbnail | preview | lqip | embedding - format: String, // e.g. "image/avif", "embedding/mobileclip-b" - ciphertext_hash: { algo, value }, - generated_by_device: UUID, - generated_by_client: String, - model_id: Option, // for embeddings; see ML Models - model_version: Option, // for embeddings - generated_at: RFC3339, - prior_provenance_hash: Option<[u8;32]>, // chained per (asset_id, role) - device_sig: Hybrid(Ed25519, ML-DSA-65), - write_sig: Signature, // under the album's epoch write-tier key -} -``` - -A derivative overwrite is therefore a `derivative-replace` lifecycle action that appends to the provenance chain like any other write. Quarantine semantics from [Write Authorization](#write-authorization) apply: a derivative whose manifest fails verification is surfaced, never silently applied — a buggy client cannot poison a derivative under the receiving side's nose. - -## Failure Modes and Recovery - -Capsule treats loss of data — and loss of the keys that decrypt it — as a first-class concern. This section enumerates what can go wrong, how each failure is detected or contained, and the redundant, independent paths that restore a user's *entire* asset collection — including after catastrophic software bugs, not just key loss. - -### Failure Mode Catalog - -| Failure mode | Detected / contained by | Recovery path | -| ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | -| **Master key loss** | — | Master-key escrow (path 1) or cross-device recovery (path 2) | -| **Device key loss** | Device keys are disposable by design | Re-bootstrap from the master key (path 1/2); device keys are never recovered | -| **AMK loss** (album key) | — | OGK escrow (path 3) and the master-key-anchored backup escrow (path 4) | -| **Write-tier key loss** | — | Re-minted and redistributed over MLS at the next epoch; no asset is lost | -| **Master key compromise** | — | Master-key rotation re-wraps the hierarchy — see [Key Rotation and Revocation](#key-rotation-and-revocation) | -| **Device compromise** | — | Device revocation certificate + MLS `Remove`; surviving devices rotate group keys | -| **AMK / write-tier compromise** | — | MLS epoch bump mints a fresh AMK and write-tier key; the compromised epoch cannot read or sign future epochs | -| **Server compromise** | Server is never trusted for authorization or plaintext | Authorization is verified against MLS history; data is E2E-encrypted at rest | -| **Classical primitive broken** (Ed25519, X25519) | Hybrid construction | The PQ half (ML-DSA-65 / ML-KEM-768) still holds — confidentiality and authentication survive | -| **PQ primitive broken** (ML-DSA, ML-KEM) | Hybrid construction | The classical half still holds | -| **Ciphertext corruption; chunk truncation, reorder, or deletion** | AES-256-GCM-STREAM per-chunk tags + `ciphertext_sha256` | Re-fetch the blob from a content-addressed copy (path 6) | -| **Reader-signed / removed-writer / wrong-epoch / forged-chain / replayed manifest** | The single [`verify_asset`](#write-authorization) chokepoint | Asset is quarantined and surfaced in the [audit trail](#provenance-of-library-modifications) | -| **MLS ratchet corruption or loss** | — | The recovery path is independent of ratchet state (paths 1, 3, 4) | -| **Backup incompleteness** (a referenced `amk_version` missing from the escrow) | Backup verification's AMK-completeness check | Caught before the backup is relied on; re-export | -| **Nonce reuse** | Structurally prevented | STREAM derives per-chunk nonces; metadata blobs draw fresh random nonces; a fresh per-file key lets the STREAM counter start at zero | -| **CBOR non-determinism** breaking cross-peer signature verification | RFC 8949 §4.2 deterministic encoding | Byte-identical re-encoding; the signature verifies | -| **Catastrophic software bug** corrupting the library DB / index | The DB is a rebuildable cache, not a source of truth | Filesystem rebuild from CBOR sidecars (path 5) | -| **Erroneous delete** (bug or user) | Soft-delete is the default | Restore from trash within the retention window (path 7) | -| **Stale-revival attempt** (peer or restore sends an old-but-validly-signed manifest) | `prior_provenance_hash` chain (see [Provenance](#provenance-of-library-modifications)) and matching server-side envelope check (see [Threat Model](/design/threat-model/)) | Manifest is quarantined; chain advance is refused on both client and server | -| **Suite-downgrade attempt** (re-sign a manifest under a weaker `crypto_suite_id`) | Signature covers `crypto_suite_id` and `protocol_version` | Verification fails at `verify_asset`; manifest is quarantined | -| **Derivative poisoning** (buggy or hostile client overwrites a good thumbnail/embedding) | Every derivative carries a [`DerivativeManifest`](#derivative-provenance) on its own chain | Overwrite without a valid manifest is rejected; provenance chain detects an unauthorized replacement | -| **Cross-schema sidecar overwrite** (old client writes back a sidecar after stripping unknown fields) | Sidecar signature covers every byte including unknown fields; old client `refuses to write` when `sidecar_schema` exceeds its max known | Old client cannot strip-and-resign; new client detects schema regression and quarantines | - -### Redundant Recovery Paths - -Restoring a complete asset collection does not depend on any single mechanism. The following paths are **independent** — each is annotated with the failures it survives: - -1. **Master-key escrow.** A recovery passphrase or BIP39-style seed unwraps the server-side escrow blob → account master key → AMK escrow → every asset. *Survives: total device loss.* See [Master-Key Escrow](/design/backup-recovery/#master-key-escrow). -2. **Cross-device recovery.** Any signed-in device re-bootstraps a new device over a verified channel. *Survives: partial device loss, and loss of the master-key backup — as long as one device survives.* -3. **Owner Group Key (OGK).** Any current member of the [owner set](#owner-group-keys-ogks) recovers every album's AMK versions, independent of album membership. *Survives: lost album membership, gaps in AMK distribution over MLS.* -4. **Portable backup artifact.** A self-describing, versioned, encrypted archive, stored offline. *Survives: server data loss, account compromise, escrow-blob corruption.* See [Backup Artifact](/design/backup-recovery/#backup-artifact) for the container format. -5. **Recovery-first filesystem rebuild.** CBOR sidecars are the canonical metadata store; the database is a rebuildable query cache. The idempotent `rebuild_index()` (`capsule-core/src/library/rebuild.rs`) walks `.cbor` sidecars and reconstructs the index. *Survives: DB corruption and catastrophic bugs in the index/query layer.* -6. **Content-addressed durability redundancy.** Ciphertext is addressed by the SHA-256 of its bytes, so any byte-identical copy — on another device or a [federated](/design/federation/) peer — is independently verifiable. This is a *durability* path: it restores ciphertext, not keys. *Survives: single-server data loss.* -7. **Trash soft-delete window.** Deletes are soft first — `soft_delete()` / `purge_expired_trash()` (`capsule-core/src/library/trash.rs`) give a reversal window before a hard purge. *Survives: erroneous deletes by a bug or user.* - -**Account-type coverage.** Registered accounts have all seven paths. [Delegated/sponsored accounts](/design/authentication/#account-types) are recovered via the sponsoring account's master key, since their keys derive from it. Non-registered (share-link) accounts hold no collection of their own — recovery is not applicable. - -### Bug-Resistance Invariants - -These cross-cutting properties make recovery robust specifically against *catastrophic bugs*, not just key loss: - -- **The backup path is independent of the MLS ratchet.** Restore never reconstructs ratchet state, so a ratchet bug cannot strand data. The master key — not any ratchet state — is the single backed-up root. -- **Hardware-bound, disposable device keys.** Device keys live inside hardware, are non-exportable, and are never backed up — a lost device is re-bootstrapped, not recovered. -- **Cross-signing (Matrix-style).** The master identity signs every device key; adding a device means an existing device signs it, so losing one device never compromises the account. -- **Every construction is versioned.** KDF `info` strings, in-blob Argon2id parameters, the [`crypto_suite_id`](#versioning-identifiers) on every manifest and metadata blob, and the [`sidecar_schema`](/design/metadata/#sidecar-schema-v1) on every sidecar mean a buggy v2 never strands v1 data — v2 keys and structures coexist with v1 without a flag day. Signature coverage of `crypto_suite_id` defeats downgrade-attempts. -- **`verify_asset` quarantines, never drops.** A bug-produced invalid asset is neither silently dropped nor silently accepted; it is quarantined and surfaced in the audit trail so an operator can tell a bug from an attack. -- **Provenance is append-only.** Each `ProvenanceRecord` carries the hash of its predecessor (`prior_provenance_hash`), and every record is hybrid-signed by the producing device. An attacker holding every *current* key still cannot rewrite a past record without forging an earlier (possibly retired) device's signature — history is read-only. See [Provenance of Library Modifications](#provenance-of-library-modifications). -- **Stale-revival is rejected.** An incoming manifest whose `prior_provenance_hash` is behind the receiver's stored `latest_provenance_hash` is treated as stale and quarantined — a deleted asset cannot be silently resurrected by a peer or a backup restore. The check is enforced both client-side and server-side (no key needed); see [Threat Model](/design/threat-model/). -- **Backup verification runs before reliance.** Preview, dry-run, signature-chain, and AMK-completeness checks (see [Backup Verification](/design/backup-recovery/#backup-verification)) detect an incomplete or broken backup *before* it is needed. - -## Transport Security - -All client-server communication is over HTTPS. While our stack aims to stay PQ-safe (within due course), the transport layer (TLS) must be configured by the server administrator to be PQ-resistant as well. As of writing, the standard is TLS 1.3 with hybrid X25519+ML-KEM key exchange enabled. Since application servers do not terminate TLS, ensure your ingress/reverse proxy is properly configured. - -## Implementation - -- **Centralized audit paths:** All key cryptographic primitives are centralized in `capsule-core/crypto`. Asset acknowledgement goes through the single `verify_asset` chokepoint (see [Write Authorization](#write-authorization)). -- **Contract-driven development:** Define the crypto interfaces, data structures, and the full set of test cases — especially negative cases — before implementing logic. -- **Backward compatibility:** The server stores all data and metadata encrypted; its database model is distinct from the client's and records `crypto_suite_id` and `protocol_version` for every manifest. Old suite ids and protocol versions remain decryptable forever — retiring a primitive adds an inventory row and a new suite id, never edits or removes an old one. Clients outside the server's supported `protocol_version` range are rejected at the [protocol handshake](/design/threat-model/), before any state is written. -- **Trust the server (and only the server) for storage, never for authorization:** The server owns, provisions, and maintains the encrypted user data, so we rely on it to *hold* data — but authorization decisions are verified cryptographically against MLS-distributed keys, never taken on the server's word. -- **Memory hygiene:** All keys and decrypted data are zeroed in memory immediately after use. We also use secure memory allocation where possible to prevent swapping to disk. - -Further guidance: - -- Use audited libraries only — libcrux (formally verified), RustCrypto, ed25519-dalek, x25519-dalek; never be the first serious user. -- Use MLS rather than inventing group crypto; it handles the 1:1 case and shifts the audit burden to the IETF and OpenMLS. -- Keep the backup path independent of the ratchet — album keys live in the backed-up hierarchy, so recovery never reconstructs ratchet state. -- Version every key derivation with an `info` string (`"albums/v1"`, `"asset-file/v1"`) so v2 keys can derive alongside v1 without a flag day. -- Store device private keys in hardware (Secure Enclave, StrongBox, TPM) to eliminate memory-extraction attacks. -- Write test vectors against known implementations (libsignal, OpenMLS, RFC vectors) before writing anything novel. - -### Versioning - -The construction of every encryption metadata structure is always versioned. Parameters (e.g. for Argon2id) must be saved inside the construction to ensure future changes do not break previous constructions. diff --git a/capsule-docs/src/content/docs/design/cryptography/encryption.md b/capsule-docs/src/content/docs/design/cryptography/encryption.md new file mode 100644 index 0000000..1af7bb0 --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/encryption.md @@ -0,0 +1,111 @@ +--- +title: Asset and Metadata Encryption +description: How Capsule encrypts asset bytes and metadata blobs, including streaming and wire formats +--- + +Every asset Capsule stores — original bytes, derivative bytes, metadata blob — is encrypted client-side before it ever crosses a network boundary. The encryption code lives in `capsule-core::crypto::encryption` and is the only place AES-256-GCM is invoked in the codebase. Two constructions live here: + +- **STREAM** for asset bytes (originals + derivatives) — supports streaming, ranged reads, and per-chunk authentication. +- **Standalone AEAD** for metadata blobs — a single contiguous byte string with a fixed wire format. + +The split is intentional: assets are huge and accessed in pieces; metadata blobs are small and always fetched whole. + +## Authenticated Asset Encryption + +Every asset is content-addressed by the SHA-256 of its ciphertext and encrypted with a unique file key. The file key is derived from the appropriate [AMK](/design/cryptography/keys/#album-master-keys-amks); the AMK itself is recoverable from the account's master key (see [Identity-Based Key Derivation](/design/cryptography/keys/#identity-based-key-derivation)). + +### Asset Key Derivation + +Each asset is encrypted with a key derived from a versioned album master key (AMK), distributed and ledgered over MLS (see [MLS](/design/cryptography/mls/)). Capsule never derives a key from the MLS epoch's internal state. + +An album's AMK ledger: + +```rust +Album { + id: UUID, + mls_group: MlsGroup, + keys: [ + AMK_v1: (random 32 bytes, created at album creation), + AMK_v2: (random 32 bytes, created when member X was removed), + AMK_v3: ... + ], + current_version: 3, +} +``` + +The per-file key is derived from the AMK version that encrypted it, using the [KDF](/design/cryptography/primitives/#key-derivation): + +```rust +file_key = HKDF_SHA512( + ikm: AMK_v{amk_version}, + salt: file_id, + info: "asset-file/v1", + length: 32 // 32 bytes for AES-256; HKDF-SHA512 expand truncates safely +) +``` + +AMKs are delivered over MLS application messages. When epoch N's MLS group is established, the creating device sends an `AlbumKeyDistribution { amk_version, amk_bytes }` message through MLS. Every current member's device receives and stores it locally (hardware-wrapped). + +**Distribution lag is expected and is not a failure.** An epoch bump and its `AlbumKeyDistribution` broadcast are separate MLS messages, so during a bump a device can legitimately receive an asset manifest referencing an `amk_version` whose key bytes have not yet arrived. A device that lacks the AMK for an `amk_version` that is otherwise **within the [MLS-attested epoch range](/design/cryptography/keys/#write-authorization)** treats the asset as *pending* — held and retried as MLS state catches up — rather than as a decryption failure or a forged manifest. Only an `amk_version` beyond the MLS-attested epoch, or one still missing after the retry timeout, is escalated. This is the `verify_asset` *pending* outcome and the matching [Failure Modes](/design/cryptography/failure-modes/#failure-mode-catalog) row; it is what keeps a concurrent upload during an epoch bump from being misread as an attack. + +### Encryption Workflow + +Encrypting an asset for upload: + +1. Derive `file_key` from `AMK_v{current}` (above). +2. Generate a random 7-byte `nonce_prefix` from the OS CSPRNG (7 = the 12-byte AES-GCM nonce minus STREAM's 4-byte chunk counter and 1-byte last-chunk flag). +3. Split the plaintext into 65,520-byte chunks and encrypt sequentially with `EncryptorBE32`, producing 64 KiB ciphertext chunks (16-byte tag each); the final chunk is flagged as last. +4. Compute the `ciphertext_hash` incrementally over the produced ciphertext (algorithm fixed by `crypto_suite_id`). +5. Build and sign the [manifest](/design/cryptography/provenance/#asset-manifest) (device signature + write-tier signature). +6. Upload the blob (see [Upload Protocol](/design/import/upload-protocol/)). + +Streaming download / ranged reads: + +- **Sequential:** `DecryptorBE32` consumes chunks in order, verifying each tag. +- **Ranged:** to start at plaintext byte `B`, compute `chunk_index = B / 65,520`. Because the [STREAM construction](#stream-construction) derives each chunk's nonce deterministically, chunk `i` decrypts independently given `file_key` and `i` — the server need only serve that 64 KiB ciphertext chunk, which the client decrypts and verifies. + +### STREAM Construction + +Capsule strictly requires streaming. + +The chosen method is AES-256-GCM with the STREAM construction (Hoang-Reyhanitabar-Rogaway-Vizár, 2015). STREAM splits the file into chunks, encrypts each with AES-GCM using a structured nonce (`prefix || counter || last-chunk-flag`), and guarantees you detect truncation, reordering, and chunk deletion. + +In Rust: the RustCrypto `aead` crate exposes `stream::EncryptorBE32` and `stream::DecryptorBE32` — drop-in. We use a 65,520-byte plaintext chunk → 64 KiB ciphertext chunk. (Note the upload transport's 4 KiB chunk alignment, described in [Upload Protocol](/design/import/upload-protocol/), is a separate concern from this crypto chunk size.) + +## Metadata Encryption + +Not all metadata can be encrypted — some must stay server-readable for routing and preview. The split is deliberate: + +- **Encrypted** (AES-256-GCM under a key derived from the album's AMK, fresh random nonce per blob): the CBOR sidecar / metadata blobs — including the [chromahash LQIP](/design/thumbnails/#lqip) and `dominant_color`, so image-derived display hints never leak to a server that never decodes assets. Each blob is independently versioned and signed like an [asset manifest](/design/cryptography/provenance/#asset-manifest). +- **Server-plaintext by necessity:** `owner_id`, the [ciphertext content hash](/design/cryptography/primitives/), and the ciphertext size — the routing and storage-accounting facts a key-less server needs. This is a deliberate, documented trade-off. +- **AI embeddings** (semantic-search vectors, face embeddings) are sensitive — a user can be re-identified from them. They are kept plaintext *locally* (vector search requires it) but encrypted at rest in the server-side backup. + +CBOR metadata blobs use **deterministic encoding** per the [canonical CBOR ruleset](/design/metadata/#canonical-cbor-encoding) owned by [Metadata](/design/metadata/) — the same byte-exact rules the plaintext sidecar follows, since the metadata blob's plaintext *is* that CBOR document. Because a blob's hash is what content-addresses it and what the [signed manifest](/design/cryptography/provenance/#asset-manifest) commits to, two implementations encoding the same logical metadata must produce byte-identical output — otherwise the hash diverges and the signature fails to verify across [federated](/design/federation/) peers. Conformance to the canonical ruleset is mandatory and is the load-bearing check behind cross-platform and cross-language interop. + +### Metadata Blob Wire Format + +An encrypted metadata blob is a single contiguous byte string. **Implementations MUST produce and consume exactly this layout**, with no framing variations, so two correct implementations can compute identical content hashes byte-for-byte. This wire format is itself the contract: any byte-level deviation breaks cross-peer signature verification. + +```text ++---------------------+---------------------+--------------------------+---------------+ +| crypto_suite_id (2) | nonce (12 bytes) | ciphertext (variable) | tag (16 bytes)| ++---------------------+---------------------+--------------------------+---------------+ +| big-endian u16 | fresh CSPRNG draw | AES-256-GCM(plaintext) | GCM tag | +``` + +- `crypto_suite_id` (2 bytes, big-endian `u16`) — pins the AEAD and KDF used to derive the key. Identical to the field carried inside the manifest (see [Versioning Identifiers](/design/cryptography/primitives/#versioning-identifiers)), and a mismatch with the manifest's value rejects the blob at decode. +- `nonce` (12 bytes) — fresh OS-CSPRNG per blob; never reused, never derived. +- `ciphertext` — the deterministically-encoded CBOR plaintext, sealed with AES-256-GCM under `HKDF-SHA512(ikm=AMK_v{n}, salt=blob_id, info="metadata-blob/v1", length=32)`. +- `tag` (16 bytes) — GCM authentication tag. + +The total blob's `ciphertext_hash` (in the asset's [signed manifest](/design/cryptography/provenance/#asset-manifest)) is computed over the full byte string above — header, nonce, ciphertext, and tag concatenated. + +## Validation + +- **Encrypt-decrypt round-trip** — for both STREAM and standalone metadata AEAD, unit tests that randomized plaintext bytes encrypt and decrypt to themselves. Fixed-vector cases pin the per-primitive parameters. +- **STREAM tamper-detection** — unit tests that mutate each chunk in turn (single bit flip, chunk swap, chunk drop, final-chunk-flag toggle) and assert `DecryptorBE32` rejects. +- **Ranged-read correctness** — unit test that fetching chunk `i` in isolation decrypts to the matching plaintext slice (no off-by-one), and that ranged reads stitched together byte-match a sequential decrypt. +- **Metadata blob wire-format determinism** — cross-language conformance test (Rust ↔ any FFI consumer) that encoding the same logical CBOR map produces byte-identical blobs against the shared [canonical CBOR known-answer vectors](/design/metadata/#canonical-cbor-encoding). This is a **blocking conformance gate**, not advisory: a consumer that drifts cannot be shipped, because its signatures would not verify across peers. +- **Nonce-misuse refusal** — unit test that the metadata-blob writer rejects an attempt to reuse a previously-emitted nonce (defense in depth on top of the CSPRNG fresh-draw rule). + +Wire-format compatibility with the upload protocol is exercised by [Upload Protocol](/design/import/upload-protocol/) smoke tests; this doc's responsibility is the byte-level correctness of the AEAD itself. diff --git a/capsule-docs/src/content/docs/design/cryptography/failure-modes.md b/capsule-docs/src/content/docs/design/cryptography/failure-modes.md new file mode 100644 index 0000000..0b11192 --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/failure-modes.md @@ -0,0 +1,82 @@ +--- +title: Failure Modes and Recovery +description: What can go wrong with Capsule's cryptographic state, and the independent paths that restore it +--- + +Capsule treats loss of data — and loss of the keys that decrypt it — as a first-class concern. This doc catalogues what can go wrong, how each failure is detected or contained, and the redundant, independent paths that restore a user's *entire* asset collection — including after catastrophic software bugs, not just key loss. + +It is a cross-cutting doc by nature: the failure-mode logic lives in many modules (key handling in `capsule-core::crypto::keys`, restore in `capsule-core::backup`, blob durability in `capsule-api`, etc.). The contract this doc owns is the **set of failures the system is required to survive** and the **independent paths that must each remain workable**. The closing [transport security](#transport-security) section is the one piece of crypto config that lives outside the application layer. + +## Failure Mode Catalog + +| Failure mode | Detected / contained by | Recovery path | +| ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| **Master key loss** | — | Master-key escrow (path 1) or cross-device recovery (path 2) | +| **Device key loss** | Device keys are disposable by design | Re-bootstrap from the master key (path 1/2); device keys are never recovered | +| **AMK loss** (album key) | — | OGK escrow (path 3) and the master-key-anchored backup escrow (path 4) | +| **Write-tier key loss** | — | Re-minted and redistributed over MLS at the next epoch; no asset is lost | +| **Master key compromise** | — | Rotate the master key + re-key affected albums — see [Master-Key Compromise](/design/cryptography/keys/#master-key-compromise) | +| **Device compromise** | — | Device revocation certificate + MLS `Remove`; surviving devices rotate group keys | +| **AMK / write-tier compromise** | — | MLS epoch bump mints a fresh AMK and write-tier key; the compromised epoch cannot read or sign future epochs | +| **Server compromise** | Server is never trusted for authorization or plaintext | Authorization is verified against MLS history; data is E2E-encrypted at rest | +| **Classical primitive broken** (Ed25519, X25519) | Hybrid construction | The PQ half (ML-DSA-65 / ML-KEM-768) still holds — confidentiality and authentication survive | +| **PQ primitive broken** (ML-DSA, ML-KEM) | Hybrid construction | The classical half still holds | +| **Ciphertext corruption; chunk truncation, reorder, or deletion** | AES-256-GCM-STREAM per-chunk tags + `ciphertext_sha256` | Re-fetch the blob from a content-addressed copy (path 6) | +| **Reader-signed / removed-writer / wrong-epoch / forged-chain / replayed manifest** | The single [`verify_asset`](/design/cryptography/keys/#write-authorization) chokepoint | Asset is quarantined and surfaced in the [audit trail](/design/cryptography/provenance/#provenance-of-library-modifications) | +| **AMK distribution lag** (manifest cites an in-flight epoch whose key has not yet arrived) | `verify_asset` *pending* outcome — `amk_version` is within the [MLS-attested range](/design/cryptography/keys/#write-authorization) but the [AlbumKeyDistribution](/design/cryptography/encryption/#asset-key-derivation) message is still in transit | Asset is **held and retried** as MLS state catches up; escalated to quarantine only if the key never arrives within the timeout — never misread as a forgery | +| **MLS ratchet corruption or loss** | — | The recovery path is independent of ratchet state (paths 1, 3, 4). State-divergence repair owned by [MLS Resilience](/design/mls-resilience/) | +| **Backup incompleteness** (a referenced `amk_version` missing from the escrow) | Backup verification's AMK-completeness check | Caught before the backup is relied on; re-export | +| **Nonce reuse** | Structurally prevented | STREAM derives per-chunk nonces; metadata blobs draw fresh random nonces; a fresh per-file key lets the STREAM counter start at zero | +| **CBOR non-determinism** breaking cross-peer signature verification | RFC 8949 §4.2 deterministic encoding | Byte-identical re-encoding; the signature verifies | +| **Catastrophic software bug** corrupting the library DB / index | The DB is a rebuildable cache, not a source of truth | Filesystem rebuild from CBOR sidecars (path 5) | +| **Erroneous delete** (bug or user) | Soft-delete is the default | Restore from trash within the retention window (path 7) | +| **Stale-revival attempt** (peer or restore sends an old-but-validly-signed manifest) | `prior_provenance_hash` chain (see [Provenance](/design/cryptography/provenance/#provenance-of-library-modifications)) and matching server-side envelope check (see [Threat Model](/design/threat-model/validation/)) | Manifest is quarantined; chain advance is refused on both client and server | +| **Suite-downgrade attempt** (re-sign a manifest under a weaker `crypto_suite_id`) | Signature covers `crypto_suite_id` and `protocol_version` | Verification fails at `verify_asset`; manifest is quarantined | +| **Derivative poisoning** (buggy or hostile client overwrites a good thumbnail/embedding) | Every derivative carries a [`DerivativeManifest`](/design/cryptography/provenance/#derivative-provenance) on its own chain | Overwrite without a valid manifest is rejected; provenance chain detects an unauthorized replacement | +| **Cross-schema sidecar overwrite** (old client writes back a sidecar after stripping unknown fields) | Sidecar signature covers every byte including unknown fields; old client `refuses to write` when `sidecar_schema` exceeds its max known | Old client cannot strip-and-resign; new client detects schema regression and quarantines | + +## Redundant Recovery Paths + +Restoring a complete asset collection does not depend on any single mechanism. The following paths are **independent** — each annotated with the failures it survives: + +1. **Master-key escrow.** A recovery passphrase or BIP39-style seed unwraps the server-side escrow blob → account master key → AMK escrow → every asset. *Survives: total device loss.* See [Master-Key Escrow](/design/backup-recovery/#master-key-escrow). +2. **Cross-device recovery.** Any signed-in device re-bootstraps a new device over a verified channel. *Survives: partial device loss, and loss of the master-key backup — as long as one device survives.* The first-device flow is owned by [Device Enrollment](/design/device-enrollment/). +3. **Owner Group Key (OGK).** Any current member of the [owner set](/design/cryptography/keys/#owner-group-keys-ogks) recovers every album's AMK versions, independent of album membership. *Survives: lost album membership, gaps in AMK distribution over MLS.* +4. **Portable backup artifact.** A self-describing, versioned, encrypted archive, stored offline. *Survives: server data loss, account compromise, escrow-blob corruption.* See [Backup Artifact](/design/backup-recovery/#backup-artifact) for the container format. +5. **Recovery-first filesystem rebuild.** CBOR sidecars are the canonical metadata store; the database is a rebuildable query cache. The idempotent `rebuild_index()` (`capsule-core::library::rebuild`) walks `.cbor` sidecars and reconstructs the index. *Survives: DB corruption and catastrophic bugs in the index/query layer.* +6. **Content-addressed durability redundancy.** Ciphertext is addressed by the SHA-256 of its bytes, so any byte-identical copy — on another device or a [federated](/design/federation/) peer — is independently verifiable. This is a *durability* path: it restores ciphertext, not keys. *Survives: single-server data loss.* +7. **Trash soft-delete window.** Deletes are soft first — `soft_delete()` / `purge_expired_trash()` (`capsule-core::library::trash`) give a reversal window before a hard purge. *Survives: erroneous deletes by a bug or user.* + +**Account-type coverage.** Registered accounts have all seven paths. [Delegated/sponsored accounts](/design/authentication/#account-types) are recovered via the sponsoring account's master key, since their keys derive from it. Non-registered ([share-link](/design/share-links/)) accounts hold no collection of their own — recovery is not applicable. + +## Bug-Resistance Invariants + +These cross-cutting properties make recovery robust specifically against *catastrophic bugs*, not just key loss: + +- **The backup path is independent of the MLS ratchet.** Restore never reconstructs ratchet state, so a ratchet bug cannot strand data. The master key — not any ratchet state — is the single backed-up root. +- **Hardware-bound, disposable device keys.** Device keys live inside hardware, are non-exportable, and are never backed up — a lost device is re-bootstrapped, not recovered. +- **Cross-signing (Matrix-style).** The master identity signs every device key; adding a device means an existing device signs it, so losing one device never compromises the account. +- **Every construction is versioned.** KDF `info` strings, in-blob Argon2id parameters, the [`crypto_suite_id`](/design/cryptography/primitives/#versioning-identifiers) on every manifest and metadata blob, and the [`sidecar_schema`](/design/metadata/#sidecar-schema-v1) on every sidecar mean a buggy v2 never strands v1 data — v2 keys and structures coexist with v1 without a flag day. Signature coverage of `crypto_suite_id` defeats downgrade-attempts. +- **`verify_asset` quarantines, never drops.** A bug-produced invalid asset is neither silently dropped nor silently accepted; it is quarantined and surfaced in the audit trail so an operator can tell a bug from an attack. +- **Provenance is append-only.** Each `ProvenanceRecord` carries the hash of its predecessor (`prior_provenance_hash`), and every record is hybrid-signed by the producing device. An attacker holding every *current* key still cannot rewrite a past record without forging an earlier (possibly retired) device's signature — history is read-only. See [Provenance](/design/cryptography/provenance/#provenance-of-library-modifications). +- **Stale-revival is rejected.** An incoming manifest whose `prior_provenance_hash` is behind the receiver's stored `latest_provenance_hash` is treated as stale and quarantined — a deleted asset cannot be silently resurrected by a peer or a backup restore. The check is enforced both client-side and server-side (no key needed); see [Threat Model](/design/threat-model/validation/). +- **Backup verification runs before reliance.** Preview, dry-run, signature-chain, and AMK-completeness checks (see [Backup Verification](/design/backup-recovery/#backup-verification)) detect an incomplete or broken backup *before* it is needed. + +## Transport Security + +All client-server communication is over HTTPS. While Capsule's stack aims to stay PQ-safe (within due course), the transport layer (TLS) must be configured by the server administrator to be PQ-resistant as well. As of 2026, the standard is **TLS 1.3 with hybrid X25519+ML-KEM key exchange** enabled. Since application servers do not terminate TLS, ensure the ingress/reverse proxy is properly configured. + +This is the one piece of cryptographic configuration that lives outside the application layer — the application code cannot enforce it, only document the requirement. + +## Validation + +The failure-mode catalog itself is the verification spec: each row must have an executable test that exercises both the *detection* (the catalog's middle column) and the *recovery* (the right column). + +- **Per-recovery-path scenarios** — seven smoke tests, one per path. Each takes a library to a "lost" state corresponding to the path's *Survives* annotation, runs the recovery, and asserts every asset is recoverable. The tests share fixtures from the [Keys](/design/cryptography/keys/) and [Provenance](/design/cryptography/provenance/) test surfaces. +- **Bug-resistance invariant checks** — unit-test surface that asserts each invariant holds structurally: + - The backup-artifact format does not embed MLS ratchet state (assert by inspecting an exported artifact). + - Device private keys cannot be exported (asserted per-platform in the [Keys](/design/cryptography/keys/#validation) hardware smoke). + - A v2 client can read a v1 sidecar and write a v2 sidecar that a v1 client still validates as v1 (cross-version round-trip). +- **Catalog completeness** — a CI check that every row in the catalog has at least one referenced test. Adding a row without a test is a structural error, not a TODO. + +The full bounded recovery surface — including which paths must be exercised end-to-end across the full system — is in [Module Map — E2E Test Surface](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/cryptography/index.md b/capsule-docs/src/content/docs/design/cryptography/index.md new file mode 100644 index 0000000..ef00ccb --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/index.md @@ -0,0 +1,33 @@ +--- +title: Cryptography +description: Capsule's cryptographic stack — the entry point to the sub-docs +--- + +Cryptography in Capsule is everything that makes E2EE work over an asset-heavy, sync-heavy workload. The choices and constructions are split across focused sub-docs because each is implementable and testable on its own, but they share one home: every primitive and key-handling routine lives in `capsule-core::crypto`, so every client and the server's no-key envelope-validation path use exactly the same code. + +## End-to-End Model in Layers + +Capsule's E2E security stacks four layers, each owned by its own sub-doc: + +- **Identity** — per-device keys, cross-signed by a user master identity. See [Keys](/design/cryptography/keys/). +- **Group membership** — one MLS group per shared album; each device is a leaf. See [MLS](/design/cryptography/mls/). +- **Asset encryption** — bulk AEAD per file, keyed via the album-scoped KDF. See [Encryption](/design/cryptography/encryption/). +- **Metadata encryption** — bulk AEAD per metadata blob, keyed the same way. (No streaming construction; metadata is fetched whole.) See [Encryption](/design/cryptography/encryption/). + +## Sub-docs + +| Sub-doc | Owns | +| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | +| [Primitives](/design/cryptography/primitives/) | **SSoT primitives inventory** — every hash, KDF, AEAD, signature scheme, KEM, ciphersuite, and version identifier | +| [Keys](/design/cryptography/keys/) | Key hierarchy (master, device, AMK, write-tier), account-type key derivation, device directory, rotation/revocation | +| [MLS](/design/cryptography/mls/) | Group membership protocol, ciphersuite binding, history delivery, FS/PCS | +| [Encryption](/design/cryptography/encryption/) | Asset AEAD (AES-256-GCM-STREAM), metadata AEAD, deterministic CBOR encoding, wire formats | +| [Provenance](/design/cryptography/provenance/) | Signed manifests, append-only provenance chains, derivative provenance | +| [Failure Modes](/design/cryptography/failure-modes/) | Failure-mode catalog, 7 independent recovery paths, bug-resistance invariants, transport security | + +## Implementation Posture + +- **Centralized.** All cryptographic primitives, key handling, and `verify_asset` live in `capsule-core::crypto`. There is no per-platform divergence in what gets verified or how — only in where keys are physically held. +- **Audited libraries only.** libcrux (formally verified), RustCrypto, ed25519-dalek, x25519-dalek, OpenMLS. Capsule is never the first serious user of a primitive's implementation. +- **Memory hygiene.** Decrypted bytes and key material are zeroed on drop; secure-allocation is used where the platform supports it, to prevent swap leaks. +- **Trust the server for storage, never for authorization.** The server holds opaque ciphertext and key-free index facts. Every authorization is verified against MLS-distributed material; a server's assertion of access is never sufficient. diff --git a/capsule-docs/src/content/docs/design/cryptography/keys.md b/capsule-docs/src/content/docs/design/cryptography/keys.md new file mode 100644 index 0000000..898f9e8 --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/keys.md @@ -0,0 +1,188 @@ +--- +title: Key Management +description: Capsule's key hierarchy, device coordination, and write authorization +--- + +Capsule's keys form a single hierarchy with one backed-up root. The hierarchy is implemented in `capsule-core::crypto::keys`; hardware-bound storage adapters (Secure Enclave, StrongBox/Keystore, TPM) live in per-platform glue under `capsule-sdk::hardware-keys`. + +- The **account master key** is the only key that is escrowed/backed up. It does not encrypt assets directly. Its job is to (1) wrap the per-device identity private keys and (2) anchor the encrypted backup that escrows album keys. +- **Device keys** are hardware-bound, non-exportable, and therefore disposable — a device is re-bootstrapped from the master key rather than recovered. +- **Album keys** (AMKs) are random per-epoch keys ledgered in MLS, escrowed both in the master-key backup and in the [Owner Group Keys](#owner-group-keys-ogks). + +The guiding rule is **the backup path is independent of the MLS ratchet**, so losing every device but holding the recovery passphrase still restores every photo. We deliberately avoid the Matrix failure mode where undecryptable content is routine. See [Failure Modes](/design/cryptography/failure-modes/). + +## Key Chain + +The account master key does **not** derive album keys — albums are MLS groups with random AMKs. The master key's role is to wrap device identity keys and to anchor the encrypted backup that escrows AMKs: + +```plaintext +account_master_key (backed up — see Failure Modes) + ├─ wraps device identity private keys (IK / DSK / DEK private halves) + └─ anchors the encrypted backup that escrows: + AMK_v{n} (random 32 bytes, per album, minted per MLS epoch) + └─ HKDF-SHA512(ikm=AMK_v{n}, salt=file_id, info="asset-file/v1") → 32-byte AES file key + └─ AES-256-GCM-STREAM +``` + +Construction rules (consistent with the [KDF choice](/design/cryptography/primitives/#key-derivation)): + +- Always include a version string in `info` so the KDF can be rotated later. +- Salt with something unique per scope (`album_id`, `file_id`) — never reuse salts across scopes. +- The 512-bit KDF output is truncated to 32 bytes for the AES-256 file key. +- Each file gets a fresh derived key, so the STREAM nonce can safely start at zero per file. + +The master key also derives one **identifier** — the [default album](/design/organization/#the-default-album)'s `album_id`, via HKDF with a dedicated `info` label — so any device can recompute which album is the de facto default from the master key alone, even after recovery. This derives an *ID*, not a key: the default album is an ordinary album with a random per-epoch AMK like any other. + +Per-album AMKs are escrowed in the server-side encrypted backup (see [Backup and Recovery](/design/backup-recovery/)) and the [OGK](#owner-group-keys-ogks) — not derived from MLS ratchet state — so losing all devices but holding the recovery passphrase still restores photos. Ratchet keys are expected to be ephemeral. + +## Key Generation + +All key generation happens client-side, drawn from the [OS CSPRNG](/design/cryptography/primitives/#randomness). The scheme is PQ-safe ("post-quantum"): classical + PQ primitives combined so that breaking either alone does not break security. + +### User Identity Keys (User IKs) + +A User IK is generated once per user ever, and lives forever (or until account compromise). This is the root of trust and signs everything below it. It is always verified out-of-band or via safety numbers. + +A User IK is a hybrid **Ed25519 + ML-DSA-65** signing keypair generated entirely on the client at account creation. The private halves are wrapped under the [account master key](#registered-accounts) and never leave the client in the clear; the public halves are published in the signed [device directory](#device-directory). + +Revocation is a global account reset — irreversible, non-recoverable, nuclear. It is published as a separate revocation certificate, hybrid-signed by the IK itself, to a well-known location so clients can check for it. + +### Device Keys + +Each device's keys are cross-signed into the [device directory](#device-directory) by the user's IK: + +1. **DSK** (Device Signing Key): hybrid **Ed25519 + ML-DSA-65**. +2. **DEK** (Device Encryption Key): hybrid **X25519 + ML-KEM-768**. + +Both are signed by the IK (hybrid signature). Device private keys are **generated inside and never leave hardware** — Secure Enclave (iOS), StrongBox/Keystore (Android), TPM (desktop) — and are non-exportable. Because they cannot be backed up, devices are treated as disposable: a lost device is removed and a new one re-bootstrapped from the master key. + +A device key can be revoked without affecting the user's identity or other devices. Revocation is done by signing a revocation statement with the IK and publishing it to a well-known location. The server then refuses to deliver new key wraps to that device, and remaining devices rotate any group keys the revoked device had access to. The revoked device's directory entry is **retained** — marked with `revoked_at` (RFC3339), never deleted — so the manifests it signed *before* revocation stay verifiable forever (provenance is append-only; see [Provenance](/design/cryptography/provenance/#what-an-attacker-with-all-current-keys-still-cannot-do)). + +### Owner Group Keys (OGKs) + +Assets' `owner_id` maps to a set of users; treat each owner as an MLS group. + +- **Type:** Symmetric AES-256 root key of an MLS group whose members are the owner's user set. +- **Purpose:** A recovery/escrow layer. The OGK does **not** wrap individual file keys. Instead, it escrows every album's [AMK versions](#album-master-keys-amks), so any current owner member can recover every album key — and therefore every asset — independent of album membership. This avoids double-wrapping each file while still guaranteeing the owner never loses access. +- **Epoch:** Bumps on any owner-set change. Every member's client commits to MLS, producing a new OGK; the server stores the welcome/commit messages. +- **Revocation:** Remove a user from the owner set → MLS Remove proposal → new epoch → the removed user's device can no longer derive future OGKs and is dropped from future AMK escrow. + +### Album Master Keys (AMKs) + +Each album is its own MLS group. Members = users with any permission on the album. + +- **Type:** Random 32-byte symmetric key, minted per epoch. AMKs are *not* derived from MLS epoch state (which is complicated at edge cases) — they are random keys distributed *over* MLS application messages and ledgered. + +Capsule separates **secrecy** (enforced by encryption) from **authorization** (enforced by signatures). One content key plus two signing capabilities, to minimize the surface that can leak: + +- **`AMK` — the content key.** Read access. MLS delivers it to *all* album members. Holding it lets you decrypt; not holding it means you cannot. +- **Write capability — a per-epoch write-tier signing keypair.** A **hybrid Ed25519 + ML-DSA-65** keypair (see the [signature scheme](/design/cryptography/primitives/#signature-scheme); both halves must verify). Distributed via MLS to writers only, used to sign [asset manifests](/design/cryptography/provenance/#asset-manifest). It rotates with the AMK epoch, so a removed writer cannot sign for future epochs. This is authorization, not secrecy — see [Write Authorization](#write-authorization). +- **Admin capability — an admin-tier signing keypair.** Also **hybrid Ed25519 + ML-DSA-65**. Distributed to admins only; used to sign MLS membership commits. + +Epoch bump triggers: member add/remove, permission change, scheduled rotation (e.g., every 30 days for long-lived albums). + +## Write Authorization + +A device signature on an [asset manifest](/design/cryptography/provenance/#asset-manifest) proves *which device* produced an asset — but not that the device was *authorized to write* to that album at that time. The server is **not trusted for authorization**: it could replay, reorder, or surface an asset signed by a reader-only device, a removed writer, or a device acting outside its write window. A bug could also produce such an asset. Both must be rejected robustly, with the verification logic kept small enough to be hard to get wrong. + +This is **the** contract every consumer of `capsule-core::crypto` depends on. It is invoked from import, sync, federation, peering, and backup-restore — anywhere an asset enters the local trusted set. + +- **Epoch-bound write proof.** Every asset manifest carries, in addition to the device DSK signature, a signature under the album's **per-epoch write-tier signing key**. Only writers at that epoch hold that key. The manifest's `amk_version` identifies the epoch. +- **Authorization authority is MLS history, not the server.** The client verifies the write-tier signature against the write-tier public key it learned for that epoch *from MLS* — the album's MLS commit chain (admin-signed) is the sole authority on who could write when. A server-asserted authorization is never sufficient. +- **Epoch ceiling is MLS-attested, not server-asserted.** The monotonic `amk_version` ceiling a client enforces is derived from the album's admin-signed [MLS commit chain](/design/cryptography/mls/#membership-operations) — the same authority that admits writers — never from the server's stored counter. A brand-new client learns the current epoch from the MLS group state delivered in its [Welcome](/design/cryptography/mls/#history-delivery-for-new-joiners), then rejects any manifest whose `amk_version` exceeds that MLS-attested epoch, so a server can neither fabricate a future epoch nor rewind to an old one and have a client honor it. The server's own no-key monotonicity check (invariant 18 in [Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants)) is a structural backstop, not the authority — it stops a *client* from skipping epochs, while MLS stops the *server* from fabricating them. +- **What this accepts vs. rejects.** An asset signed by a writer who was *later* removed is still acknowledged — it was valid when written, and nothing after removal un-seeds it. An asset signed at an epoch where the signer lacked write capability is **rejected**: an attacker (or a buggy/colluding server) cannot produce a valid write-tier signature for an epoch they were not a writer in. +- **Backdating buys nothing.** Ordering and authorization ride the provenance hash-chain and the `amk_version` epoch, never the self-asserted `timestamp` (see the timestamp note below). The "pre-sign backdated assets, then upload them after removal" attack therefore fails on the *epoch*, not the clock: a manifest must carry a `write_sig` for the epoch it claims, and a removed writer holds no write-tier key for any epoch past their removal. Anything they upload afterward either names an old epoch the chain has already advanced beyond (rejected by the monotonic `amk_version` + chain-head checks) or a new epoch they cannot sign for. +- **Single verification chokepoint.** All of this lives in one `verify_asset(manifest, ciphertext, mls_state)` function in `capsule-core::crypto`. It is the only path by which a client acknowledges an asset, and per [contract-driven development](/design/principles/) it ships with exhaustive negative test cases: reader-signed, removed-writer, wrong-epoch, forged certificate chain, replayed manifest. +- **Defensive failure handling.** A verification failure is *never* silently dropped and *never* silently accepted. The asset is quarantined and surfaced in the [provenance/audit trail](/design/cryptography/provenance/#provenance-of-library-modifications) so an operator can distinguish a bug from an attack after the fact. This bounds the blast radius of an implementation bug. +- **Transient vs. terminal outcomes.** `verify_asset` returns one of three outcomes, not two: **accept**, **terminal-reject** (reader-signed, removed-writer, wrong-epoch, forged chain, suite-downgrade → quarantined as above), and **pending**. *Pending* is the narrow, recoverable case where the manifest's `amk_version` is within the MLS-attested epoch range but the corresponding AMK content key has not yet arrived over the in-band [AlbumKeyDistribution](/design/cryptography/encryption/#asset-key-derivation) message (an epoch bump whose key broadcast is still in flight). A pending asset is **held and retried** as MLS state catches up — never quarantined as forged and never accepted unverified — until the key arrives or a configurable timeout elapses, after which it escalates to a surfaced quarantine. This distinction stops an in-flight epoch bump from flagging honest concurrent uploads as attacks; see [Failure Modes](/design/cryptography/failure-modes/#failure-mode-catalog). +- **Downgrade-resistant.** Both signatures cover `crypto_suite_id`, `protocol_version`, and `prior_provenance_hash`. A manifest cannot be silently re-signed under a weaker suite or back-dated onto a different chain position without breaking either signature; an attempt is rejected at the same `verify_asset` chokepoint. +- **Timestamp is audit-only.** A manifest's `timestamp` is the client's self-asserted capture/write time and is **never** load-bearing for authorization or ordering — those ride the epoch and the chain above. The server stamps its own trusted `received_at` in the [server-visible envelope](/design/filesystem/server/#postgresql-what-the-server-knows) as the authoritative wall-clock for any time-based policy (retention, rate limits); the client `timestamp` is preserved verbatim for display and audit. A server-side *sanity* bound on `timestamp` is a gross-drift guard for honest clients, **not** an authorization control — it surfaces a wildly-wrong clock rather than silently distorting the audit trail. Grammar owned by [Threat Model — Schema Rules](/design/threat-model/schema-rules/) and mirrored in [Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants). + +## Device Directory + +Each user publishes a signed device directory. Other users (and federated peers) read it to learn which devices belong to whom and which public keys to trust. + +```rust +DeviceDirectory { + user_id, + directory_version: u64, // monotonic; +1 on every change (add, revoke, rotate) + updated_at: RFC3339, + devices: [ + { device_id, ed25519_pk, mldsa_pk, key_package_ref, added_at, revoked_at, signed_by_master }, + ... + ], + signature: Hybrid(master_ed25519, master_mldsa) // covers directory_version + updated_at +} +``` + +When Alice's device A1 adds Bob to an album, it fetches Bob's directory, verifies the hybrid signature against Bob's published master identity, and adds all Bob's listed devices. Alice's other devices (A2, A3) see the MLS commit and update local state — MLS handles idempotent application of commits, so this just works. + +Concurrent edits (A1 and A2 trying to add different people simultaneously) are handled by MLS's proposal/commit ordering — one wins, the other re-proposes on top. OpenMLS exposes this. + +**Monotonic version (anti-rollback).** The directory is the trust anchor every peer reads to learn which device keys are current, so a server that could silently serve a *stale* directory — one that still lists a revoked device, or omits a freshly-added one — would undo a revocation or hide a device. `directory_version` closes this: it is master-signed and **strictly monotonic**, and every reader (local client, [federated](/design/federation/) peer, [peering](/design/peering/) handshake) caches the highest version it has seen per user and **refuses a directory whose `directory_version` is below that high-water mark**, surfacing the regression rather than applying it. A reader with no cached version trusts-on-first-use and pins from there. This makes a revocation durable: once a peer has seen the post-revocation directory, the server cannot walk it back. The check is the directory-layer counterpart of the [stale-revival defense](/design/import/download-sync/#stale-revival-detection) for manifests, and is enforced as a [client-](/design/threat-model/validation/#client-side-validation-invariants) and [server-side](/design/threat-model/validation/#server-side-validation-invariants) invariant. + +The directory entry's `added_at` field is what blocks the damage scenario where a new device claims its key is older than the account itself: a server rejects an upload from a device whose `added_at` postdates the manifest's `timestamp`. See [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants). + +## Identity-Based Key Derivation + +Since all assets are encrypted via keys ultimately recoverable from an account's master key, identity keys are encapsulated differently depending on the [account type](/design/authentication/#account-types). + +### Registered accounts + +Most users have their own unique master key. It is **generated client-side** at account creation from the OS CSPRNG. The server never holds the naked master key. Each device stores its own copy wrapped under that device's DEK; a new device obtains the master key either via [cross-device recovery](/design/backup-recovery/#recovery-mechanisms) or by unwrapping the [encrypted server-side backup](/design/backup-recovery/#master-key-escrow) with the recovery passphrase. + +### Delegated/Sponsored accounts + +A sponsored account is anchored under the sponsor's master key but holds its own encryption keys. The mechanism — and the only sound way to revoke — is: + +1. **Per-sponsoree KEK.** When a sponsor creates a sponsored account, the sponsor draws a fresh 32-byte **sponsoree KEK** from the CSPRNG (it is *not* derived from the master key — a deterministic derivation would be reproducible by the sponsor at any future point, defeating revocation). The KEK is wrapped under the sponsor's master key and stored in the sponsor's escrowed hierarchy. +2. **Sponsoree key material.** The sponsoree's own identity, device, and album keys are generated normally. Their private halves are wrapped under the sponsoree KEK rather than directly under the sponsor's master key, so the sponsor can re-wrap or destroy a single sponsoree's keys without touching its own or the other sponsorees'. +3. **Shared-asset access.** Sponsorees gain access to a sponsor's shared albums via ordinary MLS membership (the sponsoree's devices are added as MLS leaves in the sponsor's album groups). The KEK is *not* a content key — it only wraps the sponsoree's private keys. +4. **Revocation.** Revocation is a three-step operation, all signed by the sponsor's IK: + - **Rotate** the sponsoree KEK: draw a new KEK, re-wrap surviving sponsorees if any, drop the old KEK. + - **Publish** an IK-signed revocation certificate naming the revoked sponsoree's identity and the timestamp. + - **Remove** the revoked sponsoree's devices from every MLS group they were a member of (album groups, owner group) via the standard MLS Remove flow, bumping AMK epochs. + +The sponsor's *own* master key is untouched by any sponsoree revocation. The published revocation certificate is what clients and [federated](/design/federation/) peers check to refuse traffic from a revoked sponsoree. + +#### Damage bound under sponsor compromise + +A compromised sponsor holds every sponsoree's KEK, which wraps that sponsoree's private identity and device keys. It can therefore impersonate a sponsoree's device — forge its `device_sig`, append to or rewrite the sponsoree's [provenance history](/design/cryptography/provenance/#provenance-of-library-modifications), and write under the sponsoree's albums. Unlike a registered account — whose past records are protected even against a full current-key compromise because retired device keys are hardware-bound and non-recoverable — a sponsoree's history is **not** independent of its sponsor. This is inherent to delegation and is **bounded, not eliminated**: + +- **The trust is explicit and directional.** A sponsoree's integrity is, by construction, only ever as strong as its sponsor; its keys derive from the sponsor's hierarchy and it is never promised independence. A sponsoree is the right model only when that trust already holds (a family member, a managed device). +- **The blast radius stops at the sponsor's own sponsorees.** Per-sponsoree KEKs (step 1) mean a compromise cannot cross to a *different* sponsor's sponsorees, and registered users an album is shared with verify the sponsoree's published [device directory](#device-directory) and provenance like any peer — they are unaffected. +- **Revocation is clean.** The IK-signed revocation certificate (step 4) cuts off a sponsoree, and federated peers refuse a revoked sponsoree's traffic. +- **Escape hatch.** A user who needs provenance integrity that survives sponsor compromise must hold a **registered account** with hardware-bound device keys that are not derivable from any sponsor KEK. Sponsored accounts deliberately trade that independence for managed simplicity; the choice is the user's to make at account creation. + +### Non-registered accounts + +**Reading.** Since key management operates at the user level, userless [share links](/design/share-links/) are handled distinctly. We encapsulate the decryption keys around the secret stored in the link. The owner can optionally attach a password, in which case the [password-based KDF](/design/cryptography/primitives/#password-based-kdf) adds a second encapsulation layer on top of the link secret. + +**Writing.** Writing is **not supported** for non-registered accounts. Every uploaded asset must be encrypted under an album key and signed with a write-tier key; a non-registered user has neither a device encryption key (DEK) nor a place in any album's MLS group, so it cannot produce a valid [asset manifest](/design/cryptography/provenance/#asset-manifest). Supporting guest uploads would require an ephemeral link-scoped key hierarchy; this is a deliberate non-goal to keep the design simple. + +## Key Rotation and Revocation + +- **Master key rotation.** The master key can be replaced at will. Rotation re-wraps the key hierarchy (device-key wraps and the AMK escrow blob) under the new master key; the old master key is retained only long enough to complete the re-wrap, then discarded. Existing signed-in sessions hold device and derived keys directly and are **unaffected** — they keep working through the rotation. +- **Device revocation.** Handled via the [device key](#device-keys) revocation certificate plus an MLS `Remove` for that device's leaves (see [MLS — Membership Operations](/design/cryptography/mls/#membership-operations)). +- **Album-member revocation.** Handled by an MLS `Remove` and an AMK epoch bump (see [MLS — Membership Operations](/design/cryptography/mls/#membership-operations)). + +### Master-Key Compromise + +The hierarchy deliberately wraps *downward* — the master key wraps device identity keys and anchors the AMK escrow; it never wraps individual file keys. AMKs are not derived from it, and compartmentalization is per album (one AMK lineage per album). This is what makes recovery sound: it is **not** inverted to wrap AMKs under the hardware-bound device keys, because device keys are non-exportable and disposable — wrapping AMKs under them would forfeit the [recovery-first guarantee](/design/cryptography/failure-modes/) that holding the recovery passphrase restores every photo after every device is lost. + +A suspected master-key compromise is therefore recovered in two moves, not a hierarchy redesign: + +1. **Rotate the master key** (above), re-wrapping the hierarchy so the attacker's copy no longer unwraps current device or escrow material. +2. **Re-key the affected albums** — an MLS epoch bump per album mints fresh AMKs and write-tier keys (see [MLS Resilience — Group re-keying](/design/mls-resilience/#group-re-keying-ceremony)), so every future write uses keys the attacker never held. + +Ciphertext the attacker already exfiltrated under the old AMKs stays readable to them — inherent to any E2EE system once keys leak — but the blast radius is bounded to the albums whose AMKs were exposed, and all forward writes are clean. + +## Validation + +- **Derivation determinism** — unit tests assert that HKDF over the same `(AMK, salt, info)` produces byte-identical output, across platforms (no endianness drift, no truncation differences). Cross-checked against RFC 5869 test vectors. +- **Hardware-bound storage round-trip** — per-platform smoke harness: generate a DSK inside the Secure Enclave / StrongBox / TPM, sign a fixed payload, verify the signature against the published public key. Non-exportability is asserted by attempting to read the private bytes and confirming failure. +- **Rotation ceremony** — smoke test the full master-key rotation against a real wrapped escrow blob: rotate, re-unwrap with the new passphrase, confirm every device wrap and AMK escrow entry is recoverable. +- **`verify_asset` negative cases** — exhaustive unit-test surface owned by [Provenance](/design/cryptography/provenance/). Covers reader-signed, removed-writer, wrong-epoch, forged certificate chain, replayed manifest, suite-downgrade. Reuses the AMK + write-tier key fixtures defined here. +- **`verify_asset` pending outcome (unit).** Present a manifest whose `amk_version` is within the MLS-attested epoch range but whose AMK key is not yet locally held; assert the outcome is *pending* (held + retried), not a quarantine. Deliver the key; re-run; assert acceptance. Push the `amk_version` past the MLS-attested epoch; assert terminal-reject. +- **Directory monotonicity (unit).** Cache a `DeviceDirectory` at `directory_version = N`; present one at `N-1` (e.g. still listing a revoked device); assert it is refused and surfaced, not applied. Present `N+1`; assert acceptance and high-water-mark advance. +- **Write-tier signature is hybrid (unit).** Verify a `write_sig` with only the Ed25519 half valid and the ML-DSA half corrupted; assert rejection (both halves required), mirroring the `device_sig` hybrid check. diff --git a/capsule-docs/src/content/docs/design/cryptography/mls.md b/capsule-docs/src/content/docs/design/cryptography/mls.md new file mode 100644 index 0000000..567d1ef --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/mls.md @@ -0,0 +1,89 @@ +--- +title: MLS Group Membership +description: How Capsule binds MLS (RFC 9420) to its identity layer and uses it for album membership +--- + +Capsule's group layer is the [MLS ciphersuite](/design/cryptography/primitives/#mls-ciphersuite) from the inventory. It is implemented in `capsule-core::crypto::mls` as a thin wrapper over OpenMLS — the wrapper is what binds MLS to Capsule's identity layer ([Keys](/design/cryptography/keys/)) and to the in-band AMK distribution. + +The ciphersuite's choice of [ChaCha20-Poly1305](/design/cryptography/primitives/#mls-control-aead) (rather than the [AES-GCM](/design/cryptography/primitives/#bulk-aead) used for user data) is acceptable because: + +- It only protects MLS's own control messages (kilobytes of membership and key data, not your photos). +- ChaCha20-Poly1305 is one of the two most-audited AEADs in existence. +- The alternative is a classical-only MLS ciphersuite plus a hand-rolled PQ retrofit — exactly the custom crypto we are trying to avoid. + +One follow-on: MLS binds LeafNode signatures to Ed25519 in this suite, so the ML-DSA half of the [hybrid signature scheme](/design/cryptography/primitives/#signature-scheme) lives at the **identity layer** — identity certificates sign the Ed25519 MLS key with both Ed25519 and ML-DSA, and peers verify both before accepting a device into a group. This keeps MLS pure while preserving PQ authentication end-to-end. + +For the broader principle of preferring MLS over custom group crypto: it handles the 1:1 case, shifts the audit burden to the IETF and OpenMLS, and gives forward secrecy + post-compromise security ([below](#forward-secrecy--post-compromise-security)) as a property of the ratchet rather than something Capsule has to reinvent. + +## Membership Operations + +The four lifecycle ceremonies the wrapper exposes. Each is an idempotent entry-point: replaying the same proposal produces the same group state (MLS commits are ordered by the chain, and OpenMLS rejects duplicates at the protocol layer — see [Threat Model — Idempotency Invariants](/design/threat-model/validation/#idempotency-invariants)). + +### Add user Bob to album + +1. Fetch Bob's [device directory](/design/cryptography/keys/#device-directory) (list of his devices with KeyPackages published to the server). +2. MLS `Add` proposal + `Commit` adding all Bob's devices as leaves. +3. The `Welcome` message to Bob's devices carries current `AMK_v_current` as a Welcome extension. +4. If full history is desired (usually yes for shared albums), also include `AMK_v1..AMK_{current-1}` in the Welcome — Bob's devices can now decrypt everything. +5. If only post-join history, omit older AMKs — Bob sees only future photos. + +### Remove user Charlie + +1. MLS `Remove` proposal + `Commit` removing all Charlie's devices. +2. MLS advances to a new epoch; Charlie's devices can no longer read MLS traffic. +3. Committer generates fresh `AMK_v{current+1}` and broadcasts via MLS to remaining members. +4. All future photo uploads use `AMK_v{current+1}`. +5. Charlie retains `AMK_v1..current` locally, so he can still decrypt photos he *already had access to* — correct behavior (he already had those photos; nothing you do after removal un-seeds them). But new uploads are invisible to him. + +### Add new device for existing member + +1. Alice's existing device adds Alice's new device as a leaf in the MLS group. +2. Welcome carries all AMK versions Alice is entitled to. +3. New device is now equivalent to Alice's other devices. + +For first-device enrollment (a brand new account with no other device), see [Device Enrollment](/design/device-enrollment/). + +### Remove lost device + +1. Any of the user's remaining devices issues MLS `Remove` for the lost device. +2. Treat like a removal above — bump AMK version, since you must assume the lost device's keys are compromised. + +## History Delivery for New Joiners + +The one spot where the wrapper writes real custom code. Two patterns: + +**Full history (recommended for shared albums):** Welcome message carries an encrypted blob of `[AMK_v1, AMK_v2, ..., AMK_current]`. The new joiner decrypts all, can now read every photo. + +**Capped history (e.g., last 90 days):** Only include AMKs corresponding to epochs ≥ threshold. Older photos remain visible but not decryptable — show placeholders. + +Matrix supports both; most photo-sharing products default to full history. **Capsule fixes the policy per album**, not per add: `history_policy` is part of the album's MLS metadata, set at album creation (full history is the default for shared albums; capped history is the opt-in). Every `Add` into that album applies the album's declared policy, so a member's history visibility never depends on which device added them or in what order — eliminating the divergence where the same user ends up able to decrypt different ranges on different devices. Changing an album's `history_policy` is an [album upgrade ceremony](/design/versioning/#album-upgrade-ceremony), never an ad-hoc per-add decision. + +**Epoch ceiling on join.** The Welcome's commit chain is also the joiner's authority on the album's *current* epoch: the new member adopts the highest `amk_version` the admin-signed chain attests as its monotonic ceiling and rejects any later manifest claiming a higher epoch. This is what lets a brand-new client enforce `amk_version` monotonicity without trusting the server's counter (see [Write Authorization](/design/cryptography/keys/#write-authorization)). + +## Forward Secrecy & Post-Compromise Security + +The MLS-based scheme provides forward secrecy (FS) and post-compromise security (PCS). The specific implementation is MLS (RFC 9420) with the PQ ciphersuites from `draft-ietf-mls-pq-ciphersuites`. + +**Clarification:** True FS on data-at-rest is a contradiction (the ciphertext persists). What MLS gives you at each epoch bump is: a compromise of the current epoch's keys does not help an attacker read past epochs, and removed members cannot read future epochs. That is the practical security property you want. + +For data-in-transit between clients and server (uploads, key-bundle fetches), use TLS 1.3 with ephemeral ECDHE — that is where session-level FS lives. See [Transport Security](/design/cryptography/failure-modes/#transport-security). + +## Notes on Scaling + +MLS scales to thousands of leaves, so even a 50-user album (200+ devices) is fine. Every `Commit` touches the whole tree and each `Welcome` carries `log(N)` path secrets plus the AMK blob — a cost to watch for very large shared albums. + +## Resilience to Edge Cases + +MLS can encounter a state-divergence or lost-commit scenario that the basic protocol does not solve — handling those (group re-keying, repair after partition, reconciliation of two divergent commit chains) is owned by [MLS Resilience](/design/mls-resilience/). + +## Validation + +- **Protocol round-trip** — unit tests run the four ceremonies against an in-process OpenMLS group: add user, add device, remove user, remove device, AMK rotation. Asserts every member's view of the group state matches after each commit. +- **Welcome correctness** — unit test that a Welcome for a new joiner with `full_history = true` contains every prior AMK and decrypts every prior asset; with `capped_history = N`, contains only the last N epochs. +- **History-policy consistency (unit).** Add the same user via two different devices/orders against an album with a fixed `history_policy`; assert both Welcomes deliver the identical AMK range — the policy is read from album metadata, not chosen per add. +- **Epoch ceiling from chain (unit).** Construct a Welcome whose commit chain attests epoch N; assert the joiner adopts N as its monotonic `amk_version` ceiling and rejects a subsequently-presented manifest claiming epoch N+1 that the chain does not attest. +- **Idempotency** — replay the same commit twice; OpenMLS rejects the second; group state unchanged. +- **MLS + identity binding** — smoke test that the wrapper rejects a LeafNode whose Ed25519 key is not also covered by an ML-DSA signature at the identity layer (the hybrid binding from [primitives](/design/cryptography/primitives/#signature-scheme)). +- **Concurrent commits** — smoke test that two clients proposing in parallel converge after MLS's commit-ordering resolution; no group splits. + +The ceremony-level cross-module test (full enroll + add to album + upload as a real client) is the bounded E2E case listed in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/cryptography/primitives.md b/capsule-docs/src/content/docs/design/cryptography/primitives.md new file mode 100644 index 0000000..2d088b5 --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/primitives.md @@ -0,0 +1,113 @@ +--- +title: Cryptographic Primitives +description: Single-source-of-truth inventory of every cryptographic primitive Capsule uses +--- + +This doc is **the single source of truth** for every cryptographic primitive Capsule uses. Other docs (and the rest of the cryptography sub-docs) reference these by anchor — they never restate the choice. Swapping a primitive is a single-row edit here, plus a new `crypto_suite_id` and the dedicated section below. + +The primitive identities themselves live in `capsule-core::crypto::primitives` as compile-time constants and tagged enums. Every wire format and on-disk record that depends on a primitive carries the [versioning identifiers](#versioning-identifiers) below, so two structures encrypted under different suite versions can coexist without a flag day. + +## Primitives Inventory + +| Primitive | Choice | Used for | +| ------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------ | +| [Cryptographic hash](#cryptographic-hash) | SHA-256 | Content addressing, integrity verification | +| [Key derivation (KDF)](#key-derivation) | HKDF-SHA512 | Per-file and per-album key derivation | +| [Password-based KDF](#password-based-kdf) | Argon2id (device-tier-aware parameters) | Master-key escrow unwrap, backup unwrap | +| [Bulk AEAD](#bulk-aead) | AES-256-GCM with [STREAM](/design/cryptography/encryption/#stream-construction) | Asset and metadata ciphertext | +| [MLS control AEAD](#mls-control-aead) | ChaCha20-Poly1305 | Inherited from the [MLS ciphersuite](#mls-ciphersuite) | +| [Signature scheme](#signature-scheme) | Hybrid Ed25519 + ML-DSA-65 | Identity, device, asset manifest, write tier | +| [KEM](#kem) | X-Wing (X25519 + ML-KEM-768) | MLS HPKE | +| [MLS ciphersuite](#mls-ciphersuite) | `MLS_256_XWING_CHACHA20POLY1305_SHA256_Ed25519` (0x004D) | Group key management | +| [Randomness](#randomness) | OS CSPRNG (`getrandom`) | All keys, salts, nonces | +| [Transport](/design/cryptography/failure-modes/#transport-security) | TLS 1.3 with hybrid X25519+ML-KEM | Client-server, server-server | + +The per-primitive sections below carry the rationale; the table is the at-a-glance reference. + +## Versioning Identifiers + +A faulty, malicious, or version-mismatched client could damage data by writing under a primitive set the receiving side does not implement (see [Threat Model](/design/threat-model/)). Three identifiers — owned here, in [Versioning](/design/versioning/), and in [Metadata](/design/metadata/) — bind each on-disk and on-wire structure to a specific set of primitives or schema so mismatches **fail closed** rather than corrupt state: + +| Identifier | Type | Declared in | Carried in | +| ------------------ | ------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `crypto_suite_id` | `u16` | this doc | every [AssetManifest](/design/cryptography/provenance/#asset-manifest), every [metadata blob](/design/cryptography/encryption/#metadata-blob-wire-format), the backup [MANIFEST.cbor](/design/backup-recovery/) | +| `protocol_version` | string `YYYY-MM-DD` | [Versioning](/design/versioning/) | every AssetManifest, every wire request (see [Threat Model — Protocol Negotiation](/design/threat-model/validation/#protocol-and-capability-negotiation)), the album's MLS pin | +| `sidecar_schema` | `u16` | [Metadata — Sidecar Schema](/design/metadata/#sidecar-schema-v1) | CBOR sidecar field 0 (readable before parsing the rest) | + +`crypto_suite_id = 0x0001` denotes exactly the [Primitives Inventory](#primitives-inventory) above. Retiring any primitive (a broken SHA-256, a deprecated AEAD) **does not edit the row** — it adds a new row and a new suite id. An old AssetManifest carrying `0x0001` keeps verifying against the original row forever; new writes use the new suite id. This is the single-doc edit the inventory promises, generalized to the bundle. + +The signatures on every manifest cover `crypto_suite_id` and `protocol_version`, so a downgrade-attempt (re-signing an existing manifest under a weaker suite) cannot be silently produced. + +### Backward Compatibility + +Old suite ids and protocol versions remain decryptable forever: every encryption-metadata structure is versioned in-band, with its parameters (e.g. Argon2id memory/iterations) saved inside the construction, so a future change never breaks a previous construction. Clients outside the server's supported `protocol_version` range are rejected at the [protocol handshake](/design/threat-model/validation/#protocol-and-capability-negotiation), before any state is written. + +## Per-Primitive Choices + +### Cryptographic Hash + +**SHA-256** (SHA-2) for all content hashing, addressing, and integrity verification — one hash algorithm everywhere: the most prevalent, audited, NIST-approved standard, hardware-accelerated on every target, and one fewer implementation to maintain. + +The same SHA-256 value is reused across layers rather than recomputed — the content-addressing hash (see [Asset Encryption](/design/cryptography/encryption/#authenticated-asset-encryption)) is the value the [signed manifest](/design/cryptography/provenance/#asset-manifest) commits to and the upload protocol declares and verifies. Rejected: SHA-3 (weaker hardware support); BLAKE3 (parallelism unneeded given concurrent uploads, keyed mode redundant against our already-authenticated encryption). + +### Key Derivation + +**HKDF-SHA512** for per-file and per-album key derivation. The wider hash keeps the stack's PQ posture: under Grover a 256-bit hash falls to ~128-bit security while SHA-512 retains ~256-bit, and KDFs are off the hot path so the cost is negligible. + +Every derivation includes a versioned `info` string (e.g. `"asset-file/v1"`) and a scope-unique salt (`album_id`, `file_id`), so a future KDF change lands alongside v1 derivations without a flag day. + +### Password-based KDF + +**Argon2id** with device-tier-aware parameters (canonical defaults below). It runs only at account recovery and device bootstrap — never on a hot path — so the cost is acceptable even on constrained hardware. Each tier's parameters are recorded inside the wrapped blob, so they can be raised as device telemetry accrues without a flag day. + +| Device tier | Memory | Iterations (`t`) | Parallelism (`p`) | When applies | +| ----------------------- | ------- | ---------------- | ----------------- | ---------------------------------------- | +| Low-RAM (≤ 2 GiB total) | 128 MiB | 3 | 1 | Entry-level Android, low-end embedded | +| Normal mobile / laptop | 256 MiB | 3 | 1 | Default for phones and laptops | +| Desktop (≥ 8 GiB) | 512 MiB | 4 | 1 | Wrapping new escrow blobs from a desktop | + +The salt is always a 32-byte CSPRNG draw. The tier chosen at *wrap* time is recorded in the blob; *unwrap* respects whatever tier was recorded, so a desktop-wrapped blob unwraps correctly on a phone (slowly) and vice versa. + +### Bulk AEAD + +**AES-256-GCM**. Combined with the [STREAM construction](/design/cryptography/encryption/#stream-construction) it covers asset ciphertext; standalone AES-256-GCM (fresh random nonce per blob) covers CBOR metadata blobs. + +- AES hardware acceleration (Intel AES-NI, ARMv8 AES extensions, Apple Silicon dedicated AES units) is universal on every platform Capsule targets, so AEAD is never the bottleneck. +- AES-GCM over ChaCha20-Poly1305 for stack consistency with the [SHA-2 family](#cryptographic-hash) and to keep one bulk-AEAD choice across the codebase. MLS retains ChaCha20-Poly1305 from its [ciphersuite spec](#mls-ciphersuite); that's a separate layer. +- Nonce misuse is the structural risk of GCM. Closed two ways: every file uses a freshly-derived per-file key (so the STREAM counter can safely start at zero), and standalone metadata blobs each draw a fresh CSPRNG nonce. + +### MLS Control AEAD + +**ChaCha20-Poly1305**, inherited from the [MLS ciphersuite](#mls-ciphersuite). This protects MLS's own membership and key messages, not user data; user data uses the [bulk AEAD](#bulk-aead) above. + +### Signature Scheme + +**Hybrid Ed25519 + ML-DSA-65** for all long-lived **identity** signatures: the user IK, device keys, asset manifests, and write-tier keys. Both halves must verify before a peer is accepted, so neither algorithm being broken alone compromises authentication. + +**Short-lived operational signatures are classical Ed25519 only** — server-to-server federation, [federation capability tokens](/design/federation/#federation-capabilities), and [access-token JWTs](/design/authentication/#access-token). These live minutes to hours and rotate cheaply, so PQ hybridization buys no meaningful margin (a harvest-now-decrypt-later adversary gains nothing from a long-expired signature) and is not worth the wire-size and verification cost. This carve-out is owned here; consumers link to it rather than restating the choice. + +MLS LeafNode signatures stay Ed25519-only (pinned by the ciphersuite); the ML-DSA half of a device's identity lives at the identity layer — see [MLS](/design/cryptography/mls/). + +### KEM + +**X-Wing (X25519 + ML-KEM-768)**. This is the KEM defined by the [MLS ciphersuite](#mls-ciphersuite) we adopt. + +### MLS Ciphersuite + +**`MLS_256_XWING_CHACHA20POLY1305_SHA256_Ed25519`** (OpenMLS ciphersuite 0x004D) — MLS (RFC 9420) with the PQ ciphersuites from `draft-ietf-mls-pq-ciphersuites`. See [MLS](/design/cryptography/mls/) for how the ciphersuite's choices (X-Wing KEM, ChaCha20-Poly1305 control AEAD, SHA-256 hash, Ed25519 leaf sigs) interact with the identity layer. + +### Randomness + +All keys, salts, and nonces are drawn from the operating system CSPRNG (`getrandom`). Capsule never seeds its own PRNG. + +Nonces are never hand-rolled. The [STREAM construction](/design/cryptography/encryption/#stream-construction) derives per-chunk nonces deterministically; standalone [bulk-AEAD](#bulk-aead) metadata blobs each receive a fresh random nonce. + +## Validation + +Per-primitive verification is straightforward unit-test work: + +- **Known-answer parity** against RFC test vectors and the well-known implementations (libsignal, OpenMLS, RustCrypto vectors). Every primitive ships with its vector set. +- **Suite-id round-trip** — encrypt/sign under suite `0x0001`, persist, re-read; the decoded `crypto_suite_id` must dispatch to exactly the row in the table. A test that asserts two suite ids cannot coexist except via a new row is the structural guard against accidental SSoT drift. +- **Downgrade-rejection** — attempt to verify a manifest whose declared `crypto_suite_id` differs from the value inside its signed envelope. Must reject. + +Cross-doc test linkage: this doc owns *what is correct*; [Provenance](/design/cryptography/provenance/) owns *what `verify_asset` does with it*; [Threat Model — Validation](/design/threat-model/validation/) owns *what a key-less server rejects up front*. diff --git a/capsule-docs/src/content/docs/design/cryptography/provenance.md b/capsule-docs/src/content/docs/design/cryptography/provenance.md new file mode 100644 index 0000000..017e435 --- /dev/null +++ b/capsule-docs/src/content/docs/design/cryptography/provenance.md @@ -0,0 +1,133 @@ +--- +title: Signed Manifests and Provenance +description: Capsule's signed asset manifest, append-only provenance chains, and derivative provenance +--- + +Every asset Capsule stores has a verifiable trace of *who* produced it. The trace is anchored in a small **signed manifest** — bound to the ciphertext, cheap to verify, streaming-compatible — and extended by an **append-only, hash-chained provenance log per asset**. Together these are what let an operator distinguish a legitimate delete from a malicious or bug-induced one after the fact, and what defeats the [stale-revival attack](/design/threat-model/scenarios/#damage-scenario--invariant-map). + +The schemas live here and are the **single source of truth** for `AssetManifest`, `ProvenanceRecord`, and `DerivativeManifest`. They are implemented in `capsule-core::crypto::provenance`; verification flows through the single `verify_asset` chokepoint in `capsule-core::crypto` ([Write Authorization](/design/cryptography/keys/#write-authorization)). + +## Asset Manifest + +A small signed manifest rather than a Merkle tree: the [STREAM construction](/design/cryptography/encryption/#stream-construction) already detects per-chunk tampering, truncation, and reordering, so a Merkle tree's only marginal gain (early-abort on a forged *whole-file* signature) is not worth the extra format complexity. + +Each asset is stored as: + +```rust +AssetManifest { + version: "asset-manifest/v1", + crypto_suite_id: u16, // see Cryptography — Primitives + protocol_version: String, // YYYY-MM-DD; matches album pin + file_id: UUID, + album_id: UUID, + amk_version: u32, // identifies the AMK epoch + write-tier key + ciphertext_hash: bytes, // content-address digest; algorithm fixed by crypto_suite_id; reused by upload protocol + plaintext_size: u64, + chunk_size: u32, // plaintext bytes per chunk (65,520) + nonce_prefix: [u8; 7], // STREAM nonce prefix, random per file + created_by_user: UUID, + created_by_device: UUID, + client_version: String, + timestamp: RFC3339, // self-asserted capture/write time; audit-only (see Keys — Write Authorization) + action: enum, // create | replace | delete | metadata-update + // | derivative-add | derivative-replace | trash-restore + prior_provenance_hash: Option<[u8;32]>, // SHA-256 over the previous manifest in this asset's + // provenance chain. null only for `action = create`; a non-create manifest + // with a null prior hash is rejected at verify_asset and by the + // server's no-key chain-advance check (not a soft warning). + retention_until: Option, // server-visible; set only for `action = delete` (see Organization — Retention Window) + + device_sig: Hybrid(Ed25519, ML-DSA-65), // over all fields above + write_sig: Hybrid(Ed25519, ML-DSA-65), // under epoch write-tier key, over all fields above; both halves required +} + +AssetBlob { + manifest: AssetManifest, + chunks: [AES-256-GCM-STREAM encrypted chunks], +} +``` + +The manifest carries **two signatures**, and a client acknowledges the asset only if **both** verify: + +1. `device_sig` — hybrid Ed25519 + ML-DSA-65 by the uploading device's [DSK](/design/cryptography/keys/#device-keys). Provides provenance; the device certificate chains to the user IK via the [device directory](/design/cryptography/keys/#device-directory). +2. `write_sig` — a **hybrid Ed25519 + ML-DSA-65** signature under the epoch's [write-tier key](/design/cryptography/keys/#album-master-keys-amks); both halves must verify. Proves the signer held write authorization at `amk_version` (see [Write Authorization](/design/cryptography/keys/#write-authorization)). The signature being hybrid is what keeps its coverage of `crypto_suite_id` non-downgradable even if one algorithm is later broken. + +The signed manifest is stored as the encrypted asset's header and is itself part of the [provenance record](#provenance-of-library-modifications). The same signing approach applies to other surfaces — [metadata blobs](/design/cryptography/encryption/#metadata-encryption) and the [device directory](/design/cryptography/keys/#device-directory) are each hybrid, device-signed, and versioned. + +**Streaming is preserved.** STREAM authentication tags verify every chunk *during* the stream. The manifest signature is a one-time provenance check. `ciphertext_hash` is computed incrementally as bytes arrive and confirmed at stream end — no separate pass, no buffering the whole file. + +The closed action enum is owned by [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set). + +## Provenance of Library Modifications + +Every modification of data or metadata produces a **provenance record** — timestamp, device, client version, action — anchored by the [signed manifest](#asset-manifest) above. The records form an **append-only, hash-chained log per asset**, which is the only structure that lets a key-holding attacker be detected after the fact. + +### Chained, Append-Only Structure + +```rust +ProvenanceRecord { + asset_id: UUID, + manifest: AssetManifest, // see Asset Manifest above + prior_provenance_hash: Option<[u8;32]>, // SHA-256 over the previous record; + // null only for `action = create` + // The manifest's own `prior_provenance_hash` mirrors this value, so signature + // coverage of the manifest is signature coverage of the chain link itself. +} +``` + +Each non-create record references its predecessor by hash; a rewrite of any past record breaks the chain at that point and is detectable by any client walking forward from `create`. + +### What an Attacker With All Current Keys Still Cannot Do + +Even if every current key (every device's DSK, every album's current AMK and write-tier key) is compromised: + +- **Forward writes are possible** — the attacker can append new records, just like any holder of those keys. +- **Past records cannot be rewritten** — the prior record was signed by a (possibly retired) device whose hybrid signature is still verifiable against the public half published in the [device directory](/design/cryptography/keys/#device-directory). Replacing the past record would require forging that earlier device's signature, which the hybrid construction prevents. +- **Past records cannot be silently removed** — every later record carries the prior hash, so a removal breaks the chain. + +This bounds the blast radius of a credential compromise: history is read-only. + +### Physical Storage + +- **Client.** An append-only CBOR file at `media/{YYYY}/{YYYY-MM}/{uuid}.provenance.cbor`, alongside the asset and its sidecar — a sequence of `ProvenanceRecord` entries; on hard-delete the log persists as a tombstone-with-history. This file is a **non-authoritative local cache**. A faulty or malicious client can corrupt or truncate *its own* copy, but cannot rewrite history: the chain is self-authenticating — each record is signed and carries the prior record's hash, so dropping or altering any record breaks the forward walk from `create` — and the authoritative copy is the server's append-only blob sequence plus the replicas every other album member holds, any of which re-detects the tamper on next sync as a chain-head mismatch. A client that finds its local cache inconsistent with the authoritative chain rebuilds it from the server. +- **Server.** A content-addressed encrypted blob, distinct from the [encrypted metadata blob](/design/cryptography/encryption/#metadata-encryption), so a metadata edit (which mints a new metadata blob) never rewrites history. The server's no-key envelope of every provenance write includes `prior_provenance_hash`, so the server can enforce monotonic chain advance without holding any key — see [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants). + +The server is **append-only** for provenance: there is no API path that overwrites or deletes an existing entry. An attempt is rejected at the [server's structural validation layer](/design/threat-model/validation/). + +## Derivative Provenance + +Thumbnails, previews, and embeddings are generated client-side and uploaded as ordinary encrypted blobs. Without provenance they would be silently overwritable by any client with write capability — a buggy v4 client could quietly replace a v3 client's good thumbnail with a corrupt one. To prevent this, every derivative carries a small signed manifest of its own: + +```rust +DerivativeManifest { + version: "derivative-manifest/v1", + crypto_suite_id: u16, + source_asset_id: UUID, + role: enum, // thumbnail | preview | embedding (LQIP lives in the signed sidecar, not here) + format: String, // e.g. "image/avif", "embedding/mobileclip-b" + ciphertext_hash: bytes, + generated_by_device: UUID, + generated_by_client: String, + model_id: Option, // for embeddings; see AI/ML Integrations + model_version: Option, // for embeddings + generated_at: RFC3339, + prior_provenance_hash: Option<[u8;32]>, // chained per (asset_id, role) + device_sig: Hybrid(Ed25519, ML-DSA-65), + write_sig: Hybrid(Ed25519, ML-DSA-65), // under the album's epoch write-tier key; both halves required +} +``` + +A derivative overwrite is therefore a `derivative-replace` lifecycle action that appends to the provenance chain like any other write. Quarantine semantics from [Write Authorization](/design/cryptography/keys/#write-authorization) apply: a derivative whose manifest fails verification is surfaced, never silently applied — a buggy client cannot poison a derivative under the receiving side's nose. + +## Validation + +This is the cryptography sub-doc most directly responsible for the `verify_asset` chokepoint that every consumer module depends on. Its unit-test surface must be exhaustive — every negative case is a real damage scenario from [Threat Model — § Damage Scenarios](/design/threat-model/scenarios/#damage-scenario--invariant-map). + +- **`verify_asset` positive cases** — a manifest signed by the correct device + correct epoch write-tier key, with a matching `prior_provenance_hash`, verifies. Tested with fixed test vectors so a refactor cannot silently shift the contract. +- **`verify_asset` negative cases (exhaustive)** — reader-signed (no write-tier sig), removed-writer (write-tier sig from a now-retired epoch), wrong-epoch (sig from the wrong AMK version), forged certificate chain (device not in the user's directory or `added_at` postdates the manifest), replayed manifest (`prior_provenance_hash` does not match local chain head), suite-downgrade (re-signed under a weaker `crypto_suite_id`). Each case is its own unit test with a hand-crafted manifest fixture. +- **Chain advance enforcement** — unit test that appending a record whose `prior_provenance_hash` does not match the current head is rejected. Both client-side (`verify_asset`) and server-side (no-key envelope check) reject the same way. +- **Append-only enforcement (cryptographic, not just storage).** The guarantee is the signature chain, not the file mode. A unit test drops or rewrites a record in a serialized chain and asserts the forward walk from `create` detects the break (a non-matching prior hash, or a signature that no longer verifies). A companion test confirms the server rejects any overwrite or delete of an existing provenance entry at its structural validation layer (invariant 17), and that a client whose local `.provenance.cbor` has been tampered re-derives the authoritative chain from the server rather than trusting the local bytes. +- **Derivative poisoning rejection** — unit test that a `derivative-replace` whose `prior_provenance_hash` does not chain to the current head for `(asset_id, role)` is rejected; the existing derivative is preserved. +- **What-an-attacker-with-all-current-keys-still-cannot-do** — scenario test that holds every *current* key, attempts to rewrite a past record, and confirms the chain walker detects the break. + +The cross-module case (a manifest moving through upload → server envelope validation → finalization → client `verify_asset` on download) is bounded E2E surface, listed in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/device-enrollment.md b/capsule-docs/src/content/docs/design/device-enrollment.md new file mode 100644 index 0000000..0614e19 --- /dev/null +++ b/capsule-docs/src/content/docs/design/device-enrollment.md @@ -0,0 +1,92 @@ +--- +title: Device Enrollment +description: First-device bootstrap and cross-device add ceremonies for Capsule accounts +--- + +A Capsule account has one or more devices, each holding a hardware-bound DSK + DEK cross-signed into the user's [device directory](/design/cryptography/keys/#device-directory). This doc owns the two enrollment ceremonies a device can go through to *get into* that directory: + +- **[First-device enrollment](#first-device-enrollment).** Brand-new account, no other device exists. The first device generates the master key and the initial device keys. +- **[Cross-device add](#cross-device-add).** An existing signed-in device adds a new device to the directory (the new device gets the master key handed to it over a verified channel). + +These are distinct from **[cross-device recovery](/design/backup-recovery/#default-mechanisms)** (which is also a way to bring up a new device, but in the recovery context — the user has lost their other devices and is using the recovery passphrase + master-key escrow to restore). + +Implementation will live in `capsule-core::crypto::keys` (key generation and wrapping) and `capsule-api-auth::devices` (the device directory and the enrollment authentication surface). The ceremony glue lives in per-platform native client code (QR scan, biometric prompt). + +## First-Device Enrollment + +When a user creates a brand-new Capsule account, the very first device runs the full setup ceremony: + +1. **Generate the master key.** A 32-byte CSPRNG draw becomes the account master key. It is wrapped under a recovery passphrase via [Argon2id](/design/cryptography/primitives/#password-based-kdf); the wrapped blob is uploaded to the server-side [master-key escrow](/design/backup-recovery/#master-key-escrow). The plaintext recovery passphrase is shown to the user and never persisted. +2. **Generate the User IK.** A hybrid Ed25519 + ML-DSA-65 keypair (the [User Identity Keys](/design/cryptography/keys/#user-identity-keys-user-iks)). The private halves are wrapped under the master key; the public halves go into the (initial, single-member) device directory. +3. **Generate this device's keys.** A DSK (hybrid Ed25519 + ML-DSA-65) and a DEK (hybrid X25519 + ML-KEM-768), both generated inside the hardware secure element and non-exportable. Both are signed by the IK and added to the device directory. +4. **Publish the device directory.** The IK-signed directory is uploaded to the server. +5. **Create the default album.** Establish the owner's [default album](/design/organization/#the-default-album) — a new MLS group at the `album_id` derived from the master key (see [Keys — Key Chain](/design/cryptography/keys/#key-chain)), with this device as the sole admin/writer — and set the owner's `default_album_id` pointer ([Filesystem — Server](/design/filesystem/server/#ownership-partitioning-and-quota)) to it. This guarantees a writable import destination from the first moment the account exists. +6. **Show the recovery passphrase.** This is the only path back into the account if every device is lost, so saving it is **gated, not advisory**: the user must type back a short slice of the passphrase before setup completes, forcing them to actually record it rather than dismiss the screen. The plaintext passphrase is never persisted. + +Two design points: + +- **Account-creation auth.** The very first request authenticates via the [authentication](/design/authentication/) flow for new registration (OIDC, or the server's own credential ceremony). This establishes the *account* and its server-side metadata only — it confers no data access. All data authority is cryptographic: the master key generated here, and device keys validated device-to-device against the [device directory](/design/cryptography/keys/#device-directory). The server authenticates *who owns the account*; cryptography authenticates *what can read the data*. +- **Multi-device-from-start.** Enrolling a second device right after signup uses the ordinary [cross-device add](#cross-device-add) ceremony — there is no separate "freshly-created" path. One device is signed in and healthy, which is exactly cross-device add's precondition. + +## Cross-Device Add + +When an existing signed-in device adds a new device to the same account: + +1. **Initiate from the existing device.** The user opens "Add another device" on device A (already signed in). Initiating an add requires a **fresh local device authorization** on A (biometric or device passcode) — a valid session token alone is **not** sufficient, so an attacker holding only a stolen session token cannot enroll a rogue device without physical control of A. Device A then generates a one-time **enrollment code** — **single-use, ≥64 bits of entropy, valid 10 minutes**, scoped to this one ceremony, collision-checked at generation, and deleted by the server on redemption or expiry — and displays it as a QR code (with a text fallback). +2. **Scan or enter on the new device.** Device B scans the QR (or types the code). +3. **Establish a short-lived channel.** Devices A and B perform an ephemeral X25519 ECDH to derive a one-time channel key, carried over a **server relay by default, or a direct LAN connection when both devices are on the same network** (discovered via mDNS; LAN preferred — fewer moving parts, no relay trust). The channel is mutually authenticated by the enrollment code plus the ephemeral DH. +4. **Verify the channel.** A short safety code derived from the channel transcript is displayed on both devices, **alongside each device's identity (model + a short key fingerprint)**; the user confirms both that the codes match and that the device being added is the one physically in front of them. Binding the code to device identity defends against a MITM on the relay channel and against a relay that swaps in a different device. +5. **Transfer the master key.** Device A wraps the account master key under the channel-derived key and sends to device B. Device B unwraps, generates its own DSK + DEK in hardware, and presents them to device A for signing. +6. **Cross-sign and publish.** Device A signs B's device keys with the user's IK, updates the device directory, and uploads it. The IK private halves are wrapped under the [master key](/design/cryptography/keys/#registered-accounts), which device A already holds while signed in — so **any fully-enrolled device can unwrap the IK and authorize an add**. There is no special "IK-holder" device or extra key class; holding the master key is the single requirement for identity signing. +7. **B joins MLS groups.** With its keys now in the directory, device B can be added as a leaf to each album's MLS group (via the standard [Add new device](/design/cryptography/mls/#add-new-device-for-existing-member) flow). + +Two presentation choices: + +- **Enrollment code.** Presented as a QR code with a **friendly numeric** text fallback — the channel is independently authenticated by the safety code, so the code itself only needs to be conveniently transcribable, not dense. Entropy (≥64-bit), single-use, and 10-minute expiry are fixed in step 1. +- **Safety-code check.** Step 4 binds the code to each device's identity (model + key fingerprint). To make the human comparison failure-resistant, both devices show the code in the same chunked, fixed-length format, and confirming requires an explicit match-and-identity acknowledgement on **both** devices — a mismatch is the abort path, not a missed default. + +## Relationship to Cross-Device Recovery + +Cross-device recovery (owned by [Backup and Recovery](/design/backup-recovery/#default-mechanisms)) is operationally similar — both involve handing the master key to a new device over a verified channel — but the trigger is different: + +- **Cross-device add** is *additive*: an existing device is healthy and is bringing up a new sibling. +- **Cross-device recovery** is *substitutive*: every device has been lost; one is being bootstrapped from the recovery passphrase, possibly assisted by a surviving device. + +The two ceremonies may share underlying code (channel-establishment, key-transfer wrapping) but the entry surfaces and the user expectations are distinct. + +## Contract Skeleton + +```rust +// in capsule-core::crypto::keys +fn first_device_setup(passphrase: &str) -> Result; + +// in capsule-api-auth::devices +fn issue_enrollment_code() -> EnrollmentCode; // server stores a short-lived record +fn redeem_enrollment_code(code: EnrollmentCode) -> Result; + +// on the existing device +fn complete_cross_device_add(channel: ChannelHandle, b_keys: DeviceKeyBundle) -> Result<(), EnrollmentError>; +``` + +The channel and enrollment-code wire formats are an implementation detail; channel dispatch is LAN-direct when both devices share a network and server-relay otherwise (step 3). + +## Failure Modes + +Each enrollment ceremony must handle: + +- **User abandons mid-flow.** The enrollment code expires; no state is persisted; the user starts over. +- **Channel hijack attempt.** The safety-code verification catches an active MITM on the channel; if codes don't match, the user is told to abort. +- **Stolen session token.** A session token alone cannot start an add — the fresh local device authorization (step 1) gates initiation on physical control of an already-trusted device, so a remotely-exfiltrated token cannot enroll a rogue device. +- **Hardware-key generation failure.** The new device's secure element refuses to generate keys (rare but happens); enrollment fails with an actionable error. +- **Server unavailable.** The directory upload fails; the new device is locally functional but invisible to other devices until the upload succeeds. The client retries with backoff and surfaces "finishing setup — will complete when the server is reachable"; the device's keys are already generated and valid, so the delay loses nothing. +- **Default-album creation fails.** Account creation still completes — the master key, identity, and device are fully valid. Because the [default album](/design/organization/#the-default-album)'s ID is [derivable from the master key](/design/cryptography/keys/#key-chain), any device recreates it lazily before the first import, so a transient failure here never blocks setup or loses data. + +## Validation + +- **First-device setup round-trip (smoke).** Run the full ceremony; assert master key wrapped + escrowed; assert device directory has exactly one entry; assert recovery passphrase unwraps the escrow. +- **Cross-device add safety-code check (unit).** Inject mismatched safety codes; assert the ceremony aborts. +- **MITM defense (smoke).** Mock a relay that swaps the channel keys; assert safety codes diverge; assert abort. +- **Enrollment-code expiry (unit).** Generate code; let it expire; assert redemption fails with the right structural error. +- **Enrollment-code single-use (unit).** Redeem a code; attempt to redeem it again; assert rejection; assert the server deletes it on both redemption and expiry. +- **Local-auth gate (unit).** Attempt to initiate a cross-device add with only a session token and no fresh local device authorization; assert refusal. +- **Hardware-key failure (smoke per platform).** Mock hardware-element refusal; assert the ceremony surfaces a clear error rather than partially completing. diff --git a/capsule-docs/src/content/docs/design/federation.md b/capsule-docs/src/content/docs/design/federation.md index 5bf5c61..a9ad34c 100644 --- a/capsule-docs/src/content/docs/design/federation.md +++ b/capsule-docs/src/content/docs/design/federation.md @@ -1,25 +1,21 @@ --- title: Federation -description: How Capsule implements server-to-server federation for sharing and collaboration +description: How Capsule servers share albums across users on different home servers --- -Federation lets an album owned on one Capsule server be shared with users whose -accounts live on another. This document covers **server-to-server** federation -only; direct device-to-device sync for a single user is [Peering](/design/peering/). +Federation lets an album owned on one Capsule server be shared with users whose accounts live on another. This document covers **server-to-server** federation only; direct device-to-device sync for a single user is [Peering](/design/peering/). + +Federation reuses the existing read primitives — `/sync`, `/blob/{hash}`, the standard manifest envelope. The only new things federation introduces are a **capability token** (the contract that gates which peers may fetch what) and a **per-peer compartmentalization layer**. Implemented in `capsule-api-sync::federation`: capability issuance, verification, the pull path, and per-peer rate budgeting. ## Threat Model -Federation is designed under one assumption: **a remote server is hostile until -proven otherwise.** It may be running ancient, buggy code; it may be compromised; -it may be actively malicious; peers may collude. The only thing Capsule trusts is -cryptography it verifies itself. Every other claim a peer makes is unverified -input until a signature or a content hash says otherwise. +Federation is designed under one assumption: **a remote server is hostile until proven otherwise.** It may be running ancient, buggy code; it may be compromised; it may be actively malicious; peers may collude. The only thing Capsule trusts is cryptography it verifies itself. Every other claim a peer makes is unverified input until a signature or a content hash says otherwise. -This is in line with the security posture established in the [cryptography](/design/cryptography/) design toward Capsule's *own* server ("trust the server for storage, never for authorization"). Federation extends it to servers Capsule does not even operate. +This extends the security posture established in the [cryptography](/design/cryptography/) design toward Capsule's *own* server ("trust the server for storage, never for authorization") to servers Capsule does not even operate. ## Federation Reuses Existing Primitives -Federation deliberately introduces **no new data protocol**. A remote server fetches exactly the same content-addressed primitives a client uses (see [Import and Synchronization](/design/import-synchronization/#discovering-what-changed)): +Federation deliberately introduces **no new data protocol**. A remote server fetches exactly the same content-addressed primitives a client uses (see [Import — Download & Sync](/design/import/download-sync/#discovering-what-changed)): | Operation | Purpose | | -------------------------- | --------------------------------------------------------------------------------------------- | @@ -29,7 +25,7 @@ Federation deliberately introduces **no new data protocol**. A remote server fet Everything else — notifications, presence — rides a separate, lower-trust channel and never feeds the validation pipeline directly. -Because blobs are content-addressed by their [ciphertext content hash](/design/cryptography/#primitives-inventory), a peer *physically cannot* lie about what a hash contains: Capsule recomputes the hash on arrival and rejects a mismatch. This collapses most of the trust problem — Capsule never trusts a peer's *claim* about an object, it fetches and verifies. +Because blobs are content-addressed by their [ciphertext content hash](/design/cryptography/primitives/), a peer *physically cannot* lie about what a hash contains: Capsule recomputes the hash on arrival and rejects a mismatch. This collapses most of the trust problem — Capsule never trusts a peer's *claim* about an object, it fetches and verifies. ActivityPub and Nextcloud Federated Sharing were considered and rejected as the wire protocol: Capsule's E2EE model (ciphertext-only blobs, MLS-gated album membership) does not map onto either, and adopting one would mean tunnelling Capsule's real primitives through a foreign envelope for no gain. @@ -43,11 +39,13 @@ For v1, **each album has exactly one home server** — the server that issued th This rule keeps the v1 federation API surface small (no replication, no cross-server commit ordering) and forecloses several damage classes — split-brain ownership, two-server delete races, conflicting AMK-epoch advances — that would otherwise need explicit cross-server consensus to prevent. -Cross-server replication of a *single* album (where two users on different home servers each want to write the same album) is **out of scope for v1** and deferred to v2. v1 supports cross-server sharing in the read direction (Alice on `home.tld` shares an album to Bob on `other.tld`; Bob reads via federation; Bob's writes either remain on `home.tld` via a registered or sponsored account, or are out of scope). The v2 design space is flagged in [Threat Model — Open Questions](/design/threat-model/#open-questions). +Cross-server replication of a *single* album (where two users on different home servers each want to write the same album) is **out of scope for v1** and deferred to v2. v1 supports cross-server sharing in the read direction (Alice on `home.tld` shares an album to Bob on `other.tld`; Bob reads via federation; Bob's writes either remain on `home.tld` via a registered or sponsored account, or are out of scope). The v2 design space is flagged in [Threat Model — Open Questions](/design/threat-model/schema-rules/#open-questions). ## Federation Capabilities -Sharing an album with `alice@other.tld` requires her server to be *able* to fetch that album's blobs. Capsule issues her server an **album-scoped capability token**: a signed, expiring, revocable grant naming the album, the scope, and an expiry, reusing the [EdDSA-JWT machinery](/design/authentication/#access-tokens) already built for access tokens — no separate macaroon or ZCAP format is introduced. +Sharing an album with `alice@other.tld` requires her server to be *able* to fetch that album's blobs. Capsule issues her server an **album-scoped capability token**: a signed, expiring, revocable grant naming the album, the scope, and an expiry, reusing the [EdDSA-JWT machinery](/design/authentication/#access-token) already built for access tokens — no separate macaroon or ZCAP format is introduced. + +The capability token format is the contract every federated peer parses and that this server signs. Its shape and lifecycle below are normative. ### Token Contents @@ -64,57 +62,58 @@ A federation capability token is an EdDSA-JWT with the following claims: | `jti` | UUIDv7 | Unique token identifier; the revocation key. | | `min_protocol_version` | string | Lowest `protocol_version` the issuing server still serves; matches the album's pin. | -Signed under the home server's [Ed25519 signing key from the cryptographic primitives inventory](/design/cryptography/#signature-scheme) — classical only at this layer (operational server keys rotate easily; the [hybrid PQ scheme](/design/cryptography/#signature-scheme) is reserved for user/device identity). +Signed under the home server's signing key — classical Ed25519 only, per the [operational-signature carve-out](/design/cryptography/primitives/#signature-scheme). ### Token Lifecycle and Chain of Trust 1. **Issuance.** A user on `home.tld` shares an album with `alice@other.tld`. `home.tld` mints a capability token for `other.tld` and delivers it as part of the share-invite message to Alice's client. Alice's client posts the token to `other.tld`; `other.tld` caches it server-side and uses it on every subsequent pull. 2. **Verification.** Capsule (the verifier, `home.tld` in this case) verifies the token offline against its own published signing key — no third-party PKI, no network call to a notary except for key rotation (see [Server Identity and Key Rotation](#server-identity-and-key-rotation)). -3. **Refresh.** A token nearing `exp` is replaced by `other.tld` requesting a new one on Alice's behalf; the request is itself authenticated by the previous token. Idempotency keyed by `(peer_id, jti)` per [Threat Model — Idempotency Invariants](/design/threat-model/#idempotency-invariants). -4. **Revocation.** Revocation is a short TTL (`exp ≤ 24h`) plus a published **revocation list** at `/.well-known/capsule/revoked-jti`. Peers fetch and cache the list with a **maximum staleness of 15 minutes**. A peer holding a revoked-but-not-yet-expired token will still be honored for up to 15 minutes after revocation — this is the deliberate trade-off between revocation latency and revocation-list polling overhead. +3. **Refresh.** A token nearing `exp` is replaced by `other.tld` requesting a new one on Alice's behalf; the request is itself authenticated by the previous token. Idempotency keyed by `(peer_id, jti)` per [Threat Model — Idempotency Invariants](/design/threat-model/validation/#idempotency-invariants). +4. **Revocation.** Revocation is a short TTL (`exp ≤ 24h`) plus a published **revocation list** at `/.well-known/capsule/revoked-jti`. Peers fetch and cache the list with a **maximum staleness of 15 minutes**. A peer holding a revoked-but-not-yet-expired token will still be honored for up to 15 minutes after revocation — this is the deliberate trade-off between revocation latency and revocation-list polling overhead. **List unavailability fails closed:** a verifier that relies on a *cached* copy of an issuer's revocation list and cannot refresh it must reject, past the 15-minute bound, any token whose `jti` it can no longer confirm against a current list — it never honors tokens indefinitely on a stale list. The `exp ≤ 24h` ceiling caps the worst case regardless, but the explicit rule means revocation cannot be outlived by making the list unreachable. (A server verifying its *own* tokens checks its own always-fresh list and is never stale.) 5. **Expiry.** A token past `exp` is rejected unconditionally; the verifier returns `401` and the peer must obtain a fresh token before continuing. -This capability is a **transport-scoped control, not a confidentiality control.** A peer holding it can fetch ciphertext and nothing more — confidentiality is already enforced by [MLS album membership](/design/cryptography/#group-membership): without the album master key, fetched bytes are unreadable. The capability exists to gate *who may fetch at all* — rate-limiting, anti-enumeration, and clean revocation of a sharing relationship — not to keep content secret. +This capability is a **transport-scoped control, not a confidentiality control**: it gates *who may fetch at all* (rate-limiting, anti-enumeration, clean revocation of a sharing relationship), nothing more. Confidentiality is already enforced by [MLS album membership](/design/cryptography/mls/) — without the album master key, fetched bytes are unreadable. ## Validation at the Boundary -Every byte from a peer crosses a hard boundary before it is trusted. The exhaustive checklist — refuse-by-default, applied to every federated write — is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants); the rules that follow are the federation-specific specialization of that list. +Every byte from a peer crosses a hard boundary before it is trusted. The exhaustive checklist — refuse-by-default, applied to every federated write — is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants); the rules that follow are the federation-specific specialization of that list. -- **Strict schema match.** Input must conform exactly to the schema for its declared protocol version (see [album version pinning](/design/versioning/#album-protocol-version-pinning)). Anything else is rejected. `crypto_suite_id` and `sidecar_schema` must each be values the verifying server recognizes; an unknown value is **not** preserved-and-ignored, it is rejected (cf. the asymmetric Postel's Law in [Principles](/design/principles/) and [Threat Model — Schema Evolution](/design/threat-model/#schema-evolution-and-field-grammar)). +- **Strict schema match.** Input must conform exactly to the schema for its declared protocol version (see [album version pinning](/design/versioning/#album-protocol-version-pinning)). Anything else is rejected. `crypto_suite_id` and `sidecar_schema` must each be values the verifying server recognizes; an unknown value is **not** preserved-and-ignored, it is rejected (cf. the asymmetric Postel's Law in [Principles](/design/principles/) and [Threat Model — Schema Rules](/design/threat-model/schema-rules/#schema-evolution-and-field-grammar)). - **Closed enums.** `action`, `content_type`, `DerivativeManifest.role`, and `gps.source` are closed per protocol version. An unknown value is a structural error, not a "future to ignore." - **Hard caps.** Size caps on every field, depth caps on nested structures, length caps on bounded collections (e.g. `superseded_captions ≤ 16`), rate caps per peer. No unbounded input reaches a parser. - **Unknown fields within a known schema preserved, never executed.** Top-level unknown fields are rejected; field-level unknown CBOR keys within a known schema are preserved verbatim for forward compatibility but are never interpreted. -- **Manifest envelope checks.** All items 1–18 of [Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants) apply — `protocol_version` in range, `crypto_suite_id` in inventory, hash algorithm matches the suite, declared size against received bytes, `created_by_device` in the user's device directory, `timestamp` within ±30 days, monotonic `amk_version`, and the [stale-revival check](/design/import-synchronization/#stale-revival-detection) on `prior_provenance_hash`. +- **Manifest envelope checks.** All items 1–18 of [Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants) apply — `protocol_version` in range, `crypto_suite_id` in inventory, hash length matches the suite's digest size, declared size against received bytes, `created_by_device` in the user's device directory, `timestamp` within the sanity bound, monotonic `amk_version`, and the [stale-revival check](/design/import/download-sync/#stale-revival-detection) on `prior_provenance_hash`. - **Capability token.** Items 19–21 of the same list: token verifies under the home server's signing key, `exp` in future, `jti` not in the revocation list, per-peer rate budgets unbroken. -- **The parser is a security boundary.** Capsule's decoders for federated input are written in memory-safe Rust against audited libraries (`ciborium`, `serde_cbor`); we explicitly assume the host language and decoder are memory-safe (the same assumption [Federation — Security Against Malicious Files](#security-against-malicious-files) makes at the client edge). Decoder CVEs in client decode paths for *opaque media bytes* are handled by the [sandboxed decoder](/design/clients/#sandboxed-decoder), not by re-implementing the decoder. The federation CBOR decode path is additionally fuzzed. +- **The parser is a security boundary.** Capsule's decoders for federated input are written in memory-safe Rust against audited libraries (`ciborium`, `serde_cbor`); we explicitly assume the host language and decoder are memory-safe (the same assumption [Security Against Malicious Files](#security-against-malicious-files) makes at the client edge). Decoder CVEs in client decode paths for *opaque media bytes* are handled by the [sandboxed decoder](/design/clients/#sandboxed-decoder), not by re-implementing the decoder. The federation CBOR decode path is additionally fuzzed. ## Per-Peer Compartmentalization Each peer is its own blast-radius boundary — a bad peer cannot starve good ones: -- **Quotas.** Per-peer budgets on events/hour, bytes/hour, and CPU/hour. Exceeding a budget queues or drops further requests. +- **Quotas.** Per-peer budgets (deployment-tuned) on events/hour, bytes/hour, and CPU/hour. Exceeding a budget queues or drops further requests. +- **Receiving-user storage budget.** The per-peer budgets above bound *transfer*; storage is bounded separately. Blobs a pull *caches* on the home server are charged to the **receiving user's** [quota](/design/quota/#accounting-model), deduped, under a per-`(receiving_user, source_peer)` cap — so a single user pulling from many peers cannot exhaust home storage even while staying within every individual peer's transfer budget. - **Error budget + circuit breaker.** Malformed input spends a per-peer error budget; enough failures trip a circuit breaker that backs the peer off exponentially (e.g. 5 / 30 / 60 minutes). A buggy peer cannot DoS Capsule. -- **Quarantine for new peers.** First contact puts a server in a probationary tier: tighter quotas, stricter validation, no push notifications accepted. It graduates after a period of clean behavior. This cuts off the "spin up a fresh instance to attack" vector, mirroring email reputation systems. +- **Quarantine for new peers.** First contact puts a server in a probationary tier: tighter quotas, stricter validation, no push notifications accepted. It graduates after a period of clean behavior. This cuts off the "spin up a fresh instance to attack" vector, mirroring email reputation systems. ## Stale-Revival Defense -A federated peer may have cached an old manifest for an asset that the home server has since marked deleted (or otherwise advanced beyond). Submitting that old manifest back must not silently resurrect the asset. The defense is owned by [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) and surfaced for federation here: +A federated peer may have cached an old manifest for an asset that the home server has since marked deleted (or otherwise advanced beyond). Submitting that old manifest back must not silently resurrect the asset. The defense is owned by [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications) and surfaced for federation here: - The home server only serves the **current** manifest for any asset — it does not expose an API to fetch an arbitrary past manifest. A peer can therefore only present a manifest it has previously cached. - A peer presenting a manifest whose `prior_provenance_hash` is behind the home server's stored `latest_provenance_hash` is rejected with `409` (stale-revival), and the rejected manifest's hash is added to the bounded rejected-hash table (see [Soft-Fail Semantics](#soft-fail-semantics)). The same defense runs on the receiving client when a peer's pull serves a stale manifest forward. - The chain check is fully no-key: the server reads `prior_provenance_hash` from the manifest envelope and compares it to its own stored value. -This is the federation-layer specialization of [Threat Model — § 4 (Damage Scenario Map)](/design/threat-model/#damage-scenario--invariant-map), row #4. +This is the federation-layer specialization of [Threat Model — Damage Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map), row #4. ## Soft-Fail Semantics -A federated event that fails validation is rejected **locally** — not applied, not shown, no authority derived from it — but its hash is **remembered**. Remembering the hash keeps Capsule's view from silently diverging from peers that (wrongly) accepted it: divergence is the real enemy, and explicit rejection-with-memory is the cure. This is the federation-facing counterpart of the [`verify_asset` quarantine](/design/cryptography/#write-authorization) — a failure is never silently dropped and never silently accepted. +A federated event that fails validation is rejected **locally** — not applied, not shown, no authority derived from it — but its hash is **remembered**. Remembering the hash keeps Capsule's view from silently diverging from peers that (wrongly) accepted it: divergence is the real enemy, and explicit rejection-with-memory is the cure. This is the federation-facing counterpart of the [`verify_asset` quarantine](/design/cryptography/keys/#write-authorization) — a failure is never silently dropped and never silently accepted. -**Bounded memory.** A hostile peer could otherwise flood the rejected-hash table indefinitely, so the table is **capped**: default 100,000 entries with a 90-day TTL per entry, both deployment-configurable. Eviction is LRU within the cap. The hashes that age out are the ones Capsule hasn't seen referenced again — by the time they age out they are no longer load-bearing for divergence detection. +**Bounded memory.** A hostile peer could otherwise flood the rejected-hash table indefinitely, so the table is **capped**: default 100,000 entries with a 90-day TTL per entry, both deployment-configurable. Eviction is LRU by last reference within the cap: the hashes that age out are the ones Capsule hasn't seen referenced again, so by the time they age out they are no longer load-bearing for divergence detection. ## Reconstructing State Without Trusting Peers -Capsule never trusts the *order* in which a peer returns results. Federated state is reconciled from cryptographic signals — content hashes and signatures on [asset manifests](/design/cryptography/#provenance-and-signed-manifest) — not from peer-supplied ordering. A manifest's `timestamp` is self-asserted and used for audit only. +Capsule never trusts the *order* in which a peer returns results. Federated state is reconciled from cryptographic signals — content hashes and signatures on [asset manifests](/design/cryptography/provenance/#asset-manifest) — not from peer-supplied ordering. A manifest's `timestamp` is self-asserted and used for audit only. **Cross-peer consistency checks.** As a cheap backstop, a client may periodically fetch the same album state from the home server and from a peer and diff them. A mismatch flags a potentially misbehaving server. This is rare and off the hot path, but one server cannot rewrite history without another noticing. @@ -136,18 +135,26 @@ Linking assets from an external server means a client inherently trusts bytes fr Search spanning federated albums uses a two-tier index: - **Tier 1 — local full-fidelity index.** Everything on the home server — own uploads plus cached remote content — gets the full treatment described in [AI/ML Integrations](/design/ai/): embeddings, tags, perceptual hashes. -- **Tier 2 — federated breadcrumb index.** For accessible remote albums, Capsule keeps only a lightweight record per asset — content hash, timestamp, author, size, album membership. When the user actually views the remote album, relevant assets are fetched and **promoted** into the Tier-1 index. Promotion is lazy and on-demand; Capsule never pre-indexes every federated album wholesale. - -## Moderation and Abuse +- **Tier 2 — federated breadcrumb index.** For accessible remote albums, Capsule keeps only a lightweight record per asset — content hash, timestamp, author, size, album membership. When the user actually views the remote album, relevant assets are fetched and **promoted** to full Tier-1 treatment: the [AI/ML](/design/ai/) pipeline indexes them (embeddings, tags), and the fetched bytes count toward the receiver's [quota](/design/quota/#accounting-model). Promotion runs the indexing pipeline rather than copying pre-computed remote state — Tier 2 holds none — and is lazy and on-demand; Capsule never pre-indexes every federated album wholesale. -Capsule is end-to-end encrypted, so a server **cannot** scan content it holds — server-side content or CSAM scanning is impossible by design, and no content scanner is built. Moderation instead operates on what *is* available: +## Moderation Hooks -- **Federated reporting protocol.** A report against `alice@other.tld`'s asset is routed to her home server's administrators, since they are the only party that can act on her account. -- **Blocklists.** Server-level blocklists, plus per-user blocks that federate. -- **Untrusted-server whitelist.** The same [whitelist](#security-against-malicious-files) that gates malicious files is the front-line abuse control for content from servers Capsule does not trust. +Federation introduces moderation hooks for handling abuse across servers; the full policy (reports, suspensions, takedowns, blocklists) is owned by [Moderation](/design/moderation/). Federation provides the transport (federated reports between servers) and the boundary (untrusted-server whitelist that gates content from unknown peers). ## Server Identity and Key Rotation -- Server-to-server requests are signed under the [server's signing key](/design/cryptography/#signature-scheme) (classical-only at this layer is acceptable since operational server keys rotate easily; the [hybrid PQ scheme](/design/cryptography/#signature-scheme) is reserved for user/device identity), published at a well-known path. Matrix, ActivityPub (HTTP Signatures), and AT Protocol all converge on this pattern. -- Servers cache each other's public keys, so key rotation needs a notary / perspective endpoint so a peer can confirm a rotated key. +- Server-to-server requests are signed under the server's signing key (classical Ed25519 only, per the [operational-signature carve-out](/design/cryptography/primitives/#signature-scheme)), published at a well-known path. Matrix, ActivityPub (HTTP Signatures), and AT Protocol all converge on this pattern. +- Servers cache each other's public keys (TOFU-pinned on first contact). A rotation is confirmed by a **perspective check**: before accepting a rotated key, a peer corroborates it against one or more independent vantage points (other servers, or a configured notary) and accepts only on agreement — so a single compromised network path cannot substitute a forged key. A rotation that fails corroboration is surfaced, not silently accepted. This is the mechanism behind [Threat Model — scenario #26](/design/threat-model/scenarios/#damage-scenario--invariant-map). - Album protocol versions are pinned per album — see [Album Protocol Version Pinning](/design/versioning/#album-protocol-version-pinning). + +## Validation + +- **Capability token verify (unit).** Generate token; verify under issuer key; mutate each claim; assert each mutation rejected with the right structural code (expired, revoked, wrong audience, wrong scope, missing claim). +- **Revocation-list fail-closed (unit).** Cache a revocation list, then make refresh fail; advance past the 15-minute staleness bound; assert a token whose `jti` cannot be freshly confirmed is rejected, not honored on the stale list. +- **Pull boundary checks (unit).** Submit a peer-pull request with each kind of malformed envelope; assert refusal at the corresponding [server-side invariant](/design/threat-model/validation/#server-side-validation-invariants). +- **Rate-budget enforcement (smoke).** Exhaust a peer's events-per-hour budget against a testcontainer Postgres; assert `429`; assert successful pulls resume after the window. +- **Circuit breaker (smoke).** Submit N malformed payloads from a single peer; assert circuit opens; assert further requests are short-circuited until the back-off elapses. +- **Soft-fail bounded memory (unit).** Push the rejected-hash table past its cap; assert LRU eviction; assert no unbounded memory growth. +- **Cross-peer consistency (smoke).** Stand up two federated servers; produce a write on one; fetch from the other; assert byte-identical state. + +The cross-module case — full cross-server pull where Alice on `home.tld` shares to Bob on `other.tld` and Bob's client successfully fetches and verifies — is one bounded E2E case in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/filesystem.md b/capsule-docs/src/content/docs/design/filesystem.md deleted file mode 100644 index d39cb3a..0000000 --- a/capsule-docs/src/content/docs/design/filesystem.md +++ /dev/null @@ -1,496 +0,0 @@ ---- -title: Filesystem -description: How Capsule structures files on disk, on the server and on clients ---- - -Capsule's end-to-end encryption splits the filesystem into two fundamentally -different roles. The **server** stores only opaque, content-addressed -ciphertext — it never holds a decryption key and cannot interpret a single byte -it stores (see [Cryptography](/design/cryptography/)). **Clients** hold the keys, so a -client filesystem is a working library of plaintext media, sidecar metadata, and -rebuildable caches. The two layouts share a small set of principles but -otherwise have little in common. - -This document covers on-disk structure only. The import pipeline, the upload -protocol, and synchronization are covered in -[Import and Synchronization](/design/import-synchronization/); metadata extraction in -[Metadata](/design/metadata/); derivative generation in -[Thumbnails and Previews](/design/thumbnails/); grouping and trash semantics in -[Asset Organization](/design/organization/); backup and recovery in -[Backup and Recovery](/design/backup-recovery/). - -## Shared Principles - -These follow directly from [Core Principles](/design/principles/): - -- **Recovery-first.** No database is required to interpret canonical data. On - the client, sidecar files are the source of truth and the index is a - rebuildable cache. On the server, PostgreSQL is the authoritative index, but - it holds only key-free facts. -- **Atomic writes.** Every write that must not tear uses temp-file + atomic - rename on the same filesystem. Direct overwrites risk corruption on power loss. -- **Ephemeral derived data.** Only originals and their canonical metadata are - irreplaceable. Thumbnails, transcodes, parsed-metadata caches, and the query - index can all be regenerated and are treated as such. -- **4 KiB alignment.** Data is processed and written block-aligned to 4 KiB, - which matches memory and disks and enables the reflink assembly path below. -- **Content-addressing.** Stored blobs are named by their ciphertext content hash — - the same hash everywhere a content address is needed (see - [Cryptography Primitives Inventory](/design/cryptography/#primitives-inventory)). - -## Server vs Client at a Glance - -| Concern | Server | Client | -| ------------ | ------------------------------------------ | --------------------------------------------- | -| Holds keys | No | Yes | -| Stored form | Opaque ciphertext blobs | Plaintext media + CBOR sidecars | -| Naming | Content-addressed by ciphertext hash | UUIDv7 stems, date-bucketed | -| Index | PostgreSQL (key-free facts only) | SQLite (rebuildable, full plaintext metadata) | -| Derived data | Stored as client-generated encrypted blobs | Generated locally, cached, rebuildable | -| Originals | Always retained while referenced | Present only if synced locally | - -## Server Filesystem - -### Stores by Deployment Profile - -The server's durable state is always split across **two required systems** plus an **optional third** for high-concurrency deployments: - -- **Blob store** (filesystem) — the encrypted bytes of every asset. *Required.* -- **PostgreSQL** — the authoritative index: ownership, album references, blob - references, lifecycle state, and (in the default profile) upload-session state. - *Required.* -- **Valkey** — volatile upload-session state (offsets, status) with a 24-hour - TTL. *Optional.* Recommended only for deployments where upload-session hot-path - contention on PostgreSQL becomes measurable. - -This gives two concrete deployment profiles: - -| Profile | Session state lives in | When to choose it | -| --------------------------- | ------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- | -| **Default (Postgres-only)** | `upload_sessions` table with `expires_at` TTL column and a periodic sweep | Self-hosted, small-to-medium servers, single-node deployments. Reduces ops surface. | -| **High-concurrency** | Valkey (keyed `upload:session:{id}`) with native 24-hour TTL; PostgreSQL still holds the durable pending-asset row | Large multi-tenant deployments where session-table contention is a measured bottleneck | - -Switching profiles is operationally invisible to clients — the upload protocol does not change, only where the server stores volatile session counters. The [upload protocol](/design/import-synchronization/) is written to be store-agnostic. - -The server performs no decoding, no metadata extraction, and no thumbnail -generation — it cannot, since it never holds a key. - -### Blob Store Layout - -```text -{blob_root}/ -├── incoming/ -│ ├── {upload_id}_{n}.part # in-flight chunk -│ └── {upload_id}.bin # assembled blob, pre-verification -├── blobs/ -│ └── {hash[0:2]}/{hash[2:4]}/ -│ └── {hash} # finalized blob, content-addressed -└── .server/ - ├── version # server filesystem schema version - └── config # server-wide configuration -``` - -- **`{blob_root}`**: absolute path configured at server startup. The entire tree - must be on a single filesystem so that finalization renames are atomic. -- **`incoming/`**: live uploads. Chunks land as `{upload_id}_{n}.part`; on - finalization they are concatenated into `{upload_id}.bin`. The 4 KiB chunk - alignment is what allows each chunk to be reflinked into place on - copy-on-write filesystems, turning assembly into a near-instant metadata - operation. See the upload protocol in - [Import and Synchronization](/design/import-synchronization/). -- **`blobs/`**: the finalized store. A blob's filename is its [ciphertext content hash](/design/cryptography/#primitives-inventory); the two-level hex-prefix shard keeps directory sizes bounded for - multi-million-blob stores. A finalized blob is immutable. -- **`.server/`**: the server operator's own configuration and schema version. - This is plaintext server metadata, not user data — it is the one thing under - `{blob_root}` that is not an encrypted blob. - -### Uniform, Opaque Blobs - -A single asset produces a **bundle** of blobs (see -[Import and Synchronization](/design/import-synchronization/) — "What Gets Uploaded"): -the encrypted original, encrypted derivatives (thumbnails, previews, LQIP), the -encrypted CBOR metadata blob, and the encrypted provenance blob (see -[Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)). -The blob store does not distinguish them — every blob is just content-addressed -ciphertext. The mapping from an asset to its constituent blobs, and the role of -each blob, lives entirely in PostgreSQL. - -### Recovering the Index from Blobs Alone - -The PostgreSQL index is authoritative but **not the only copy** of what the -server knows. Every blob carries enough server-visible structural metadata — -the [unencrypted portion](/design/cryptography/#provenance-and-signed-manifest) -of the asset manifest — to rebuild the index row that referenced it. This is -the server-side counterpart of the recovery-first principle that lets a client -rebuild its index from CBOR sidecars. - -The server-visible portion of a blob includes: - -- `crypto_suite_id`, `protocol_version`, `amk_version` — what bundle of - primitives encrypted this asset and which album epoch -- the ciphertext hash (`hash.value`) and declared size — content address and - storage attribution -- `created_by_user`, `created_by_device`, `album_id`, `file_id`, - `prior_provenance_hash`, `action` — owner, provenance chain link, and - lifecycle action -- the device's hybrid signature — provenance attribution; verifiable against - the public device directory even without any key Capsule's server holds - -A rebuild walks `blobs/`, reads the manifest envelope of each blob, verifies -the device signature against the cached device directory, and writes an index -row. The rebuild is idempotent: re-running it against an existing index -produces no changes. The full envelope check list a server runs at recovery is -the same list it runs at write time — see -[Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants). - -A blob whose manifest envelope fails structural validation during rebuild is -**quarantined**, not silently dropped — moved to `{blob_root}/quarantine/` -with a sibling `.reason.json` recording the rejection code. This guarantees -that an unrecoverable byte sequence is preserved for forensic inspection -rather than vanishing on rebuild. - -Operationally the rebuild is invoked when a PostgreSQL restore is incomplete -or a logical-corruption event is detected; it is **never** the hot path. The -hot path runs through the authoritative PG index. The recovery path's job is -to make the index reconstructible if PG is lost, not to substitute for it. - -### Manifest Envelope Validation (Server-Side) - -Every write — `POST /upload`, `PATCH /upload/{id}`, finalization, any -lifecycle manifest, any federation pull — passes through structural -validation of the manifest envelope **before** any state is persisted. The -server holds no decryption key, so it cannot verify the cryptographic -signatures; but it does enforce that every envelope field is present, -structurally well-formed, within bounds, and consistent with the album the -manifest claims to address. - -The complete refuse-by-default checklist is owned by -[Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants). -A rejection at any check returns the rejection code listed there and writes -no state. This is what defeats the version-mismatched-client damage class -without requiring the server to hold a key. - -### Content-Addressing and Deduplication - -Naming blobs by their [ciphertext content hash](/design/cryptography/#primitives-inventory) makes deduplication free: a blob already present -is never stored twice. At upload-session creation the server checks for a blob -with the same content hash already owned by the uploader — an exact -local-and-remote duplicate is rejected up front, and an asset that exists -remotely under a *different* ciphertext resolves to a **merge** that links the -existing blob rather than storing a second copy (see -[Import and Synchronization](/design/import-synchronization/) — "Deduplication and -Merge"). Reference counting in PostgreSQL determines when a blob is genuinely -unreferenced. - -### PostgreSQL: What the Server Knows - -The server index records only what can be known without a key: - -- `asset_id`, `owner_id`, `album_id`, `upload_user_id` -- references to the asset's blobs (their [content hashes](/design/cryptography/#primitives-inventory)) and each blob's role -- `amk_version` — which album-key epoch encrypted the asset (see - [Cryptography](/design/cryptography/)) -- declared ciphertext size and `content_type` -- the `uploaded` flag and server-visible lifecycle state -- creation/modification timestamps and provenance records (see - [Cryptography](/design/cryptography/) — "Provenance of Library Modifications") - -No plaintext capture date, dimensions, EXIF, tags, or filename ever reaches the -server. Those live inside the encrypted metadata blob (see [Metadata Encryption](/design/cryptography/#metadata-encryption)) and are readable only by authorized clients. - -Session creation writes a *pending* asset row (`uploaded = false`) that reserves -the asset ID the bundle's blobs reference; finalization flips it. See the -session lifecycle in [Import and Synchronization](/design/import-synchronization/). - -### Ownership, Partitioning, and Quota - -`owner_id` is the billing and namespace entity; the `owner_id` → user-set -mapping lives in PostgreSQL and is mirrored as an MLS group (the Owner Group -Key — see [Cryptography](/design/cryptography/)). Storage quota is accounted to -`upload_user_id`, which is distinct from `owner_id`. The blob store itself is -not partitioned by owner — content-addressing is global — but every blob -*reference* is owner-scoped in PostgreSQL, and deduplication checks are scoped -to the owner. - -### Deletion and Garbage Collection - -The server cannot read an asset's `is_deleted` flag — it is inside the encrypted -metadata blob. Lifecycle transitions are therefore signalled by the client and -recorded as server-visible state on the asset row; soft delete is a state -change, not a file operation. Permanent deletion drops the asset's blob -references, and a blob whose reference count reaches zero becomes eligible for a -garbage-collection sweep. Consistent with the data-integrity principle, blob -removal is conservative — a blob is deleted only after its references are -provably gone. - -## Client Filesystem - -Clients hold keys, so a client stores plaintext. Desktop clients keep a -self-contained library directory; mobile clients use platform-sandboxed storage. - -What a client keeps locally depends on its sync setting — *metadata only*, -*metadata + thumbnails*, or *metadata + thumbnails + original* (see -[Import and Synchronization](/design/import-synchronization/) — "Synchronization -Scope"). A library therefore routinely contains assets whose original is -server-only, and the layout must represent an asset whether or not its original -bytes are present locally. - -### Desktop Library Layout - -```text -{library_root}/ -├── media/ -│ └── {YYYY}/{YYYY-MM}/ -│ ├── {uuid}.{ext} # original media (plaintext; absent if not synced locally) -│ ├── {uuid}.cbor # canonical metadata sidecar (plaintext, signed) -│ └── {uuid}.provenance.cbor # append-only signed provenance chain -├── cache/ -│ ├── thumbnails/{size}/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{fmt} -│ ├── meta/{uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor # verbose parsed metadata -│ └── transcodes/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{ext} -├── index/ -│ └── library.sqlite # rebuildable query + vector index -└── .library/ - ├── version # library schema version - ├── config # user preferences and library state - ├── lock # process lock file (ephemeral) - ├── trash/ - │ └── {uuid}.{ext} # soft-deleted media - └── quarantine/ - ├── {uuid}.{ext} # irreplaceable bytes that failed validation - └── {uuid}.reason.json # parse error / signature failure / schema mismatch -``` - -- **`media/`**: originals, their sidecars, and their provenance chains. Filenames are - `{UUIDv7}.{extension}` (always lowercase), `{UUIDv7}.cbor`, and - `{UUIDv7}.provenance.cbor` respectively. The CBOR sidecar is the client's - canonical, self-describing metadata record (see - [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1)) — the - plaintext counterpart of the encrypted metadata blob the server stores. The - `.provenance.cbor` file is an append-only signed log per asset (see - [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)); - the client never deletes it, so a hard-deleted asset leaves a - tombstone-with-history. Per the recovery-first principle, the entire library - is reconstructible from these three files alone. Files are date-bucketed by - capture timestamp because the client, unlike the server, can read capture - dates. -- **`cache/`**: purely derived and rebuildable — thumbnails and previews (formats declared in [Thumbnails and Previews](/design/thumbnails/#thumbnail-and-preview-formats)), verbose - parsed-metadata caches, and transcodes. Sharded by UUID prefix to bound - directory sizes. Deletable at any time; never a source of truth. -- **`index/library.sqlite`**: a rebuildable query cache over the sidecars, and - the local vector index backing AI features (`sqlite-vec` — see - [AI/ML Integrations](/design/ai/)). On a schema change it may be dropped and rebuilt - rather than migrated, since it is always reconstructible. -- **`.library/`**: library-scoped state — schema version, user configuration, a - process lock file that prevents two app instances from opening the same - library, the trash (soft-delete retention area), and `quarantine/` (where - irreplaceable bytes that failed structural or signature validation are - preserved verbatim alongside a `.reason.json` recording the rejection). The - quarantine area is the union surface listed in - [Threat Model — Quarantine Surfaces](/design/threat-model/#quarantine-surfaces). - -The full sidecar and SQLite schemas are owned by [Metadata](/design/metadata/) and not -duplicated here. - -### Mobile Clients - -Android and iOS use platform-sandboxed storage rather than a user-visible -library directory. The logical model is the same — originals (when synced), -canonical metadata, rebuildable caches, and a local SQLite index — but placement -follows each platform's sandbox rules. Capsule deliberately does not store -rebuildable derivatives in OS-managed cache locations: the OS may evict them -indiscriminately, and a thumbnail that is expensive to regenerate is not -genuinely disposable (see [Import and Synchronization](/design/import-synchronization/) -— "Space Recovery"). - -### Local Index Staleness - -SQLite may lag the filesystem after external edits or interrupted operations. -The client verifies file existence before acting on an index row and triggers a -full rebuild from sidecars when it detects structural inconsistency. Because the -index is always rebuildable, this recovery is safe. - -### Space Recovery - -Majority of data except non-backed up files are considered ephemeral but are not -considered disposable nor to be stored in cache storage. It is much easier for -the Capsule app to determine which versions of the same data can be retained and -which can be deleted. Storing thumbnails as cache may result in them being -deleted by the OS indiscriminately, when it is in fact useful. We provide tools -to analyze the biggest storage consumers and allow users to selectively delete -data. - -## Library Self-Maintenance - -The data-integrity principle treats client storage as *potentially lost* (see -[Core Principles](/design/principles/)): unlike the server, a client library -sits on consumer hardware, syncs only partially, and is edited by a long-lived -process that can be killed mid-write. A client therefore never assumes its -library is consistent — it periodically *proves* it is, repairs what it can -repair safely, and surfaces what it cannot. Three routines do this: -**scrubbing** removes the debris of interrupted operations, **self-validation** -confirms the library is structurally and bitwise intact, and **deduplication** -collapses byte-identical assets. All three are conservative — consistent with -"we can NEVER delete data unexpectedly," irreplaceable data is never removed -without explicit user confirmation. - -### Scrubbing - -A startup **scrub** sweeps the debris of interrupted writes. Atomic writes -(below) stage to `.tmp` files; a crash between the write and the rename strands -them. The scrub walks `media/` and removes `.tmp` files older than a few minutes -— the age floor avoids racing a write that is legitimately in flight elsewhere -in the process. It runs at most once every seven days, gated by a -`last_scrubbed_at` timestamp in the library config, since stale temp files are -harmless clutter rather than an urgent fault. Every removal is logged. The -server performs the equivalent sweep of stale `.part`/`.bin` files (see -[Atomic Writes and Crash Recovery](#atomic-writes-and-crash-recovery)). - -### Self-Validation - -Validation answers a stronger question than scrubbing: *is the library still a -faithful, interpretable copy of its assets?* It runs in two tiers, separated by -cost. - -**Structural validation** is a cheap directory walk, run at startup. It checks -the invariants of the [layout](#desktop-library-layout): - -- Every `{uuid}.{ext}` original has a matching `{uuid}.cbor` sidecar and - `{uuid}.provenance.cbor` chain. Every sidecar parses as valid CBOR with its - required fields present, has a `sidecar_schema` ≤ the client's max known - (per the [tightened Postel's Law](/design/principles/)), and bears a valid - signature from a device in the user's directory. -- A sidecar's `uuid` field matches its filename, and its date bucket matches its - capture timestamp. -- Every `cache/` entry (thumbnail, transcode, parsed-metadata cache) and every - `.library/trash/` file refers to an asset the library still knows. -- The provenance chain for each asset is walkable from `create` to head, with - each record's `prior_provenance_hash` matching the preceding record's content - hash. A break in the chain is a quarantine surface, not a silent skip. -- Index rows reference files that exist — this subsumes - [Local Index Staleness](#local-index-staleness) above. - -**Content validation** is expensive — it recomputes the [content hash](/design/cryptography/#primitives-inventory) of each locally -present original and compares it against the sidecar's `hash` field (the -algorithm-tagged form declared in [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1); -the algorithm itself follows whatever `crypto_suite_id` the sidecar carries). -The original is the only irreplaceable thing on a client, so -silent bit rot is the worst failure a client can suffer and nothing else detects -it. Because hashing every original is heavy I/O, content validation is not run -at startup: it is scheduled opportunistically (device idle, on power, unmetered) -and throttled, can be triggered on demand, and re-verifies each original on a -slow rolling cadence rather than all at once. - -### Repair - -Repair follows directly from the data-integrity principle — *ephemeral data is -rebuilt silently; irreplaceable data is never destroyed to resolve an -inconsistency.* - -| Finding | Action | -| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Stale `.tmp` / partial file | Deleted by the scrub. | -| Orphaned `cache/` entry | Deleted — derived and rebuildable. | -| Index inconsistency | Index dropped and rebuilt from sidecars — always safe. | -| Orphaned sidecar (no original) | Expected when the [sync scope](/design/import-synchronization/#synchronization-scope) is metadata-only — not a fault. Flagged only if the scope says the original should be present locally, in which case the original is re-fetched from the server. | -| Orphaned original (no sidecar) | The file is irreplaceable, so it is never deleted. It is moved to `.library/quarantine/` and surfaced to the user; the client attempts to re-derive a minimal sidecar from the file itself and the server index. | -| Malformed CBOR sidecar | The bytes are preserved — moved verbatim to `.library/quarantine/{uuid}.cbor` with a sibling `.reason.json` recording the parse error, and surfaced to the user. **Never silent-skipped:** a sidecar whose CBOR does not parse, whose required fields are missing, or whose `sidecar_schema` is above the client's max known is treated as a quarantine surface (see [Threat Model — Quarantine Surfaces](/design/threat-model/#quarantine-surfaces)). The client attempts to re-fetch a current sidecar from the server before treating the asset as lost. | -| Sidecar signature invalid | Same as malformed: quarantined, never auto-overwritten. The client re-fetches; a persistent failure surfaces the asset as "provenance broken" rather than silently dropping it. | -| Corrupt original (hash mismatch) | If the asset also exists on the server, the ciphertext blob is re-fetched and its derivatives re-generated. If the corrupt copy is the only copy — this device was its uploader and it was never synced — it cannot be auto-healed and is surfaced loudly. | - -Every finding and every repair is logged, so the state of the library is -reconstructible after the fact. - -### Deduplication - -Capsule deduplicates at three distinct layers, and they must not be confused: - -- **Server-side ciphertext dedup** — content-addressed blobs are never stored - twice (see [Content-Addressing and Deduplication](#content-addressing-and-deduplication)). -- **Import-time dedup** — import refuses an asset already uploaded from this - library and resolves a remote-only match to a merge (see - [Import and Synchronization](/design/import-synchronization/#deduplication-and-merge)). -- **Intra-library dedup** — described here: two assets *within one client - library* whose originals are byte-identical. - -Import-time dedup catches most duplicates as they arrive, but it cannot catch -all of them. Byte-identical assets still accumulate — the same file imported -from two different sources, a folder import that overlaps an earlier one, an -asset re-imported after its sidecar was lost, or a backup restored over a -library that still holds the originals. - -The dedup key is the plaintext **`hash.value`** recorded in every sidecar (the -algorithm-tagged form from [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1)) — -the same value the index lets the client look up directly. Two assets that share -it are exact duplicates. This is deliberately distinct from the server's -*ciphertext* hash: two devices may encrypt the same plaintext under different -album keys, so only the plaintext hash identifies duplicates across a library. - -Deduplication is **not** stacking. A RAW+JPEG pair, a burst, and a Live Photo -are *different bytes* deliberately kept together — they are -[stacked](/design/organization/#asset-stacking), never deduplicated. -Visually-similar but non-identical photos are a separate AI grouping feature -(Smart Selection) that never deletes. Dedup only ever acts on originals that are -bit-for-bit identical. - -Resolution is conservative and never silent. The client presents each duplicate -set and lets the user choose the survivor. On merge, the survivor inherits the -union of album memberships and tags (merged through the OR-set CRDT — see -[Metadata](/design/metadata/#collaborative-metadata)), the highest rating, and -the earliest import and capture timestamps; the losing copy is soft-deleted into -the trash, so the action is reversible and is recorded as a signed, -provenance-tracked modification like any other deletion (see -[Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications)). -Whole-library deduplication is a user-initiated maintenance action or a surfaced -suggestion — never an automatic background deletion — consistent with the rule -that data is never removed unexpectedly. - -## Atomic Writes and Crash Recovery - -Every write that must not tear uses temp-file + atomic rename, staged on the -same filesystem as its destination. The atomicity rule is enforced at three -granularities — the single file, the per-asset bundle, and the multi-asset -edit — each of which is owned by a section of -[Threat Model — Atomicity Invariants](/design/threat-model/#atomicity-invariants). - -- **Client — single-file writes.** Sidecar and provenance appends stage to - `{uuid}.cbor.tmp` and `{uuid}.provenance.cbor.tmp` in the destination - directory, then rename into place. A direct overwrite is never used. -- **Client — per-asset bundle.** An asset import or update is a *bundle*: - original (when present locally), sidecar, and a new provenance record. - All `.tmp` files stage first; only after every staged file is on disk do - the renames execute, and only in a fixed order (original → sidecar → - provenance). A failure at any rename discards every remaining `.tmp` and - rolls back the renames already done by deleting the just-renamed targets, - so the on-disk state never reflects a partial bundle. The - `.provenance.cbor` is the last to be renamed, so the existence of a new - provenance record implies the rest of the bundle is committed. -- **Client — stack edit.** A stack edit touches multiple sidecars and writes - a single provenance record per affected asset. All `.tmp` files (one per - sidecar plus one per provenance file) stage first and rename together; any - rename failure discards the entire batch. There is no partial stack. -- **Server — chunk assembly.** Chunks stage as `{upload_id}_{n}.part`; the - assembled blob is `{upload_id}.bin`. The blob is renamed into its - content-addressed location under `blobs/` only after the ciphertext hash - is recomputed and matches the declared value (see - [Import and Synchronization — Finalization and Integrity](/design/import-synchronization/#finalization-and-integrity)). -- **Server — finalization transaction.** The manifest envelope insert, the - blob rename, the metadata blob insert, the provenance blob insert, and - the asset row update commit in a single PostgreSQL transaction. The - server never exposes an asset whose bundle is partially persisted; a - crash between any pair leaves the session in `WaitingForProcessing` and - the next finalization attempt either completes the bundle or fails it - cleanly. - -On startup, each side scrubs incomplete work: stale `.part`, `.tmp`, and `.bin` -files left by an interrupted upload or import are identified and removed, and -the cleanup is logged. A blob or media file is never published, on either side, -until its integrity has been verified. - -## Encrypted Backups - -A backup is an export artifact — encrypted, self-describing, and kept outside -both `{library_root}` and `{blob_root}` — so it is not part of the live library -or the server blob store, and may be stored on external or cloud storage. Its -format, the master-key escrow, and the recovery flow are covered in -[Backup and Recovery](/design/backup-recovery/). diff --git a/capsule-docs/src/content/docs/design/filesystem/client.md b/capsule-docs/src/content/docs/design/filesystem/client.md new file mode 100644 index 0000000..eecd24c --- /dev/null +++ b/capsule-docs/src/content/docs/design/filesystem/client.md @@ -0,0 +1,65 @@ +--- +title: Client Filesystem +description: How clients lay out a library on disk — desktop, mobile, local index, and space recovery +--- + +Clients hold keys, so a client stores plaintext. Desktop clients keep a self-contained library directory; mobile clients use platform-sandboxed storage. The cross-platform logic lives in `capsule-core::library` (paths, init, open) and `capsule-core::db` (SQLite cache); per-platform glue lives in `capsule-sdk` and native client code. + +What a client keeps locally depends on its sync setting — *metadata only*, *metadata + thumbnails*, or *metadata + thumbnails + original* (see [Import — Synchronization Scope](/design/import/download-sync/#synchronization-scope)). A library therefore routinely contains assets whose original is server-only, and the layout must represent an asset whether or not its original bytes are present locally. + +The directory layout below is itself a contract — the recovery-first rebuild assumes exactly these filenames and sharding rules. + +## Desktop Library Layout + +```text +{library_root}/ +├── media/ +│ └── {YYYY}/{YYYY-MM}/ +│ ├── {uuid}.{ext} # original media (plaintext; absent if not synced locally) +│ ├── {uuid}.cbor # canonical metadata sidecar (plaintext, signed) +│ └── {uuid}.provenance.cbor # append-only signed provenance chain +├── cache/ +│ ├── thumbnails/{size}/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{fmt} +│ ├── meta/{uuid[0:2]}/{uuid[2:4]}/{uuid}.meta.cbor # verbose parsed metadata +│ └── transcodes/{uuid[0:2]}/{uuid[2:4]}/{uuid}.{ext} +├── index/ +│ └── library.sqlite # rebuildable query + vector index +└── .library/ + ├── version # library schema version + ├── config # user preferences and library state + ├── lock # process lock file (ephemeral) + ├── trash/ + │ └── {uuid}.{ext} # soft-deleted media + └── quarantine/ + ├── {uuid}.{ext} # irreplaceable bytes that failed validation + └── {uuid}.reason.json # parse error / signature failure / schema mismatch +``` + +- **`media/`**: originals, their sidecars, and their provenance chains. Filenames are `{UUIDv7}.{extension}` (always lowercase), `{UUIDv7}.cbor`, and `{UUIDv7}.provenance.cbor` respectively. The CBOR sidecar is the client's canonical, self-describing metadata record (see [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1)) — the plaintext counterpart of the encrypted metadata blob the server stores. The `.provenance.cbor` file is an append-only signed log per asset (see [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications)); the client never deletes it, so a hard-deleted asset leaves a tombstone-with-history. Per the recovery-first principle, the entire library is reconstructible from these three files alone. Files are date-bucketed by capture timestamp because the client, unlike the server, can read capture dates. +- **`cache/`**: purely derived and rebuildable — thumbnails and previews (formats declared in [Thumbnails — Thumbnail and Preview Formats](/design/thumbnails/#thumbnail-and-preview-formats)), verbose parsed-metadata caches, and transcodes. Sharded by UUID prefix to bound directory sizes. Deletable at any time; never a source of truth. +- **`index/library.sqlite`**: a rebuildable query cache over the sidecars, and the local vector index backing AI features (`sqlite-vec` — see [AI/ML Integrations](/design/ai/)). It is also the substrate for [view albums](/design/organization/#system--smart-albums-views) — system aggregations like *All* and user-defined smart albums are materialized by querying this index entirely client-side, with no server involvement. On a schema change it may be dropped and rebuilt rather than migrated, since it is always reconstructible. +- **`.library/`**: library-scoped state — schema version, user configuration, a process lock file that prevents two app instances from opening the same library, the trash (soft-delete retention area), and `quarantine/` (where irreplaceable bytes that failed structural or signature validation are preserved verbatim alongside a `.reason.json` recording the rejection). The quarantine area is the union surface listed in [Threat Model — Quarantine Surfaces](/design/threat-model/scenarios/#quarantine-surfaces). The `version` file pins the on-disk layout schema; a layout bump rebuilds derived structures (cache, index) and never touches the canonical original/sidecar/provenance files, so it cannot lose data. + +The full sidecar and SQLite schemas are owned by [Metadata](/design/metadata/) and not duplicated here. + +## Mobile Clients + +Android and iOS use platform-sandboxed storage rather than a user-visible library directory. The logical model is the same — originals (when synced), canonical metadata, rebuildable caches, and a local SQLite index — but placement follows each platform's sandbox rules. Capsule deliberately does **not** store rebuildable derivatives in OS-managed cache locations: the OS may evict them indiscriminately, and a thumbnail that is expensive to regenerate is not genuinely disposable (see [Space Recovery](#space-recovery)). + +## Local Index Staleness + +SQLite may lag the filesystem after external edits or interrupted operations. The client verifies file existence before acting on an index row and triggers a full rebuild from sidecars when it detects structural inconsistency. Because the index is always rebuildable, this recovery is safe. Detection and rebuild details are owned by [Maintenance](/design/filesystem/maintenance/). + +## Space Recovery + +Rebuildable data is deliberately **not** stored in OS-managed cache locations: the OS evicts indiscriminately, and a thumbnail that is expensive to regenerate is not genuinely disposable. Capsule manages reclamation itself — it surfaces the biggest storage consumers and lets the user selectively delete, and an original that is server-only after eviction is transparently re-fetched on demand. + +## Validation + +- **Library init/open round-trip (unit).** Create an empty library; open it; assert all directories present and `version`/`config` populated. Re-open; assert idempotency. +- **Date-bucketing correctness (unit).** Given a sidecar's `capture_timestamp`, assert the asset lands in exactly `media/{YYYY}/{YYYY-MM}/`. Negative test: capture timestamp inconsistent with directory bucket triggers a [maintenance](/design/filesystem/maintenance/) repair. +- **Process lock contention (smoke).** Open the library in process A; attempt to open in process B; assert clean refusal with a structured error. +- **Mobile sandbox placement (smoke per platform).** Per-platform test asserts the library is placed in the OS-blessed location for app private storage and survives an app cold-start. +- **Local index rebuild from sidecars (smoke).** Populate a library; drop `library.sqlite`; re-open; assert the index is rebuilt and queries return the same results as before. + +Cross-module case (full library lifecycle: import → upload → restore on a fresh client) is bounded E2E surface in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/filesystem/index.md b/capsule-docs/src/content/docs/design/filesystem/index.md new file mode 100644 index 0000000..d14dd08 --- /dev/null +++ b/capsule-docs/src/content/docs/design/filesystem/index.md @@ -0,0 +1,39 @@ +--- +title: Filesystem +description: How Capsule structures files on disk — server vs client, and what they share +--- + +Capsule's end-to-end encryption splits the filesystem into two fundamentally different roles. The **server** stores only opaque, content-addressed ciphertext — it never holds a decryption key and cannot interpret a single byte it stores. **Clients** hold the keys, so a client filesystem is a working library of plaintext media, sidecar metadata, and rebuildable caches. The two layouts share a small set of principles but otherwise have little in common. + +The on-disk layout is itself part of the contract — the filenames, directory structure, and atomic-write conventions are how recovery-first becomes operational, so they appear here verbatim rather than as suggestion. + +## Sub-docs + +| Sub-doc | Concern | Primary crate(s) | +| ----------------------------------------------- | --------------------------------------------------------------------------------- | -------------------------------------------------------------------- | +| [Server Filesystem](/design/filesystem/server/) | Blob store layout, Postgres index, deployment profiles, ownership, deletion | `capsule-api` + storage glue | +| [Client Filesystem](/design/filesystem/client/) | Desktop / mobile library layout, local SQLite index, space recovery | `capsule-core::{library,db}` + per-platform glue | +| [Maintenance](/design/filesystem/maintenance/) | Self-validation, scrubbing, repair, intra-library dedup, atomic-write granularity | `capsule-core::library` (client) + `capsule-api` (server-side scrub) | + +This index covers the principles both sides share. The import pipeline, the upload protocol, and synchronization are covered in [Import and Synchronization](/design/import/); metadata extraction in [Metadata](/design/metadata/); derivative generation in [Thumbnails and Previews](/design/thumbnails/); grouping and trash semantics in [Asset Organization](/design/organization/); backup and recovery in [Backup and Recovery](/design/backup-recovery/). + +## Shared Principles + +These follow directly from [Core Principles](/design/principles/): + +- **Recovery-first.** No database is required to interpret canonical data. On the client, sidecar files are the source of truth and the index is a rebuildable cache. On the server, PostgreSQL is the authoritative index, but it holds only key-free facts. +- **Atomic writes.** Every write that must not tear uses temp-file + atomic rename on the same filesystem. Direct overwrites risk corruption on power loss. The full per-granularity rules live in [Maintenance — Atomic Writes](/design/filesystem/maintenance/#atomic-writes-and-crash-recovery). +- **Ephemeral derived data.** Only originals and their canonical metadata are irreplaceable. Thumbnails, transcodes, parsed-metadata caches, and the query index can all be regenerated and are treated as such. +- **4 KiB alignment.** Data is processed and written block-aligned to 4 KiB, which matches memory and disks and enables the [reflink assembly path](/design/import/upload-protocol/#server-side-storage-and-assembly). +- **Content-addressing.** Stored blobs are named by their ciphertext content hash — the same hash everywhere a content address is needed (see [Cryptography — Primitives](/design/cryptography/primitives/)). + +## Server vs Client at a Glance + +| Concern | Server | Client | +| ------------ | ------------------------------------------ | --------------------------------------------- | +| Holds keys | No | Yes | +| Stored form | Opaque ciphertext blobs | Plaintext media + CBOR sidecars | +| Naming | Content-addressed by ciphertext hash | UUIDv7 stems, date-bucketed | +| Index | PostgreSQL (key-free facts only) | SQLite (rebuildable, full plaintext metadata) | +| Derived data | Stored as client-generated encrypted blobs | Generated locally, cached, rebuildable | +| Originals | Always retained while referenced | Present only if synced locally | diff --git a/capsule-docs/src/content/docs/design/filesystem/maintenance.md b/capsule-docs/src/content/docs/design/filesystem/maintenance.md new file mode 100644 index 0000000..edde36c --- /dev/null +++ b/capsule-docs/src/content/docs/design/filesystem/maintenance.md @@ -0,0 +1,94 @@ +--- +title: Library Maintenance and Atomic Writes +description: How Capsule keeps client storage consistent, repairs what it can, and writes atomically +--- + +The data-integrity principle treats client storage as *potentially lost* (see [Core Principles](/design/principles/)): unlike the server, a client library sits on consumer hardware, syncs only partially, and is edited by a long-lived process that can be killed mid-write. A client therefore never assumes its library is consistent — it periodically *proves* it is, repairs what it can repair safely, and surfaces what it cannot. + +The maintenance routines live in `capsule-core::library`: [`scrub`](#scrubbing), [self-validation](#self-validation), [repair](#repair), and [`dedup`](#deduplication). The server runs an equivalent scrub of stale `.part`/`.bin` files under `incoming/`. All routines are **conservative** — consistent with "we can NEVER delete data unexpectedly," irreplaceable data is never removed without explicit user confirmation. + +This doc also owns the granularity rules for [atomic writes](#atomic-writes-and-crash-recovery), which other docs reference but should not restate. + +## Scrubbing + +A startup **scrub** sweeps the debris of interrupted writes. Atomic writes (below) stage to `.tmp` files; a crash between the write and the rename strands them. The scrub walks `media/` and removes `.tmp` files older than **10 minutes** (configurable) — the age floor avoids racing a write that is legitimately in flight elsewhere in the process. It runs at most once every seven days, gated by a `last_scrubbed_at` timestamp in the library config, since stale temp files are harmless clutter rather than an urgent fault. Every removal is logged. The server performs the equivalent sweep of stale `.part`/`.bin` files (see [Atomic Writes and Crash Recovery](#atomic-writes-and-crash-recovery)). + +## Self-Validation + +Validation answers a stronger question than scrubbing: *is the library still a faithful, interpretable copy of its assets?* It runs in two tiers, separated by cost. + +### Structural Validation (Cheap, at Startup) + +A directory walk that checks the invariants of the [client layout](/design/filesystem/client/#desktop-library-layout): + +- Every `{uuid}.{ext}` original has a matching `{uuid}.cbor` sidecar and `{uuid}.provenance.cbor` chain. Every sidecar parses as valid CBOR with its required fields present, has a `sidecar_schema` ≤ the client's max known (per the [tightened Postel's Law](/design/principles/#postels-law-asymmetric)), and bears a valid signature from a device in the user's directory. +- A sidecar's `uuid` field matches its filename, and its date bucket matches its capture timestamp. +- Every `cache/` entry (thumbnail, transcode, parsed-metadata cache) and every `.library/trash/` file refers to an asset the library still knows. +- The provenance chain for each asset is walkable from `create` to head, with each record's `prior_provenance_hash` matching the preceding record's content hash. A break — a missing record or a non-matching `prior_provenance_hash` — is a quarantine surface, not a silent skip. +- Index rows reference files that exist — this subsumes the [local index staleness](/design/filesystem/client/#local-index-staleness) check. + +### Content Validation (Expensive, Scheduled) + +Recomputes the [content hash](/design/cryptography/primitives/) of each locally present original and compares it against the sidecar's `hash` field (the algorithm-tagged form declared in [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1); the algorithm itself follows whatever `crypto_suite_id` the sidecar carries). The original is the only irreplaceable thing on a client, so silent bit rot is the worst failure a client can suffer and nothing else detects it. + +Because hashing every original is heavy I/O, content validation is **not** run at startup: it is scheduled opportunistically (device idle, on power, unmetered) and throttled, can be triggered on demand, and re-verifies each original on a slow rolling cadence rather than all at once. + +## Repair + +Repair follows directly from the data-integrity principle — *ephemeral data is rebuilt silently; irreplaceable data is never destroyed to resolve an inconsistency.* + +| Finding | Action | +| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Stale `.tmp` / partial file | Deleted by the scrub. | +| Orphaned `cache/` entry | Deleted — derived and rebuildable. | +| Index inconsistency | Index dropped and rebuilt from sidecars — always safe. | +| Orphaned sidecar (no original) | Expected when the [sync scope](/design/import/download-sync/#synchronization-scope) is metadata-only — not a fault. Flagged only if the scope says the original should be present locally, in which case the original is re-fetched from the server. | +| Orphaned original (no sidecar) | The file is irreplaceable, so it is never deleted. It is moved to `.library/quarantine/` and surfaced to the user; the client attempts to re-derive a minimal sidecar from the file itself and the server index. | +| Malformed CBOR sidecar | The bytes are preserved — moved verbatim to `.library/quarantine/{uuid}.cbor` with a sibling `.reason.json` recording the parse error, and surfaced to the user. **Never silent-skipped:** a sidecar whose CBOR does not parse, whose required fields are missing, or whose `sidecar_schema` is above the client's max known is treated as a quarantine surface (see [Threat Model — Quarantine Surfaces](/design/threat-model/scenarios/#quarantine-surfaces)). The client attempts to re-fetch a current sidecar from the server before treating the asset as lost. | +| Sidecar signature invalid | Same as malformed: quarantined, never auto-overwritten. The client re-fetches; a persistent failure surfaces the asset as "provenance broken" rather than silently dropping it. | +| Corrupt original (hash mismatch) | If the asset also exists on the server, the ciphertext blob is re-fetched and its derivatives re-generated. If the corrupt copy is the only copy — this device was its uploader and it was never synced — it cannot be auto-healed and is surfaced loudly. | + +Every finding and every repair is logged, so the state of the library is reconstructible after the fact. + +## Deduplication + +Capsule deduplicates at three distinct layers, and they must not be confused: + +- **Server-side ciphertext dedup** — content-addressed blobs are never stored twice (see [Server — Content-Addressing and Deduplication](/design/filesystem/server/#content-addressing-and-deduplication)). +- **Import-time dedup** — import refuses an asset already uploaded from this library and resolves a remote-only match to a merge (see [Upload Protocol — Deduplication and Merge](/design/import/upload-protocol/#deduplication-and-merge)). +- **Intra-library dedup** — described here: two assets *within one client library* whose originals are byte-identical. + +Import-time dedup catches most duplicates as they arrive, but it cannot catch all of them. Byte-identical assets still accumulate — the same file imported from two different sources, a folder import that overlaps an earlier one, an asset re-imported after its sidecar was lost, or a backup restored over a library that still holds the originals. + +The dedup key is the plaintext **`hash`** digest recorded in every sidecar (see [Metadata — Sidecar Schema v1](/design/metadata/#sidecar-schema-v1)) — the same value the index lets the client look up directly. Two assets that share it are exact duplicates. This is deliberately distinct from the server's *ciphertext* hash: two devices may encrypt the same plaintext under different album keys, so only the plaintext hash identifies duplicates across a library. + +Deduplication is **not** stacking. A RAW+JPEG pair, a burst, and a Live Photo are *different bytes* deliberately kept together — they are [stacked](/design/organization/#asset-stacking), never deduplicated. Visually-similar but non-identical photos are a separate AI grouping feature (Smart Selection) that never deletes. Dedup only ever acts on originals that are bit-for-bit identical. + +Resolution is conservative and never silent. The client presents each duplicate set and lets the user choose the survivor. On merge, the survivor inherits the union of album memberships and tags (merged through the OR-set CRDT — see [Metadata — Collaborative Metadata](/design/metadata/#collaborative-metadata)), the highest rating, and the earliest import and capture timestamps; the losing copy is soft-deleted into the trash, so the action is reversible and is recorded as a signed, provenance-tracked modification like any other deletion (see [Provenance](/design/cryptography/provenance/#provenance-of-library-modifications)). Whole-library deduplication is a user-initiated maintenance action or a surfaced suggestion — never an automatic background deletion — consistent with the rule that data is never removed unexpectedly. + +## Atomic Writes and Crash Recovery + +Every write that must not tear uses temp-file + atomic rename, staged on the same filesystem as its destination. The atomicity rule is enforced at three granularities — the single file, the per-asset bundle, and the multi-asset edit. These are also the canonical statement of the rule; [Threat Model — Atomicity Invariants](/design/threat-model/validation/#atomicity-invariants) cross-references them and is where the cross-doc invariant lives. + +- **Client — single-file writes.** Sidecar and provenance appends stage to `{uuid}.cbor.tmp` and `{uuid}.provenance.cbor.tmp` in the destination directory, then rename into place. A direct overwrite is never used. +- **Client — per-asset bundle.** An asset import or update is a *bundle*: original (when present locally), sidecar, and a new provenance record. All `.tmp` files stage first; only after every staged file is on disk do the renames execute, and only in a fixed order (original → sidecar → provenance). A failure at any rename discards every remaining `.tmp` and rolls back the renames already done by deleting the just-renamed targets, so the on-disk state never reflects a partial bundle. The `.provenance.cbor` is the last to be renamed, so the existence of a new provenance record implies the rest of the bundle is committed. +- **Client — stack edit.** A stack edit touches multiple sidecars and writes a single provenance record per affected asset. All `.tmp` files (one per sidecar plus one per provenance file) stage first and rename together; any rename failure discards the entire batch. There is no partial stack. +- **Server — chunk assembly.** Chunks stage as `{upload_id}_{n}.part`; the assembled blob is `{upload_id}.bin`. The blob is renamed into its content-addressed location under `blobs/` only after the ciphertext hash is recomputed and matches the declared value (see [Upload Protocol — Finalization and Integrity](/design/import/upload-protocol/#finalization-and-integrity)). +- **Server — finalization transaction.** The blob rename into its content-addressed `blobs/` location is a filesystem operation and so necessarily happens *before* the Postgres commit; the manifest-envelope insert, metadata-blob insert, provenance-blob insert, and asset-row `uploaded` flip then commit in a **single PostgreSQL transaction**. That ordering is what makes every crash point safe: a crash *before* the rename leaves only `incoming/` debris (scrubbed below); a crash *after* the rename but *before* the commit leaves a finalized blob in `blobs/` that **no committed row references** — an orphan the [reference-count GC](/design/filesystem/server/#deletion-and-garbage-collection) reclaims, while the idempotent retry re-finalizes against the already-present blob (re-placing a content-addressed hash is a no-op). The "single transaction" guarantee is over the **index rows**; blob *placement* is idempotent and GC-safe precisely because it is content-addressed. The server never exposes an asset whose index bundle is partially persisted — the session stays in `WaitingForProcessing` until a finalization attempt commits the whole bundle or fails it cleanly. + +On startup, each side scrubs incomplete work: stale `.part`, `.tmp`, and `.bin` files left by an interrupted upload or import are identified and removed, and the cleanup is logged. A blob or media file is never published, on either side, until its integrity has been verified. + +## Encrypted Backups + +A backup is an export artifact — encrypted, self-describing, and kept outside both `{library_root}` and `{blob_root}` — so it is not part of the live library or the server blob store, and may be stored on external or cloud storage. Its format, the master-key escrow, and the recovery flow are covered in [Backup and Recovery](/design/backup-recovery/). + +## Validation + +- **Scrub age-floor (unit).** Create a `.tmp` file aged < N minutes; assert scrub leaves it. Age it past the floor; assert removal. +- **Structural validation (unit).** Each invariant in the [Structural Validation](#structural-validation-cheap-at-startup) list gets a negative test case (missing sidecar, missing provenance, schema regression, signature failure, date-bucket drift, orphaned cache/trash entry, broken provenance chain). Each produces a structured finding. +- **Content validation throttling (smoke).** Inject many originals; trigger content validation; assert it does not stall the app and respects power/connectivity gates. +- **Repair safety (unit).** Each row of the repair table is a unit test: trigger the finding, run repair, assert the *exact* action (delete vs quarantine vs re-fetch) was taken. +- **Intra-library dedup correctness (unit).** Two assets with identical plaintext hash; assert dedup proposes the right survivor (union albums, max rating, earliest timestamps), records a soft-delete provenance for the loser, and is reversible. +- **Atomic-write crash simulation (smoke).** Programmatically interrupt a bundle write between each pair of staged steps; assert no on-disk state reflects a partial bundle on next startup. + +Cross-module case (server crash mid-finalization → recovery on restart) is bounded E2E surface in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/filesystem/server.md b/capsule-docs/src/content/docs/design/filesystem/server.md new file mode 100644 index 0000000..c8b0e8e --- /dev/null +++ b/capsule-docs/src/content/docs/design/filesystem/server.md @@ -0,0 +1,118 @@ +--- +title: Server Filesystem +description: The server's blob store layout, Postgres index, and deployment profiles +--- + +The server's job is to hold ciphertext blobs and a key-free index that maps assets to blobs. It performs no decoding, no metadata extraction, and no thumbnail generation — it cannot, since it never holds a decryption key. The blob layout below **is** the contract: a server-side rebuild (re-deriving the Postgres index from blob bytes) depends on the file naming and the manifest envelope being exactly as specified here. + +Implemented in `capsule-api` (blob storage, Postgres index, manifest envelope validation). The session-state store is a [deployment choice](#deployment-profiles), not a versioned API surface. + +## Deployment Profiles + +The server's durable state is always split across **two required systems** plus an **optional third** for high-concurrency deployments: + +- **Blob store** (filesystem) — the encrypted bytes of every asset. *Required.* +- **PostgreSQL** — the authoritative index: ownership, album references, blob references, lifecycle state, and (in the default profile) upload-session state. *Required.* +- **Valkey** — volatile upload-session state (offsets, status) with a 24-hour TTL. *Optional.* Recommended only for deployments where upload-session hot-path contention on PostgreSQL becomes measurable. + +This gives two concrete deployment profiles: + +| Profile | Session state lives in | When to choose it | +| --------------------------- | ------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- | +| **Default (Postgres-only)** | `upload_sessions` table with `expires_at` TTL column and a periodic sweep | Self-hosted, small-to-medium servers, single-node deployments. Reduces ops surface. | +| **High-concurrency** | Valkey (keyed `upload:session:{id}`) with native 24-hour TTL; PostgreSQL still holds the durable pending-asset row | Large multi-tenant deployments where session-table contention is a measured bottleneck | + +Switching profiles is operationally invisible to clients — the [upload protocol](/design/import/upload-protocol/) does not change, only where the server stores volatile session counters. The protocol is written to be store-agnostic. + +## Blob Store Layout + +```text +{blob_root}/ +├── incoming/ +│ ├── {upload_id}_{n}.part # in-flight chunk +│ └── {upload_id}.bin # assembled blob, pre-verification +├── blobs/ +│ └── {hash[0:2]}/{hash[2:4]}/ +│ └── {hash} # finalized blob, content-addressed +└── .server/ + ├── version # server filesystem schema version + └── config # server-wide configuration +``` + +- **`{blob_root}`**: absolute path configured at server startup. The entire tree must be on a single filesystem so that finalization renames are atomic. +- **`incoming/`**: live uploads. Chunks land as `{upload_id}_{n}.part`; on finalization they are concatenated into `{upload_id}.bin`. The 4 KiB chunk alignment is what allows each chunk to be reflinked into place on copy-on-write filesystems, turning assembly into a near-instant metadata operation. See the upload protocol in [Import — Upload Protocol](/design/import/upload-protocol/). +- **`blobs/`**: the finalized store. A blob's filename is its [ciphertext content hash](/design/cryptography/primitives/); the two-level hex-prefix shard keeps directory sizes bounded for multi-million-blob stores. A finalized blob is immutable. +- **`.server/`**: the server operator's own configuration and schema version. This is plaintext server metadata, not user data — it is the one thing under `{blob_root}` that is not an encrypted blob. + +## Uniform, Opaque Blobs + +A single asset produces a **bundle** of blobs (see [Import — Upload Protocol: What Gets Uploaded](/design/import/upload-protocol/#what-gets-uploaded)): the encrypted original, encrypted derivatives (thumbnails, previews), the encrypted CBOR metadata blob (which carries the LQIP), and the encrypted provenance blob (see [Cryptography — Provenance](/design/cryptography/provenance/)). The blob store does not distinguish them — every blob is just content-addressed ciphertext. The mapping from an asset to its constituent blobs, and the role of each blob, lives entirely in PostgreSQL. + +## Recovering the Index from Blobs Alone + +The PostgreSQL index is authoritative but **not the only copy** of what the server knows. Every blob carries enough server-visible structural metadata — the [unencrypted portion](/design/cryptography/provenance/#asset-manifest) of the asset manifest — to rebuild the index row that referenced it. This is the server-side counterpart of the recovery-first principle that lets a client rebuild its index from CBOR sidecars. + +The server-visible portion of a blob includes: + +- `crypto_suite_id`, `protocol_version`, `amk_version` — what bundle of primitives encrypted this asset and which album epoch +- the ciphertext hash and declared size — content address and storage attribution +- `created_by_user`, `created_by_device`, `album_id`, `file_id`, `prior_provenance_hash`, `action` — owner, provenance chain link, and lifecycle action +- the device's hybrid signature — provenance attribution; verifiable against the public device directory even without any key the server holds + +A rebuild walks `blobs/`, reads the manifest envelope of each blob, verifies the device signature against the cached device directory, and writes an index row. The rebuild is idempotent: re-running it against an existing index produces no changes. The full envelope check list a server runs at recovery is the same list it runs at write time — see [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants). + +A blob whose manifest envelope fails structural validation during rebuild is **quarantined**, not silently dropped — moved to `{blob_root}/quarantine/` with a sibling `.reason.json` recording the rejection code. This guarantees that an unrecoverable byte sequence is preserved for forensic inspection rather than vanishing on rebuild. + +Operationally the rebuild is invoked when a PostgreSQL restore is incomplete or a logical-corruption event is detected; it is **never** the hot path. The hot path runs through the authoritative PG index. The recovery path's job is to make the index reconstructible if PG is lost, not to substitute for it. + +## Manifest Envelope Validation + +Every write — `POST /upload`, `PATCH /upload/{id}`, finalization, any lifecycle manifest, any federation pull — passes through structural validation of the manifest envelope **before** any state is persisted. The server holds no decryption key, so it cannot verify the cryptographic signatures; but it does enforce that every envelope field is present, structurally well-formed, within bounds, and consistent with the album the manifest claims to address. + +The complete refuse-by-default checklist is owned by [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants). A rejection at any check returns the rejection code listed there and writes no state. This is what defeats the version-mismatched-client damage class without requiring the server to hold a key. + +## Content-Addressing and Deduplication + +Naming blobs by their [ciphertext content hash](/design/cryptography/primitives/) makes deduplication free: a blob already present is never stored twice. At upload-session creation the server checks for a blob with the same content hash already owned by the uploader — an exact local-and-remote duplicate is rejected up front, and an asset that exists remotely under a *different* ciphertext resolves to a **merge** that links the existing blob rather than storing a second copy (see [Import — Upload Protocol: Deduplication and Merge](/design/import/upload-protocol/#deduplication-and-merge)). Reference counting in PostgreSQL determines when a blob is genuinely unreferenced. + +## PostgreSQL: What the Server Knows + +The server index records only what can be known without a key: + +- `asset_id`, `owner_id`, `album_id`, `upload_user_id` +- references to the asset's blobs (their [content hashes](/design/cryptography/primitives/)) and each blob's role +- `amk_version` — which album-key epoch encrypted the asset +- declared ciphertext size and `content_type` +- the `uploaded` flag and server-visible lifecycle state +- the server's own trusted `received_at` per write — the authoritative clock for time-based policy (retention, rate limits) — alongside the client's self-asserted, audit-only `timestamp` +- provenance records (see [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications)) + +No plaintext capture date, dimensions, EXIF, tags, or filename ever reaches the server. Those live inside the encrypted metadata blob (see [Metadata Encryption](/design/cryptography/encryption/#metadata-encryption)) and are readable only by authorized clients. + +Session creation writes a *pending* asset row (`uploaded = false`) that reserves the asset ID the bundle's blobs reference; finalization flips it. See the [session lifecycle](/design/import/upload-protocol/#session-lifecycle). + +## Ownership, Partitioning, and Quota + +`owner_id` is the billing and namespace entity; the `owner_id` → user-set mapping lives in PostgreSQL and is mirrored as an MLS group (the [Owner Group Key](/design/cryptography/keys/#owner-group-keys-ogks)). Storage quota is accounted to `upload_user_id`, which is distinct from `owner_id` — the full quota model is owned by [Quota](/design/quota/). The blob store itself is not partitioned by owner — content-addressing is global — but every blob *reference* is owner-scoped in PostgreSQL, and deduplication checks are scoped to the owner. + +The owner record also carries a non-secret **`default_album_id`** pointer (and an optional `(scope → album_id)` override map) naming the owner's [default album](/design/organization/#the-default-album) — the import destination when the user picks none. It is a plain UUID the server stores and serves but never acts on for authorization: a write is still gated on real album write capability ([invariant 6](/design/threat-model/validation/#server-side-validation-invariants)), so the pointer is discovery convenience, not a security control. Album *contents* stay E2E-encrypted; the server learns only which album UUID is currently the default. + +## Deletion and Garbage Collection + +The server cannot read an asset's `is_deleted` flag — it lives inside the encrypted metadata blob. Lifecycle transitions are signalled by the client and recorded as server-visible state on the asset row; soft delete is a state change, not a file operation. Permanent deletion drops the asset's blob references. A blob is removed **only** when it is provably unreferenced, and the mechanism is deliberately built so that a bug biases toward *keeping* bytes, never deleting live ones. + +- **Reference counting is the single source of truth.** A blob's reference count is a query over committed asset / derivative / metadata / provenance rows — never a separately-maintained counter that could drift out of sync. A blob is GC-eligible only when that query returns zero. +- **Two-phase mark-and-sweep with a grace window.** Reaching zero references *marks* a blob (records `collectable_since`); it is swept only after a configurable grace window (default 24–72 h) **and** only after the zero-reference count is re-confirmed inside the deleting transaction (`SELECT … FOR UPDATE` over the reference set). A reference reappearing during the window — an in-flight finalization retry, a concurrent merge — cancels the mark. This reclaims the finalization-crash orphan (a blob renamed into `blobs/` whose Postgres commit never landed; see [Maintenance — Atomic Writes](/design/filesystem/maintenance/#atomic-writes-and-crash-recovery)) without ever racing a legitimate late reference. +- **A Postgres↔filesystem mismatch is never resolved by deletion.** The two directions are asymmetric because only one risks data loss. A blob in `blobs/` with no referencing row is an orphan, reclaimed by the zero-reference sweep above. A committed row referencing a blob **missing** from `blobs/` is a *loud* integrity error — surfaced, logged, and quarantined for an operator — **never** auto-deleted: erasing the dangling row would destroy the only record that the asset should exist, exactly the data-loss class the [data-integrity principle](/design/principles/) forbids. +- **Auditable, reversible by default.** Every GC decision is logged with the blob hash, the observed reference count, and the mark/sweep timestamps (per the [traceability principle](/design/principles/)); a dry-run mode reports what *would* be collected without removing anything, so a suspect sweep can be inspected before it runs. + +## Validation + +- **Layout round-trip (unit).** Upload, finalize, rename, and assert the blob lives at exactly `blobs/{hash[0:2]}/{hash[2:4]}/{hash}` on disk. Recompute the hash from disk; assert match. +- **Index rebuild idempotency (smoke).** Take a real testcontainer Postgres + a populated `blobs/` tree, drop the index tables, run the rebuild routine, assert every row matches a hand-derived expected set. Re-run; assert zero changes. +- **Quarantine on malformed envelope (unit).** Inject a blob with a corrupted manifest envelope into `blobs/`; run rebuild; assert the blob moves to `quarantine/` with a `.reason.json` that names the structural check that failed. +- **Deployment-profile parity (smoke).** Run the upload-server smoke suite against the Postgres-only profile and the Postgres+Valkey profile; assert byte-identical client-observable behavior. +- **Reference-count GC safety (unit).** Decrement a blob's last reference; assert eligibility for GC; assert GC only proceeds after a configurable grace period; concurrent re-reference during the grace period cancels GC. +- **Dangling-reference safety (unit).** Point a committed row at a blob hash absent from `blobs/`; run the integrity check; assert the row is surfaced/quarantined and **never** auto-deleted, and that the missing blob is not treated as collectable. + +Cross-module cases (upload → finalize → rebuild from blobs) are bounded E2E surface listed in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/import-synchronization.md b/capsule-docs/src/content/docs/design/import-synchronization.md deleted file mode 100644 index 69de6f9..0000000 --- a/capsule-docs/src/content/docs/design/import-synchronization.md +++ /dev/null @@ -1,270 +0,0 @@ ---- -title: Import and Synchronization -description: How Capsule imports and synchronizes assets across devices and platforms ---- - -We define **import** as the process of taking assets from an external source (e.g. a camera, a directory on the filesystem) and bringing them into Capsule's management. This involves scanning the files, extracting metadata, and preparing them for upload. - -We split [synchronization](#synchronization) into two parts: - -- Upload: Locally stored assets are uploaded to the server and made available across devices. -- Download: Assets are downloaded from the server to local devices as needed. - -Capsule additionally produces [encrypted backups](/design/backup-recovery/) — encrypted, portable exports of a library — which are covered separately. - -## Import - -Every import is deterministic and idempotent. But imports can be partially completed. Every import is identified by an *import ID*. - -### Import Pipeline - -Our import pipeline is as follows: - -- Initiate import: Users initiate an import in one of the following methods: - - Manual: User selects files or directories to import through the UI. It can either point to a flat structure or a standardized directory structure (e.g. DCIM) - - Automated: Platforms (primarily mobile) can automatically detect new media in directories being watched and appropriately trigger imports. -- File scanning and metadata extraction: *See [Metadata](/design/metadata/)* for details on how we extract metadata and organize files. -- Import planning and confirmation: - - Before we import any file, we parse and verify it is a format we support. We strictly reject unsupported formats to avoid any issues later on. The server independently enforces a closed-enum `content_type` allow-list at session creation (see [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants)), so a malicious or buggy client declaring an unsupported format is rejected before any bytes are uploaded. Bytes received over the wire are decoded only inside the [client's sandboxed decoder](/design/clients/#sandboxed-decoder), so a format-mismatch attack cannot reach the host process. - - Based on the scanned files and extracted metadata, we can provide users with a summary of what will be imported (e.g. number of files, total size, any issues detected) and allow them to confirm or adjust the import. - - If uploaded assets are detected locally, we will refuse to import them. Note even if asset exists remotely, since we defer encryption and hash of encrypted blob until upload, we will allow import but upload will involve a merge operation. -- Execute import on each new file to be imported in order specified by [Upload Prioritization](#upload-prioritization): - - - Import into detected space: We can automatically move the files that are to be imported into the appropriate space. We compute the necessary metadata for cryptography (detailed in [Cryptography](/design/cryptography/)) and prepare the files for upload. This step can be optimized by parallelizing the processing of files and prioritizing certain files based on heuristics (see [Upload Prioritization](#upload-prioritization)). - - Generate thumbnails and previews: *See [Thumbnails](/design/thumbnails/)* for details on how we generate thumbnails and previews. - - Upload files: We choose to upload the files based on criterias outlined in [Sync](#synchronization). - -## Synchronization - -Core to the synchronization mechanism is the E2E/encryption requirements (see [Cryptography](/design/cryptography/)). This means that uploading and downloading require careful management of all asset metadata to ensure asset is accessible and properly decrypted on all devices (and inaccessible to unauthorized parties). - -### Upload - -Every upload is idempotent but stateful. Uploads can be completed partially and are identified by an *upload ID*. - -The upload path is a critical hot path. Its design is held to a higher standard of correctness and performance than the rest of the API: it must behave predictably under interrupted connections, concurrent transfers, and constrained hardware. The protocol below is deliberately *strict* — ambiguity in a resumable transfer protocol is what produces silent corruption and orphaned state. - -#### Protocol & Mechanics - -##### What Gets Uploaded - -An asset is never uploaded as a single plaintext file. Because Capsule is end-to-end encrypted (see [Cryptography](/design/cryptography/)), the client **encrypts and signs** everything *before* transmission, and the server only ever stores opaque, content-addressed ciphertext blobs. A single imported asset produces a **bundle** of blobs: - -- The **original blob** — the source asset, encrypted under the [bulk AEAD](/design/cryptography/#bulk-aead) with the [STREAM construction](/design/cryptography/#stream-construction). -- **Derivative blobs** — thumbnails, previews, and LQIP, generated client-side during import (see [Thumbnails](/design/thumbnails/)), each encrypted independently. -- The **metadata blob** — the CBOR metadata document (capture date, dimensions, EXIF-derived fields, provenance), encrypted under the [bulk AEAD](/design/cryptography/#bulk-aead) (see [Metadata](/design/metadata/)). - -Each blob is its own upload with its own upload ID; the protocol does not couple them. The client is responsible for completing the full set, and the server exposes the asset to other devices only once its required members (at minimum the original and metadata blobs) are finalized. Using one uniform mechanism for every blob type keeps the protocol small, and decoupling lets small derivatives land quickly while a large original is still transferring. - -The server performs no decoding, no metadata extraction, and no thumbnail generation — it cannot, since it never holds a decryption key. All such work happens client-side during [import](#import). - -##### Design Invariants - -The upload protocol guarantees the following, and every endpoint is designed to uphold them: - -- **Content-addressed.** Every blob is identified by its [ciphertext content hash](/design/cryptography/#primitives-inventory). The plaintext hash is never transmitted to the server. -- **Idempotent.** Re-creating a session for a blob already stored is a no-op that resolves to the existing asset. Re-sending a chunk at an already-acknowledged offset is accepted and simply returns the current offset. -- **Resumable.** A session survives connection loss for the lifetime of its TTL. A client resumes by querying the authoritative offset and continuing from there — no bytes are re-sent unnecessarily. -- **Strictly bounded.** The total ciphertext size is declared at session creation and immutable thereafter. The cumulative received bytes may never exceed it, nor exceed the server's per-file limit. -- **Verified.** No upload is marked complete until the server has recomputed the ciphertext hash and confirmed it matches the declared value. -- **Recoverable.** Every session is either driven to a terminal state or garbage-collected. There are no permanently orphaned chunks or pending asset rows. - -##### Upload Protocol - -We use a custom resumable-upload protocol modeled on [TUS](https://tus.io/) but trimmed to our needs: no per-request capability negotiation, no metadata smuggled in headers, ciphertext-only payloads. All endpoints are authenticated with a bearer JWT. Compatibility is instead gated once, up front — see [Protocol Versioning](#protocol-versioning). - -| Method | Path | Purpose | -| -------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `POST` | `/upload` | Create a session. Body declares ciphertext `size`, `hash` (the [content hash](/design/cryptography/#primitives-inventory) as a tagged object `{ algo, value }`), `content_type` (closed enum), `crypto_suite_id`, `protocol_version`, `manifest_envelope` (the unencrypted manifest fields the server validates per [Threat Model — Server-Side Validation Invariants](/design/threat-model/#server-side-validation-invariants)), optional `album_id`, optional `owner_id`, optional `intent_id` (required only during an [album upgrade](/design/versioning/#album-upgrade-ceremony)). Returns `201` with `Location: /upload/{id}` and `X-Capsule-Suggested-Chunk-Size`. Rejects with `400` / `403` / `426` per the validation invariants. | -| `HEAD` | `/upload/{id}` | Query progress. Returns `X-Capsule-Offset` (next expected byte), `X-Capsule-Content-Length`, and session status. This is the resumption primitive. | -| `PATCH` | `/upload/{id}` | Append a chunk at `X-Capsule-Offset`, with an optional per-chunk `X-Capsule-Checksum`. Returns `204` and the new offset. | -| `DELETE` | `/upload/{id}` | Cancel the session — removes chunks, the session record, and the pending asset row. | -| `GET` | `/upload/sessions` | List the caller's active sessions, so a client can resume across app restarts or devices. | - -Creating a session writes a *pending* asset row to Postgres (`uploaded = false`) and a session record to the configured **session-state store** (see [Filesystem — Stores by Deployment Profile](/design/filesystem/#stores-by-deployment-profile): Postgres by default, Valkey in the high-concurrency profile). The pending row reserves the asset ID that derivative and metadata blobs reference. - -**Chunk rules.** These are enforced strictly; a violation fails the request rather than being silently corrected: - -- Every chunk except the final one MUST be a multiple of 4 KiB (4096 bytes). This keeps server-side writes block-aligned, which is what makes the reflink assembly path (below) work. A non-aligned, non-final chunk is rejected with `400`. -- Offsets are strictly sequential. A `PATCH` must arrive at exactly the current received-byte count; an out-of-order or gapped write is rejected with `409`, and the client recovers by issuing `HEAD` to learn the authoritative offset. -- **Idempotency tuple.** The server keys each accepted PATCH by `(upload_id, offset, chunk_hash)` where `chunk_hash` is the SHA-256 of the chunk bytes (carried in the `X-Capsule-Checksum` header). A duplicate PATCH with the same tuple returns the same response — a re-send after a lost ACK is a no-op. A PATCH at an already-acknowledged offset *with a different `chunk_hash`* is rejected with `409` + a corruption error: this is the structural defense against a faulty client that retries with garbage. The complete idempotency contract is owned by [Threat Model — Idempotency Invariants](/design/threat-model/#idempotency-invariants). -- Cumulative size may never exceed the declared `size` nor the server's `max_file_size`. The server checks the cumulative count **at every chunk arrival**, not only at finalization — a buggy client that streams past the declared size is cut off before more bytes are persisted. Either ceiling is rejected (`400` / `413`) and the session is moved to a failed state. -- The upload completes exactly when received bytes equal the declared size; finalization then runs automatically. - -##### Protocol Versioning - -The upload protocol is the most fragile contract between client and server: a client that misunderstands chunk alignment, offset semantics, or finalization can silently corrupt or orphan data. The upload session is therefore gated by Capsule's universal protocol handshake, defined in [Threat Model — Protocol and Capability Negotiation](/design/threat-model/#protocol-and-capability-negotiation), so a client never begins a transfer against a server it is not known to be compatible with. This section names the upload-specific specializations. - -Versioning is **date-based** (`YYYY-MM-DD` — the day a protocol revision is frozen), not integer or semver. An integer version conveys nothing about ordering granularity and invites a bump for every change; semver implies a minor/patch backward-compatibility contract finer than we are willing to maintain on a hot path. A date is unambiguously ordered, human-readable, and maps directly onto a release. - -- Every client sends `X-Capsule-Protocol: ` on every request (the upload-specific alias `X-Capsule-Upload-Protocol` remains accepted but is deprecated). The server advertises the inclusive range it accepts via `X-Capsule-Protocol-Min` and `-Max` on every response, errors included. -- A `POST /upload` whose version falls outside the accepted range is rejected with `426 Upgrade Required` *before* any session or pending asset row is created. The response names the supported range so the client can show an actionable message ("update Capsule to keep uploading"). Per [Threat Model](/design/threat-model/#protocol-and-capability-negotiation), the same rule applies to every other write surface. -- This is a one-shot **compatibility gate**, not negotiation: there is no back-and-forth to settle on a shared version, and the protocol carries no capability flags. A client either speaks a version the server accepts, or it does not upload. -- The server supports a *window* of past protocol versions, not only the newest, so a staggered client rollout keeps working. A version leaves the window only after the deprecation period defined in [Threat Model — Min-Supported-Client Deprecation Policy](/design/threat-model/#min-supported-client-deprecation-policy); dropping one is a breaking change announced ahead of time. -- The date is bumped only for an **incompatible** wire change — offset semantics, alignment rules, finalization, the state machine. Purely additive, safely-ignorable changes do not bump it, and server-tunable parameters such as suggested chunk sizes and adaptive-sizing tiers are not protocol surface at all. - -##### Session Lifecycle - -A session moves through a strict state machine: - -```plaintext -Pending ─▶ Uploading ─▶ WaitingForProcessing ─▶ Completed - └─▶ FailedProcessing -``` - -- **Pending** — session created, no bytes received. -- **Uploading** — at least one chunk received, transfer in progress. -- **WaitingForProcessing** — all declared bytes received; finalization (assembly + hash verification) is running. -- **Completed** — hash verified, asset marked uploaded, now visible to other devices. Terminal. -- **FailedProcessing** — terminal failure (hash mismatch, assembly error). Chunks and the pending asset row are removed. Terminal. - -Session records live in the [session-state store](/design/filesystem/#stores-by-deployment-profile) with a 24-hour TTL and a per-owner index for listing. This split is intentional: the session store holds only volatile transfer state, so the hot path — offset increments and status transitions — never touches the durable Postgres asset row. (In the default Postgres-only profile, sessions live in an `upload_sessions` table with an `expires_at` column and a periodic sweep; in the high-concurrency profile, they live in Valkey under keys `upload:session:{id}` with atomic `HINCRBY`/`HSET` and native TTL.) Postgres's durable asset record is written exactly twice per upload regardless of profile: once at session creation (the pending row) and once at finalization (mark uploaded). A session that reaches its TTL before completing is garbage-collected — chunks deleted, pending asset row removed — and the client treats an expired session as gone and re-imports. (Client should imply retries if this happens but halt after too many retries.) - -#### Reliability & Integrity - -##### Server-Side Storage and Assembly - -Each chunk is written to disk as `{upload_id}_{n}.part`; the assembled blob is `{upload_id}.bin`. Because this is a hot path, the storage layer is aggressively optimized: - -- **Streaming writes.** Chunk bytes are streamed from the request body straight to disk; large transfers must never accumulate in hot memory. On Linux, the write path uses `io_uring`. -- **Reflink assembly.** Finalization concatenates chunks into the final blob using `FICLONERANGE` (copy-on-write reflink) on CoW filesystems such as Btrfs and XFS. The 4 KiB chunk alignment is precisely what allows each chunk to be reflinked at its destination offset; only the final (possibly unaligned) chunk needs a plain copy. Reflink turns assembly into a near-instant metadata operation instead of an O(file size) copy. On filesystems without reflink support, the code falls back to a sequential copy. -- **Offloaded blocking work.** Chunk assembly and hashing run on a blocking thread pool, never on the async reactor. -- **Backpressure.** `max_cache_size` bounds the total in-flight upload bytes held on disk; `max_file_size` bounds any single blob. The configuration asserts `max_file_size < max_cache_size` and warns if fewer than ~10 concurrent maximum-size uploads would fit. The distinct task pools — network I/O, file I/O, and hashing — are sized and load-tested independently against realistic hardware limits. - -##### Finalization and Integrity - -When received bytes reach the declared size, the server finalizes: - -1. Session transitions to **WaitingForProcessing**. -2. Chunks are assembled into the final blob. -3. The server recomputes the [content hash](/design/cryptography/#primitives-inventory) over the assembled ciphertext on the blocking pool and compares it to the declared `hash`. -4. **On match** — the pending asset is marked uploaded inside a Postgres transaction and the session transitions to **Completed**. -5. **On mismatch** — the blob and the pending asset row are deleted, the session transitions to **FailedProcessing**, and a checksum-mismatch error is returned. A mismatch is always treated as corruption or tampering and is never silently retried server-side. - -The server verifies only the *ciphertext* hash — it has no other option. The client independently verifies the *plaintext* on download via the [STREAM construction](/design/cryptography/#stream-construction)'s per-chunk authentication tags, which detect truncation, reordering, and chunk deletion. The two checks are complementary: the server guarantees "the bytes I stored are the bytes you declared," and the AEAD guarantees "the plaintext I decrypted is authentic." - -##### Robustness - -- An upload is not expected to run to completion in a single connection. The server tolerates arbitrarily long pauses within the session TTL, and clients resume via `HEAD`. [Auto syncing](#auto-syncing) explicitly assumes interrupted transfers are normal. -- A chunk re-sent at an already-acknowledged offset is idempotent. A chunk at a stale offset receives `409` together with the authoritative offset so the client can re-align. -- Concurrent finalization attempts on a single session are guarded — a second attempt observes a non-`Pending`/`Uploading` status and returns a conflict rather than double-processing. -- Every critical step — session creation, each chunk, assembly, hash verification, finalization — is logged with the upload ID so an interrupted or failed upload can be reconstructed and recovered after the fact. - -#### Performance - -##### Adaptive Chunk Sizing - -The server suggests an initial chunk size by file-size tier — `< 10 MB` → 256 KiB, `< 100 MB` → 1 MiB, `≥ 100 MB` → 4 MiB. The client may then adapt *within a tier-bounded range* based on throughput measured over a sliding 30-second window: doubling the chunk size when sustained throughput is high (`> 5 MB/s`), halving it when low (`< 1 MB/s`), and always staying 4 KiB-aligned. The rationale is a direct trade-off — chunks that are too small waste round-trips, while chunks that are too large waste re-transmission on a flaky link and pin more memory per in-flight request. - -Adaptation is purely a client concern; the server only enforces alignment and bounds. The client must never let adaptation regress effective throughput — if a tier's range is mis-tuned, the conservative choice is the tier minimum. - -We deliberately do **not** expose per-blob upload *ordering* as a protocol concern. Concurrent sessions plus the OS and TCP stack settle ordering naturally; see [Upload Prioritization](#upload-prioritization) for the client-side heuristics that decide which assets to *start*. - -##### Upload Prioritization - -We have a specific ordering which we pick how to upload many files simultaneously. - -- **File Size:** Smaller files might be processed first to give a quicker sense of progress, or larger files might be prioritized if they are deemed more critical. - - While file size is a useful heuristic, for internal ordering, we should let the order files are uploaded be naturally determined by simultaneous uploads and the network conditions, which fall to the underlying file transfer protocol — the custom resumable-upload protocol described above, running as concurrent sessions over the OS and TCP stack (see [Adaptive Chunk Sizing](#adaptive-chunk-sizing)). -- **Last Modified Times:** Newer or recently modified files might be more relevant to the user. (Note this filesystem metadata may not be always reliable so some fallbacks may be needed. Last accessed time was also considered but relatime makes this heuristic relatively noisy.) -- **Directory Depth:** Files closer to the root of the specified paths might be processed first. - -Note that file **type/extension** is deliberately *not* a prioritization criterion — prioritizing purely by file type may result in anomalies. Instead we have exceptions for certain sidecar files (e.g. `.xmp` associated with an image, or `.wav` associated with a video file). - -#### Access Control - -##### Deduplication and Merge - -Because blobs are addressed by their [ciphertext content hash](/design/cryptography/#primitives-inventory), the protocol can avoid redundant transfers: - -- At session creation, the server checks for an asset with the same content hash already owned by the user. An exact duplicate that exists both locally and remotely is rejected up front — nothing is re-uploaded. The dedup check and the pending-row insert run inside a single PostgreSQL transaction (a `SELECT ... FOR UPDATE` followed by `INSERT ... ON CONFLICT`), so two concurrent uploaders cannot both observe "no existing row" and each insert their own — the TOCTOU race is closed at the database layer. -- [Import](#import) treats already-uploaded *local* assets as non-importable. But because encryption and hashing are deferred until upload, an asset may already exist remotely under a *different* ciphertext (for example, re-encrypted under a newer album key). Import still admits such an asset, and the upload then resolves to a **merge**: the server links the existing stored blob to the new asset and album reference rather than storing a second copy. The original blob's upload short-circuits, and only the new metadata blob is transferred. -- **Merge is strictly additive on the server.** A merge **never** deletes an existing blob or rewrites an existing manifest — it only adds a new reference. The blob's reference count goes up, never down, on merge. Reference removal happens only through an explicit `delete` lifecycle action signed by a current writer (see [Authorization](/design/authorization/)), and the underlying blob is hard-purged only after every reference is provably gone. - -These checks deduplicate at upload time. Byte-identical assets that still slip into a client library — for example through overlapping folder imports or a restore over an existing library — are collapsed separately by client-side [intra-library deduplication](/design/filesystem/#deduplication). - -##### Quota and Permissions - -- An upload is attributed to `upload_user_id` (the authenticated uploader) for storage-quota accounting, which is distinct from `owner_id` (the asset's owner). Uploading on behalf of a different owner requires a verified relationship and is permission-checked at session creation. -- Adding an asset to an album requires write-tier album access (`AMK_write`; see [Cryptography](/design/cryptography/)); the server validates album write permission before creating the session. -- Only the uploader may append chunks. The uploader or the owner may query (`HEAD`) or cancel (`DELETE`) a session. - -### Download - -Download is the inverse of upload, and rests on the same two foundations: blobs are **content-addressed by ciphertext hash**, and the server never holds a key, so it serves only opaque ciphertext. Where the upload path optimises for correctness under interruption, the download path optimises for **bandwidth and storage frugality** — a client fetches the smallest representation that satisfies the user's current intent, and nothing more. - -#### Discovering What Changed - -A client never polls assets individually. It holds a single opaque **sync cursor** and asks the server for everything that changed after it: - -| Method | Path | Purpose | -| ------ | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | -| `GET` | `/sync` | Returns a page of asset changes (created, metadata-updated, deleted) after `cursor`, with a `next_cursor`. The feed is monotonic and resumable. | -| `GET` | `/blob/{hash}` | Fetch a ciphertext blob by its content address. Supports HTTP `Range` for resumable and partial reads. | - -The `/sync` feed carries only the small encrypted **metadata blobs** and each asset's **blob manifest** — the content hashes of its original and derivative blobs — never original or derivative bytes. Discovering a thousand new assets costs a few hundred kilobytes. The client decrypts each metadata blob, learns the asset's dimensions, capture date, and LQIP, and only *then* decides what else, if anything, to fetch. A deleted or modified asset arrives as a tombstone or an updated metadata reference; the client reconciles local state against it (see [Synchronization Scope](#synchronization-scope)). - -**Sync feed validation.** Every entry in a `/sync` response carries a `protocol_version` (matching the album's pin) and a per-album monotonic `sync_seq`. The client refuses to apply an entry whose `protocol_version` is above its max known (per the [tightened Postel's Law](/design/principles/)) and refuses any page whose `sync_seq` regresses against what the client has already seen for that album — a regressing `sync_seq` indicates a malicious or buggy server attempting to rewind the client's view, and the client surfaces it rather than applying it. - -#### Stale-Revival Detection - -A malicious or buggy server, peer, or backup could submit an old-but-validly-signed manifest to resurrect an asset that the receiving device has tombstoned at a later state. The defense — owned by [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) — is the per-asset `prior_provenance_hash` chain. Two layers enforce it: - -- **Client.** Every device's local index stores a `latest_provenance_hash` per `asset_id`. When a sync entry, federation pull, peering artifact, or backup restore proposes a manifest whose `prior_provenance_hash` is **behind** that local value, the entry is **quarantined** (see [Threat Model — Quarantine Surfaces](/design/threat-model/#quarantine-surfaces)) and surfaced as "peer sent stale state." -- **Server (no-key).** The server stores the same `latest_provenance_hash` per asset in PostgreSQL and rejects any incoming non-`create` manifest whose `prior_provenance_hash` does not match. This is described in the [server-side validation invariants](/design/threat-model/#server-side-validation-invariants). - -A deleted asset cannot be silently resurrected, on either side, without the resurrection appearing as a quarantine surface to the user. - -#### Tiered, On-Demand Fetch - -Each asset has a ladder of representations, cheapest first: - -1. **LQIP** — embedded in the metadata blob (see [Thumbnails](/design/thumbnails/)); available the instant metadata syncs, at zero extra request. -2. **Thumbnail** — fetched when the asset scrolls into, or near, view in a grid. -3. **Preview** — a screen-resolution derivative, fetched when the asset is opened. -4. **Original** — fetched only on explicit demand: viewing at full fidelity, exporting, or sharing the original. - -The default policy follows the per-library setting in [Synchronization Scope](#synchronization-scope) — *metadata only*, *metadata + thumbnails*, or *metadata + thumbnails + original*. Anything above the configured tier is fetched lazily, on demand. The original is never fetched speculatively unless the device was its uploader, in which case it already holds the plaintext locally and downloads nothing. - -Because every blob is content-addressed, a fetch is skipped entirely when the blob is already in the local cache — the client looks up its cache by hash before issuing any request, so a representation shared between assets (an identical thumbnail, a merged original) is only ever fetched once. - -#### Resumption and Verification - -- Large originals are fetched with HTTP `Range` requests; an interrupted download resumes from the last persisted byte instead of restarting, mirroring the upload protocol's resumability. -- The client verifies integrity itself. Since the server can only attest to ciphertext, the client recomputes the [ciphertext content hash](/design/cryptography/#primitives-inventory) against the requested content address, then decrypts and relies on the [STREAM construction](/design/cryptography/#stream-construction)'s authentication tags to detect truncation, reordering, or chunk deletion. Any failure discards the blob and re-fetches it. - -#### Prefetch and Frugality - -- Prefetch is bounded and predictive — thumbnails for assets just beyond the viewport, the preview for the likely-next asset in a sequence — and is cancelled as soon as the user's focus moves. -- Prefetch and any above-tier fetch obey the same connection rules as [Auto Syncing](#auto-syncing): on a metered connection the client fetches only what the user explicitly opens, and defers the rest. -- Fetched-but-unpinned blobs are ordinary cache citizens, subject to [Space Recovery](/design/filesystem/#space-recovery); the client transparently re-fetches them on demand if they are evicted. - -### Auto Syncing - -On mobile clients, we support auto syncing which can be very useful for ensuring new assets are backed up (not to be confused with [encrypted backups](/design/backup-recovery/)) to the server and assets from other device loaded onto device. - -#### Synchronization Criteria - -We are conservative in when we check whether synchronization is needed. To bypass the possibility of outdated reconciliations, we reconcile the assets that required syncing (both uploading and downloading), and immediately execute backup as long as criterias remain throughout the data transfer process. If conditions change (e.g. internet connection became metered), it will be re-evaluated and potentially paused gracefully. Upload server does not expect the client to always complete transfers to completion (e.g., due to network conditions). - -Finally, the actual synchronization criteria are strict and scale with the reconciliation amount (i.e. total upload + download transfer): - -- **Small reconciliation** — a handful of new assets, or metadata-only deltas: synced proactively whenever the device has any non-metered connection. -- **Large reconciliation** — bulk uploads, or original-tier downloads: deferred until the device is connected to unmetered Wi-Fi. - -#### Platform Limitations - -We strictly implement auto sync ONLY if we can guarantee it will behave appropriately under all scenarios. We explicitly do not implement it on platforms that do not give all the APIs we need (e.g., detecting metered connection) to avoid surprises. - -#### Notifications - -When the auto sync criteria have not been met for a prolonged period — **two weeks** specifically — the library falls silently out of date, which defeats the purpose of a backup. The client surfaces this rather than letting it pass unnoticed: - -- After two weeks without a completed sync, the user is notified that the library is behind and offered a one-tap **force sync now**, which proceeds regardless of the metered/Wi-Fi criteria with their explicit consent. -- The notification can be **snoozed** until a later date (e.g. another two weeks) or **disabled** outright. Snoozing only suppresses the warning; disabling opts out of the warning entirely and does not affect auto sync itself. - -### Synchronization Scope - -- Uploadable new content: We upload the source (i.e. original) asset as well as all associated metadata and derivatives. -- Modified/deleted content: We update the associated metadata. -- Fetch new content: Depending on setting, it either fetches *metadata only*, *metadata + thumbnails*, or *metadata + thumbnails + original* for all new assets. Unless original already exists locally (e.g., if device was the original uploader), the original is only fetched on demand (e.g. user explicitly tries to view original or share original with others). This is to save bandwidth and storage on client devices. Note that metadata includes LQIP which can be used as a preview before even thumbnails are fetched. diff --git a/capsule-docs/src/content/docs/design/import/download-sync.md b/capsule-docs/src/content/docs/design/import/download-sync.md new file mode 100644 index 0000000..02ff2a2 --- /dev/null +++ b/capsule-docs/src/content/docs/design/import/download-sync.md @@ -0,0 +1,102 @@ +--- +title: Download and Synchronization +description: How Capsule clients discover changes, fetch blobs on demand, and auto-sync +--- + +Download is the inverse of [upload](/design/import/upload-protocol/), and rests on the same two foundations: blobs are **content-addressed by ciphertext hash**, and the server never holds a key, so it serves only opaque ciphertext. Where the upload path optimises for correctness under interruption, the download path optimises for **bandwidth and storage frugality** — a client fetches the smallest representation that satisfies the user's current intent, and nothing more. + +The download client lives in `capsule-sdk` (per-platform glue handles cache placement and connection-class detection); the server side — the sync feed and blob fetch — lives in `capsule-api-sync`. The `/sync` feed format is the **contract** other modules consume; its versioning and per-album monotonic ordering are what defeats the stale-rewind attack class. + +## Discovering What Changed + +A client never polls assets individually. It holds a single opaque **sync cursor** and asks the server for everything that changed after it: + +| Method | Path | Purpose | +| ------ | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | +| `GET` | `/sync` | Returns a page of asset changes (created, metadata-updated, deleted) after `cursor`, with a `next_cursor`. The feed is monotonic and resumable. | +| `GET` | `/blob/{hash}` | Fetch a ciphertext blob by its content address. Supports HTTP `Range` for resumable and partial reads. | + +The `/sync` feed carries only the small encrypted **metadata blobs** and each asset's **blob manifest** — the content hashes of its original and derivative blobs — never original or derivative bytes. Discovering a thousand new assets costs a few hundred kilobytes. The client decrypts each metadata blob, learns the asset's dimensions, capture date, and LQIP, and only *then* decides what else, if anything, to fetch. A deleted or modified asset arrives as a tombstone or an updated metadata reference; the client reconciles local state against it (see [Synchronization Scope](#synchronization-scope)). + +**Cursor authenticity.** The opaque sync cursor is **MAC'd by the server** (HMAC-SHA256) under a server-only key and verified on every `/sync` (and [federation pull](/design/federation/#federation-reuses-existing-primitives)) request, so a client cannot forge or mutate a cursor and a cursor lifted from another context is rejected at the boundary. The MAC is the *authenticity* layer; the per-album monotonic `sync_seq` check below is the independent *anti-rewind* layer. They are separate on purpose: a malicious server can always hand back one of its own *older*, validly-MAC'd cursors, and only the client-held high-water mark defeats that. Together they close the [sync-cursor rewind class](/design/threat-model/scenarios/#damage-scenario--invariant-map). + +**Sync feed validation.** Every entry in a `/sync` response carries a `protocol_version` (matching the album's pin) and a per-album monotonic `sync_seq`. The client refuses to apply an entry whose `protocol_version` is above its max known (per the [tightened Postel's Law](/design/principles/#postels-law-asymmetric)) and refuses any page whose `sync_seq` regresses against what the client has already seen for that album — a regressing `sync_seq` indicates a malicious or buggy server attempting to rewind the client's view, and the client surfaces it rather than applying it. + +## Stale-Revival Detection + +A malicious or buggy server, peer, or backup could submit an old-but-validly-signed manifest to resurrect an asset that the receiving device has tombstoned at a later state. The defense — owned by [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications) — is the per-asset `prior_provenance_hash` chain. Two layers enforce it: + +- **Client.** Every device's local index stores a `latest_provenance_hash` per `asset_id`. When a sync entry, federation pull, peering artifact, or backup restore proposes a manifest whose `prior_provenance_hash` is **behind** that local value, the entry is **quarantined** (see [Threat Model — Quarantine Surfaces](/design/threat-model/scenarios/#quarantine-surfaces)) and surfaced as "peer sent stale state." +- **Server (no-key).** The server stores the same `latest_provenance_hash` per asset in PostgreSQL and rejects any incoming non-`create` manifest whose `prior_provenance_hash` does not match. This is described in the [server-side validation invariants](/design/threat-model/validation/#server-side-validation-invariants). + +A deleted asset cannot be silently resurrected, on either side, without the resurrection appearing as a quarantine surface to the user. + +## Tiered, On-Demand Fetch + +Each asset has a ladder of representations, cheapest first: + +1. **LQIP** — embedded in the metadata blob (see [Thumbnails](/design/thumbnails/)); available the instant metadata syncs, at zero extra request. +2. **Thumbnail** — fetched when the asset scrolls into, or near, view in a grid. +3. **Preview** — a screen-resolution derivative, fetched when the asset is opened. +4. **Original** — fetched only on explicit demand: viewing at full fidelity, exporting, or sharing the original. + +The default policy follows the per-library setting in [Synchronization Scope](#synchronization-scope) — *metadata only*, *metadata + thumbnails*, or *metadata + thumbnails + original*. Anything above the configured tier is fetched lazily, on demand. The original is never fetched speculatively unless the device was its uploader, in which case it already holds the plaintext locally and downloads nothing. + +Because every blob is content-addressed, a fetch is skipped entirely when the blob is already in the local cache — the client looks up its cache by hash before issuing any request, so a representation shared between assets (an identical thumbnail, a merged original) is only ever fetched once. + +**When an above-tier fetch cannot succeed.** A lazily-fetched representation may be temporarily or permanently unavailable. The client distinguishes the two: a **transient** failure (network drop, `5xx`) retries with backoff and resumes via `Range`; a **permanent** failure (`410 Gone`, `403`, a purged origin, or an unreachable [federated home server](/design/federation/#robustness-against-connectivity-loss)) **degrades gracefully** to the best representation already in hand — preview → thumbnail → LQIP, down to the always-present LQIP — and surfaces a non-destructive "full resolution unavailable" state on the asset. It never thrashes the fetch, and it never removes the asset's metadata or local index entry over a missing derivative. The asset stays listed and re-fetches automatically once the representation becomes reachable again. + +## Resumption and Verification + +- Large originals are fetched with HTTP `Range` requests; an interrupted download resumes from the last persisted byte instead of restarting, mirroring the [upload protocol's](/design/import/upload-protocol/) resumability. +- The client verifies integrity itself. Since the server can only attest to ciphertext, the client recomputes the [ciphertext content hash](/design/cryptography/primitives/) against the requested content address, then decrypts and relies on the [STREAM construction](/design/cryptography/encryption/#stream-construction)'s authentication tags to detect truncation, reordering, or chunk deletion. Any failure discards the blob and re-fetches it. + +## Prefetch and Frugality + +- Prefetch is bounded and predictive — thumbnails for assets just beyond the viewport, the preview for the likely-next asset in a sequence — and is cancelled as soon as the user's focus moves. +- Prefetch and any above-tier fetch obey the same connection rules as [Auto Syncing](#auto-syncing): on a metered connection the client fetches only what the user explicitly opens, and defers the rest. +- Fetched-but-unpinned blobs are ordinary cache citizens, subject to [Space Recovery](/design/filesystem/client/#space-recovery); the client transparently re-fetches them on demand if they are evicted. + +## Auto Syncing + +On mobile clients, auto syncing keeps new assets backed up (not to be confused with [encrypted backups](/design/backup-recovery/)) to the server and pulls assets from other devices onto the device. + +### Synchronization Criteria + +Sync is checked conservatively. When a check fires, the client reconciles everything that needs syncing — uploads and downloads — and proceeds as long as the criteria below hold throughout the transfer. If conditions change mid-transfer (e.g. the connection becomes metered), it re-evaluates and pauses gracefully; the server never assumes a transfer runs to completion in one session (see [Upload Protocol — Robustness](/design/import/upload-protocol/#robustness)). + +The actual synchronization criteria are strict and scale with the reconciliation amount (i.e. total upload + download transfer): + +- **Small reconciliation** — a handful of new assets, or metadata-only deltas: synced proactively whenever the device has any non-metered connection. +- **Large reconciliation** — bulk uploads, or original-tier downloads: deferred until the device is connected to unmetered Wi-Fi. + +### Platform Limitations + +Auto sync is implemented **only** if it can be guaranteed to behave appropriately under all scenarios. It is explicitly not implemented on platforms that lack the APIs we need (e.g., detecting metered connections), to avoid surprises. + +### Notifications + +When the auto sync criteria have not been met for a prolonged period — **two weeks** specifically — the library falls silently out of date, which defeats the purpose of a backup. The client surfaces this rather than letting it pass unnoticed: + +- After two weeks without a completed sync *while changes remain un-synced*, the user is notified that the library is behind and offered a one-tap **force sync now**, which proceeds regardless of the metered/Wi-Fi criteria with their explicit consent. +- The notification can be **snoozed** until a later date (e.g. another two weeks) or **disabled** outright. Snoozing only suppresses the warning; disabling opts out of the warning entirely and does not affect auto sync itself. + +## Synchronization Scope + +- **Uploadable new content:** the source (original) asset is uploaded along with all associated metadata and derivatives. +- **Modified/deleted content:** associated metadata is updated. +- **Fetch new content:** depending on setting, metadata only / metadata + thumbnails / metadata + thumbnails + original is fetched for all new assets. Unless the original already exists locally (e.g., if the device was the original uploader), the original is only fetched on demand (e.g. the user explicitly views the original or shares the original with others). This is to save bandwidth and storage on client devices. Metadata includes LQIP which can be used as a preview before even thumbnails are fetched. + +## Validation + +- **Sync feed monotonicity (unit).** Server-side unit tests assert that every `sync_seq` advance over a given album is strictly increasing; concurrent writes are linearised by the same Postgres transaction that mints the new `sync_seq`. +- **Sync feed forward-version rejection (unit).** Client-side unit test that a feed entry whose `protocol_version` is above the client's max known is rejected without partial application. +- **Sync feed rewind rejection (unit).** Client-side unit test that a page whose `sync_seq` regresses against the locally-seen high-water mark is surfaced, not applied. +- **Sync cursor authenticity (unit).** Server-side: present a cursor with a tampered or forged MAC; assert boundary rejection. Client-side: present a validly-MAC'd but *older* cursor; assert the monotonic `sync_seq` high-water-mark check still refuses the rewind. +- **Above-tier permanent unavailability (unit).** With scope set so the original is on-demand, make `/blob/{hash}` return `410`; assert the client degrades to the next-lower locally-held representation, surfaces "full resolution unavailable", and leaves the asset's metadata + index entry intact; restore availability; assert automatic re-fetch. +- **Tiered fetch correctness (unit).** Per-tier policy is unit-testable: configure scope = *metadata + thumbnails*, present a sync entry with original + thumbnails + LQIP, assert only metadata + thumbnails are fetched. +- **Resume after interrupt (smoke).** Start a large original fetch; interrupt mid-Range; resume; assert byte-identical result with no re-fetched bytes. +- **Auto-sync state machine (smoke).** Simulate connectivity changes (Wi-Fi → metered → offline → Wi-Fi); assert the scheduler pauses, resumes, and respects the small/large threshold. +- **Cross-asset dedup hit (unit).** Two assets with the same thumbnail hash; the second viewing must not refetch. + +The cross-module case — server emits a sync entry → client applies and fetches blob — is bounded E2E surface listed in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/import/index.md b/capsule-docs/src/content/docs/design/import/index.md new file mode 100644 index 0000000..726f93f --- /dev/null +++ b/capsule-docs/src/content/docs/design/import/index.md @@ -0,0 +1,33 @@ +--- +title: Import and Synchronization +description: Overview of how Capsule imports assets and synchronizes them across devices +--- + +We define **import** as the process of taking assets from an external source (a camera, a directory on the filesystem) and bringing them into Capsule's management. Once imported, assets travel between devices via the **upload protocol** (client → server) and the **sync feed** (server → client). + +The three concerns live in separate sub-docs because they correspond to distinct modules that can be implemented and validated independently: + +| Sub-doc | Concern | Primary crate(s) | +| -------------------------------------------------- | -------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | +| [Pipeline](/design/import/pipeline/) | Local scan, plan, execute — the import workflow on a single device | `capsule-core::import` | +| [Upload Protocol](/design/import/upload-protocol/) | The TUS-like wire protocol between client and server, session lifecycle, finalization, reliability | `capsule-sdk::upload` (client) + `capsule-api-upload` (server) | +| [Download & Sync](/design/import/download-sync/) | Sync feed, tiered fetch, stale-revival defense, auto-sync | `capsule-sdk` (client) + `capsule-api-sync` (server) | + +[Encrypted backups](/design/backup-recovery/) are a separate artifact format; [peering](/design/peering/) reuses the backup artifact for device-to-device sync rather than the upload/sync protocols. + +## End-to-End Flow + +```text +[Local source] + │ + ▼ scan, extract metadata +[Pipeline] ── plan ──▶ user confirms + │ + ▼ encrypt + sign + generate derivatives +[Upload Protocol] ── session → chunks → finalize ──▶ server blob store + Postgres + │ + ▼ sync feed advances +[Download & Sync] ── /sync (metadata) → /blob/{hash} (lazy original) ──▶ peer devices +``` + +Every stage is content-addressed, idempotent, and resumable. Session state in the upload path and cursor state in the sync feed are the two pieces of mutable cross-module state; both are owned by their respective sub-docs. diff --git a/capsule-docs/src/content/docs/design/import/pipeline.md b/capsule-docs/src/content/docs/design/import/pipeline.md new file mode 100644 index 0000000..5c72bf0 --- /dev/null +++ b/capsule-docs/src/content/docs/design/import/pipeline.md @@ -0,0 +1,78 @@ +--- +title: Import Pipeline +description: How Capsule scans, plans, and executes a local import on a single device +--- + +The import pipeline is the workflow a client runs to bring assets from an external source (a camera, a filesystem directory) into Capsule's management. It is implemented in `capsule-core::import` and runs entirely client-side — no server is contacted until the [upload protocol](/design/import/upload-protocol/) is invoked at the tail of the pipeline. + +Every import is **deterministic** and **idempotent**. Imports can be partially completed; each is identified by an *import ID* and resumable. The planner is pure (given the same inputs it produces the same plan), which makes the bulk of the pipeline unit-testable without any I/O. + +## Pipeline Stages + +```text +Initiate ──▶ Scan & Extract ──▶ Plan & Confirm ──▶ Execute ──▶ (Upload) +``` + +### Initiate + +Imports begin in one of two ways: + +- **Manual.** The user selects files or directories through the UI. The selection can point to a flat structure or a standardized directory structure (e.g. DCIM). +- **Automated.** Platforms (primarily mobile) detect new media in watched directories and trigger imports automatically. + +### Scan & Extract + +Files are walked, parsed, and their metadata extracted — see [Metadata](/design/metadata/) for the canonical schema. Format support is strictly gated: a file whose format is not in the supported set is rejected here rather than later, so the failure surfaces before the user is asked to confirm. + +The server independently enforces a closed-enum `content_type` allow-list at session creation (see [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants)), so even a malicious or buggy client declaring an unsupported format is rejected before any bytes are uploaded. Bytes received over the wire on the receiving side are decoded only inside the [sandboxed decoder](/design/clients/#sandboxed-decoder), so a format-mismatch attack cannot reach the host process. + +### Plan & Confirm + +The planner is **pure**: given the scanned files and their extracted metadata, it produces an `ImportPlan { added: [..], skipped: [..], conflicts: [..], total_size }` deterministically. The plan is shown to the user (summary of what will be imported, total size, any issues), and the user confirms or adjusts. + +- If an asset is already uploaded *locally* in the library, import refuses it — no merge needed. +- If an asset already exists *remotely* under a different ciphertext (e.g. re-encrypted under a newer album key), import still admits it; the [upload protocol](/design/import/upload-protocol/#deduplication-and-merge) then resolves it as a merge (the existing blob is linked rather than re-uploaded). + +**Destination resolution.** Each added asset is assigned a destination [container album](/design/organization/#container-albums). If the user picks one explicitly the planner uses it; otherwise it calls `resolve_default_album(context)` — the active scope's override, else the owner [default-album](/design/organization/#the-default-album) pointer, else the derived de facto album. To keep the planner pure, the active context and a snapshot of the pointer/overrides are planner *inputs*, so the resolved `album_id` is deterministic and recorded in the `ImportPlan` rather than discovered later at upload time. + +The planner's purity is what lets it be unit-tested exhaustively without filesystem fixtures: every edge case (overlapping selections, mixed formats, sidecar pairing, partial state from a prior interrupted import) becomes a table of `(scan_input → expected_plan)` pairs. + +### Execute + +For each file in the plan, in [upload prioritization](#upload-prioritization) order: + +1. **Move into the detected space.** The planner determined which library directory each asset belongs in; execution moves files into place. +2. **Compute cryptographic metadata.** Encrypt under the resolved destination album's AMK (see [Asset Encryption](/design/cryptography/encryption/#authenticated-asset-encryption)), produce the [signed manifest](/design/cryptography/provenance/#asset-manifest). +3. **Generate thumbnails and previews.** Per [Thumbnails](/design/thumbnails/). +4. **Hand off to the upload protocol.** Each blob (original + derivatives + metadata) becomes its own upload session — see [Upload Protocol](/design/import/upload-protocol/). + +Step 1–3 can be parallelized across files. The executor is cancellation-aware: a partially-executed plan can be aborted cleanly and resumed (re-running the import re-derives the plan and skips already-completed work via the deterministic planner). + +## Upload Prioritization + +When many files are processed simultaneously, the order they are *started* is decided by these heuristics: + +- **Last Modified Times.** Newer or recently modified files are likely more relevant to the user. Filesystem mtime is the cross-platform signal, with fallbacks where a platform reports it unreliably. +- **Directory Depth.** Files closer to the root of the specified paths are processed first. +- **File Size.** A useful secondary heuristic, but ordering of in-flight uploads is left to the OS / TCP stack and the [adaptive chunk sizing](/design/import/upload-protocol/#adaptive-chunk-sizing) the upload protocol exposes; we do not micro-manage it here. + +File **type/extension** is deliberately *not* a prioritization criterion — prioritizing purely by type may produce anomalies. Instead we have exceptions for sidecar files (e.g. `.xmp` associated with an image, `.wav` associated with a video) that travel with their parent asset. + +The pipeline decides which assets to *start*; the [upload protocol](/design/import/upload-protocol/#adaptive-chunk-sizing) decides how they stream. + +## Contracts the Pipeline Exposes + +What the rest of the system depends on this module for: + +- `ImportPlan` — the deterministic output of the planner; rendered to the UI for confirmation. Schema fields: `added` (each entry carrying its resolved destination `album_id`), `skipped`, `conflicts`, `total_size`, `import_id` (UUIDv7). +- `execute(plan, cancel_token) → ImportExecutionReport` — the executor entry-point. Honors the cancel token at every file boundary. Returns per-file status. +- A stable progress event stream so the UI can report per-asset state (queued / encrypting / uploading / done / failed). + +## Validation + +- **Planner determinism (unit).** Table-driven tests over `(scan_input, library_state) → expected_plan`. Every conflict-resolution and dedup-detection path is its own row. Default-album resolution is part of the input snapshot, so a given `(context, pointer/overrides)` yields a deterministic destination `album_id`. +- **Scanner format-rejection (unit).** Every unsupported extension and every malformed-header case produces a structured rejection, never a panic. +- **Executor cancellation (smoke).** Run a real executor against a temp library, cancel mid-flight, assert no partial bundle is left on disk and a re-run produces the same plan minus already-completed files. +- **Resume after interruption (smoke).** Plan → execute partially → kill the process → re-run. The deterministic planner re-derives the same plan; already-completed assets are skipped. + +The cross-module case — pipeline → upload protocol → server finalization → assets visible in `/sync` — is bounded E2E surface listed in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/import/upload-protocol.md b/capsule-docs/src/content/docs/design/import/upload-protocol.md new file mode 100644 index 0000000..f7fdc2f --- /dev/null +++ b/capsule-docs/src/content/docs/design/import/upload-protocol.md @@ -0,0 +1,156 @@ +--- +title: Upload Protocol +description: The wire protocol between Capsule clients and the server for resumable, content-addressed uploads +--- + +The upload protocol is a custom resumable-upload protocol modeled on [TUS](https://tus.io/) but trimmed to Capsule's needs: no per-request capability negotiation, no metadata smuggled in headers, ciphertext-only payloads. Compatibility is gated once, up front, via the universal [protocol handshake](/design/threat-model/validation/#protocol-and-capability-negotiation). + +This protocol is the most fragile contract between client and server: a client that misunderstands chunk alignment, offset semantics, or finalization can silently corrupt or orphan data. The endpoint table, the chunk rules, the session state machine, and the finalization steps below **are the contract** — every implementation MUST conform exactly. The client implementation lives in `capsule-sdk::upload`; the server in `capsule-api-upload`. The two are tested independently against the protocol surface. + +Every upload is **idempotent** but stateful. Uploads can complete partially and are identified by an *upload ID*. + +## What Gets Uploaded + +An asset is never uploaded as a single plaintext file. Because Capsule is end-to-end encrypted, the client **encrypts and signs** everything *before* transmission, and the server only ever stores opaque, content-addressed ciphertext blobs. A single imported asset produces a **bundle** of blobs: + +- The **original blob** — the source asset, encrypted under the [bulk AEAD](/design/cryptography/primitives/#bulk-aead) with the [STREAM construction](/design/cryptography/encryption/#stream-construction). +- **Derivative blobs** — thumbnails and previews, generated client-side during import (see [Thumbnails](/design/thumbnails/)), each encrypted independently. +- The **metadata blob** — the CBOR metadata document (capture date, dimensions, EXIF-derived fields, the [LQIP](/design/thumbnails/#lqip), provenance), encrypted under the [bulk AEAD](/design/cryptography/primitives/#bulk-aead) (see [Metadata](/design/metadata/)). + +Each blob is its own upload with its own upload ID; the protocol does not couple them and imposes **no wire ordering**. The client may transfer the bundle in any order — decoupling lets small derivatives land while a large original is still in flight — but the server **gates visibility** on the pending-asset row: the asset becomes visible to other devices only once its **required members, the original and the metadata blob, are both finalized**. This is enforced without reading plaintext — each blob's role is recorded on its pending row at session creation, and the visibility flip simply checks that the original and metadata roles are present and `uploaded`. Every blob in the bundle — original, derivatives, metadata, provenance — counts toward the uploader's storage [quota](/design/quota/#accounting-model). + +"Blob" is defined once, in [Filesystem — Server: Uniform, Opaque Blobs](/design/filesystem/server/#uniform-opaque-blobs); this protocol is its transport, not its definition. Every asset and derivative blob carries a signed [manifest envelope](/design/cryptography/provenance/#asset-manifest): at `POST /upload` the server validates the envelope's `created_by_device` against the uploader's [device directory](/design/cryptography/keys/#device-directory) (invariant 7), and the client verifies the full write-tier signature on download via [`verify_asset`](/design/cryptography/keys/#write-authorization). **Backup artifacts are the one exception** — they carry no per-asset provenance of their own (the exporting device is not the original author); their integrity rides the library-level backup MANIFEST instead (see [Backup and Recovery](/design/backup-recovery/)). + +The server performs no decoding, no metadata extraction, and no thumbnail generation — it cannot, since it never holds a decryption key. All such work happens client-side during [import](/design/import/pipeline/). + +## Design Invariants + +The upload protocol guarantees the following, and every endpoint upholds them: + +- **Content-addressed.** Every blob is identified by its [ciphertext content hash](/design/cryptography/primitives/). The plaintext hash is never transmitted to the server. +- **Idempotent.** Re-creating a session for a blob already stored is a no-op that resolves to the existing asset. Re-sending a chunk at an already-acknowledged offset is accepted and simply returns the current offset. +- **Resumable.** A session survives connection loss for the lifetime of its TTL. A client resumes by querying the authoritative offset and continuing from there — no bytes are re-sent unnecessarily. +- **Strictly bounded.** The total ciphertext size is declared at session creation and immutable thereafter. The cumulative received bytes may never exceed it, nor exceed the server's per-file limit. +- **Verified.** No upload is marked complete until the server has recomputed the ciphertext hash and confirmed it matches the declared value. +- **Recoverable.** Every session is either driven to a terminal state or garbage-collected. There are no permanently orphaned chunks or pending asset rows. + +## Endpoints + +All endpoints are authenticated with a bearer JWT. + +| Method | Path | Purpose | +| -------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `POST` | `/upload` | Create a session. Body declares ciphertext `size`, `hash` (the [content hash](/design/cryptography/primitives/) digest bytes; algorithm fixed by `crypto_suite_id`), `content_type` (closed enum), `crypto_suite_id`, `protocol_version`, `manifest_envelope` (the unencrypted manifest fields the server validates per [Threat Model — Server-Side Validation Invariants](/design/threat-model/validation/#server-side-validation-invariants)), optional `album_id`, optional `owner_id`, optional `intent_id` (required only during an [album upgrade](/design/versioning/#album-upgrade-ceremony)). Returns `201` with `Location: /upload/{id}` and `X-Capsule-Suggested-Chunk-Size`. Rejects with `400` / `403` / `426` per the validation invariants. | +| `HEAD` | `/upload/{id}` | Query progress. Returns `X-Capsule-Offset` (next expected byte), `X-Capsule-Content-Length`, and session status. This is the resumption primitive. | +| `PATCH` | `/upload/{id}` | Append a chunk at `X-Capsule-Offset`, with an optional per-chunk `X-Capsule-Checksum`. Returns `204` and the new offset. | +| `DELETE` | `/upload/{id}` | Cancel the session — removes chunks, the session record, and the pending asset row. | +| `GET` | `/upload/sessions` | List the caller's active sessions, so a client can resume across app restarts or devices. | + +Creating a session writes a *pending* asset row to Postgres (`uploaded = false`) and a session record to the configured **session-state store** (see [Filesystem — Server: Deployment Profiles](/design/filesystem/server/#deployment-profiles): Postgres by default, Valkey in the high-concurrency profile). The pending row reserves the asset ID that derivative and metadata blobs reference. + +## Chunk Rules + +Enforced strictly; a violation fails the request rather than being silently corrected: + +- Every chunk except the final one MUST be a multiple of 4 KiB (4096 bytes). This keeps server-side writes block-aligned, which is what makes the [reflink assembly path](#server-side-storage-and-assembly) work. A non-aligned, non-final chunk is rejected with `400`. +- Offsets are strictly sequential. A `PATCH` must arrive at exactly the current received-byte count; an out-of-order or gapped write is rejected with `409`, and the client recovers by issuing `HEAD` to learn the authoritative offset. +- **Idempotency tuple.** The server keys each accepted PATCH by `(upload_id, offset, chunk_hash)` where `chunk_hash` is the SHA-256 of the chunk bytes (carried in the `X-Capsule-Checksum` header). A duplicate PATCH with the same tuple returns the same response — a re-send after a lost ACK is a no-op. A PATCH at an already-acknowledged offset *with a different `chunk_hash`* is rejected with `409` + a corruption error: this is the structural defense against a faulty client that retries with garbage. The complete idempotency contract is owned by [Threat Model — Idempotency Invariants](/design/threat-model/validation/#idempotency-invariants). +- Cumulative size may never exceed the declared `size` nor the server's `max_file_size`. The server checks the cumulative count **at every chunk arrival**, not only at finalization — a buggy client that streams past the declared size is cut off before more bytes are persisted. Either ceiling is rejected (`400` / `413`) and the session is moved to a failed state. +- The upload completes exactly when received bytes equal the declared size; finalization then runs automatically. + +## Protocol Versioning + +The upload session is gated by Capsule's universal [protocol handshake](/design/threat-model/validation/#protocol-and-capability-negotiation), so a client never begins a transfer against a server it is not known to be compatible with. This section names the upload-specific specializations. + +Versioning is **date-based** (`YYYY-MM-DD` — the day a protocol revision is frozen), not integer or semver. An integer version conveys nothing about ordering granularity and invites a bump for every change; semver implies a minor/patch backward-compatibility contract finer than we are willing to maintain on a hot path. A date is unambiguously ordered, human-readable, and maps directly onto a release. + +- Every client sends `X-Capsule-Protocol: ` on every request (the upload-specific alias `X-Capsule-Upload-Protocol` remains accepted but is deprecated). The server advertises the inclusive range it accepts via `X-Capsule-Protocol-Min` and `-Max` on every response, errors included. +- A `POST /upload` whose version falls outside the accepted range is rejected with `426 Upgrade Required` *before* any session or pending asset row is created. The response names the supported range so the client can show an actionable message ("update Capsule to keep uploading"). Per [Threat Model](/design/threat-model/validation/#protocol-and-capability-negotiation), the same rule applies to every other write surface. +- This is a one-shot **compatibility gate**, not negotiation: there is no back-and-forth to settle on a shared version, and the protocol carries no capability flags. A client either speaks a version the server accepts, or it does not upload. +- The server supports a *window* of past protocol versions, not only the newest, so a staggered client rollout keeps working. A version leaves the window only after the deprecation period defined in [Threat Model — Min-Supported-Client Deprecation Policy](/design/threat-model/schema-rules/#min-supported-client-deprecation-policy); dropping one is a breaking change announced ahead of time. +- The date is bumped only for an **incompatible** wire change — offset semantics, alignment rules, finalization, the state machine. Purely additive, safely-ignorable changes do not bump it, and server-tunable parameters such as suggested chunk sizes and adaptive-sizing tiers are not protocol surface at all. + +## Session Lifecycle + +A session moves through a strict state machine: + +```plaintext +Pending ─▶ Uploading ─▶ WaitingForProcessing ─▶ Completed + └─▶ FailedProcessing +``` + +- **Pending** — session created, no bytes received. +- **Uploading** — at least one chunk received, transfer in progress. +- **WaitingForProcessing** — all declared bytes received; finalization (assembly + hash verification) is running. +- **Completed** — hash verified, asset marked uploaded, now visible to other devices. Terminal. +- **FailedProcessing** — terminal failure (hash mismatch, assembly error). Chunks and the pending asset row are removed. Terminal. + +Session records live in the [session-state store](/design/filesystem/server/#deployment-profiles) with a 24-hour TTL and a per-owner index for listing. This split is intentional: the session store holds only volatile transfer state, so the hot path — offset increments and status transitions — never touches the durable Postgres asset row. (In the default Postgres-only profile, sessions live in an `upload_sessions` table with an `expires_at` column and a periodic sweep; in the high-concurrency profile, they live in Valkey under keys `upload:session:{id}` with atomic `HINCRBY`/`HSET` and native TTL.) Postgres's durable asset record is written exactly twice per upload regardless of profile: once at session creation (the pending row) and once at finalization (mark uploaded). A session that reaches its TTL before completing is garbage-collected — chunks deleted, pending asset row removed — and the client treats an expired session as gone and re-imports, retrying with backoff and halting after a bounded number of attempts. + +**TTL applies only to in-progress transfer.** A session is eligible for TTL eviction only while in `Pending` or `Uploading`. Once it reaches `WaitingForProcessing`, finalization is running and the session is **not** evicted out from under it — finalization either drives it to `Completed` or fails it cleanly to `FailedProcessing`. Both terminal states are **retained for the remainder of the TTL** rather than deleted on transition, so a client whose finalization ACK was lost re-queries (`HEAD`) and observes the terminal outcome — learning the upload already succeeded or failed — instead of seeing a vanished session and blindly re-uploading. (The `FailedProcessing` cleanup of chunks and the pending row happens at the transition; only the session *status record* is what lingers for the TTL.) + +## Server-Side Storage and Assembly + +Each chunk is written to disk as `{upload_id}_{n}.part`; the assembled blob is `{upload_id}.bin`. Because this is a hot path, the storage layer is aggressively optimized: + +- **Streaming writes.** Chunk bytes are streamed from the request body straight to disk; large transfers must never accumulate in hot memory. On Linux, the write path uses `io_uring`. +- **Reflink assembly.** Finalization concatenates chunks into the final blob with a copy-on-write reflink wherever the filesystem supports one — `FICLONERANGE` on Linux (Btrfs, XFS), `clonefile` on macOS (APFS), the equivalent on ReFS. The 4 KiB chunk alignment is precisely what allows each chunk to be reflinked at its destination offset; only the final (possibly unaligned) chunk needs a plain copy. Reflink turns assembly into a near-instant metadata operation instead of an O(file size) copy. On filesystems without reflink support, the code falls back to a sequential copy. +- **Offloaded blocking work.** Chunk assembly and hashing run on a blocking thread pool, never on the async reactor. +- **Backpressure.** `max_cache_size` bounds the total in-flight upload bytes held on disk; `max_file_size` bounds any single blob. The configuration asserts `max_file_size < max_cache_size` and warns if fewer than ~10 concurrent maximum-size uploads would fit. The distinct task pools — network I/O, file I/O, and hashing — are sized and load-tested independently against realistic hardware limits. + +## Finalization and Integrity + +When received bytes reach the declared size, the server finalizes: + +1. Session transitions to **WaitingForProcessing**. +2. Chunks are assembled into the final blob. +3. The server recomputes the [content hash](/design/cryptography/primitives/) over the assembled ciphertext on the blocking pool and compares it to the declared `hash`. +4. **On match** — the pending asset is marked uploaded inside a Postgres transaction and the session transitions to **Completed**. +5. **On mismatch** — the blob and the pending asset row are deleted, the session transitions to **FailedProcessing**, and a checksum-mismatch error is returned. A mismatch is always treated as corruption or tampering and is never silently retried server-side. + +The server verifies only the *ciphertext* hash — it has no other option. The client independently verifies the *plaintext* on download via the [STREAM construction](/design/cryptography/encryption/#stream-construction)'s per-chunk authentication tags, which detect truncation, reordering, and chunk deletion. The two checks are complementary: the server guarantees "the bytes I stored are the bytes you declared," and the AEAD guarantees "the plaintext I decrypted is authentic." + +## Robustness + +- An upload is not expected to run to completion in a single connection. The server tolerates arbitrarily long pauses within the session TTL, and clients resume via `HEAD`. [Auto syncing](/design/import/download-sync/#auto-syncing) explicitly assumes interrupted transfers are normal. +- A chunk re-sent at an already-acknowledged offset is idempotent. A chunk at a stale offset receives `409` together with the authoritative offset so the client can re-align. +- Concurrent finalization attempts on a single session are guarded — a second attempt observes a non-`Pending`/`Uploading` status and returns a conflict rather than double-processing. +- Every critical step — session creation, each chunk, assembly, hash verification, finalization — is logged with the upload ID so an interrupted or failed upload can be reconstructed and recovered after the fact. + +## Adaptive Chunk Sizing + +The server suggests an initial chunk size by file-size tier — `< 10 MB` → 256 KiB, `< 100 MB` → 1 MiB, `≥ 100 MB` → 4 MiB. The client may then adapt *within a tier-bounded range* based on throughput measured over a sliding 30-second window: doubling the chunk size when sustained throughput is high (`> 5 MB/s`), halving it when low (`< 1 MB/s`), and always staying 4 KiB-aligned. The rationale is a direct trade-off — chunks that are too small waste round-trips, while chunks that are too large waste re-transmission on a flaky link and pin more memory per in-flight request. + +Adaptation is purely a client concern; the server only enforces alignment and bounds. The client must never let adaptation regress effective throughput — if a tier's range is mis-tuned, the conservative choice is the tier minimum. + +We deliberately do **not** expose per-blob upload *ordering* as a protocol concern. Concurrent sessions plus the OS and TCP stack settle ordering naturally; see [Pipeline — Upload Prioritization](/design/import/pipeline/#upload-prioritization) for the client-side heuristics that decide which assets to *start*. + +## Deduplication and Merge + +Because blobs are addressed by their [ciphertext content hash](/design/cryptography/primitives/), the protocol avoids redundant transfers: + +- At session creation, the server checks for an asset with the same content hash already owned by the user. An exact duplicate that exists both locally and remotely is rejected up front — nothing is re-uploaded. The dedup check and the pending-row insert run inside a single PostgreSQL transaction (a `SELECT ... FOR UPDATE` followed by `INSERT ... ON CONFLICT`), so two concurrent uploaders cannot both observe "no existing row" and each insert their own — the TOCTOU race is closed at the database layer. +- The [import pipeline](/design/import/pipeline/#plan--confirm) treats already-uploaded *local* assets as non-importable. But because encryption and hashing are deferred until upload, an asset may already exist remotely under a *different* ciphertext (for example, re-encrypted under a newer album key). Import still admits such an asset, and the upload then resolves to a **merge**: the server links the existing stored blob to the new asset and album reference rather than storing a second copy. The original blob's upload short-circuits, and only the new metadata blob is transferred. +- **Merge is strictly additive on the server.** A merge **never** deletes an existing blob or rewrites an existing manifest — it only adds a new reference. The blob's reference count goes up, never down, on merge. Reference removal happens only through an explicit `delete` lifecycle action signed by a current writer (see [Authorization](/design/authorization/)), and the underlying blob is hard-purged only after every reference is provably gone. + +These checks deduplicate at upload time. Byte-identical assets that still slip into a client library — for example through overlapping folder imports or a restore over an existing library — are collapsed separately by client-side [intra-library deduplication](/design/filesystem/maintenance/#deduplication). + +## Quota and Permissions + +- An upload is attributed to `upload_user_id` (the authenticated uploader) for storage-quota accounting, which is distinct from `owner_id` (the asset's owner). Uploading on behalf of a different owner requires a verified relationship and is permission-checked at session creation. The quota accounting model is owned by [Quota](/design/quota/). +- Adding an asset to an album requires write-tier album access (`AMK_write`; see [Cryptography — Keys](/design/cryptography/keys/#album-master-keys-amks)); the server validates album write permission before creating the session. +- For an ordinary asset bundle the client resolves a concrete container `album_id` — the user's choice or the [default album](/design/organization/#the-default-album) — **before** encryption, since the bytes are encrypted under that album's AMK. So `album_id` is effectively always present for asset uploads; the `optional` marking on `POST /upload` covers only non-asset/owner-scoped kinds and the `intent_id`-bearing [album upgrade](/design/versioning/#album-upgrade-ceremony). This is why [invariant 6](/design/threat-model/validation/#server-side-validation-invariants) can require `album_id` to exist and be writable. +- Only the uploader may append chunks. The uploader or the owner may query (`HEAD`) or cancel (`DELETE`) a session. + +## Validation + +The wire protocol is the boundary across two modules, so both sides have rich isolated test surfaces: + +- **Server protocol conformance (smoke).** Exercise the full state machine against the real server against a testcontainer Postgres (and Valkey for the high-concurrency profile): create session → PATCH chunks → finalize → verify Completed. Mock the client at the HTTP layer using a generated request fixture set. +- **Server chunk-rule rejection (unit).** Each rule (non-aligned non-final chunk, gapped offset, duplicate offset with different hash, cumulative-over-size, oversize file) has a unit test asserting the exact rejection code. +- **Server idempotency (unit).** Replay each idempotent endpoint with identical input; assert byte-identical response. +- **Server finalization integrity (smoke).** Concatenate chunks; recompute hash; assert match. Inject a corrupted chunk; assert FailedProcessing and full cleanup of the pending row + chunks. +- **Client protocol conformance (smoke).** The client `capsule-sdk::upload` runs against a mocked HTTP layer that replays the rejection codes the server's unit tests exercise; assert the client handles each correctly (re-align on 409, abort-and-reimport on 426, etc.). +- **Client resume semantics (smoke).** Start an upload, interrupt at random offset, resume; assert no bytes re-sent that the server already has. + +The cross-module case — real client → real server full upload — is bounded E2E surface listed in [Module Map](/design/module-map/#e2e-test-surface). Because both sides have rich smoke coverage, the E2E case can be a single happy-path round-trip rather than the full rejection matrix. diff --git a/capsule-docs/src/content/docs/design/index.md b/capsule-docs/src/content/docs/design/index.md new file mode 100644 index 0000000..57146a8 --- /dev/null +++ b/capsule-docs/src/content/docs/design/index.md @@ -0,0 +1,25 @@ +--- +title: Design Overview +description: How Capsule's design docs are organized and where to start +--- + +Capsule is an end-to-end-encrypted personal photo and media store with optional federation. These design docs are its normative specification: every primitive, schema, and protocol is declared in exactly one **owner doc** and referenced by anchor everywhere else (the [Single Source of Truth rule](/design/principles/#single-source-of-truth)). + +## The shape of the system + +The design stacks in layers, each building on the one below — the sidebar groups follow this order: + +- **Foundations** — the [core principles](/design/principles/) every component obeys, and the [module map](/design/module-map/) from code module to owning doc. +- **Cryptography** — the [primitives](/design/cryptography/primitives/) inventory, the [key hierarchy](/design/cryptography/keys/), [MLS](/design/cryptography/mls/) group membership, asset/metadata [encryption](/design/cryptography/encryption/), and signed [provenance](/design/cryptography/provenance/). The server holds only opaque ciphertext — never a key. +- **Identity & access** — [authentication](/design/authentication/), [authorization](/design/authorization/), and [device enrollment](/design/device-enrollment/). +- **Storage** — the [server](/design/filesystem/server/) and [client](/design/filesystem/client/) filesystems, the [metadata](/design/metadata/) sidecar schema, and [thumbnails](/design/thumbnails/). +- **Import & sync** — the [import pipeline](/design/import/pipeline/), [upload protocol](/design/import/upload-protocol/), [download & sync](/design/import/download-sync/), [backup](/design/backup-recovery/), and [versioning](/design/versioning/). +- **Sharing & federation** — server-to-server [federation](/design/federation/), device-to-device [peering](/design/peering/), [share links](/design/share-links/), and [moderation](/design/moderation/). +- **Organization & clients** — [albums and stacks](/design/organization/), native [client duties](/design/clients/), and on-device [AI/ML](/design/ai/). +- **Threat model** — the cross-cutting [damage-scenario map](/design/threat-model/scenarios/), [validation invariants](/design/threat-model/validation/), and [schema rules](/design/threat-model/schema-rules/) that bound what a faulty or hostile client can do. + +## Where to start + +- **New to the project?** Read [Core Principles](/design/principles/), then the [Cryptography overview](/design/cryptography/). +- **Implementing a feature?** Find your code module in the [Module Map](/design/module-map/) — it names the owning design doc and the validation tier. +- **Reviewing security?** Start at the [Threat Model](/design/threat-model/) and follow each damage scenario to the owner doc that defeats it. diff --git a/capsule-docs/src/content/docs/design/metadata.md b/capsule-docs/src/content/docs/design/metadata.md index 14389cd..5f9d24f 100644 --- a/capsule-docs/src/content/docs/design/metadata.md +++ b/capsule-docs/src/content/docs/design/metadata.md @@ -1,35 +1,30 @@ --- title: Metadata -description: How Capsule extracts and utilizes metadata from assets +description: The CBOR sidecar schema v1, the CRDT semantics for collaborative metadata, identifiers, and geolocation --- -## Design Philosophy +The CBOR sidecar is the canonical, plaintext-local-only metadata record for every asset (see [Filesystem — Client](/design/filesystem/client/)). It is **self-describing**: field 0 carries the schema version so any reader can detect a schema it does not implement *before* parsing the rest. Versioning the schema in-band is what prevents a faulty or old client from corrupting state with a partial parse. -All metadata processing in Capsule is handled by `capsule-core`, which is implemented in Rust and exposed to all languages via FFI. It handles the I/O natively and is generally opaque to minimize FFI surface. +This doc is the **single source of truth** for the CBOR sidecar schema. The schema below — every field, type, and ordering rule — is the contract every implementation must conform to byte-for-byte (else cross-peer signatures break). Per the [SSoT rule](/design/principles/#single-source-of-truth), other docs reference fields here by name and never re-declare them. -This doc is the **single source of truth** for the CBOR sidecar schema. Per the [single-source-of-truth rule](/design/principles/#single-source-of-truth), other docs reference fields here by name and never re-declare them. - -## Metadata Capabilities - -We minimize the logic involved in repository and leverage dependencies where useful. This is the rough breakdown (subject to being outdated): - -- `capsule-core`: Extracts the filesystem metadata for verification and indexing. +All metadata processing lives in `capsule-core::metadata` (extraction, filtering, querying) and `capsule-core::sidecar` (encoding, signing, schema versioning). Implementation is in Rust and exposed to all native clients via FFI from `capsule-core` — the I/O is handled natively to minimize FFI surface. ## Sidecar Schema v1 -The CBOR sidecar is the client's canonical, plaintext-local-only metadata record (see [Filesystem — Client Filesystem](/design/filesystem/#client-filesystem)). It is **self-describing**: field 0 carries the schema version so any reader can detect a schema it does not implement *before* parsing the rest. Versioning the schema in-band is what prevents a faulty or old client from corrupting state with a partial parse (see [Threat Model — Schema Evolution](/design/threat-model/)). - ```rust SidecarV1 { sidecar_schema: u16, // FIELD 0 — readable before parsing the rest. Currently 1. crypto_suite_id: u16, // matches the asset's manifest; see Cryptography uuid: UUIDv7, - hash: { algo: String, value: bytes }, // canonical plaintext hash + hash: bytes, // canonical plaintext digest; algorithm + length fixed by crypto_suite_id (see Primitives) capture_timestamp: RFC3339, import_timestamp: RFC3339, content_type: String, // closed enum per protocol_version dimensions: Option<{ width: u32, height: u32 }>, + // display placeholder — image-derived, lives inside this encrypted sidecar (see Thumbnails — LQIP) + lqip: Option<{ chromahash: bytes, format_version: u16, dominant_color: [u8; 3] }>, + // collaborative metadata (see Collaborative Metadata below) tags_user: OR_set<(tag: String, add_id)>, tags_ai: OR_set<(tag: String, add_id, model_id: String, model_version: String)>, @@ -37,6 +32,9 @@ SidecarV1 { superseded_captions: Vec<{ value: String, written_by: device_id, ts: RFC3339 }>, // bounded ≤ 16 rating_lww: Option<{ value: u8, ts: RFC3339, by: device_id }>, + // organization — stack grouping; StackMembership shape owned by Asset Organization + stack_membership: Option, + // identifiers (see Identifiers below; privacy-on-export rules apply) camera_id: Option<{ model: String, serial: String }>, device_id: UUIDv4, @@ -63,9 +61,23 @@ SidecarV1 { - The signature covers every byte including `_unknown`, so stripping unknown fields invalidates the signature and is detectable. - A schema bump is a coordinated change; per [Versioning — Album Protocol Version Pinning](/design/versioning/#album-protocol-version-pinning), an album's pinned protocol version constrains which sidecar schemas may be written into it. +### Canonical CBOR Encoding + +The sidecar — and the [encrypted metadata blob](/design/cryptography/encryption/#metadata-encryption) whose plaintext is this same CBOR document — must serialize **byte-identically across every implementation and language**: the bytes are what the [signed manifest](/design/cryptography/provenance/#asset-manifest) and content hash commit to, so one divergent byte makes an honest sidecar look forged to another platform or [federated](/design/federation/) peer. The canonical rules are RFC 8949 §4.2 deterministic encoding, normative here: + +- **Definite-length encoding only** — no indefinite-length maps, arrays, text strings, or byte strings. +- **Shortest-form integers** — the smallest of the 1/2/4/8-byte encodings that represents the value. +- **Map keys sorted by the bytewise lexicographic order of their *encoded* form, with no duplicate keys.** This ordering governs *every* map, including `_unknown` — unknown keys are re-sorted into the same canonical order on write, so a round-trip through any conformant client is byte-stable and the signature (which covers `_unknown`) still verifies. +- **Floats** in the shortest IEEE-754 form (16/32/64-bit) that round-trips the value exactly; the canonical quiet NaN for NaN. Capsule avoids floats in signed structures where an integer or string suffices. +- **Field 0** (`sidecar_schema`) sorts first under the rule above, so a reader reads the schema version before parsing the rest. + +Every implementation — the Rust `capsule-core::sidecar` encoder and any FFI consumer — MUST emit identical bytes for the same document, enforced as a **blocking cross-language conformance gate** against shared **known-answer vectors** committed in `capsule-core::sidecar` (the same fixtures [Encryption](/design/cryptography/encryption/#metadata-blob-wire-format) tests against): a consumer that drifts cannot ship, because its signatures would not verify across peers. + ### Add-id Binding -`add_id` is the tuple `(device_id: UUIDv4, monotonic_counter: u64)`, where `monotonic_counter` is incremented per-device per-(asset, OR-set) pair. Every OR-set add carries an `add_id`; every OR-set remove targets a specific `add_id`. A remove that names an `add_id` the receiver has never observed an add for is **rejected**, not silently no-op — preventing the "remove an element you never added" attack noted in the [Threat Model](/design/threat-model/). +`add_id` is the tuple `(device_id: UUIDv4, monotonic_counter: u64)`, where `monotonic_counter` is incremented per-device per-(asset, OR-set) pair. Every OR-set add carries an `add_id`; every OR-set remove targets a specific `add_id`. A remove that names an `add_id` the receiver has never observed an add for is **rejected**, not silently no-op — preventing the "remove an element you never added" attack noted in the [Threat Model](/design/threat-model/scenarios/). + +**Counter durability across restarts.** A `monotonic_counter` must never repeat for a given `(device_id, asset, OR-set)`: a reused `add_id` would alias two distinct adds, so removing one would silently delete the other and break OR-set convergence. The counter is persisted in the local [index](/design/filesystem/client/#desktop-library-layout), and on client restart or reinstall it is **reseeded to one past the maximum `add_id.counter` this device has ever issued**, recovered from the signed sidecars themselves (a device's own past `add_id`s are durably recorded in the sidecars it wrote). An add lost to a crash *before* its sidecar was persisted was never observed by any peer, so its counter may be safely reused — correctness depends only on never reusing a counter that ever reached a written sidecar. A counter is reset to zero only when the device can prove it has issued nothing — i.e. no sidecar bears its `device_id`. This makes the counter monotonic over the lifetime of a `device_id`, not merely within one process. ## Identifiers @@ -81,7 +93,7 @@ The identifiers above and several other metadata fields are **fingerprinting sur A boundary crossing is any of: -- A **share link** is generated for a non-member of the album. +- A **[share link](/design/share-links/)** is generated for a non-member of the album. - An **external backup** is exported to media the user will hand off (e.g. cloud storage shared with someone else, a physical drive given to a friend). - A **federated peer** outside the owning user's home server fetches the asset (see [Federation](/design/federation/)). @@ -92,7 +104,7 @@ When the boundary is crossed, the following fields are stripped from the exporte | Camera serial number | Stripped | Full value | | Device identifier (UUIDv4) | Stripped | Full value | | Session ID | Stripped | Full value | -| GPS coordinates | Truncated to city-level precision (~1 km) | Full precision | +| GPS coordinates | Rounded to 2 decimal places (≈1 km) | Full precision | | Personal contact tags (faces matched to a known person) | Stripped | Retained | Stripping happens at the moment of export — the encrypted sidecar inside the user's library is untouched, so the user does not lose the data locally. Retention opt-in is per-export, not a sticky account setting, to prevent foot-guns where a user opts in once and forgets. @@ -113,15 +125,17 @@ A plain LWW register loses one side of a tied edit silently — a real problem w - The losing value of every concurrent caption edit lands in `superseded_captions`, capped at 16 entries (oldest evicted). Each entry carries who wrote it and when, so the UI can surface a "this caption replaced another" hint and let the user restore the earlier value. - Ratings are unambiguous numerically; they do not need a superseded log. -This converts a silent-data-loss damage vector (a buggy client clobbering another device's edit) into an explicit, recoverable surface. See [Threat Model — Forbidden Client Behaviors](/design/threat-model/) for the corresponding rule that clients must never strip `superseded_captions`. +This converts a silent-data-loss damage vector (a buggy client clobbering another device's edit) into an explicit, recoverable surface. See [Threat Model — Forbidden Client Behaviors](/design/threat-model/schema-rules/#forbidden-client-behaviors) for the corresponding rule that clients must never strip `superseded_captions`. ### How Operations Travel We encrypt the **operations**, not the resulting state. Merges are then commutative and associative, so order of arrival does not matter and a peer replaying a stale operation cannot corrupt current state. The operation log reconciles into the canonical CBOR sidecar, which remains the source of truth (see [Core Principles](/design/principles/) — recovery-first). -Each operation carries the same `prior_provenance_hash` chain link as any [lifecycle action](/design/authorization/#asset-lifecycle), so a metadata-update is provenance-tracked exactly like a create or delete. +Each operation carries the same `prior_provenance_hash` chain link as any [lifecycle action](/design/authorization/#the-closed-action-set), so a metadata-update is provenance-tracked exactly like a create or delete. + +Album *membership* is deliberately **not** a CRDT here — it is driven by MLS proposals and commits (see [Cryptography — MLS](/design/cryptography/mls/)), which already resolve concurrent changes. -Album *membership* is deliberately **not** a CRDT here — it is driven by MLS proposals and commits (see [Group Membership](/design/cryptography/#group-membership)), which already resolve concurrent changes. +The same encrypted-operation path also carries the per-owner **library-settings document** — [smart-album](/design/organization/#system--smart-albums-views) definitions (predicate + display name) and similar client-authored organizational state — synced and merged across devices like any other collaborative metadata, and never legible to the server. (The [default-album](/design/organization/#the-default-album) *designation* is separate: a non-secret server-side owner pointer, not part of this encrypted document.) This LWW/OR-set approach is intentionally simpler than a full event-graph with state resolution: photo metadata does not need it, and the extra machinery would not be functionally justified. @@ -130,27 +144,33 @@ This LWW/OR-set approach is intentionally simpler than a full event-graph with s User tags and AI-suggested tags live in **structurally separate OR-sets** (`tags_user` and `tags_ai` in the [sidecar schema](#sidecar-schema-v1)). The separation is structural, not policy: - An AI tag can never overwrite a user tag and vice versa — they are different fields, so the question does not arise. A hallucinating model cannot pollute user intent. -- Every `tags_ai` entry carries `model_id` and `model_version` (see [ML Models](/design/ml-models/)). When the canonical model for that slot changes, AI tags from the old model are flagged as stale; cross-model semantic comparison is forbidden (see [Threat Model — Client-Side Validation Invariants](/design/threat-model/)). +- Every `tags_ai` entry carries `model_id` and `model_version` (see [AI — Embedding Provenance](/design/ai/#embedding-provenance)). When the canonical model for that slot changes, AI tags from the old model are flagged as stale; cross-model semantic comparison is forbidden (see [Threat Model — Client-Side Validation Invariants](/design/threat-model/validation/#client-side-validation-invariants)). - A user can **promote** an AI tag — explicit user action copies the entry to `tags_user` (with a fresh user-scoped `add_id`) and may optionally remove it from `tags_ai`. Promotion is a signed lifecycle operation; never automatic. - A user can **dismiss** an AI tag — an OR-set remove on `tags_ai` keyed by the original `add_id`. -The same dual-namespace structure applies to any future ML-derived metadata field that overlays a user-editable one (face labels, location guesses, etc.). The owner doc for the model is [ML Models](/design/ml-models/); the storage shape is owned here. +The same dual-namespace structure applies to any future ML-derived metadata field that overlays a user-editable one (face labels, location guesses, etc.). The owner doc for the model is [AI/ML Integrations](/design/ai/); the storage shape is owned here. ## Geolocation -Most modern camera devices record geolocation data. This is almost universally in **WGS-84 (Earth Coordinates)**. However, mapping data in China (perhaps there are also other countries) use obfuscated coordinates, namely: +GPS is stored canonically in **WGS-84** (`gps.lat` / `gps.lon`), the near-universal camera format. Some jurisdictions mandate obfuscated coordinates for display — notably China's **GCJ-02**, and Baidu's **BD-09** (a second obfuscation layer over GCJ-02). Capsule always stores WGS-84 and converts to the required system **deterministically and client-side** (in `capsule-core`) at plot time; the stored coordinate is never the obfuscated one. Per-platform map-provider selection is a client/deployment concern, not part of this schema. + +## Validation -- GCJ-02 (Mars Coordinates): The obfuscated coordinate system mandated by the Chinese government for national security. All authorized maps inside mainland China (AMap/Gaode, Tencent Maps, Apple Maps via AMap) use this. -- BD-09 (Baidu Coordinates): Baidu Maps takes GCJ-02 and applies a second layer of obfuscation. You only need to worry about this if you specifically use the Baidu Maps SDK. +The sidecar schema is the contract; validation focuses on serde determinism + CRDT correctness. -While annoying, we can translate WGS-84 coordinates into the obfuscated coordinates with a deterministic algorithm before plotting on maps. Capsule does this strictly on the client-side with the capability found in `capsule-core`. +- **Canonical CBOR conformance (unit + cross-language).** Encode a fixture sidecar (including a populated `_unknown` map); assert byte-identical output across runs, platforms, and every FFI consumer, matching the shared known-answer vectors for the [canonical ruleset](#canonical-cbor-encoding) — key sort including `_unknown`, shortest-form integers, definite-length only. Re-decode; assert structural equality. This is a **blocking conformance gate**, not advisory. +- **Add-id counter durability (unit).** Issue adds advancing the counter; drop the in-memory counter to simulate a restart/reinstall; reseed from the device's existing sidecars; assert the next `add_id.counter` is strictly greater than every counter the device previously issued — never a reuse. +- **Schema versioning enforcement (unit).** Construct a sidecar with `sidecar_schema = N+1`; load on a reader whose `max_known = N`; assert write-refusal. Construct with `sidecar_schema = N`; assert acceptance. +- **OR-set merge convergence (unit).** Generate add/remove operations from N devices in random order; merge in every permutation; assert byte-identical final state across permutations. +- **Add-id rejection (unit).** Issue a remove with an `add_id` never observed locally; assert rejection (not silent no-op). +- **LWW with superseded capture (unit).** Two devices write captions within milliseconds; merge; assert the winner is the lexicographic-tiebreak chosen, and the loser appears in `superseded_captions`. +- **Privacy-on-export stripping (unit).** Each row of the privacy table is a fixture test: assert the field is stripped by default, retained when opt-in is set, and that the local sidecar is unchanged either way. +- **Concurrent-edit reconciliation (smoke).** Two test clients edit the same album offline; merge over MLS; assert convergence with no manual conflict resolution needed. -### Mapping Providers +Cross-module case: metadata edited on device A → synced via server → applied on device B with correct CRDT merge. Bounded E2E surface in [Module Map](/design/module-map/#e2e-test-surface). -These are the recommended mapping providers for all scenarios: +## Related -- All Apple devices: Apple Maps (uses AMap data in China so it works globally) -- Web clients in China: AMap (Gaode) JavaScript API -- Web clients outside of China: Google Maps JavaScript API -- All non-Apple devices in China: AMap/Gaode (Tencent Maps is also fine but AMap has better support for geolocation and POI search) -- All non-Apple devices outside China: Google Maps (this is the most robust and developer-friendly provider). +- [Asset Organization](/design/organization/) — albums and stacks that consume the `stack_membership` field. +- [AI/ML Integrations](/design/ai/) — owner of the models behind `tags_ai` and the reserved AI-facet fields. +- [Thumbnails and Previews](/design/thumbnails/) — owner of the LQIP scheme carried in the `lqip` field. diff --git a/capsule-docs/src/content/docs/design/ml-models.md b/capsule-docs/src/content/docs/design/ml-models.md deleted file mode 100644 index 1d551d4..0000000 --- a/capsule-docs/src/content/docs/design/ml-models.md +++ /dev/null @@ -1,106 +0,0 @@ ---- -title: ML Models and Algorithms -description: The model inventory and key algorithmic implementations behind Capsule's AI features ---- - -This is the reference companion to [AI/ML Integrations](/design/ai/): the -concrete model chosen for each task, and the key algorithms that combine them. - -**This doc is the canonical model inventory.** Per the [single-source-of-truth rule](/design/principles/#single-source-of-truth), every ML model identity Capsule uses is declared here and referenced from other docs by link. Swapping a model is a one-row edit in the table below. - -> **Status:** The table below is **provisional** pending experimentation and field testing on Capsule's target devices in 2026. The doc *structure* — one canonical row per task with an explicit `model_id`/`model_version` — is the stable contract; the specific row choices are subject to revision and individual rows may be marked WIP or alt as the inventory matures. - -**E2EE constraint on embedding models.** Capsule's server never holds plaintext, so embeddings are generated client-side. Every device that ingests assets must therefore run the *same* embedding model — otherwise vectors aren't comparable across devices. The model size floor is set by the lowest-end device Capsule supports, not by what runs comfortably on a desktop. - -## Embedding Provenance - -Every embedding stored in Capsule — locally in the SQLite vector index, in an encrypted backup, or inside a [`DerivativeManifest`](/design/cryptography/#derivative-provenance) for an embedding-class derivative — carries the tuple `(model_id, model_version)` identifying which row of the table below produced it. Embeddings are not comparable across `(model_id, model_version)` pairs: the vector spaces are different. The invariant: - -- The vector index **refuses inserts** whose `model_id` is not the current canonical row for its task (the row marked `WIP (high priority)` or its successor). A buggy or new client uploading embeddings from an unrecognized model is rejected at the insert API, never silently mixed in. -- A model swap (a new row replacing an old one) increments `model_version` for that task. Old embeddings are **flagged as stale** and excluded from queries until they are regenerated from the originals. Cross-version semantic comparison is forbidden — see [Threat Model — Client-Side Validation Invariants](/design/threat-model/#client-side-validation-invariants). -- Regenerating embeddings after a model swap is a background task that walks the library and produces fresh embeddings at the new `model_version`. The old entries are removed only after the new ones are persisted (atomicity: per-asset replace, not a global truncate-and-rebuild). -- The mapping from `model_id` to a row in this table is what gives a swap its *single-doc-edit* property: changing the canonical model is a one-row edit here, the `model_id` string changes, and every downstream consumer follows. - -This invariant lives in [Threat Model — § Damage Scenario Map](/design/threat-model/#damage-scenario--invariant-map) row #14 and is what defeats the "silent invalidation of the vector index" damage class identified in the audit. - -## Specific ML Tasks & Models - - - - - - -| Task | Category | Model(s) | Dataset(s) | Function | Implementation Status | -| --------------------------------- | ---------------- | ------------------------------------------------------------------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- | -| **Semantic Search** | Natural Language | **MobileCLIP-B** (ONNX, INT8) — canonical; quantized SigLIP-tiny as fallback[^semantic-alt] | | Generates global image embeddings for natural language search. Sized for the lowest-end device Capsule supports (see the E2EE constraint above). | WIP (high priority) | -| **Dense Tagging & OCR** | Dense Tagging | Florence-2 | | Unified vision-language model for bounding boxes, dense captions, and reading text. | -| **VLM / Image Chat** | Natural Language | Qwen2.5-VL or LLaVA-1.6 | | Quantized models for on-demand conversational queries about an image. | -| **Image Captioning** | Natural Language | BLIP-2 | | Generates a natural language description of the image content. | -| **Face Detection** | People | SCRFD | | Highly efficient face bounding box and landmark detection. | WIP (high priority) | -| **Face Recognition** | People | InsightFace (AdaFace) | | Generates face embeddings. AdaFace excels at handling low-quality/dark images. | WIP (high priority) | -| **Person Detection** | People | YOLOv10 | | Object detection for identifying "person" bounding boxes. | -| **Person Re-ID** | People | OSNet or TorReID | | Generates embeddings based on clothing and body shape when faces are hidden. | -| **Expression Analysis** | People | EmotioNet | | Detects facial action units to infer emotions. | -| **Quality Scoring** | People | LIQE / TOPIQ | | Blind image quality assessment for noise, blur, and lighting without a reference image. | -| **Object Detection** | Scene | **YOLOv10**[^objdet-alt] | | Detects objects and background elements for dense tagging. | WIP (high priority) | -| **Scene Classification** | Scene | VIT-L, ConvNeXt-L | Places365, SUN397 | Classifies the overall scene (e.g., "beach", "wedding", "cityscape"). | -| **Landmark Detection** | Scene | DINOv2 + GeM pooling | Google Landmarks v2 | Detects key landmarks (e.g., Eiffel Tower, Golden Gate Bridge) for geotagging. | -| **Bird/plant Detection** | Scene | BioCLIP | iNaturalist 2021 | Identifies and classifies birds and plants within images. | -| **General Animal Detection** | Scene | YOLOv8 finetuned on Open Images Animals | Open Images Animals | Detects common animals (dogs, cats, horses) for tagging and search. | -| **OCR** | Text | TrOCR | SynthText, IIIT-5K | Extracts text from images, including handwriting and signage. | -| **Screenshot Detection** | Scene | Custom CNN classifier | | Identifies screenshots to help culling. | -| **Voice Transcription** | Audio | **Distil-Whisper-large**[^asr-alt] | | Speech recognition for generating transcripts from video audio tracks. ~6× faster than Whisper-large-v3 at ~1% WER cost — the trade is the right one for on-device transcription. | -| **Aesthetic Scoring** | Quality | NIMA (Efficientnet head) | AVA Dataset | Rates the aesthetic quality of images to help users find their best shots. | -| **Blur detection** | Quality | Laplacian variance + CNN regressor | DefocusNet, CUHK | Detect blurry images. | -| **Exposure Assessment** | Quality | Custom CNN regressor | Custom | Evaluates the exposure level of images to ensure optimal lighting conditions. | -| **Noise Estimation** | Quality | Custom CNN regressor | Custom | Estimates the noise level in images to help users identify and filter out noisy shots. | -| **Near-duplicate / burst** | Similarity | pHash/dHash + CNN | Custom | Same moment, slightly different | -| **Semantic new-duplicate** | Similarity | Embeddings from the canonical Semantic Search row + ANN | Custom | Same subject, different angle/day | -| **Best-shot selection** | Similarity | Quality models combined? | Custom | Select sharpest/best-exposed from burst | -| **Shot/scene boundary detection** | Video | TransNet v2, PyScene Detect | BBC Planet Earth, ClipShots | Segment video for thumbnail/highlights | -| **Highlight extraction** | Video | Temporal attention + quality scroe | SumMe, TVSum | Extract best moments from videos for highlights and thumbnails. | -| **Action/activity recognition** | Video | VideoMAE, TimeSformer | Kinetics-700, ActivityNet | Sports, cooking, playing, travel | -| **NSFW Detection** | Categorization | OpenCLIP or custom CNN | NSFW datasets | Detects explicit content to help users filter and manage sensitive media. | -| **Violence / Graphic Content** | Categorization | ViT classifier | Custom | Detects and flags sensitive content (e.g. in shared albums) | - -[^semantic-alt]: Considered and rejected: SigLIP-so400m (~400M params, impractical on the lowest-end mobile we support — the E2EE constraint forces every device to run the same model), full CLIP ViT-L/14 (similar size class), OpenCLIP ViT-G (much larger). MobileCLIP-B is the size sweet spot; quantized SigLIP-tiny stays as a fallback if MobileCLIP semantic quality is insufficient in field tests. -[^objdet-alt]: Considered and rejected for the *committed* slot: Grounding DINO (open-vocabulary; heavier; revisit if dense-tagging breadth becomes the bottleneck), RT-DETR (transformer-based; comparable accuracy, slower on mobile). YOLOv10 is the committed choice; alternatives may run as additional specialized passes later. -[^asr-alt]: Considered and rejected: Whisper-large-v3 (best accuracy but too slow on mobile for opportunistic background transcription), Whisper-medium (similar speed to Distil-Whisper-large but worse accuracy), faster-whisper CT2 ports (a runtime optimization layer; can be applied on top of Distil-Whisper). - -## Key Algorithmic Implementations - - - -### Video-as-Sparse-Photos Algorithm - -Processing every frame of a video through heavy ML models is computationally prohibitive. This algorithm treats video as a sparse collection of keyframes. - -1. **Cut Detection:** Use PySceneDetect (Content-Aware routing) to chunk the video into visually distinct scenes. -2. **Temporal Sampling:** Extract frames at the 10%, 50%, and 90% timestamps of each scene. -3. **Blur Rejection:** Calculate the variance of the Laplacian for each extracted frame: - - $$V = \text{var}(\nabla^2 I)$$ - -. If $V$ is below a defined threshold, the frame is too blurry and is discarded. -4. **Audio Processing:** Run the canonical ASR model (see the **Voice Transcription** row above) concurrently to generate a timestamped transcript. -5. **Integration:** The surviving keyframes are pushed into the standard image-processing queue. Database records map the keyframe embeddings to the parent `video_id` and specific timestamp. - -### The Re-ID & Pseudo-Labeling Loop - -This algorithm identifies individuals even when they turn away from the camera during an event. - -1. **The Anchor Pass:** When an image contains a high-confidence frontal face, run InsightFace. If the embedding matches a known profile (e.g., "Bride"), record the bounding box. -2. **The Body Pass:** Run a standard object detector (YOLOv10) to find all "person" bounding boxes. Pass these crops through OSNet to get a 512-dimensional body embedding. -3. **The Linking Phase:** Calculate the Intersection over Union (IoU) of the Face bounding box and the Body bounding box. If $\text{IoU} > 0.7$, link the OSNet body embedding to the "Bride" profile for the duration of this specific album/event. -4. **Pseudo-Labeling:** When an image features a person facing away (no face detected), compare the OSNet body embedding against the temporary event-specific body embeddings using cosine similarity: - - $$\text{sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|}$$ - -. If the similarity exceeds the threshold, tag the individual as the "Bride." - -### High-Dimensional Vector Search in Postgres - -To maintain high throughput in Postgres, exact K-Nearest Neighbors (KNN) is too slow for millions of rows. - -1. Implement **HNSW (Hierarchical Navigable Small World)** indexes on the `pgvector` columns. -2. Use the inner product operator (`<#>`) for normalized embeddings, as it is computationally cheaper than calculating $L_2$ distance (`<->`) or cosine distance (`<=>`) at scale. diff --git a/capsule-docs/src/content/docs/design/mls-resilience.md b/capsule-docs/src/content/docs/design/mls-resilience.md new file mode 100644 index 0000000..b377d78 --- /dev/null +++ b/capsule-docs/src/content/docs/design/mls-resilience.md @@ -0,0 +1,70 @@ +--- +title: MLS Resilience +description: How Capsule's MLS layer recovers from lost commits, state divergence, and group corruption +--- + +OpenMLS handles MLS (RFC 9420) correctly under normal operation — commits ordered by the group's chain, duplicates rejected, ratchet advanced atomically. But MLS can still hit scenarios the base protocol does not resolve on its own: a commit lost in transit, two devices proposing concurrently with the wrong ordering, a member whose local state has diverged from the server's. This doc owns Capsule's recovery contracts for those edge cases. + +It is kept **separate** from [Cryptography — MLS](/design/cryptography/mls/) (which owns the ciphersuite binding and the four standard ceremonies) because recovery is a distinct, cross-cutting concern — it reaches into the [OGK](/design/cryptography/keys/#owner-group-keys-ogks), backup, and quarantine UX, not the steady-state membership protocol. The recovery surfaces here are exercised in `capsule-core::crypto::mls` (the OpenMLS wrapper) and surface to users through quarantine + reconciliation UX in the native clients. + +## Failure Modes + +The MLS-layer scenarios that need defined recovery contracts. Each is a candidate damage scenario that the existing [scenario map](/design/threat-model/scenarios/#damage-scenario--invariant-map) does not currently address head-on: + +### Lost commit + +A device sends an MLS commit (e.g. an `Add` or AMK rotation) and the server never receives or persists it. The sending device believes it succeeded; other devices never see the new epoch. + +**Recovery direction:** the server's MLS commit chain is the source of truth. A device that doesn't see its committed epoch reflected in the chain within a detection timeout (default 30 s) treats the commit as lost and re-submits, backing off on each attempt (default 30 s → 2 min → 10 min). The commit chain provides idempotency — OpenMLS rejects a duplicate, so a retry that *did* land is harmless. After the backoff budget is exhausted (default 3 attempts) the membership change is surfaced to the user ("couldn't sync — will retry when connectivity returns"), never silently abandoned. + +### State divergence + +Two devices' local MLS state has diverged — different views of the group's current epoch, different write-tier key, different member list. This can happen after a buggy commit, an incomplete sync, or a long offline period. + +**Recovery direction:** the device with the older epoch reconciles by replaying every commit it missed from the server's chain. A device whose local state is *ahead* of the server — it holds a commit whose hash is **absent from the server's authoritative chain** (a local state-mutation bug, or a commit the server never persisted) — declares itself unreconcilable, discards its local group state, and **re-bootstraps in full**. Partial re-bootstrap is deliberately not attempted: MLS group state is small, so a clean full re-fetch is simpler to reason about than splicing suspect local state, and is the only path taken. + +### Concurrent commits with the wrong ordering + +OpenMLS handles ordinary concurrent commits — one wins, the other re-proposes. But a *concurrent AMK rotation* where two admins both rotate at the same epoch needs care: the second commit must observe the first's new write-tier key in its proposal envelope, or the resulting epoch carries two write-tier keys. + +**Recovery direction:** MLS's commit ordering serializes the two rotations; the losing rotation is **automatically re-proposed** against the resulting epoch — no user confirmation. The replay is deterministic and idempotent (it re-runs against fresh state and converges on one write-tier key per epoch), so prompting an admin on every concurrent rotation would add friction without adding safety. + +### Group re-keying ceremony + +A scheduled or admin-triggered re-keying of the entire album group (every member's leaf rotates; fresh AMK; fresh write-tier key). This is more invasive than a single member add/remove and may be needed periodically for long-lived albums or after a suspected compromise. + +**Recovery direction:** re-keying runs as a quiesce → commit-chain → resume ceremony, modeled on the [album upgrade ceremony](/design/versioning/#album-upgrade-ceremony) and sharing its crash-resume machinery (an `intent_id`-keyed, idempotent, resumable sequence). Every member's client processes the leaf-update chain as one logical operation; until it completes the album stays on the prior epoch, so a partial run never leaves two live write-tier keys. **Triggers:** admin-initiated, automatic after a suspected compromise, and optional scheduled rotation for long-lived albums (deployment policy). **The [OGK](/design/cryptography/keys/#owner-group-keys-ogks) is the recovery anchor:** if a re-keying stalls partway, any current owner-set member recovers the album's AMK lineage from the OGK escrow and re-drives a fresh, clean epoch out-of-band — the ceremony can always be completed or restarted without data loss. + +## Recovery Posture + +Across the failure modes above, Capsule's recovery posture is consistent: + +- **Server chain is authoritative.** Any local state inconsistency is reconciled by replaying the server's chain. The server cannot *forge* MLS state (it holds no MLS group secrets) but it can *order* commits. +- **Re-bootstrap is always available.** A device whose MLS state is unrecoverable can be removed and re-added by another device (the standard "Add new device" flow from [Cryptography — MLS](/design/cryptography/mls/#add-new-device-for-existing-member)). This is the bottom-of-stack recovery — losing local MLS state never loses access to the data, just to the in-flight ratchet. +- **Quarantine, not silent acceptance.** A device that detects local-vs-server state divergence surfaces it to the user (not silently re-bootstraps), so a divergence caused by a bug is visible and investigable. + +## Contract Skeleton + +Reconciliation is a **single entry-point**, not per-failure-mode calls: the caller asks "bring me current" and the outcome enum reports what happened, including the two cases that escalate to user action or re-bootstrap. + +```rust +// in capsule-core::crypto::mls +enum ReconcileOutcome { + UpToDate, + Reconciled { applied_commits: Vec }, + Diverged { local_epoch: u64, server_epoch: u64 }, // requires user action + Unrecoverable, // requires re-bootstrap +} + +fn reconcile_with_server(group: GroupId) -> Result; +fn rekey_group(group: GroupId, reason: RekeyReason) -> Result<(), MlsError>; +``` + +## Validation + +- **Lost-commit recovery (smoke).** Inject a network failure during a commit; assert the sending device's retry succeeds; assert idempotency (no duplicate epoch). +- **State-divergence detection (unit).** Construct a local MLS state that disagrees with a mocked server chain; assert detection; assert reconciliation produces the server-authoritative state. +- **Concurrent rotation (smoke).** Two admins rotate the same epoch; assert serialization; assert one rotation replays against the other's result. +- **Re-keying atomicity (smoke).** Inject a crash mid-rekey; assert the ceremony resumes on restart (similar to the [album upgrade ceremony](/design/versioning/#album-upgrade-ceremony) idempotency). + +The relationship to [Threat Model](/design/threat-model/) is that several scenarios in the existing map (e.g. row #16 "attacker with all current keys") are upstream of this doc — MLS resilience is about recovering from honest failure, not adversarial attack. The two combine cleanly because both routes ultimately reduce to "re-bootstrap from a higher recovery path." diff --git a/capsule-docs/src/content/docs/design/moderation.md b/capsule-docs/src/content/docs/design/moderation.md new file mode 100644 index 0000000..8e884de --- /dev/null +++ b/capsule-docs/src/content/docs/design/moderation.md @@ -0,0 +1,76 @@ +--- +title: Moderation +description: Server moderation policy — reports, suspensions, takedowns, blocklists, federated reporting +--- + +Capsule is end-to-end encrypted, so a server **cannot** scan content it holds — server-side content or CSAM scanning is impossible by design, and no content scanner will be built. Moderation operates entirely on what *is* available: user reports, account-level signals, and federated peer reputation. + +Implementation will live in `capsule-api::moderation` (a new sub-crate or service inside `capsule-api`). The boundary surfaces — report submission, federated report exchange, blocklist publication — are the eventual contract; this doc captures what they will need to do. + +## What Moderation Cannot Do (Structural) + +Naming this up front is load-bearing: + +- **No content inspection.** The server holds opaque ciphertext. There is no algorithm that can act on the content of an asset without a key. +- **No retroactive content takedown.** Once a peer has fetched ciphertext, the home server cannot un-fetch it. Takedown is about *future* serving, not deletion-from-everywhere. +- **No silent operations.** Every moderation action that affects user data must produce a [provenance record](/design/cryptography/provenance/#provenance-of-library-modifications) the user can see in their audit log. + +## What Moderation Can Do (Operational Hooks) + +The actual policy surfaces that need design: + +### Federated Reporting + +A report against `alice@other.tld`'s asset is routed to her home server's administrators, since they are the only party that can act on her account. Three mechanics are fixed: + +- **Authentication.** A federated report MUST be signed by the reporting server's [signing key](/design/federation/#server-identity-and-key-rotation) and is verified before it reaches the admin queue; an unsigned or invalid-signature report is dropped, never surfaced. This makes every report attributable — a server that submits false reports is itself identifiable and blockable. +- **Rate-limiting.** Reports are bounded per `(reporting_server, reported_user)`; exceeding the limit applies backpressure rather than amplifying. Together with signing, this defeats the false-flag / mass-report abuse vector (a flood of forged or spoofed reports against one user). +- **Content.** A report carries the alleged asset's **content hash and album pointer — never plaintext or decryption material**. This is the privacy-preserving, operable middle: the home-server admin can locate the asset and, *if* they already hold album access, fetch and view it to act; an admin without album access sees only opaque identifiers, exactly as the E2EE model requires. A report never widens who can read content. + +### Blocklists + +Server-level blocklists, plus per-user blocks that federate: + +- **Server-level blocklist.** A server admin publishes a list of peer servers that this server refuses to accept federated requests from. Operates at the [federation capability](/design/federation/#federation-capabilities) layer. +- **Per-user block.** A user can block another user; the block is enforced by the blocker's home server — the blocked user is removed from albums shared with the blocker and cannot share new albums with them. Removal is an ordinary MLS `Remove` + AMK epoch bump applied at the blocked user's next sync; the prior epochs' keys they already hold are not retroactively clawed back (consistent with [removal semantics](/design/cryptography/mls/#remove-user-charlie)). A per-user block is **scoped to that user**: it does **not** propagate as a server-wide federation block, so one user (or a coordinated group) cannot weaponize blocks to sever an entire peer server from the federation. Each home server enforces only its own users' blocks. +- **Blocklist exchange (v2).** A peer-level mechanism for sharing *server-level* blocklists across federated servers (so a malicious server isn't pure whack-a-mole) is **deferred to v2**, but its shape is fixed now: signed, versioned blocklist documents an admin **opts into** consuming from peers they already trust — never auto-applied, and deliberately distinct from per-user blocks (which never propagate). v1 ships only the manual server-level blocklist above. + +### Untrusted-Server Whitelist + +[Federation — Security Against Malicious Files](/design/federation/#security-against-malicious-files) names this as the front-line abuse control for content from servers Capsule does not trust. Moderation policy decides what "trusted" means and how trust is established/revoked. + +### Account Suspension + +A server admin can suspend a user account on their home server. Suspended accounts: + +- Cannot upload — `POST /upload` session creation is refused with a structured `403 AccountSuspended` code (distinct from quota and permission rejections, so the client surfaces the right remediation). +- Cannot share new albums (existing shares remain valid for the share-link TTL; revocation lists can revoke them). +- Cannot revoke other devices' sessions (a suspended account's `revoke_all_sessions` is refused — defends against compromised-account-as-DoS). + +The user's *data* is untouched — suspension is an access-level action, not a data-level one. Reversibility (a suspension can be lifted) is the default; permanent termination is a separate policy. + +### Takedown + +When a moderation action requires the *home server* to stop serving a specific asset (e.g. legal request, CSAM report verified by admin viewing in their album): + +- The asset is marked unservable on the home server (`served = false` in the index). +- Federated peers fetching the asset receive `410 Gone`. +- The asset's underlying blob is **not** deleted — the user owns the data, and a takedown is a serving constraint, not a destruction; the user can still restore from their own backup. A takedown is therefore **reversible by default** (an admin can lift it). A **legal-hold** variant marks the asset indefinitely unservable where law requires it — lifted only when the legal obligation ends, not at admin discretion — but even then never destroys the user's bytes: the constraint is on the *home server's serving*, not on the data the user holds. +- The takedown emits a **server-visible moderation provenance record** the user sees in their audit log — what was taken down, when, and (where policy permits) why — honoring the "[No silent operations](#what-moderation-cannot-do-structural)" rule. A user whose asset stops serving is never left to guess why, and the moderation action is itself auditable after the fact. + +## Federation Boundary + +Moderation crosses the federation boundary cleanly because [federation](/design/federation/) is pull-only and capability-gated. A blocked peer cannot pull; a takedown asset returns `410` to every pull. The moderation policy decisions don't require new federation primitives — they reuse the capability and revocation surfaces already there. + +## Appeals + +A suspended or taken-down user can appeal. The appeal is authenticated by **master-key proof** (the same mechanism as [global session revoke](/design/authentication/#explicit-revocation)) rather than a session token — the session may be the thing under dispute — and lands in the home-server admin queue. The admin's decision is itself a [moderation provenance record](/design/cryptography/provenance/#provenance-of-library-modifications) the user can see. Because suspension and takedown are reversible by default, a granted appeal simply lifts the constraint. + +## Validation + +- Federated report transport (smoke): send report from server A to server B; assert it reaches the admin queue with structured metadata. +- Blocklist enforcement (smoke): blacklist a peer; assert federation pulls from that peer are refused. +- Suspension enforcement (unit): a suspended account's upload session creation is rejected with the right structural code. +- Takedown serving (smoke): take down an asset; assert subsequent fetches return `410`; assert the underlying blob is preserved; assert a moderation provenance record is appended and visible in the user's audit log. +- Federated-report authentication (unit): submit a report signed by a valid peer key; assert it reaches the admin queue. Submit an unsigned / invalid-signature report; assert it is dropped. Exceed the per-`(reporting_server, reported_user)` rate limit; assert backpressure. +- Block scoping (unit): a per-user block removes the blocked user from the blocker's shared albums; assert it does **not** appear as a server-level federation block against the blocked user's home server. diff --git a/capsule-docs/src/content/docs/design/module-map.md b/capsule-docs/src/content/docs/design/module-map.md new file mode 100644 index 0000000..f91b65e --- /dev/null +++ b/capsule-docs/src/content/docs/design/module-map.md @@ -0,0 +1,157 @@ +--- +title: Module Map +description: Index of every code module to its owning design doc and validation tier +--- + +This is the developer's first stop. It maps every Capsule workspace crate and module to the design doc(s) that govern its behavior, and to the validation tier (Unit / Smoke / E2E — see [Validation Tiers](/design/principles/#validation-tiers)) it ships with. The E2E test surface at the bottom is **bounded**: adding a test there means adding the test to the relevant doc's Validation section and justifying why the cross-module surface is irreducible. + +The mapping reflects the *design intent*. Some modules listed below are currently planned (annotated `(planned)`) rather than already implemented in the codebase — the doc structure already accounts for them so the boundary is set before code lands. + +## Crate Roster + +| Crate | Purpose | +| ------------------------- | ----------------------------------------------------------------------------------------------------------------- | +| `capsule-core` | Shared logic across server and clients: cryptography, library layout, import pipeline, metadata, ML orchestration | +| `capsule-sdk` | Client SDK: auto-generated OpenAPI client, upload protocol, per-platform hardware-key + peering glue | +| `capsule-api` | Server entry-point + routing | +| `capsule-api-auth` | Authentication, sessions, OIDC, device directory | +| `capsule-api-library` | GraphQL API for UI queries (assets, albums, search) | +| `capsule-api-upload` | TUS-like resumable upload protocol server | +| `capsule-api-media` | Media serving (ciphertext blobs, public shares) | +| `capsule-api-sync` | gRPC sync API + federation | +| `capsule-api-service` | Higher-level service layer over the entity model (album, asset, friendship, passkey, stack, user, quota) | +| `capsule-api-entity` | Sea-ORM entities (Postgres schema) | +| `capsule-api-model` | Business-logic models on top of entities | +| `capsule-api-migration` | Sea-ORM migrations | +| `capsule-api-environment` | Configuration, env vars, feature flags | +| `capsule-api-testing` | Shared test utilities (testcontainer setup, schema fixtures) | +| `capsule-cli` | Command-line client | +| `capsule-media` | Standalone media utility crate | + +## Module → Design Doc + +### `capsule-core` + +| Module | Owning design doc | Validation tier | +| ----------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | --------------------------------------------- | +| `crypto::primitives` (planned) | [Cryptography — Primitives](/design/cryptography/primitives/) | Unit (RFC vectors) | +| `crypto::keys` (planned) | [Cryptography — Keys](/design/cryptography/keys/), [Device Enrollment](/design/device-enrollment/) | Unit + Smoke (hardware per-platform) | +| `crypto::mls` (planned) | [Cryptography — MLS](/design/cryptography/mls/), [MLS Resilience](/design/mls-resilience/) | Unit + Smoke (protocol round-trip) | +| `crypto::encryption` (planned) | [Cryptography — Encryption](/design/cryptography/encryption/) | Unit (KAT, round-trip) | +| `crypto::provenance` (planned) | [Cryptography — Provenance](/design/cryptography/provenance/) | Unit (exhaustive negative cases) + Smoke | +| `crypto::verify_asset` (planned) | [Cryptography — Write Authorization](/design/cryptography/keys/#write-authorization) | Unit (the single chokepoint; exhaustive) | +| `backup` (planned) | [Backup and Recovery](/design/backup-recovery/) | Unit + Smoke | +| `library::{init,open,rebuild,lock,paths,scrub,trash}` | [Filesystem — Client](/design/filesystem/client/), [Filesystem — Maintenance](/design/filesystem/maintenance/) | Unit + Smoke | +| `import::{scanner,planner,executor,plan,upload,group,progress,special}` | [Import — Pipeline](/design/import/pipeline/) | Unit (planner determinism) + Smoke (executor) | +| `metadata::{file,filter,types}` | [Metadata](/design/metadata/) | Unit (filtering) | +| `sidecar::*` | [Metadata — Sidecar Schema](/design/metadata/#sidecar-schema-v1) | Unit (serde determinism) | +| `exif::{extract,timezone}` | [Metadata](/design/metadata/) | Unit | +| `db::{driver,schema,rows}` | [Filesystem — Client](/design/filesystem/client/) | Unit (SQLite ops) | +| `domain::*` (enums) | [Organization](/design/organization/), [Authorization](/design/authorization/), [Metadata](/design/metadata/) | Unit (closed-enum rejection) | +| `models::*` | [Metadata](/design/metadata/), [Import — Pipeline](/design/import/pipeline/) | Unit | +| `ml` (planned) | [AI/ML Integrations](/design/ai/) | Unit + Smoke (inference parity per-platform) | +| `sharing` (planned) | [Share Links](/design/share-links/) | Unit | + +### `capsule-sdk` + +| Module | Owning design doc | Validation tier | +| ------------------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------- | +| (auto-generated client) | [Clients](/design/clients/) | Smoke (re-generated; not unit-tested) | +| `upload` | [Import — Upload Protocol](/design/import/upload-protocol/) | Unit + Smoke (client side) | +| `peering` (planned) | [Peering](/design/peering/) | Unit + Smoke per platform | +| `hardware-keys` (planned) | [Cryptography — Keys](/design/cryptography/keys/), [Device Enrollment](/design/device-enrollment/) | Smoke per platform | + +### `capsule-api` (root + sub-crates) + +| Module | Owning design doc | Validation tier | +| ---------------------------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------- | +| `capsule-api` (routing) | [Filesystem — Server](/design/filesystem/server/) | Smoke | +| `capsule-api-auth::{oidc,session,claims,roles}` | [Authentication](/design/authentication/), [Authorization](/design/authorization/) | Unit + Smoke (testcontainer Postgres/Redis) | +| `capsule-api-auth::devices` (planned for enrollment) | [Device Enrollment](/design/device-enrollment/) | Smoke | +| `capsule-api-library::schema::*` | [Metadata](/design/metadata/), [Organization](/design/organization/) | Smoke (GraphQL) | +| `capsule-api-library::loaders` | [Filesystem — Server](/design/filesystem/server/) | Unit (DataLoader) | +| `capsule-api-upload` | [Import — Upload Protocol](/design/import/upload-protocol/) | Unit + Smoke + 1 E2E | +| `capsule-api-media::routes` | [Filesystem — Server](/design/filesystem/server/), [Thumbnails](/design/thumbnails/) | Smoke | +| `capsule-api-media::shares` (planned) | [Share Links](/design/share-links/) | Unit + Smoke | +| `capsule-api-sync` (sync feed) | [Import — Download & Sync](/design/import/download-sync/) | Unit + Smoke + 1 E2E | +| `capsule-api-sync::federation` | [Federation](/design/federation/) | Unit + Smoke + 1 E2E | +| `capsule-api-service::album` | [Organization](/design/organization/) | Unit | +| `capsule-api-service::asset` | [Authorization](/design/authorization/), [Organization](/design/organization/) | Unit + Smoke | +| `capsule-api-service::quota` (planned) | [Quota](/design/quota/) | Unit | +| `capsule-api::moderation` (planned) | [Moderation](/design/moderation/) | Smoke | +| `capsule-api-entity::*` (Sea-ORM) | [Filesystem — Server](/design/filesystem/server/) | Unit (Sea-ORM CRUD) | +| `capsule-api-migration` | [Versioning](/design/versioning/) (forward-only migrations) | Smoke (migration run) | +| `capsule-api-environment` | (configuration; no design owner) | Unit | +| `capsule-api-testing` | (test utilities; no design owner) | n/a | + +### `capsule-cli`, `capsule-media` + +| Crate | Owning design doc | Validation tier | +| --------------- | ---------------------------------------------------- | --------------- | +| `capsule-cli` | [Clients](/design/clients/) (treats CLI as a client) | Smoke | +| `capsule-media` | (small utility crate; no specific design owner) | Unit | + +## Design Doc → Module (Reverse Lookup) + +Navigation from a design doc back to where the code lives. + +| Design doc | Implementing modules | +| ------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | +| [Principles](/design/principles/) | (meta — no specific code module) | +| [Cryptography — Primitives](/design/cryptography/primitives/) | `capsule-core::crypto::primitives` (planned) | +| [Cryptography — Keys](/design/cryptography/keys/) | `capsule-core::crypto::keys`, `capsule-sdk::hardware-keys` (both planned) | +| [Cryptography — MLS](/design/cryptography/mls/) | `capsule-core::crypto::mls` (planned, wraps OpenMLS) | +| [Cryptography — Encryption](/design/cryptography/encryption/) | `capsule-core::crypto::encryption` (planned) | +| [Cryptography — Provenance](/design/cryptography/provenance/) | `capsule-core::crypto::provenance` + `verify_asset` chokepoint (planned) | +| [Cryptography — Failure Modes](/design/cryptography/failure-modes/) | Cross-cutting: `capsule-core::backup`, `capsule-core::library`, `capsule-core::crypto::*` | +| [MLS Resilience](/design/mls-resilience/) | `capsule-core::crypto::mls` (extends main MLS module) | +| [Device Enrollment](/design/device-enrollment/) | `capsule-core::crypto::keys`, `capsule-api-auth::devices` | +| [Authentication](/design/authentication/) | `capsule-api-auth::{oidc,session,claims}` | +| [Authorization](/design/authorization/) | `capsule-api-auth::roles`, `capsule-core::crypto::provenance` (verify_asset) | +| [Clients](/design/clients/) | `capsule-sdk` + per-platform native code | +| [Versioning](/design/versioning/) | Cross-cutting: `capsule-api` (header enforcement), `capsule-core::crypto::mls` (upgrade ceremony), `capsule-api-migration` | +| [Backup and Recovery](/design/backup-recovery/) | `capsule-core::backup` (planned), `capsule-api-auth` (escrow surface) | +| [Metadata](/design/metadata/) | `capsule-core::{metadata,sidecar,exif}`, `capsule-api-library::schema` | +| [Filesystem — Server](/design/filesystem/server/) | `capsule-api`, `capsule-api-entity`, blob store glue | +| [Filesystem — Client](/design/filesystem/client/) | `capsule-core::{library,db}`, per-platform native code | +| [Filesystem — Maintenance](/design/filesystem/maintenance/) | `capsule-core::library::{scrub,rebuild,trash}`, server-side scrub in `capsule-api-upload` | +| [Import — Pipeline](/design/import/pipeline/) | `capsule-core::import::*` | +| [Import — Upload Protocol](/design/import/upload-protocol/) | `capsule-sdk::upload` (client) + `capsule-api-upload` (server) | +| [Import — Download & Sync](/design/import/download-sync/) | `capsule-sdk` (client) + `capsule-api-sync` (server) | +| [Federation](/design/federation/) | `capsule-api-sync::federation` | +| [Peering](/design/peering/) | `capsule-sdk::peering` (planned) + `capsule-core::backup` (artifact format) | +| [Organization](/design/organization/) | `capsule-core::domain::stack_type`, `capsule-api-service::{album,stack}` | +| [AI/ML Integrations](/design/ai/) | `capsule-core::ml` (planned), model registry + per-platform inference runners | +| [Thumbnails](/design/thumbnails/) | Client-side gen in `capsule-sdk` + serving in `capsule-api-media` | +| [Share Links](/design/share-links/) | `capsule-core::sharing` (planned), `capsule-api-media::shares` (planned) | +| [Moderation](/design/moderation/) | `capsule-api::moderation` (planned) | +| [Quota](/design/quota/) | `capsule-api-service::quota` (planned) | +| [Threat Model](/design/threat-model/) | Enforced across every validation chokepoint: `capsule-core::crypto::verify_asset` (client), `capsule-api` validators (server) | +| [Threat Model — Scenarios](/design/threat-model/scenarios/) | (catalog; each row maps to the owner doc's module) | +| [Threat Model — Validation](/design/threat-model/validation/) | `capsule-api` envelope checks (server-side), `capsule-core::crypto::verify_asset` (client-side) | +| [Threat Model — Schema Rules](/design/threat-model/schema-rules/) | `capsule-core::crypto` decoders + `capsule-api` validators (closed-enum + Postel asymmetry) | + +## E2E Test Surface + +The bounded global list of cross-module integration tests. Editing this list requires updating the relevant doc's Validation section. **Adding an E2E case past this list is a signal the design has unwanted coupling worth examining** before adding the test. + +Target count: ≤ 12 cases. Each one is named by what it proves — not "test X" but "X works through Y and Z." + +1. **Auth → Library query.** Log in via OIDC → access-token → GraphQL query for own albums returns expected list. Covers `capsule-api-auth::oidc + session` × `capsule-api-library::schema`. +2. **Full import + upload + finalize.** Local scan → plan → execute → upload session → finalize → blob present at `blobs/{hash}` + index row marked uploaded. Covers `capsule-core::import` × `capsule-sdk::upload` × `capsule-api-upload` × `capsule-api-entity`. +3. **Sync feed pickup.** Upload from device A → device B's `/sync` advances → device B fetches metadata blob and (per scope) the original. Covers `capsule-api-sync` × `capsule-sdk` download path × `capsule-core::library` write. +4. **Federation cross-server pull.** Alice on `home.tld` shares to Bob on `other.tld` → capability token → Bob's server pulls metadata + blobs → Bob's client renders. Covers `capsule-api-sync::federation` (both sides) × `capsule-api-auth` (capability issue). +5. **LAN peering A→B.** Two devices on the same LAN; mDNS discovery → TLS handshake → delta-scoped artifact → restore on receiver → byte-equal libraries. Covers `capsule-sdk::peering` × `capsule-core::backup` × `capsule-core::library`. +6. **Backup → restore on a fresh device.** Export full backup → bootstrap new device via passphrase + escrow → import backup → assert every asset present and verifiable. Covers `capsule-core::backup` × `capsule-core::crypto::keys` × `capsule-core::library`. +7. **Full lifecycle.** Create → metadata-update → trash → restore → re-delete → hard-purge after retention. Provenance chain advances through every transition; server refuses purge before `retention_until`. Covers `capsule-api-auth::roles` × `capsule-core::crypto::provenance` × server purge worker. +8. **Album upgrade ceremony.** Multi-member album; admin initiates upgrade → quiesce → drain → tombstone → fork → queued writes replay. Includes one resume-from-crash mid-ceremony. Covers `capsule-core::crypto::mls` × `capsule-api` × client UI. +9. **Cross-version protocol gate.** Client with `protocol_version` outside server's range attempts upload; receives `426`; UI surfaces actionable error. Covers `capsule-api` handshake × `capsule-sdk` error handling. +10. **Model regen after version bump.** Bump canonical model version; assert stale embeddings excluded from queries; background regen produces fresh embeddings; queries return correct results post-regen. Covers `capsule-core::ml` × `capsule-core::db` vector index. +11. **Server crash mid-finalization.** Inject crash between blob rename and Postgres transaction commit; restart; assert session moves to `FailedProcessing` cleanly, no orphaned blob, no zombie pending row. Covers `capsule-api-upload` × `capsule-api-entity` × `capsule-api`'s startup scrub. +12. **Cross-device enrollment.** Existing device A authorizes new device B over a verified channel (enrollment code + safety-code check) → B generates hardware keys → A cross-signs B into the device directory → B joins each album's MLS group → B's library matches A's. Includes one MITM-on-relay abort. Covers `capsule-api-auth::devices` × `capsule-core::crypto::keys` × `capsule-sdk::hardware-keys`. + +## Using This Map + +- **When implementing a module:** find it in [Module → Design Doc](#module--design-doc), open the owning doc, read the contracts and the validation tier expectations. The unit + smoke surface defined in that doc should be authorable without leaving the module. +- **When adding a feature:** find the relevant design doc via the [reverse lookup](#design-doc--module-reverse-lookup); confirm the feature fits within an existing module's scope or warrants a new one. If new, add a row here. +- **When considering an E2E test:** check this list first. If your proposed test isn't here, either it's an existing case in disguise (use that), or the design has cross-module coupling worth surfacing — discuss before adding. diff --git a/capsule-docs/src/content/docs/design/organization.md b/capsule-docs/src/content/docs/design/organization.md index 160bb15..edace36 100644 --- a/capsule-docs/src/content/docs/design/organization.md +++ b/capsule-docs/src/content/docs/design/organization.md @@ -1,56 +1,113 @@ --- title: Asset Organization -description: Details on how assets are organized and grouped in Capsule +description: Albums (container and view), default-album resolution, asset stacks, and trash retention --- -## Keywords +**Albums** are Capsule's organizational backbone: [container albums](#container-albums) are the cryptographic unit every asset belongs to, while [view albums](#system--smart-albums-views) are derived, key-free presentations. On top of albums, **stacks** group related files (RAW+JPEG pairs, bursts, live photos) so a library stays tidy, and **trash** stages every destructive operation behind a signed retention window so a buggy or hostile actor cannot silently destroy data. Stacks and trash are metadata-only — they never touch the underlying asset bytes. -- [Albums and Collections](#albums-and-collections): Organize your media into albums and collections for easy browsing and sharing. -- [Asset Stacking](#asset-stacking): Group related files (e.g., RAW+JPEG pairs, burst photos, video chapters) into a single "stack" to keep your library organized. +Implemented across `capsule-core::domain::stack_type` (stack-type enums), `capsule-core::library` (default-album resolution and client-side view evaluation), the metadata sidecar layer for `stack_membership` (see [Metadata](/design/metadata/)), the signed `delete`-manifest envelope for `retention_until`, and the service layer in `capsule-api-service::album`/`stack` for server-side enforcement. The retention contract — the `retention_until` field signed into the `delete` manifest — is the load-bearing piece that prevents a hostile server from accelerating purges. -## Albums and Collections +## Albums + +The UI calls two different things "albums," and the design keeps them strictly separate: + +- **[Container albums](#container-albums)** — the real cryptographic unit. Every asset belongs to exactly one. +- **[View albums](#system--smart-albums-views)** — derived, key-free presentations computed client-side. They hold no keys and own no assets. + +### Container Albums + +A container album is Capsule's primary organizational unit and its primary **sharing and access-control boundary**. An album *is* an MLS group: its cryptographic identity (the per-epoch [AMK](/design/cryptography/keys/#album-master-keys-amks)) and membership operations are owned by [Cryptography — Keys](/design/cryptography/keys/) and [MLS](/design/cryptography/mls/), and its server-side storage shape (rows, blob references, `protocol_version` pin) lives in the [Filesystem — Server](/design/filesystem/server/) Postgres schema. This section owns the *interaction surface* over that machinery. + +- **Membership and roles.** Each member holds one of the album's three capabilities — read (AMK only), write (AMK + write-tier key), or admin (also the admin-tier key) — delivered over MLS to that member's devices ([Keys — Album Master Keys](/design/cryptography/keys/#album-master-keys-amks)). A role change is an MLS commit and bumps the AMK epoch. +- **Invitation and join.** An admin invites a user by fetching and verifying their [device directory](/design/cryptography/keys/#device-directory) and issuing an MLS `Add` for all their devices; the `Welcome` delivers the AMK range set by the album's `history_policy` ([MLS — History Delivery](/design/cryptography/mls/#history-delivery-for-new-joiners)). Inviting a user on another home server also issues a [federation capability](/design/federation/#federation-capabilities); inviting a non-account recipient uses a [share link](/design/share-links/). Joining is acceptance of the `Welcome`; leaving or removal is an MLS `Remove` + epoch bump. +- **Album-level policy** — `history_policy`, the `protocol_version` pin, and the default `retention_until` — is fixed at creation and changed only through an [album upgrade ceremony](/design/versioning/#album-upgrade-ceremony), never ad hoc. + +Dialog copy and on-screen presentation remain a client-UX detail. + +### The Default Album + +A container album must be explicitly created, but a brand-new account has none — so an import would have nowhere to land. Capsule guarantees a **default album**: a de facto, nameless container that exists for every owner from [first-device enrollment](/design/device-enrollment/#first-device-enrollment) onward and receives any import the user does not file elsewhere. + +- **De facto and nameless.** It is an ordinary container album in every cryptographic and lifecycle respect — its own MLS group, random per-epoch AMK, `history_policy`, `protocol_version` pin, retention — but carries no user-assigned name; a client typically surfaces it as the library's primary view. +- **Specially identified.** Its album ID is **derived deterministically from the account master key** (the master key derives the *identifier*, not any key — see [Keys — Key Chain](/design/cryptography/keys/#key-chain)). The ID is therefore unique per user, unguessable before creation, and recomputable on any of the user's devices and after recovery — so a device can locate the default album from the master key alone, without waiting on a synced pointer. +- **Designation is a server-side owner pointer.** Which container is *currently* the default is a non-secret `default_album_id` on the owner record ([Filesystem — Server](/design/filesystem/server/#ownership-partitioning-and-quota)), defaulting to the derived de facto album. The pointer is not security-bearing — a write still requires real album write capability ([server-side invariants](/design/threat-model/validation/#server-side-validation-invariants), invariant 6). +- **One or more defaults, context-driven.** A client may register **scope overrides** — `(scope → album)` mappings that re-point the default for a context (a per-source auto-import mapping; "while viewing album X, new photos default to X"). The resolution rule, `resolve_default_album(context)`, returns the active scope's override if set, else the owner pointer, else the derived de facto album. It **always** resolves to a container — a [view](#system--smart-albums-views) can never be an import destination. The [import planner](/design/import/pipeline/#plan--confirm) consumes this when the user picks no album. +- **Stable.** Re-designating the default just moves the pointer. The current default **cannot be deleted while designated** — the user must repoint first, or the client recreates the derived de facto album — so import always has a home. + +### System & Smart Albums (Views) + +View albums are organizational surfaces computed entirely client-side over the assets the user can already decrypt (the union of their container-album memberships), materialized by querying the [local index](/design/filesystem/client/#local-index-staleness). A view is **not** an MLS group, holds **no** AMK, **owns no assets**, and is **not** a sharing or access-control boundary — sharing happens only at the container tier. Two kinds: + +- **System albums** — built-in and implicit. The canonical one is **All** — every asset the user can see; because that is the union over their containers, every asset appears in it (which is exactly why the [default album](#the-default-album) matters: an import always enters *some* container and so shows up in All). [Trash](#recycling) is another system view, over lifecycle state. +- **Smart / dynamic albums** — user-defined filtered views whose membership is a predicate over sidecar fields and AI-derived attributes ([Metadata](/design/metadata/#sidecar-schema-v1), [AI](/design/ai/)). Membership is **computed**, never stored: editing a smart album, or an asset's attributes, never moves or re-encrypts an asset. A definition (predicate + display name) is user content — stored in a client-side, E2E-encrypted document synced across the user's devices with the same [CRDT semantics](/design/metadata/#collaborative-metadata) as other collaborative metadata, so the server never learns it. ## Asset Stacking -In large media collections, it’s common for related files to belong together. Instead of cluttering your library with dozens of nearly identical files, Capsule "stacks" them into a single unit. +Related files often belong together — RAW+JPEG pairs, bursts, a video and its external audio track. Rather than clutter the library with near-identical entries, Capsule groups them into one stack via best-effort auto-detection. + +**Stacking is metadata-only.** A stack edit modifies the `stack_membership` field of each member asset's sidecar and emits a `metadata-update` provenance record per affected asset. It **never** deletes, rewrites, or merges the underlying asset bytes — even a "best photo" choice within a burst is just the `role = primary` pointer in metadata, not a destructive operation. A buggy or malicious stack edit therefore cannot lose original bytes. The full atomicity rule (stage all `.tmp` files, rename together, discard on any rename failure) lives in [Filesystem — Atomic Writes](/design/filesystem/maintenance/#atomic-writes-and-crash-recovery) and [Threat Model — Atomicity Invariants](/design/threat-model/validation/#atomicity-invariants). + +### Stack Membership Schema + +The `stack_membership` field on each member sidecar carries: -You’ve likely seen this in action before—think of how photo apps group RAW+JPG pairs or how video editors sync external audio with camera footage. Capsule uses a "best-effort" auto-detection system to identify these relationships and keep your workspace clean. +```rust +StackMembership { + stack_id: UUIDv7, + stack_type: StackType, // closed enum, below + role: StackRole, // primary | member | proxy + member_index: Option, // ordering within the stack (burst sequence, video chapter index) +} +``` -**Stacking is metadata-only.** A stack edit modifies the `stack_membership` field of each member asset's sidecar and emits a `metadata-update` provenance record per affected asset. It **never** deletes, rewrites, or merges the underlying asset bytes — even a "best photo" choice within a burst is a pointer in metadata, not a destructive operation. A buggy or malicious stack edit therefore cannot lose original bytes. The full atomicity rule (stage all `.tmp` files, rename together, discard on any rename failure) lives in [Filesystem — Atomic Writes and Crash Recovery](/design/filesystem/#atomic-writes-and-crash-recovery) and [Threat Model — Atomicity Invariants](/design/threat-model/#atomicity-invariants). +`stack_type` is a closed enum per `protocol_version` — adding a new stack type bumps the version. Old albums never see the new type. -### Photography & Mobile Stacks +### Stack Types -* **RAW + JPEG Pairs:** The classic "prosumer" stack. We treat the uncompressed RAW file and the processed JPEG as one asset to keep your grid tidy. +**Photography & Mobile Stacks** + +* **RAW + JPEG Pairs:** The classic "prosumer" stack. The uncompressed RAW and the processed JPEG are treated as one asset to keep the grid tidy. * **Burst Stacks:** A sequence of high-speed stills (e.g., 10–30 fps). The app identifies a "Best Photo" and tucks the rest behind it. * **Live Photos:** A JPEG or HEIC paired with a 1.5–3 second video clip, managed as a single interactive unit. -* **Portrait/Depth Stacks:** An image paired with its depth map. This allows you to adjust the bokeh (background blur) after the shot is taken. -* **Smart Selection:** AI-driven grouping of visually similar images taken within seconds of each other to reduce "clutter." +* **Portrait/Depth Stacks:** An image paired with its depth map. Enables adjusting bokeh after the shot is taken. +* **Smart Selection:** AI-driven grouping of visually similar images taken within seconds of each other. -### Technical & Creative Stacks +**Technical & Creative Stacks** -* **Exposure Bracketing (HDR):** Multiple shots of the same scene at different exposure levels (e.g., -2, 0, +2 EV) to be merged into a single High Dynamic Range image. +* **Exposure Bracketing (HDR):** Multiple shots of the same scene at different exposure levels (e.g., -2, 0, +2 EV) to be merged into a single HDR image. * **Focus Stacks:** A series of shots with shifting focus points. Often used in macro photography to create "infinite" depth of field. -* **Pixel Shift Stacks:** Found in high-end mirrorless cameras. The sensor moves slightly to capture multiple shots, which are stacked for ultra-high resolution and perfect color. +* **Pixel Shift Stacks:** Found in high-end mirrorless cameras. The sensor moves slightly to capture multiple shots, stacked for ultra-high resolution and perfect color. * **Panorama (Stitched):** A sequence of horizontal or vertical shots intended to be merged into a single wide-field image. -### Video & Audio Stacks +**Video & Audio Stacks** * **Proxy/Optimized Stacks:** Pairs a heavy "Master" file (like 8K RAW) with a lightweight "Proxy" (like 1080p ProRes) for smoother editing performance. -* **Chaptered Video:** Action cameras (like GoPro) often split long recordings into 4GB chunks. We stack files like `GOPR001.mp4` and `GOPR002.mp4` so they appear as one continuous video. +* **Chaptered Video:** Action cameras (like GoPro) often split long recordings into 4GB chunks. Files like `GOPR001.mp4` and `GOPR002.mp4` are stacked so they appear as one continuous video. * **Dual-System Audio:** Groups video files with high-quality external audio (WAV/AIFF) using timecode or waveform matching. ## Recycling When you delete an asset, it defaults to trash (i.e. soft delete). On sync, new items in trash are essentially a metadata update rather than removal. A true "delete" operation is only performed when the user explicitly empties the trash, the asset has been in the trash for its full retention period, or the user requests immediate deletion. -For consistency, deletion of assets is functionally similar to addition and modification of assets. See [Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) and [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set). +For consistency, deletion of assets is functionally similar to addition and modification of assets. See [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications) and [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set). ### Retention Window -The trash retention window is **signed into the `delete` manifest at delete time** as the `retention_until` field — not server-configured at purge time. The default is 30 days; the user can extend it per delete or per album-policy. Because the retention is part of the signed manifest: +The trash retention window is **signed into the `delete` manifest at delete time** as the `retention_until` field — not server-configured at purge time. It lives in the manifest's **server-visible envelope** (like `action` and `prior_provenance_hash`), so the keyless purge worker reads and enforces it without any decryption key, comparing it against the server's own [trusted clock](/design/filesystem/server/#postgresql-what-the-server-knows). The default is 30 days; the user can extend it per delete or per album policy. Because retention is part of the signed manifest: - The server **cannot accelerate** a purge by changing a server-side config — the cryptographic floor on retention is the signed manifest's `retention_until`. A hard purge before that timestamp is rejected (the server's purge worker reads `retention_until` from the manifest, not from a local policy). - The server **cannot delay** a purge beyond an order issued by a `trash-restore` or a signed shorter-retention re-issue — the user remains in control. - A `trash-restore` action issued before `retention_until` recovers the asset, appends a new provenance record, and rewinds the local lifecycle state. The original delete manifest is **not removed** from the provenance chain — it remains as a record of "this was deleted on date X and restored on date Y." This addresses the damage scenario where a hostile server unilaterally accelerates a purge to delete an asset the user expected to be recoverable, as well as the scenario where a buggy server retains data past the user's chosen window. + +## Validation + +- **Stack edit metadata-only (unit).** Build a stack edit; assert no asset bytes are touched on disk; only sidecars and provenance records are modified. +- **Stack edit atomicity (unit).** Inject a rename failure mid-bundle; assert all staged `.tmp` files are discarded and on-disk state reflects no partial stack. +- **Closed stack-type enum rejection (unit).** Set `stack_type = "future-stack-type"`; assert structural rejection at the sidecar validator. +- **Retention-window honor (smoke).** Issue a `delete` with `retention_until = now + 30d`. Mock the server clock to `now + 15d`; assert purge worker refuses. Move to `now + 31d`; assert purge proceeds. +- **Trash-restore round-trip (smoke).** Delete → restore → assert asset reappears in live set, provenance chain has delete + restore records, original delete record is preserved. +- **Hostile-server purge defense (smoke).** Mock a server that attempts purge before `retention_until`; assert the purge worker (running the no-key envelope check) refuses. + +The cross-module case — full lifecycle including stack creation, member edit, soft delete, restore, and final hard purge — is one bounded E2E case in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/peering.md b/capsule-docs/src/content/docs/design/peering.md index 649f620..d7e3951 100644 --- a/capsule-docs/src/content/docs/design/peering.md +++ b/capsule-docs/src/content/docs/design/peering.md @@ -1,164 +1,88 @@ --- title: Peering -description: How Capsule implements peering for direct device-to-device communication and synchronization +description: Direct LAN device-to-device sync within a single user's own devices --- -Peering is **device-to-device** sync within a single user's own devices. It is -distinct from [Federation](/design/federation/), which is server-to-server -sharing across *different* users. +Peering is **device-to-device** sync within a single user's own devices. It is distinct from [Federation](/design/federation/), which is server-to-server sharing across *different* users. -Peering exists as an **accelerator, never a replacement** for normal -[server synchronization](/design/import-synchronization/#synchronization). It -earns its place in three situations: +Peering exists as an **accelerator, never a replacement** for normal [server synchronization](/design/import/). It earns its place in three situations: -- **LAN-speed transfer.** Two of a user's devices on the same network can move - a freshly imported asset directly, instead of round-tripping every byte - through the server and the internet. -- **Offline operation.** When the server is unreachable, devices on a shared - LAN still converge. This satisfies the - [offline/online divide](/design/principles/) — peering works fully offline. -- **Best-effort opportunism.** If no peer is found, peering simply does nothing - and the device falls back to server sync. Nothing depends on it succeeding. +- **LAN-speed transfer.** Two of a user's devices on the same network can move a freshly imported asset directly, instead of round-tripping every byte through the server and the internet. +- **Offline operation.** When the server is unreachable, devices on a shared LAN still converge. This satisfies the [offline/online divide](/design/principles/) — peering works fully offline. +- **Best-effort opportunism.** If no peer is found, peering simply does nothing and the device falls back to server sync. Nothing depends on it succeeding. + +Peering is the one module here that lives entirely on the client. It is implemented in `capsule-sdk::peering` (discovery, channel, transfer) over `capsule-core::backup` (the artifact format it ingests). The three contract surfaces — mDNS descriptor, TLS handshake parameters, delta-fetch protocol — are the only new primitives peering introduces; everything else is borrowed. ## Peering Reuses, Not Reinvents -Peering deliberately introduces **no new payload format and no new sync -engine** — the same discipline [Federation](/design/federation/#federation-reuses-existing-primitives) -applies. The unit of transfer is a delta-scoped -[backup artifact](/design/backup-recovery/#backup-artifact): a self-describing, -versioned, encrypted, content-addressed blob that already exists for -[Backup and Recovery](/design/backup-recovery/). +Peering deliberately introduces **no new payload format and no new sync engine** — the same discipline [Federation](/design/federation/#federation-reuses-existing-primitives) applies. The unit of transfer is a delta-scoped [backup artifact](/design/backup-recovery/#backup-artifact): a self-describing, versioned, encrypted, content-addressed blob that already exists for [Backup and Recovery](/design/backup-recovery/). -The receiving device ingests that artifact through the **same restore path** it -would use for any backup. Peering therefore owns only two things of its own — a -LAN **discovery** mechanism and a **transport**. Everything else (what an asset -is, how it is encrypted, how it is verified, what "changed" means) is borrowed -from designs that already exist and are already audited. Fewer moving parts -means a smaller blast radius and far less code unique to peering. +The receiving device ingests that artifact through the **same restore path** it would use for any backup. Peering therefore owns only two things of its own — a LAN **discovery** mechanism and a **transport**. Everything else (what an asset is, how it is encrypted, how it is verified, what "changed" means) is borrowed from designs that already exist and are already audited. Fewer moving parts means a smaller blast radius and far less code unique to peering. ## Trust Model -Federation assumes [a remote server is hostile](/design/federation/#threat-model). -Peering does not: both endpoints are the *same user's* devices, each holding a -hardware-bound DSK cross-signed into that user's -[device directory](/design/cryptography/#per-user-device-coordination). A peer -is accepted only after a mutual hybrid-signature check confirms both devices -chain to the same User IK. - -Identity-trusted is **not** content-trusted, however. A device can still be -buggy, or compromised at the application layer above its hardware keys. So -peering keeps Federation's posture toward *data*: every received asset is -re-verified — its [ciphertext content hash](/design/cryptography/#primitives-inventory) -recomputed, its [STREAM tags](/design/cryptography/#stream-construction) checked, -its [asset manifest](/design/cryptography/#provenance-and-signed-manifest) -run through the single [`verify_asset`](/design/cryptography/#write-authorization) -chokepoint. The channel authenticates *who* you are talking to; it never -exempts *what* they send from validation. +Federation assumes [a remote server is hostile](/design/federation/#threat-model). Peering does not: both endpoints are the *same user's* devices, each holding a hardware-bound DSK cross-signed into that user's [device directory](/design/cryptography/keys/#device-directory). A peer is accepted only after a mutual hybrid-signature check confirms both devices chain to the same User IK. + +Identity-trusted is **not** content-trusted, however. A device can still be buggy, or compromised at the application layer above its hardware keys. So peering keeps Federation's posture toward *data*: every received asset is re-verified — its [ciphertext content hash](/design/cryptography/primitives/) recomputed, its [STREAM tags](/design/cryptography/encryption/#stream-construction) checked, its [asset manifest](/design/cryptography/provenance/#asset-manifest) run through the single [`verify_asset`](/design/cryptography/keys/#write-authorization) chokepoint. The channel authenticates *who* you are talking to; it never exempts *what* they send from validation. ### Peer-Class Containment -Even two of the same user's devices are separate failure-containment boundaries -([Threat Model — Damage Containment Layers](/design/threat-model/#damage-containment-layers)). -A buggy $v_k$ device cannot overwrite a $v_{k+1}$ device's state via a stale-but-valid -backup artifact, and a v_{k+1} device's writes are not retroactively applied to -a v_k device's view of an older album. Specifically: - -- Every received manifest is checked against the receiver's local - `latest_provenance_hash` for that asset (see [Applying Received Data](#applying-received-data)) - — a stale manifest is quarantined, not silently applied. -- Every received structure that announces a `sidecar_schema`, `crypto_suite_id`, - or `protocol_version` above the receiver's max known is rejected at decode — - the receiver refuses to interpret bytes it cannot validate. This is the - client-side counterpart of the [server-side schema lockdown](/design/threat-model/#schema-evolution-and-field-grammar). -- Device-directory revocations are honored immediately: a device that has been - removed from the user's directory cannot complete the TLS handshake (its - certificate no longer chains to a current IK signature), and any prior cached - state from that device is treated as suspect. +Even two of the same user's devices are separate failure-containment boundaries ([Threat Model — Damage Containment Layers](/design/threat-model/#damage-containment-layers)). A buggy $v_k$ device cannot overwrite a $v_{k+1}$ device's state via a stale-but-valid backup artifact, and a v_{k+1} device's writes are not retroactively applied to a v_k device's view of an older album. Specifically: + +- Every received manifest is checked against the receiver's local `latest_provenance_hash` for that asset (see [Applying Received Data](#applying-received-data)) — a stale manifest is quarantined, not silently applied. +- Every received structure that announces a `sidecar_schema`, `crypto_suite_id`, or `protocol_version` above the receiver's max known is rejected at decode — the receiver refuses to interpret bytes it cannot validate. This is the client-side counterpart of the [server-side schema lockdown](/design/threat-model/schema-rules/#schema-evolution-and-field-grammar). +- Device-directory revocations are honored immediately: a device that has been removed from the user's directory cannot complete the TLS handshake (its certificate no longer chains to a current IK signature), and any prior cached state from that device is treated as suspect. ## Discovery -Discovery is the one genuinely new mechanism. Devices advertise a peering -service over **mDNS** on the local network and accept connections over **TCP**. +Discovery is the one genuinely new mechanism. Devices advertise a peering service over **mDNS** on the local network and accept connections over **TCP**. -Discovery is **LAN-only** — there is no relay, no internet-wide rendezvous. mDNS -broadcasts are visible to every host on the segment, so the advertisement must -not leak identity: a device advertises an **opaque, rotating service instance**, -not `user@server.tld` or a device name. Whether two advertisements belong to the -same user is established *inside* the encrypted channel (below), never from the -broadcast itself. +Discovery is **LAN-only** — there is no relay, no internet-wide rendezvous. mDNS broadcasts are visible to every host on the segment, so the advertisement must not leak identity: a device advertises an **opaque, rotating service instance**, not `user@server.tld` or a device name. Whether two advertisements belong to the same user is established *inside* the encrypted channel (below), never from the broadcast itself. -If no peer answers, discovery fails silently and the device proceeds with -ordinary server sync. +If no peer answers, discovery fails silently and the device proceeds with ordinary server sync. ## Establishing the Channel -A peer connection is HTTP over a **mutually authenticated TLS 1.3** channel. The -certificates presented are the **device keys themselves** — there is no CA. -Each side verifies that the other's device certificate carries a valid hybrid -signature chaining to the shared User IK, exactly as published in the -[device directory](/design/cryptography/#per-user-device-coordination). The -directory *is* the trust anchor; a device not in it cannot complete the -handshake. +A peer connection is HTTP over a **mutually authenticated TLS 1.3** channel. The certificates presented are the **device keys themselves** — there is no CA. Each side verifies that the other's device certificate carries a valid hybrid signature chaining to the shared User IK, exactly as published in the [device directory](/design/cryptography/keys/#device-directory). The directory *is* the trust anchor; a device not in it cannot complete the handshake. -This doc covers sync between devices that are **already provisioned** — both -already hold the account master key. Bootstrapping a brand-new device (handing -it the master key for the first time) is **cross-device recovery** and is -specified in [Backup and Recovery](/design/backup-recovery/#recovery-mechanisms); -peering does not re-document it. +This doc covers sync between devices that are **already provisioned** — both already hold the account master key. Bootstrapping a brand-new device (handing it the master key for the first time) is **cross-device recovery** and is specified in [Device Enrollment](/design/device-enrollment/); peering does not re-document it. ## Determining the Delta -Before building an artifact, the two devices must agree on what is missing. -Peering reuses the [sync cursor](/design/import-synchronization/#discovering-what-changed) -model rather than inventing a diff: each side offers its set of held -[ciphertext content addresses](/design/cryptography/#primitives-inventory) and its cursor, and the delta is the -complement. "What changed" is already defined by the `/sync` feed — peering -borrows that definition wholesale. +Before building an artifact, the two devices must agree on what is missing. Peering reuses the [sync cursor](/design/import/download-sync/#discovering-what-changed) model rather than inventing a diff: each side offers its set of held [ciphertext content addresses](/design/cryptography/primitives/) and its cursor, and the delta is the complement. "What changed" is already defined by the `/sync` feed — peering borrows that definition wholesale. ## What Moves Over the Wire -The transfer payload is a [backup artifact](/design/backup-recovery/#backup-artifact) -scoped to the delta — backup artifacts are explicitly *"constructed from a list -of assets, albums, and so on,"* so a delta-scoped one needs no special -construction path. +The transfer payload is a [backup artifact](/design/backup-recovery/#backup-artifact) scoped to the delta — backup artifacts are explicitly *"constructed from a list of assets, albums, and so on,"* so a delta-scoped one needs no special construction path. -Its contents honor the receiver's existing per-library -[Synchronization Scope](/design/import-synchronization/#synchronization-scope) -setting — there is no peering-specific knob: +Its contents honor the receiver's existing per-library [Synchronization Scope](/design/import/download-sync/#synchronization-scope) setting — there is no peering-specific knob: -- **Always included:** the encrypted metadata blobs and the AMK versions needed - to decrypt the transferred assets. Without these the receiver cannot - interpret anything. -- **Per scope:** original and derivative blobs are included only up to the - receiver's configured tier (*metadata only* / *+ thumbnails* / - *+ original*). Tiers above the setting are fetched lazily later, just as with - server download. +- **Always included:** the encrypted metadata blobs and the AMK versions needed to decrypt the transferred assets. Without these the receiver cannot interpret anything. +- **Per scope:** original and derivative blobs are included only up to the receiver's configured tier (*metadata only* / *+ thumbnails* / *+ original*). Tiers above the setting are fetched lazily later, just as with server download. -Because every blob is content-addressed, dedup is free: the receiver skips any -blob whose [content hash](/design/cryptography/#primitives-inventory) it already holds — the same lookup the -`/blob/{hash}` download path performs against its local cache. +Because every blob is content-addressed, dedup is free: the receiver skips any blob whose [content hash](/design/cryptography/primitives/) it already holds — the same lookup the `/blob/{hash}` download path performs against its local cache. ## Transfer Protocol Peering is **pull-only**, mirroring [Federation](/design/federation/#pull-only-federation): the device that is behind initiates the pull and applies the result only after it verifies. A peer that has new content may send a lightweight **notification hint** — "new content exists" — over a separate low-trust channel to prompt a pull sooner; that hint never feeds the validation pipeline directly and carries no authority. -The artifact is fetched with HTTP `GET` and `Range` requests, which makes a transfer **resumable** across the flaky-by-nature LAN and **idempotent** — content-addressing turns a re-fetch of an already-held blob into a no-op. This is the same resumability the [upload](/design/import-synchronization/#protocol--mechanics) and [download](/design/import-synchronization/#resumption-and-verification) paths rely on. +The artifact is fetched with HTTP `GET` and `Range` requests, which makes a transfer **resumable** across the flaky-by-nature LAN and **idempotent** — content-addressing turns a re-fetch of an already-held blob into a no-op. This is the same resumability the [upload](/design/import/upload-protocol/) and [download](/design/import/download-sync/#resumption-and-verification) paths rely on. ## Applying Received Data -A received artifact is ingested through the **backup restore path** — peering adds no separate deserialization. Restore already re-verifies every blob's [ciphertext content hash](/design/cryptography/#primitives-inventory), checks [STREAM tags](/design/cryptography/#stream-construction) on decrypt, and runs each asset manifest through [`verify_asset`](/design/cryptography/#write-authorization). +A received artifact is ingested through the **backup restore path** — peering adds no separate deserialization. Restore already re-verifies every blob's [ciphertext content hash](/design/cryptography/primitives/), checks [STREAM tags](/design/cryptography/encryption/#stream-construction) on decrypt, and runs each asset manifest through [`verify_asset`](/design/cryptography/keys/#write-authorization). -Additionally, every received manifest's `prior_provenance_hash` is checked against the receiver's local `latest_provenance_hash` for that asset (see [Import & Sync — Stale-Revival Detection](/design/import-synchronization/#stale-revival-detection)). A peering pull cannot resurrect an asset the local device has tombstoned at a later provenance step — even if the artifact was honestly produced from an older state of the sending device. The stale entry is **quarantined and surfaced** as "peer sent stale state." +Additionally, every received manifest's `prior_provenance_hash` is checked against the receiver's local `latest_provenance_hash` for that asset (see [Import — Stale-Revival Detection](/design/import/download-sync/#stale-revival-detection)). A peering pull cannot resurrect an asset the local device has tombstoned at a later provenance step — even if the artifact was honestly produced from an older state of the sending device. The stale entry is **quarantined and surfaced** as "peer sent stale state." -Failures follow Federation's [soft-fail semantics](/design/federation/#soft-fail-semantics): an asset that fails verification is **quarantined and surfaced** in the [provenance/audit trail](/design/cryptography/#provenance-of-library-modifications), never silently dropped and never silently accepted — so a bug can be told apart from an attack after the fact. +Failures follow Federation's [soft-fail semantics](/design/federation/#soft-fail-semantics): an asset that fails verification is **quarantined and surfaced** in the [provenance/audit trail](/design/cryptography/provenance/#provenance-of-library-modifications), never silently dropped and never silently accepted — so a bug can be told apart from an attack after the fact. ## Reconciliation with the Server -Peering does not fork a device's state away from the server. A peering-received asset arrives with its signed manifest intact, so when the server later sees the same asset — uploaded by whichever device the [upload policy](/design/import-synchronization/#synchronization-scope) assigns — it resolves through the existing [deduplication and merge](/design/import-synchronization/#deduplication-and-merge) path on the [content hash](/design/cryptography/#primitives-inventory). A device never re-uploads a blob the server already holds, and the two devices remain convergent with the server's view. +Peering does not fork a device's state away from the server. A peering-received asset arrives with its signed manifest intact, so when the server later sees the same asset — uploaded by whichever device the [upload policy](/design/import/download-sync/#synchronization-scope) assigns — it resolves through the existing [deduplication and merge](/design/import/upload-protocol/#deduplication-and-merge) path on the [content hash](/design/cryptography/primitives/). A device never re-uploads a blob the server already holds, and the two devices remain convergent with the server's view. ## Versioning -Peering has two independently versioned surfaces, both checked **once, up front**, crashing early on mismatch per [Principles](/design/principles/) and the universal [protocol handshake](/design/threat-model/#protocol-and-capability-negotiation): +Peering has two independently versioned surfaces, both checked **once, up front**, crashing early on mismatch per [Principles](/design/principles/) and the universal [protocol handshake](/design/threat-model/validation/#protocol-and-capability-negotiation): - The peering **transport protocol** — date-based (`YYYY-MM-DD`), exchanged via `X-Capsule-Protocol` at channel establishment. Mismatch terminates the TLS connection **before any payload byte is sent** — `426 Upgrade Required` in the channel's framing layer. There is no degraded-mode fallback; peering simply fails and the device proceeds to ordinary server sync. - The **artifact format** — versioned by [Backup and Recovery](/design/backup-recovery/#backup-artifact), so a newer device can still ingest an artifact built by an older one. The artifact's `crypto_suite_id` and album `protocol_version` are validated against the receiver's max known on ingest; a forward-jumping value is rejected (refuse-by-default), never best-effort-parsed. @@ -173,3 +97,14 @@ Peering's failure posture falls out of the designs it reuses: - **Peer disappears.** A vanished peer is indistinguishable from "no peer found" — the device falls back to server sync. Peering is best-effort. - **Offline.** With no server reachable, devices on a shared LAN still converge; the feature works solely offline. - **No order trust.** Content-addressed, immutable blobs and signed manifests mean a peer cannot influence state by reordering a transfer — the same guarantee Federation states in [Reconstructing State Without Trusting Peers](/design/federation/#reconstructing-state-without-trusting-peers). + +## Validation + +- **mDNS opaque identifier (unit).** Generate an advertisement; assert it carries no user handle, no device name. Re-generate after the rotation interval; assert a new opaque identifier. +- **TLS mutual-auth handshake (unit).** Two device certificates chaining to the same IK — assert handshake succeeds. Replace one cert with a revoked-device cert — assert handshake fails. Replace one cert with a foreign-user IK cert — assert handshake fails. +- **Delta calculation (unit).** Two devices with overlapping but distinct content-address sets; assert the delta is the symmetric difference. +- **Artifact ingest (smoke).** Build a delta-scoped backup artifact on device A; feed to device B; assert restore path applies every asset; assert byte-equal `library.sqlite` rebuild on both sides. +- **Stale-revival quarantine on peer pull (smoke).** Device A holds an old manifest; device B holds a newer chain head; A pulls from B successfully; B pulls from A — assert quarantine, not silent overwrite. +- **Resume across LAN drop (smoke).** Start a large artifact transfer; sever the LAN; reconnect; assert Range-resumed transfer with no re-fetched bytes. + +The cross-module case — full A→B LAN sync with both devices then reconciling with the server — is one bounded E2E case in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/principles.md b/capsule-docs/src/content/docs/design/principles.md index 57bb795..e820e3c 100644 --- a/capsule-docs/src/content/docs/design/principles.md +++ b/capsule-docs/src/content/docs/design/principles.md @@ -3,24 +3,24 @@ title: Core Principles description: The core principles that guide the design and development of Capsule --- -These principles apply universally to all components of Capsule from clients to server. +These principles apply universally to all components of Capsule, from clients to server. The owner-doc rules and structural guidance below apply to every doc in `design/`. -Determinism and idempotent processes. Raw and original data is the source of truth -All data is processed aligned to 4KiB (matches memory and disks). Just verify no edge cases require a smaller or bigger multiple though. -Forward and backwards compatibility: old clients ignore new fields and new clients tolerate missing ones gracefully +## Principles -Data integrity: We can NEVER delete data unexpectedly. We act under strict scenarios and crash early otherwise. We implement multiple layers of safeguards to avoid current and future bugs. We trust data in the server will be safe (and in robust hardware) and data in the clients as potentially lost. -Treat most data as ephemeral. If it wasn’t original data, it can be rebuilt. -Encryption, security, and isolation: Keep sensitive code that require auditing and storage of data separate. Encrypt metadata besides data. Compartmentalize every boundary as a failure-containment boundary — per-album, per-peer, per-event, per-user, per-version — so a bug or compromise on one side of a boundary cannot cross it. -Divide between offline and online functionalities: a feature should work either solely online or offline. It should not exhibit different behaviours depending on resource connectivity. This simplifies business logic and risk of state shifts. +- **Determinism + idempotency.** Raw original data is the source of truth; every process is repeatable from its inputs. +- **4 KiB alignment.** Data is processed and written 4 KiB-aligned where it touches disk or memory boundaries. Smaller or larger multiples are introduced only when a concrete edge case demands it. +- **Forward and backward compatibility.** Old clients ignore new fields; new clients tolerate missing ones gracefully — subject to the [Postel's Law asymmetry](#postels-law-asymmetric) below. +- **Data integrity.** Capsule never deletes data unexpectedly. Act under strict scenarios; otherwise crash early. Multiple layers of safeguards guard against current and future bugs. Server data is assumed durable (robust hardware); client data is assumed potentially lost. +- **Ephemeral derived data.** Anything that isn't an original asset can be rebuilt and is treated as rebuildable. +- **Encryption + compartmentalization.** Sensitive code and storage stay separated. Metadata is encrypted alongside data. Every boundary — per-album, per-peer, per-event, per-user, per-version — is a failure-containment boundary; a bug or compromise on one side cannot cross. +- **Offline/online divide.** A feature works either solely online or solely offline, not differently by connectivity. This simplifies business logic and limits state-shift risk. +- **Recovery-first.** The filesystem must be reconstructible from partial corruption. No database is required to interpret critical data — sidecar files are the canonical metadata store; the database is a rebuildable query cache. +- **Self-describing.** Each media file is paired with a CBOR sidecar containing all user-editable and stable metadata. Files are independently interpretable without a running database. +- **Atomic writes.** Use temp-file + rename throughout. Direct overwrites risk corruption on power loss. -**Recovery-First**: The filesystem must be reconstructible from partial corruption. No database is required to interpret critical data — sidecar files are the canonical metadata store; the database is a rebuildable query cache. +### Postel's Law (asymmetric) -**Self-Describing**: Each media file is paired with a CBOR sidecar containing all user-editable and stable metadata. Files are independently interpretable without a running database. - -**Atomic Writes**: Use temp-file + rename throughout. Direct overwrites risk corruption on power loss. - -**Postel's Law**: Liberal in what we accept *within a known schema version* — unknown sidecar fields are preserved verbatim and missing optional fields are tolerated. **Cross-version is closed**: a structure announcing a schema version (`sidecar_schema`, `crypto_suite_id`, `protocol_version`) above the receiver's max known is rejected, never best-effort-parsed. The asymmetry is what prevents a faulty or new client from silently corrupting state — see [Threat Model — Schema Evolution and Field Grammar](/design/threat-model/#schema-evolution-and-field-grammar). +Liberal in what we accept *within a known schema version* — unknown sidecar fields are preserved verbatim and missing optional fields are tolerated. **Cross-version is closed**: a structure announcing a schema version (`sidecar_schema`, `crypto_suite_id`, `protocol_version`) above the receiver's max known is rejected, never best-effort-parsed. The asymmetry is what prevents a faulty or new client from silently corrupting state — see [Threat Model — Schema Rules](/design/threat-model/schema-rules/). ## Single Source of Truth @@ -28,25 +28,68 @@ Every primitive, construction, format, or component identity Capsule depends on The owner docs are: -| Domain | Owner doc | -| ----------------------------------------------------------- | ------------------------------------------------------------- | -| All cryptographic primitives + constructions | [Cryptography](/design/cryptography/#primitives-inventory) | -| ML model identities | [ML Models and Algorithms](/design/ml-models/) | -| LQIP scheme + thumbnail/preview formats | [Thumbnails and Previews](/design/thumbnails/) | -| Server storage stack + topology | [Filesystem](/design/filesystem/) | -| Session/access tokens + auth flow | [Authentication](/design/authentication/) | -| Backup artifact container + escrow | [Backup and Recovery](/design/backup-recovery/) | -| CRDT scheme, identifiers, geolocation | [Metadata](/design/metadata/) | -| Upload/download protocol semantics | [Import and Synchronization](/design/import-synchronization/) | -| Federation trust model, capability tokens, soft-fail policy | [Federation](/design/federation/) | -| LAN discovery + peer channel | [Peering](/design/peering/) | -| Album protocol version pinning | [Versioning](/design/versioning/) | -| Stacking taxonomy + trash semantics | [Asset Organization](/design/organization/) | -| Lifecycle action set | [Authorization](/design/authorization/) | -| Damage containment, client-class taxonomy, server-side validation duties | [Threat Model](/design/threat-model/) | +| Domain | Owner doc | +| --------------------------------------------------------------------- | ------------------------------------------------------------------- | +| All cryptographic primitives + constructions + versioning identifiers | [Cryptography — Primitives](/design/cryptography/primitives/) | +| Cryptographic key hierarchy + device coordination | [Cryptography — Keys](/design/cryptography/keys/) | +| MLS group membership + ciphersuite binding | [Cryptography — MLS](/design/cryptography/mls/) | +| Asset + metadata encryption | [Cryptography — Encryption](/design/cryptography/encryption/) | +| Provenance chains + signed manifests + derivative provenance | [Cryptography — Provenance](/design/cryptography/provenance/) | +| Recovery paths + failure-mode catalog + transport security | [Cryptography — Failure Modes](/design/cryptography/failure-modes/) | +| MLS resilience (state divergence, lost commits, re-keying) | [MLS Resilience](/design/mls-resilience/) | +| Device enrollment + cross-device add ceremony | [Device Enrollment](/design/device-enrollment/) | +| ML model identities + embedding provenance | [AI/ML Integrations](/design/ai/) | +| LQIP scheme + thumbnail/preview formats | [Thumbnails and Previews](/design/thumbnails/) | +| Server filesystem (blob store, Postgres index, deployment profiles) | [Filesystem — Server](/design/filesystem/server/) | +| Client filesystem (library layout, local index, space recovery) | [Filesystem — Client](/design/filesystem/client/) | +| Library self-maintenance + atomic-write granularity | [Filesystem — Maintenance](/design/filesystem/maintenance/) | +| Session/access tokens + identity binding + auth flow | [Authentication](/design/authentication/) | +| Backup artifact container + escrow + recovery mechanisms | [Backup and Recovery](/design/backup-recovery/) | +| CRDT scheme, identifiers, geolocation, sidecar schema | [Metadata](/design/metadata/) | +| Import pipeline (scan, plan, execute) | [Import — Pipeline](/design/import/pipeline/) | +| Upload protocol (wire, sessions, finalization) | [Import — Upload Protocol](/design/import/upload-protocol/) | +| Download, sync feed, tiered fetch, auto-sync | [Import — Download & Sync](/design/import/download-sync/) | +| Federation trust model, capability tokens, soft-fail | [Federation](/design/federation/) | +| LAN discovery + peer channel + delta transfer | [Peering](/design/peering/) | +| Album protocol version pinning + upgrade ceremony | [Versioning](/design/versioning/) | +| Stacking taxonomy + trash retention semantics | [Asset Organization](/design/organization/) | +| Lifecycle action set | [Authorization](/design/authorization/) | +| Damage scenarios + client class taxonomy + containment shells | [Threat Model](/design/threat-model/) | +| Server- + client-side refuse-by-default validation invariants | [Threat Model — Validation](/design/threat-model/validation/) | +| Schema evolution rules, forbidden behaviors, deprecation policy | [Threat Model — Schema Rules](/design/threat-model/schema-rules/) | +| Share links + public-share serving | [Share Links](/design/share-links/) | +| Moderation policy + federated reporting + blocklists | [Moderation](/design/moderation/) | +| Quota accounting + enforcement points | [Quota](/design/quota/) | +| Client validation duties + sandboxed decoder | [Clients](/design/clients/) | +| Code module → design doc mapping + bounded E2E test surface | [Module Map](/design/module-map/) | **Permitted secondary mentions.** Mechanism-explanatory phrasing inside a non-owner doc is fine — for example, "STREAM tags catch chunk reordering" inside [Peering](/design/peering/) is explaining a *behavior*, not declaring a *choice*. What the rule forbids is restating the choice itself ("we use SHA-256") outside the owner doc. When in doubt, link. +## Doc Structure + +Design docs are **not templated**. Each doc's structure is chosen to fit its content — a wire-protocol doc is naturally state-machine-shaped, a primitives inventory is naturally a table, the threat-model scenario doc is naturally a matrix. What stays consistent is *what every doc must make legible*, not *how it must look*. + +Regardless of shape, every design doc must let a reader answer four questions: + +1. **Where does this live?** — Which crate(s) / module(s) implement this. Surface this however reads best: a header callout, an intro sentence, or per-section "implemented in…" notes. Don't bolt on a labeled "Module Boundary" section if it doesn't add clarity. +2. **What is its public surface?** — The contract other modules depend on: schema, wire format, trait shape, manifest envelope, closed enum, error domain. For some docs the entire doc *is* the surface (a CBOR schema doc, a primitives inventory); for others the surface is one focused subsection. Promote it visually only if a reader couldn't otherwise locate it quickly. +3. **What does it own vs. defer to?** — Owner-anchor links to the SSoT for upstream primitives. This is the [SSoT rule](#single-source-of-truth) applied per doc. +4. **How is it validated?** — Brief tier notes (see [Validation Tiers](#validation-tiers) below) where the answer is non-obvious or where the cross-module test surface needs bounding. For docs whose validation collapses entirely to "the threat-model scenario map enforces this," a one-line pointer is enough. + +These are goals, not required headings. Some docs hit all four in a single intro paragraph; others (especially wire-protocol and schema docs) dedicate focused sections. The choice is per doc. + +The [Module Map](/design/module-map/) is the cross-cutting index: every code module → owning design doc → validation tier. It is the developer's first stop. + +## Validation Tiers + +The three test tiers a design doc may reference: + +- **Unit** — In-module tests against the doc's contract surface, with peer modules and external dependencies mocked. Deterministic, fast, run on every change. Example: signing and verifying a manifest against fixed test vectors inside `capsule-core::crypto`. +- **Smoke** — Single-module end-to-end with the module's real implementation but its peers mocked. Uses real I/O and real backing services (e.g. testcontainers for Postgres or Valkey). Example: the upload-server full session lifecycle (`POST /upload` → `PATCH` → finalization) against a real Postgres, with no client process — the client side is mocked at the HTTP boundary. +- **E2E** — Multiple modules wired together against real infrastructure. The list is **bounded** in [Module Map — E2E Test Surface](/design/module-map/#e2e-test-surface). Any addition requires updating that list — E2E surface growing past the bound is a signal the design has unwanted coupling worth examining. + +The split is enforced by **what is mocked, not by location in the source tree**. A test under `crate/tests/integration/` that mocks every peer is still a unit test for the purposes of this taxonomy. + ## Damage Containment A faulty, malicious, or version-mismatched client must not be able to inflict irreparable damage on user data. The principles above (data integrity, atomic writes, recovery-first, self-describing, Postel's Law, encryption + compartmentalization) name the *posture*; the [Threat Model](/design/threat-model/) names the *defenses*. @@ -55,7 +98,7 @@ In particular, the threat model owns: - The **client class taxonomy** (honest, faulty, malicious, old, new) — how each is authenticated and what stops each from doing harm. - The **damage scenario → invariant map** — for every concrete attack or bug class, the single owner doc that defeats it. -- **Server-side validation invariants** — the refuse-by-default structural checks a key-less server runs on every write. +- **Server- and client-side validation invariants** — the refuse-by-default structural checks a key-less server and every client run on every write. - **Protocol and capability negotiation** — the universal fail-closed handshake that rejects version mismatches before any state is written. - **Idempotency, atomicity, and quarantine** rules that span owner docs. diff --git a/capsule-docs/src/content/docs/design/quota.md b/capsule-docs/src/content/docs/design/quota.md new file mode 100644 index 0000000..bc85706 --- /dev/null +++ b/capsule-docs/src/content/docs/design/quota.md @@ -0,0 +1,82 @@ +--- +title: Quota +description: Storage quota accounting, thresholds, and enforcement points +--- + +Storage quota in Capsule is accounted to `upload_user_id` (the authenticated uploader), which is distinct from `owner_id` (the asset's owner). This separation lets a user upload on behalf of a different owner (with verified permission) while keeping storage cost attributed correctly. The accounting model is enforced at the [server filesystem](/design/filesystem/server/#ownership-partitioning-and-quota) and at [upload session creation](/design/import/upload-protocol/#quota-and-permissions); this doc owns the threshold model and what happens when limits are hit. + +Implementation will live in `capsule-api-service::quota`. Accounting reads from the Postgres asset index (size sums per `upload_user_id`); enforcement runs at session creation, before any chunks are accepted. + +## Accounting Model + +```text +quota_used(user) = SUM(ciphertext_size) for all blobs where upload_user_id = user + + SUM(metadata_blob_size) + + SUM(derivative_blob_size for derivatives the user generated) +``` + +Notable: + +- **Content-addressed dedup is global.** A blob shared between two uploaders counts against *only the first uploader* — the second is a merge (see [Upload Protocol — Deduplication and Merge](/design/import/upload-protocol/#deduplication-and-merge)). This is what stops a malicious user from racking up another user's quota by re-uploading their public assets. +- **Derivatives count.** Thumbnails and previews are real storage, attributed to whichever device generated them. +- **Provenance blobs count.** Each per-asset `.provenance.cbor` (server-side encrypted blob) is small but accumulates. +- **Federated-received blobs count against the receiver.** When a user's home server caches a blob pulled from a [federated](/design/federation/) peer on that user's behalf, the cached bytes count against the **receiving** user's quota, deduped by content hash so a blob the server already holds is never counted twice. A per-`(receiving_user, source_peer)` caching budget (deployment-configurable; default 25% of the receiver's hard quota per source peer) bounds how much one user can pull from any single peer, so a user receiving from many peers cannot push the home server's storage past their own quota. This is the storage-side counterpart of [Federation's per-peer compartmentalization](/design/federation/#per-peer-compartmentalization) and is the resolution of the federated-receive DoS. +- **Trash-retained assets count fully.** An asset in trash still occupies storage until its [retention window](/design/organization/#retention-window) expires and it is hard-purged, so it counts against quota at full size. This is deliberate: it keeps accounting honest and gives users a concrete reason to empty trash rather than treating it as free overflow. +- **Derivatives are reclaimed on hard-purge.** When an asset is hard-purged, its derivative and metadata blob references drop alongside the original's; any blob whose reference count reaches zero is [garbage-collected](/design/filesystem/server/#deletion-and-garbage-collection) and the freed bytes are credited back to whichever user they were attributed to. A purged asset never leaves orphaned derivatives silently inflating a quota. + +## Thresholds and States + +A user account exists in one of these quota states: + +| State | Threshold | Behavior | +| ----------------- | ----------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **OK** | quota_used < soft_limit | All uploads succeed normally. | +| **Soft warning** | soft_limit ≤ quota_used < hard_limit | Uploads succeed, but the UI surfaces a warning. | +| **Hard exceeded** | quota_used ≥ hard_limit | New uploads rejected at session creation with a structured error. Existing assets remain accessible. | +| **Grace expired** | quota_used ≥ hard_limit for > `grace_window` (default 14 days) | Read-only mode: reads, deletes, and restore-from-trash still work; only new uploads and metadata-growth writes are refused. Freeing space (emptying trash) lifts it. | +| **Suspended** | (admin or billing action — see [Moderation](/design/moderation/)) | Server-defined; possibly upload refusal, possibly full lockout. | + +Defaults for `soft_limit`, `hard_limit`, and `grace_window` are deployment-configurable. Self-hosted servers might run with no quota (`hard_limit = ∞`); hosted services set per-tier limits. + +## Enforcement Points + +Where the quota check actually runs: + +- **At [`POST /upload`](/design/import/upload-protocol/#endpoints) session creation.** The server computes `quota_used(upload_user_id) + declared_size` and rejects with `403 Quota Exceeded` (or similar structural code) if it crosses the hard limit. This is the *only* hard enforcement point — once a session is open, the declared size is the cap, and the session is allowed to complete. +- **At session cancellation.** When a session is cancelled or expires, the reserved-but-uncommitted bytes are released; the next quota check sees the new (lower) usage. +- **At [finalization](/design/import/upload-protocol/#finalization-and-integrity).** Cumulative size is bounded by the declared size at chunk acceptance; no separate quota check at finalization is needed because the declared size was already approved at session creation. +- **At metadata-update writes.** A metadata-update creates a new encrypted metadata blob; the size delta is checked against quota. Tiny but non-zero. + +## Scope Decisions + +- **Sponsored-account attribution.** A sponsoree's uploads count against the **sponsor's** quota — the sponsoree's `upload_user_id` derives from the sponsor ([Keys — Delegated/Sponsored](/design/cryptography/keys/#delegatedsponsored-accounts)), so storage rolls up to the sponsoring (billing) account. There is no separate sponsoree quota. +- **Per-album quotas.** Out of scope for v1 — quota is per `upload_user_id` only. A deployment that later wants per-album caps adds them as a second, independent check at the same enforcement point; the accounting model above does not change. +- **Grace-window UX.** The structural rule is "upload session creation refused" in read-only mode; the client surfaces this as a discoverable, remediable state (what is full, what to delete) rather than an opaque mid-import error. Concrete copy is a client-UX detail. +- **Billing integration.** Out of scope and deliberately decoupled: this doc owns *accounting and enforcement* (what `quota_used` is, where the check runs); a billing/tier system, where present, only *sets* `soft_limit` / `hard_limit` / `grace_window`. Self-hosted deployments run with no billing and `hard_limit = ∞`. + +## Contract Skeleton + +```rust +// in capsule-api-service::quota +struct QuotaStatus { + used: u64, + soft_limit: u64, + hard_limit: u64, + state: QuotaState, // OK | SoftWarning | HardExceeded | GraceExpired | Suspended +} + +fn check_quota(user: UserId, additional_bytes: u64) -> Result<(), QuotaError>; +fn current_status(user: UserId) -> QuotaStatus; +``` + +Concrete error types, the `GET /quota` response shape, and admin controls are an implementation detail; the accounting model and enforcement points above are the contract. + +## Validation + +- **Hard-limit enforcement (unit).** A session creation that would cross the hard limit is rejected with the right code; no pending row is written. +- **Dedup attribution (unit).** Two users upload the same content; assert only the first user's quota is debited. +- **Trash-retention accounting (unit).** Soft-delete an asset; assert it still counts at full size until hard-purge; hard-purge it; assert the bytes are released. +- **Federated-receive accounting (unit).** Cache a federated blob for a receiving user; assert it debits the receiver, deduped (a blob the server already holds is not double-counted); exhaust a `(receiving_user, source_peer)` caching budget; assert further pulls from that peer are refused. +- **Derivative reclaim on purge (unit).** Hard-purge an asset; assert its derivative + metadata blob references drop and any zero-reference blob is GC'd, with bytes credited back — no orphaned derivative left counting. +- **Grace expiry (smoke).** Mock the grace window past; assert read-only mode behavior. +- **Quota status reporting (unit).** `GET /quota` returns accurate `used` + `state` for a fixture user. diff --git a/capsule-docs/src/content/docs/design/share-links.md b/capsule-docs/src/content/docs/design/share-links.md new file mode 100644 index 0000000..064d8a7 --- /dev/null +++ b/capsule-docs/src/content/docs/design/share-links.md @@ -0,0 +1,71 @@ +--- +title: Share Links +description: Non-registered-account share link generation, permission model, and public-share serving +--- + +Share links let a Capsule user grant view (and possibly limited write) access to an album or a specific asset *without* requiring the recipient to have a Capsule account. The recipient is the [non-registered account](/design/authentication/#account-types) class — no master key, no User IK, no MLS membership. The cryptographic shape (the link secret carries the decryption material; an optional passphrase wraps it with the [password-based KDF](/design/cryptography/primitives/#password-based-kdf)) is owned by [Cryptography — Keys: Non-registered accounts](/design/cryptography/keys/#non-registered-accounts); this doc owns everything else. + +Implementation will live in `capsule-api-media::shares` (public-share serving endpoints) and `capsule-core::sharing` (link generation, capability validation). + +## Scope (v1) + +In scope: + +- View-only links to a single asset. +- View-only links to a whole album. +- Optional passphrase protection (the link secret + a user-chosen passphrase, both required to decrypt). +- Optional expiry (link valid until a specific timestamp). +- Revocation (publish a revocation record; the serving endpoint refuses revoked links). + +Out of scope for v1 (deliberate non-goals): + +- **Writable share links.** Writing requires a write-tier key + a place in the MLS group; a non-registered user has neither. Supporting writes would require an ephemeral link-scoped key hierarchy — a substantial new design that is not justified for v1. +- **Per-recipient analytics.** Link views are not tracked per-recipient. The link is the credential; the server knows it was used, not by whom. + +## Security Contract + +These are **normative** — the security-relevant decisions are committed; only UX presentation remains open. + +- **URL format.** `https://server.tld/s/{opaque-id}#{secret}` — the secret lives in the URL **fragment**, which browsers never transmit, so the server holds only `{opaque-id}` and never the decryption secret. `{opaque-id}` is **fully opaque and carries no scope**; the asset/album scope is resolved server-side from the link record, so the URL itself leaks nothing about what it points to. +- **Opaque-id entropy.** `{opaque-id}` is a **random 128-bit value** drawn from the CSPRNG — a full 128 bits of entropy, *not* a UUIDv7 or other structured id whose embedded timestamp would cut real entropy to ~62 bits. No shorter or sequential identifier is permitted — this is the structural defense against link enumeration, independent of rate limiting. +- **Serving-endpoint rate limits.** The public serve path is rate-limited **per source IP and per `{opaque-id}`** (two independent limiters) and returns an **indistinguishable `404`** — never `410 Gone`, which would confirm a link once existed — for a not-found, revoked, or expired link alike, so probing reveals nothing and fast enumeration is throttled. +- **Passphrase unwrap is client-side.** When a passphrase protects a link, the server stores only the **wrapped** secret and never receives the passphrase: the client fetches the wrapped material and unwraps it locally via the [password-based KDF](/design/cryptography/primitives/#password-based-kdf). The server is never in the password-trust path, so a server compromise cannot brute-force passphrases beyond the [Argon2id](/design/cryptography/primitives/#password-based-kdf) cost already imposed. Because unwrap is client-side the server cannot observe a *failed* attempt, so the endpoint that returns the wrapped material is rate-limited per source IP and per `{opaque-id}` (the same limiter as the serve path); the Argon2id cost is the real brute-force backstop. +- **Privacy strip on serve is mandatory.** The serve path **always** applies the boundary-crossing strip from [Metadata — Privacy on Export](/design/metadata/#privacy-on-export) (camera serial, device/session ids, GPS truncated to city level, contact tags). There is **no per-share opt-out** that could leak fingerprinting fields — a public share is, by definition, a boundary crossing. +- **Home-server-only serving.** A share link is served **only by the album's [home server](/design/federation/#album-ownership-v1-single-home-server)**. A federated peer never serves a share; a share-scoped request at a peer returns a **structured `{ home_server }` JSON pointer** the client resolves — explicitly *not* an HTTP redirect, to avoid an open-redirect surface — never content. This keeps revocation and rate-limiting at a single authoritative point. +- **Revocation cache.** Per-link revocation is checked against a **short-TTL cache (default 60 s)** with the same fail-closed posture as the [federation revocation list](/design/federation/#token-lifecycle-and-chain-of-trust): a serve path that cannot confirm a link is still live past the TTL refuses rather than serving on stale-allowed state. + +## Contract Skeleton + +The surfaces consuming code needs; the security policies they enforce are fixed by the [Security Contract](#security-contract) above. + +```rust +// in capsule-core::sharing +trait ShareLinkIssuer { + fn create_link(scope: ShareScope, expiry: Option, passphrase: Option<&str>) -> Result; + fn revoke(link_id: ShareLinkId) -> Result<(), Error>; +} + +// in capsule-api-media::shares +// GET /s/{opaque-id} → metadata blob + LQIP (mandatory server-side strip — see Security Contract) +// GET /s/{opaque-id}/blob/{hash} → ciphertext blob; client decrypts using link-derived key +// POST /s/{opaque-id}/passphrase → if passphrase-wrapped, exchange passphrase for unwrap material +``` + +Concrete error variants are an implementation detail; the rate-limit, opaque-id entropy, privacy-strip, and revocation policies are fixed by the [Security Contract](#security-contract) above. + +## Failure Modes + +- **Link enumeration.** Defeated structurally by the ≥128-bit opaque-id and operationally by per-IP/per-link rate limits with indistinguishable `404`s (see [Security Contract](#security-contract)). +- **Revoked link still served.** Home-server-only serving means a single authoritative revocation point — no peer caches a share to serve stale — and the 60 s revocation cache fails closed past its TTL. +- **Passphrase brute force.** The [Argon2id](/design/cryptography/primitives/#password-based-kdf) wrap makes weak passphrases survivable; client-side unwrap keeps the server out of the trust path; the rate-limited serve endpoint is the operational backstop. + +## Validation + +- **Opaque-id entropy (unit).** Assert generated ids are ≥128-bit and non-sequential; a generator producing shorter or guessable ids fails the test. +- **Enumeration resistance (smoke).** Probe the serve endpoint with random ids; assert per-IP/per-link rate limiting, and that not-found, revoked, and expired all return an indistinguishable `404`. +- **Passphrase unwrap locality (unit).** Assert the passphrase never crosses the wire — the server stores and returns only the wrapped secret; unwrap happens client-side. +- **Revocation honored (smoke).** Revoke a link; assert the serve endpoint refuses within the 60 s cache window, and fails closed past TTL when revocation state is unreachable. +- **Privacy-strip on serve (unit).** Assert the boundary-crossing field set is always stripped from the served metadata blob, with no opt-out path. +- **Home-server-only (unit).** Assert a federated peer refuses to serve a share and returns a home-server pointer. + +(The validation surface grows with the client UX, but the security checks above are committed.) diff --git a/capsule-docs/src/content/docs/design/threat-model.md b/capsule-docs/src/content/docs/design/threat-model.md deleted file mode 100644 index 1366ee3..0000000 --- a/capsule-docs/src/content/docs/design/threat-model.md +++ /dev/null @@ -1,313 +0,0 @@ ---- -title: Threat Model -description: How Capsule contains damage from faulty, malicious, or version-mismatched clients ---- - -This doc catalogues the ways a client can damage user data, the invariant in each owner doc that defeats each scenario, and the universal rules that bind them — protocol negotiation, server-side validation duties, idempotency, atomicity, and provenance immutability. - -It is **not** a primitives doc. Every primitive Capsule uses is declared in its [owner doc](/design/principles/#single-source-of-truth); this doc references those declarations rather than re-stating them. Where a specific invariant lives, the relevant owner doc enforces it; where a *defense* spans multiple docs, the canonical statement lives here. - -## Purpose and Scope - -E2EE shifts most of the trust to the client. The server holds no keys; clients write the canonical state. That makes the question "what damage can a client cause?" load-bearing for the design — a single buggy implementation, a hostile keyholder inside an album, a stranded old build, or a too-new prototype all have to fail safely. - -A faulty, malicious, or version-mismatched client must not be able to cause **irreparable** damage (loss of original bytes, loss of audit trail, undetected silent overwrite of user intent) and should not be able to cause more than **transient** damage (a quarantined asset surfaces to the user; a rejected write returns a clear error; a divergence is detected and reconciled). The recovery paths in [Cryptography — Failure Modes and Recovery](/design/cryptography/#failure-modes-and-recovery) cover key loss; this doc covers the *write-path* harm a wrong-but-signed client can attempt. - -## Client Class Taxonomy - -Every client request can be classified by one of these models. The defenses listed below apply to **all** of them — none of them are trusted to enforce their own correctness: - -| Class | Description | What authenticates them | What stops them | -| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **Honest** | Conforming implementation, correct keys, correct version. | Session token + access token + device DSK + epoch write-tier signature. | Nothing to stop. This is the baseline. | -| **Faulty** | Conforming intent, buggy implementation. Writes structurally invalid or semantically wrong manifests under real keys. | Same as honest — the keys are correct. | Server-side [structural validation](#server-side-validation-invariants) + client-side [`verify_asset`](/design/cryptography/#write-authorization) chokepoint + quarantine surfaces. | -| **Malicious** | Adversary in possession of a current device's DSK and the album's epoch write-tier key. Writes deliberately malformed or destructive operations. | Same as honest — the keys are real, because the adversary owns them. | Provenance chain immutability + soft-delete window + per-album/per-event compartmentalization + audit trail for after-the-fact attribution. | -| **Old** | A signed-in client that predates a feature, schema, or suite the server now considers minimum. Cannot produce structurally valid writes for albums pinned above its version. | Authenticated, but `X-Capsule-Protocol` is below the server's accepted range. | [Protocol handshake](#protocol-and-capability-negotiation) rejects writes with `426 Upgrade Required` before any state is written. | -| **New** | A prototype or staging build that writes a `protocol_version`/`crypto_suite_id`/`sidecar_schema` ahead of what the receiver knows. | Authenticated, but the version is higher than the receiver's max known. | Receiver's refuse-by-default rule on unknown enum values, unknown schemas, and forward-jumping protocol versions; closed schema evolution boundary (see below). | - -The deliberate choice in the matrix above: a *malicious* client with real keys is the hardest to stop, because confidentiality and authentication don't help when the adversary already holds the keys. Capsule's response is to ensure such an adversary can do nothing **silently** — every write produces a signed provenance record, soft-delete is the default, and history is append-only. The audit trail is the recovery surface. - -## Damage Containment Layers - -Restating the boundary hierarchy from [Core Principles](/design/principles/) as concentric containment shells, with the owner doc that enforces each: - -| Shell | Boundary | Owner doc | -| ------------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | -| **Per-version** | Album protocol pinning isolates a buggy v_k from v_{k-1} albums. | [Versioning](/design/versioning/#album-protocol-version-pinning) | -| **Per-album** | MLS group + per-epoch AMK + per-epoch write-tier key. | [Cryptography — Group Membership](/design/cryptography/#group-membership) | -| **Per-event** (manifest) | Each lifecycle action is its own signed, chained record. | [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) | -| **Per-user** | Owner Group Key, sponsored-account isolation. | [Cryptography — Owner Group Keys](/design/cryptography/#owner-group-keys-ogks) | -| **Per-peer** (federation) | Capability tokens, error budgets, quarantine for new peers. | [Federation](/design/federation/) | -| **Per-device** (peering) | Device directory enforced via the TLS handshake. | [Peering — Establishing the Channel](/design/peering/#establishing-the-channel) | - -A bug or compromise on one side of any shell cannot cross it. - -## Damage Scenario → Invariant Map - -The lookup table for "what damage X is prevented by which invariant Y in which doc Z." Each row names a concrete vector found during the audit and the single owner-doc anchor that defeats it. - -| # | Damage scenario | Defense | Owner doc | -| --- | ------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| 1 | Old client writes a sidecar after stripping unknown fields | Sidecar signature covers `_unknown`; old client refuses to write when `sidecar_schema` > its max known | [Metadata — Schema Versioning Rules](/design/metadata/#schema-versioning-rules) | -| 2 | Faulty client uploads bytes that don't match the declared content type | Server's `content_type` allow-list per protocol version (no-key check) + receiving client decoder sandbox | [Threat Model §5](#server-side-validation-invariants), [Clients — Sandboxed Decoder](/design/clients/#sandboxed-decoder) | -| 3 | Buggy client uploads chunk with wrong offset and re-tries | Idempotency tuple `(upload_id, offset, chunk_hash)`; duplicate at offset with different hash → reject | [Import & Sync — Upload Protocol](/design/import-synchronization/#upload-protocol) | -| 4 | Hostile peer sends an old-but-validly-signed manifest to revive a deleted asset | `prior_provenance_hash` chain advance check on both client and server | [Cryptography — Provenance](/design/cryptography/#provenance-of-library-modifications), [§ Server-Side Validation Invariants](#server-side-validation-invariants) | -| 5 | Malicious client re-signs an existing manifest under a weaker `crypto_suite_id` | Signatures cover `crypto_suite_id` and `protocol_version` | [Cryptography — Write Authorization](/design/cryptography/#write-authorization) | -| 6 | Two devices concurrently caption the same photo | Caption LWW + `superseded_captions` array surfaces the loser | [Metadata — Surfacing Concurrent Edits](/design/metadata/#surfacing-concurrent-edits) | -| 7 | Client issues an OR-set remove for an element it never observed an add for | Add-id binding: removes target a specific `add_id`; unknown `add_id` is rejected | [Metadata — Add-id Binding](/design/metadata/#add-id-binding) | -| 8 | Buggy client overwrites a good thumbnail with a corrupt one | Every derivative carries a signed `DerivativeManifest` on its own chain; overwrite is a `derivative-replace` lifecycle action | [Cryptography — Derivative Provenance](/design/cryptography/#derivative-provenance) | -| 9 | A client declares `timestamp = 2099-01-01` to distort the audit | Server rejects timestamp outside ±30 days of server clock at accept | [Cryptography — Write Authorization](/design/cryptography/#write-authorization) | -| 10 | Server-side TOCTOU on blob dedup creates a duplicate | Dedup-check and pending-row insert are atomic on a single Postgres transaction | [Filesystem — Content-Addressing and Deduplication](/design/filesystem/#content-addressing-and-deduplication) | -| 11 | A faulty client uploads bytes that exceed its declared size | Server bounds cumulative received at every chunk, not only at finalization | [Import & Sync — Chunk rules](/design/import-synchronization/#upload-protocol) | -| 12 | A new client writes a manifest with a `crypto_suite_id` the server does not recognize | Refuse-by-default at handshake: 400 before any session is created | [§ Protocol and Capability Negotiation](#protocol-and-capability-negotiation) | -| 13 | A federated peer floods the rejected-hash table to exhaust memory | Per-peer quota; bounded LRU memory | [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics) | -| 14 | A model swap silently invalidates the AI tag namespace | Every `tags_ai` entry carries `model_id`+`model_version`; cross-model comparison is forbidden | [Metadata — Tag Provenance and Namespacing](/design/metadata/#tag-provenance-and-namespacing) | -| 15 | A leaked session token revokes all of a user's other sessions to lock them out | `revoke_all_sessions` requires master-key proof, not session auth | [Authentication — Explicit revocation](/design/authentication/#explicit-revocation) | -| 16 | An attacker holding every current key tries to rewrite the asset's history | Provenance chain references each predecessor's hash; rewriting any past record requires forging an earlier (possibly retired) device's hybrid signature | [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications) | -| 17 | A client picks a random `amk_version` to skip MLS | Server's no-key check: `amk_version` must be monotonic per album and known to the server | [§ Server-Side Validation Invariants](#server-side-validation-invariants) | -| 18 | A v_old client tries to write into an album that has been upgraded to v_new | Album pinning + upgrade ceremony quiescence: server returns `409` for writes carrying a stale `intent_id` | [Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony) | -| 19 | A malformed CBOR sidecar lands on disk after a crash mid-write | Malformed sidecar → quarantined to `.library/quarantine/`; never silent-skipped | [Filesystem — Repair](/design/filesystem/#repair) | -| 20 | A federation pull returns a manifest claiming a device that's not in the user's directory | Server's no-key check: `created_by_device` must be in the user's published device directory | [§ Server-Side Validation Invariants](#server-side-validation-invariants) | -| 21 | A buggy client uploads a metadata blob with a hand-crafted wire format | Metadata blob wire format is byte-exact; mismatched envelope rejected at decode | [Cryptography — Metadata Blob Wire Format](/design/cryptography/#metadata-blob-wire-format) | -| 22 | A retry of a delete manifest decrements blob refcount twice | Manifest idempotency keyed by `prior_provenance_hash`: a duplicate manifest is a no-op | [§ Idempotency Invariants](#idempotency-invariants) | -| 23 | A backup restore from 6 months ago silently overwrites current state | Restore-as-chain-fork: every restored manifest with a stale `prior_provenance_hash` is quarantined and surfaced for explicit merge | [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) ([open question](#open-questions)) | -| 24 | A new device claims its key is older than the account itself | Device entry in the device directory is signed by the IK and carries `added_at`; a server rejects an upload from a device whose `added_at` postdates the manifest | [Cryptography — Device Keys](/design/cryptography/#device-keys), [§ Server-Side Validation Invariants](#server-side-validation-invariants) | -| 25 | A peer floods notifications to make Capsule pull garbage | Notifications are advisory; pull is on Capsule's schedule and goes through full validation | [Federation — Pull-Only Federation](/design/federation/#pull-only-federation) | -| 26 | A federated server's TLS endpoint silently changes its public key | Servers cache each other's keys; rotation requires a notary endpoint co-sign | [Federation — Server Identity and Key Rotation](/design/federation/#server-identity-and-key-rotation) | -| 27 | A buggy client writes a stack edit that updates one member's sidecar and not the others | Stack edits are bundle-atomic: all `.tmp` files staged first, all renamed together; any rename failure discards the bundle | [Filesystem — Atomic Writes and Crash Recovery](/design/filesystem/#atomic-writes-and-crash-recovery) | -| 28 | A federated peer serves a stale capability token after revocation | Capability TTL ≤ 24h + published revocation list polled ≤ 15 min | [Federation — Federation Capabilities](/design/federation/#federation-capabilities) | -| 29 | A faulty client uploads embeddings derived from a model the receiver does not run | Vector index refuses inserts whose `model_id` is unknown | [ML Models — Embedding Provenance](/design/ml-models/#embedding-provenance) | -| 30 | A client tries to write directly to a server-derived field (e.g. computed ciphertext hash) | Server recomputes ciphertext hash at finalization and rejects mismatch | [Import & Sync — Finalization and Integrity](/design/import-synchronization/#finalization-and-integrity) | - -When a scenario surfaces during implementation that does not match any of the above, the rule is: add a row here, then declare the defense in exactly one owner doc. Never restate a defense in multiple docs. - -## Server-Side Validation Invariants - -The server holds no keys — it cannot verify any signature against a key it owns. But it **does** validate the *structure* of every write before persisting state. These checks are refuse-by-default and intentionally exhaustive; a buggy server that skips one of them silently widens the blast radius for the entire client class taxonomy above. - -This list is the canonical statement; [Filesystem](/design/filesystem/), [Import & Synchronization](/design/import-synchronization/), [Federation](/design/federation/), [Authorization](/design/authorization/), and [Authentication](/design/authentication/) reference it without restating. - -### On `POST /upload` (session creation) - -1. `X-Capsule-Protocol` is within the server's `[Min, Max]` range. Otherwise `426 Upgrade Required`, no session created. -2. `crypto_suite_id` is a row of the [Primitives Inventory](/design/cryptography/#primitives-inventory). Otherwise `400`. -3. `hash.algo` matches the algorithm declared by `crypto_suite_id`. Otherwise `400`. -4. `size` ∈ (0, `max_file_size`]. Otherwise `400` / `413`. -5. `content_type` ∈ closed enum for this protocol version. Otherwise `400`. -6. `album_id` exists; authenticated user has server-visible write capability on it; album's pinned `protocol_version` equals the request's. Otherwise `403`. -7. `created_by_device` is in the user's published device directory, and the directory entry's `added_at` precedes the request's `timestamp`. Otherwise `403`. -8. `timestamp` is within ±30 days of server clock. Otherwise `400`. - -### On each `PATCH /upload/{id}` chunk - -9. Offset is exactly the current received-byte count. Otherwise `409`, with `X-Capsule-Offset` returned. -10. Non-final chunk size is a multiple of 4 KiB. Otherwise `400`. -11. Cumulative received ≤ declared `size`. Otherwise `400` / `413`, session moves to `FailedProcessing`. -12. The `(upload_id, offset, chunk_hash)` idempotency tuple is new OR matches an exact prior PATCH. Otherwise (same offset, different hash) `409` + corruption error. - -### At finalization - -13. Total received == declared `size`. Otherwise `FailedProcessing`. -14. Recomputed ciphertext hash == declared `hash.value`. Otherwise `FailedProcessing` + corruption error. -15. Manifest envelope re-validated (rerun 1–8) inside the finalization transaction. - -### On non-upload writes (lifecycle action manifest, metadata-update, derivative-add/replace, trash-restore) - -16. `action` is in the closed enum. Otherwise `400`. -17. `prior_provenance_hash` equals the last accepted manifest's content hash for this `asset_id`. Otherwise `409` (stale-revival). -18. `amk_version` is monotonic per album (never regresses). Otherwise `400`. - -### On federation pull (server-to-server) - -19. Capability token verifies under home server's signing key; `exp` in future; `jti` not in revocation list (cached ≤ 15 min). Otherwise `401` / `403`. -20. All checks (1)–(18) re-applied — federation does not unlock looser rules. -21. Per-peer rate budgets unbroken (events/hour, bytes/hour, CPU/hour). Otherwise `429`. - -Every rejection is logged with a structured reason code; the rejected hash is remembered (bounded, see [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics)) so divergence between Capsule's view and a permissive peer's view is detectable. - -## Client-Side Validation Invariants - -Mirror checklist that every client implements before applying any received data — local or remote. A client that skips one of these is in the *faulty* class. - -- Run [`verify_asset`](/design/cryptography/#write-authorization) on every received `AssetManifest`. Quarantine on failure; never silent-drop, never silent-accept. -- Reject an incoming `sidecar_schema` greater than the client's `max_known_sidecar_schema`. Refuse to write that sidecar; refuse to read in normal mode (read-only opt-in is allowed). -- Reject an incoming `protocol_version` outside `[Min, Max]` known to the client. The same handshake the server runs. -- Reject an unknown enum value for any field whose enum is closed at the current schema (notably `action`, `content_type`, `gps.source`, `DerivativeManifest.role`). Unknown CBOR map keys are preserved per [Postel's Law](/design/principles/) and never executed. -- Maintain a local `latest_provenance_hash` per `asset_id`. Refuse to apply any manifest whose `prior_provenance_hash` is behind the local value. Surface it. -- Reject an OR-set remove whose `add_id` was never observed locally as an add. -- Refuse to follow a `revoke_all_sessions` confirmation that did not include a master-key proof. -- Decode remote-origin asset bytes only in the [sandboxed decoder](/design/clients/#sandboxed-decoder). - -## Protocol and Capability Negotiation - -Every versioned API surface — client-to-server uploads, sync feed, federation pull, peering — runs the same compatibility gate. The gate is **fail-closed**: a mismatch is a hard reject before any state is written, never a silent degrade. - -### Universal Headers - -| Header | Sent by | Meaning | -| ---------------------------- | ------------------------- | ------------------------------------------------------------------------------------------ | -| `X-Capsule-Protocol` | client / peer | `YYYY-MM-DD` protocol version the request is written against | -| `X-Capsule-Crypto-Suite` | client / peer on writes | `u16` suite id from the [Primitives Inventory](/design/cryptography/#primitives-inventory) | -| `X-Capsule-Sidecar-Schema` | client on metadata-update | `u16` schema version declared at `sidecar_schema` field 0 | -| `X-Capsule-Protocol-Min` | server on every response | the lowest protocol version this server accepts | -| `X-Capsule-Protocol-Max` | server on every response | the highest protocol version this server accepts | -| `X-Capsule-Min-Client-Build` | server on responses | semver deprecation cutoff; advisory unless the path is hard-deprecated | - -### Fail-Closed Rules - -- `X-Capsule-Protocol` outside `[Min, Max]` on a **write**: `426 Upgrade Required`. No session created, no row written. -- `X-Capsule-Crypto-Suite` not in the inventory: `400 Bad Request`. -- `X-Capsule-Sidecar-Schema` above the server's max known: `400 Bad Request`. (The server does not parse sidecars itself, but it refuses to acknowledge writes whose schema number it does not index.) -- **Reads of any past version succeed.** Read invariants are deliberately stable per [Versioning](/design/versioning/), so a current server still serves v_{k-N} blobs from years ago. -- Federation capability is an additional `401` / `403` layer on top of the protocol gate. A valid token never substitutes for a valid protocol header. - -The handshake is **one-shot per request**, not a negotiation. Either both sides agree by inspection, or the request fails. There is no back-and-forth that could leak partial state. - -## Idempotency Invariants - -Every write surface has a single idempotency key. Duplicates are no-ops; conflicts (same key, different content) are corruption errors. - -| Surface | Idempotency key | Duplicate behavior | -| ----------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------- | -| Upload chunk (`PATCH /upload/{id}`) | `(upload_id, offset, chunk_hash)` | Returns current offset; no double-write | -| Session creation (`POST /upload`) | `(owner_id, hash.value, album_id)` — server's existing dedup check | Returns the existing session; no second session | -| Lifecycle manifest write | `(asset_id, prior_provenance_hash, manifest_hash)` | No-op append; chain advances exactly once | -| Metadata-update operation | Operation id (UUIDv7) + `(asset_id, prior_provenance_hash)` | Re-applying the same op is structurally identical | -| Federation capability proof | `(peer_id, jti)` | Refresh with same `jti` returns the same response | -| Federation pull | `(peer_id, sync_cursor)` — the sync cursor itself is the key | Re-pull returns the same page | -| MLS commit | Handled by OpenMLS; commits are ordered by the group's commit chain | OpenMLS rejects duplicates | -| Album upgrade ceremony | `intent_id` (UUIDv7); see [Versioning](/design/versioning/#album-upgrade-ceremony) | Same intent never produces two forks | - -A write surface that does not appear here is, by default, **not** idempotent and must be designed before it ships. - -## Atomicity Invariants - -Multi-write operations that must succeed-as-one or not at all. A partial success on any of these is itself a damage scenario. - -- **Asset bundle finalization.** The manifest, ciphertext blob, metadata blob, and provenance blob commit together in a single Postgres transaction. Server failure between any pair leaves the entire bundle un-finalized; the session moves to `FailedProcessing` and the partial blobs are GC'd. ([Filesystem — Atomic Writes](/design/filesystem/#atomic-writes-and-crash-recovery)) -- **Stack edits.** All affected sidecars stage as `.tmp` files first; renames happen together. Any rename failure discards every `.tmp` in the bundle. ([Filesystem — Atomic Writes](/design/filesystem/#atomic-writes-and-crash-recovery)) -- **AMK epoch bump + write-tier key rotation.** A new AMK and a new write-tier key are minted as a single MLS commit. The two cannot exist out of sync. -- **Album upgrade ceremony.** The cutover is one MLS commit, the `AlbumTombstone`. Until applied, the client is in v_old; after, in v_new. ([Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony)) -- **Lifecycle manifest + provenance record.** Writing a lifecycle manifest and appending its provenance entry are the same act, because the provenance entry **is** the manifest plus the chain link. There is no separate "now record provenance" step that can race. - -## Quarantine Surfaces - -Every "don't apply, surface it" code path. The union exists so the UI surface and operator audit have a single inventory of "things that need a human to look at." - -| Surface | Where it lives on disk (client) | Source of truth doc | -| ------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------------------------------------- | -| `verify_asset` reject (any signature or chain failure) | Quarantine area surfaced via the audit log | [Cryptography — Write Authorization](/design/cryptography/#write-authorization) | -| Federation soft-fail | Rejected-hash table, bounded LRU | [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics) | -| Orphaned original (no sidecar) | `.library/quarantine/` after a failed recovery | [Filesystem — Repair](/design/filesystem/#repair) | -| Malformed CBOR sidecar | `.library/quarantine/` (the unparseable bytes are preserved) | [Filesystem — Repair](/design/filesystem/#repair) | -| Stale-revival (peer or restore sends old manifest) | Audit log + UI surface "peer sent stale state" | [Cryptography — Provenance](/design/cryptography/#provenance-of-library-modifications) | -| Album upgrade stranded write | Local `pending_until_upgrade` queue | [Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony) | -| Backup restore chain conflict | Audit log + UI surface "restore conflicts" | [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) | - -A quarantined item is **never silently dropped and never silently applied**. The user (or operator) can inspect, repair, or discard explicitly. - -## Provenance Immutability Rules - -The append-only hash-chained record per asset is defined in [Cryptography — Provenance of Library Modifications](/design/cryptography/#provenance-of-library-modifications). This section is the policy layer. - -- **No path exists to overwrite or delete an existing provenance entry.** Not via the API, not via the local filesystem (the client treats `.provenance.cbor` as append-only), not via federation. The constraint is structural, not enforced by a permission check. -- **Even a hard-delete preserves provenance.** When an asset is purged, its `media/{YYYY}/{YYYY-MM}/{uuid}.provenance.cbor` remains as a tombstone-with-history. The bytes that go away are the ciphertext blob and the encrypted metadata; the audit trail does not. -- **Export and backup carry the chain.** A backup artifact includes every asset's full provenance chain. On restore, the chain re-enters the local index — see the [open question on restore conflicts](#open-questions). -- **What a key-holding attacker still cannot do.** A complete current-key compromise lets the attacker append forward. It does not let them rewrite the past — every prior record is bound by a signature from a (possibly retired) device whose public half is still in the device directory. - -## Schema Evolution and Field Grammar - -The owner of "what a Capsule schema looks like" is each individual schema's owner doc; the owner of "what evolution is allowed" is this doc. - -### Deny-by-Default for Unknown Request Fields - -[Postel's Law](/design/principles/) — as tightened in principles — applies asymmetrically: - -- **In requests (client → server, or peer → server):** unknown fields at known positions in a known schema are accepted and preserved verbatim. Unknown fields at the **top level** that the receiver does not declare are **rejected**. Schema-bearing requests that announce a `sidecar_schema` or `crypto_suite_id` the receiver does not implement are rejected. The asymmetry is deliberate: liberal acceptance in requests is what lets new clients write extensions, but only *inside* a known schema envelope. -- **In responses (server → client):** unknown fields are preserved verbatim. A new server sending an old client a response with a new field does not break the old client. - -### Closed Enums - -The following enums are closed per `protocol_version` — a value outside the enum is a structural error, never a "future value to ignore": - -- `AssetManifest.action` -- `Sidecar.content_type` -- `Sidecar.gps.source` -- `DerivativeManifest.role` - -Adding a value to a closed enum bumps `protocol_version`. Old albums never see the new value because they are pinned. - -### Timestamp Grammar - -All `timestamp` and `ts` fields are RFC 3339 strings. Server-accepted values are bounded to **±30 days** of server wall-clock at the moment of accept (configurable per deployment). The bound applies to writes; reads serve whatever timestamp was historically accepted. - -A client whose system clock drifts more than 30 days from the server is rejected at handshake. This is the *honest* class's protection from a faulty NTP — the bound surfaces the drift instead of silently distorting audit timestamps. - -### Bounded String and Collection Sizes - -Every field has a maximum length declared in the schema (e.g. `caption_lww.value ≤ 4096 bytes`; `superseded_captions ≤ 16 entries`). The receiver rejects an oversized value. No field is unbounded. - -## Forbidden Client Behaviors - -A correct Capsule client implementation must never: - -- Back-date or post-date a `timestamp` outside the ±30-day window. -- Re-sign or re-issue a manifest under a `crypto_suite_id` lower than the original. -- Sign for an album epoch the client does not currently hold the write-tier key for. -- Issue an OR-set remove for an `add_id` it has not locally observed an add for. -- Strip `_unknown` fields from a sidecar it intends to write back. Round-trips must preserve everything the schema allows. -- Strip `superseded_captions` entries. -- Overwrite an existing `.provenance.cbor` file (the file is append-only). -- Submit a `revoke_all_sessions` without proof of master-key possession. -- Decode bytes received from a non-home peer outside the [sandboxed decoder](/design/clients/#sandboxed-decoder). -- Promote an AI tag to a user tag silently — promotion is an explicit, signed lifecycle operation. -- Treat a `429`, `409`, or `426` as a retry-with-the-same-payload. Each one requires a fix on the client (back off, re-align offset, upgrade) before retry. - -A client implementation that does any of the above is **buggy by definition**. The check belongs in the client implementation's own correctness tests; the network layers above protect against the consequences. - -## Min-Supported-Client Deprecation Policy - -Dropping a `protocol_version` from the server's accepted window is a breaking change. The policy: - -1. **Announcement.** A deprecation cutoff date is published at `/.well-known/capsule/deprecation` ahead of the cutoff by at least the announcement window (default 90 days, deployment-configurable). The announcement names the cutoff date and the minimum `protocol_version` that will remain accepted. -2. **Server response.** Below the cutoff, every response carries `X-Capsule-Min-Client-Build` and a `Warning:` header pointing to the deprecation URL. -3. **Hard cutoff.** On the cutoff date, the dropped version moves outside `[Min, Max]`. Writes from clients pinned to that version receive `426`. Reads still succeed. -4. **Stranded user.** A user whose only client is below the cutoff still has every recovery path from [Cryptography — Failure Modes and Recovery](/design/cryptography/#failure-modes-and-recovery): master key, cross-device, OGK, backup artifact. The deprecation does not strand data; it strands a specific old binary. - -The deprecation surface is **never** retroactive against historical state. Old albums pinned to a dropped version remain readable forever — they just cannot be written to from a current client. - -## Open Questions - -These survive the current design and should be resolved before the docs are considered final. - -1. **Restore-vs-stale-revival.** A restore from a 6-month-old backup hands the system manifests whose `prior_provenance_hash` is older than the local `latest_provenance_hash`. The naive defense quarantines every entry, which is a foot-gun. Two candidate resolutions: (a) restore enters a `restore_from_backup` chain branch the user explicitly merges, or (b) restore resets `latest_provenance_hash` from the backup contents under additional authentication. Resolution lives in [Backup & Recovery](/design/backup-recovery/). -2. **Sync cursor authenticity.** A malicious server could hand a client an older `sync_cursor` to rewind its view. The cursor is currently opaque; making it MAC'd by the server and validated as monotonic by the client is the leading fix. -3. **Cross-server album replication (v2).** v1 pins each album to a single home server; v2 will need a story for cross-server MLS state and federated commit ordering. -4. **Sponsored-account write damage.** A compromised registered account holds its sponsorees' KEKs and can manipulate their histories without their device keys. Enumerate the damage and bound it. -5. **AMK epoch monotonicity bootstrap.** A brand-new client cannot know the previous max `amk_version` without trusting the server. The fix bootstraps monotonicity from the MLS commit chain rather than the server's stored counter. -6. **Cross-language deterministic CBOR.** FFI consumers re-serializing may drift; no byte-identical cross-language test surface is documented. -7. **Federated quota DoS via honest user.** Per-peer quotas protect Capsule from a peer, but a single user receiving from many peers can exhaust the home server's storage. Needs a peer-attribution dimension. -8. **"New client" UI surface.** A client speaking a `protocol_version` ahead of an album's pin is rejected on writes but may *read* state a future client wrote. The unknown-extension UI surface needs definition in [Clients](/design/clients/). - -## Cross-References - -Each owner doc gains an invariant section or two that links back to this doc. The mapping: - -| Owner doc | Threat-model section(s) it ties into | -| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [Principles](/design/principles/) | [§ Damage Containment Layers](#damage-containment-layers) | -| [Versioning](/design/versioning/) | [§ Protocol and Capability Negotiation](#protocol-and-capability-negotiation), [§ Atomicity Invariants](#atomicity-invariants) | -| [Filesystem](/design/filesystem/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants), [§ Atomicity Invariants](#atomicity-invariants), [§ Quarantine Surfaces](#quarantine-surfaces) | -| [Cryptography](/design/cryptography/) | [§ Provenance Immutability Rules](#provenance-immutability-rules), [§ Damage Scenario Map](#damage-scenario--invariant-map) (signature/chain rows) | -| [Metadata](/design/metadata/) | [§ Schema Evolution and Field Grammar](#schema-evolution-and-field-grammar), [§ Damage Scenario Map](#damage-scenario--invariant-map) (CRDT rows) | -| [Import & Synchronization](/design/import-synchronization/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants), [§ Idempotency Invariants](#idempotency-invariants) | -| [Federation](/design/federation/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants), [§ Quarantine Surfaces](#quarantine-surfaces) | -| [Peering](/design/peering/) | [§ Client-Side Validation Invariants](#client-side-validation-invariants), [§ Damage Scenario Map](#damage-scenario--invariant-map) (peer rows) | -| [Authentication](/design/authentication/) | [§ Forbidden Client Behaviors](#forbidden-client-behaviors), [§ Damage Scenario Map](#damage-scenario--invariant-map) (revoke-all row) | -| [Authorization](/design/authorization/) | [§ Server-Side Validation Invariants](#server-side-validation-invariants) | -| [Backup & Recovery](/design/backup-recovery/) | [§ Quarantine Surfaces](#quarantine-surfaces), [§ Open Questions](#open-questions) | -| [Thumbnails](/design/thumbnails/) | [§ Damage Scenario Map](#damage-scenario--invariant-map) (derivative row) | -| [ML Models](/design/ml-models/) | [§ Damage Scenario Map](#damage-scenario--invariant-map) (embedding model row) | -| [AI](/design/ai/) | [§ Forbidden Client Behaviors](#forbidden-client-behaviors) (AI tag namespace) | -| [Organization](/design/organization/) | [§ Atomicity Invariants](#atomicity-invariants), [§ Forbidden Client Behaviors](#forbidden-client-behaviors) | -| [Clients](/design/clients/) | [§ Client-Side Validation Invariants](#client-side-validation-invariants), [§ Min-Supported-Client Deprecation Policy](#min-supported-client-deprecation-policy) | diff --git a/capsule-docs/src/content/docs/design/threat-model/index.md b/capsule-docs/src/content/docs/design/threat-model/index.md new file mode 100644 index 0000000..4cc4fee --- /dev/null +++ b/capsule-docs/src/content/docs/design/threat-model/index.md @@ -0,0 +1,71 @@ +--- +title: Threat Model +description: How Capsule contains damage from faulty, malicious, or version-mismatched clients +--- + +E2EE shifts most of the trust to the client. The server holds no keys; clients write the canonical state. That makes the question "what damage can a client cause?" load-bearing for the design — a single buggy implementation, a hostile keyholder inside an album, a stranded old build, or a too-new prototype all have to fail safely. + +A faulty, malicious, or version-mismatched client must not be able to cause **irreparable** damage (loss of original bytes, loss of audit trail, undetected silent overwrite of user intent) and should not be able to cause more than **transient** damage (a quarantined asset surfaces to the user; a rejected write returns a clear error; a divergence is detected and reconciled). The recovery paths in [Cryptography — Failure Modes](/design/cryptography/failure-modes/) cover key loss; this doc covers the *write-path* harm a wrong-but-signed client can attempt. + +The threat model is not a primitives doc. Every primitive Capsule uses is declared in its [owner doc](/design/principles/#single-source-of-truth); this doc references those declarations rather than re-stating them. Where a specific invariant lives, the relevant owner doc enforces it; where a *defense* spans multiple docs, the canonical statement lives in one of the sub-docs below. + +The cross-cutting invariants here are enforced by code that lives across many crates: `capsule-core::crypto::verify_asset` (client-side validation chokepoint), `capsule-api` (server-side envelope checks at every write path), and the [validation](/design/threat-model/validation/) sub-doc's invariants directly map to acceptance tests in the corresponding API crates. + +## Sub-docs + +| Sub-doc | Concern | +| -------------------------------------------------- | ------------------------------------------------------------------------------------------------ | +| [Scenarios](/design/threat-model/scenarios/) | Damage scenario → invariant map, the quarantine surface inventory, provenance immutability rules | +| [Validation](/design/threat-model/validation/) | Server- + client-side refuse-by-default checklists; protocol handshake; idempotency; atomicity | +| [Schema Rules](/design/threat-model/schema-rules/) | Schema evolution rules, forbidden client behaviors, deprecation policy, open questions | + +## Client Class Taxonomy + +Every client request can be classified by one of these models. The defenses listed below apply to **all** of them — none of them are trusted to enforce their own correctness: + +| Class | Description | What authenticates them | What stops them | +| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| **Honest** | Conforming implementation, correct keys, correct version. | Session token + access token + device DSK + epoch write-tier signature. | Nothing to stop. This is the baseline. | +| **Faulty** | Conforming intent, buggy implementation. Writes structurally invalid or semantically wrong manifests under real keys. | Same as honest — the keys are correct. | Server-side [structural validation](/design/threat-model/validation/#server-side-validation-invariants) + client-side [`verify_asset`](/design/cryptography/keys/#write-authorization) chokepoint + quarantine surfaces. | +| **Malicious** | Adversary in possession of a current device's DSK and the album's epoch write-tier key. Writes deliberately malformed or destructive operations. | Same as honest — the keys are real, because the adversary owns them. | Provenance chain immutability + soft-delete window + per-album/per-event compartmentalization + audit trail for after-the-fact attribution. | +| **Old** | A signed-in client that predates a feature, schema, or suite the server now considers minimum. Cannot produce structurally valid writes for albums pinned above its version. | Authenticated, but `X-Capsule-Protocol` is below the server's accepted range. | [Protocol handshake](/design/threat-model/validation/#protocol-and-capability-negotiation) rejects writes with `426 Upgrade Required` before any state is written. | +| **New** | A prototype or staging build that writes a `protocol_version`/`crypto_suite_id`/`sidecar_schema` ahead of what the receiver knows. | Authenticated, but the version is higher than the receiver's max known. | Receiver's refuse-by-default rule on unknown enum values, unknown schemas, and forward-jumping protocol versions; closed schema evolution boundary (see [Schema Rules](/design/threat-model/schema-rules/)). | + +The deliberate choice in the matrix above: a *malicious* client with real keys is the hardest to stop, because confidentiality and authentication don't help when the adversary already holds the keys. Capsule's response is to ensure such an adversary can do nothing **silently** — every write produces a signed provenance record, soft-delete is the default, and history is append-only. The audit trail is the recovery surface. + +## Damage Containment Layers + +Restating the boundary hierarchy from [Core Principles](/design/principles/) as concentric containment shells, with the owner doc that enforces each: + +| Shell | Boundary | Owner doc | +| ------------------------- | ---------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | +| **Per-version** | Album protocol pinning isolates a buggy v_k from v_{k-1} albums. | [Versioning](/design/versioning/#album-protocol-version-pinning) | +| **Per-album** | MLS group + per-epoch AMK + per-epoch write-tier key. | [Cryptography — MLS](/design/cryptography/mls/) + [Cryptography — Keys](/design/cryptography/keys/#album-master-keys-amks) | +| **Per-event** (manifest) | Each lifecycle action is its own signed, chained record. | [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications) | +| **Per-user** | Owner Group Key, sponsored-account isolation. | [Cryptography — Keys](/design/cryptography/keys/#owner-group-keys-ogks) | +| **Per-peer** (federation) | Capability tokens, error budgets, quarantine for new peers. | [Federation](/design/federation/) | +| **Per-device** (peering) | Device directory enforced via the TLS handshake. | [Peering — Establishing the Channel](/design/peering/#establishing-the-channel) | + +A bug or compromise on one side of any shell cannot cross it. + +## Owner Doc Cross-Reference + +Each owner doc gains a short section linking back to the relevant threat-model invariant. The mapping (for navigation): + +| Owner doc | Threat-model section(s) it ties into | +| --------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [Principles](/design/principles/) | [§ Damage Containment Layers](#damage-containment-layers) | +| [Versioning](/design/versioning/) | [Protocol Negotiation](/design/threat-model/validation/#protocol-and-capability-negotiation), [Atomicity](/design/threat-model/validation/#atomicity-invariants) | +| [Filesystem](/design/filesystem/) | [Server Validation](/design/threat-model/validation/#server-side-validation-invariants), [Atomicity](/design/threat-model/validation/#atomicity-invariants), [Quarantine Surfaces](/design/threat-model/scenarios/#quarantine-surfaces) | +| [Cryptography](/design/cryptography/) | [Provenance Immutability](/design/threat-model/scenarios/#provenance-immutability-rules), [Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) (signature/chain rows) | +| [Metadata](/design/metadata/) | [Schema Evolution](/design/threat-model/schema-rules/#schema-evolution-and-field-grammar), [Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) (CRDT rows) | +| [Import & Synchronization](/design/import/) | [Server Validation](/design/threat-model/validation/#server-side-validation-invariants), [Idempotency](/design/threat-model/validation/#idempotency-invariants) | +| [Federation](/design/federation/) | [Server Validation](/design/threat-model/validation/#server-side-validation-invariants), [Quarantine Surfaces](/design/threat-model/scenarios/#quarantine-surfaces) | +| [Peering](/design/peering/) | [Client Validation](/design/threat-model/validation/#client-side-validation-invariants), [Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) (peer rows) | +| [Authentication](/design/authentication/) | [Forbidden Behaviors](/design/threat-model/schema-rules/#forbidden-client-behaviors), [Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) (revoke-all row) | +| [Authorization](/design/authorization/) | [Server Validation](/design/threat-model/validation/#server-side-validation-invariants) | +| [Backup & Recovery](/design/backup-recovery/) | [Quarantine Surfaces](/design/threat-model/scenarios/#quarantine-surfaces) | +| [Thumbnails](/design/thumbnails/) | [Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) (derivative row) | +| [AI/ML Integrations](/design/ai/) | [Forbidden Behaviors](/design/threat-model/schema-rules/#forbidden-client-behaviors) (AI tag namespace); [Scenario Map](/design/threat-model/scenarios/#damage-scenario--invariant-map) (embedding model row) | +| [Organization](/design/organization/) | [Atomicity](/design/threat-model/validation/#atomicity-invariants), [Forbidden Behaviors](/design/threat-model/schema-rules/#forbidden-client-behaviors) | +| [Clients](/design/clients/) | [Client Validation](/design/threat-model/validation/#client-side-validation-invariants), [Deprecation Policy](/design/threat-model/schema-rules/#min-supported-client-deprecation-policy) | diff --git a/capsule-docs/src/content/docs/design/threat-model/scenarios.md b/capsule-docs/src/content/docs/design/threat-model/scenarios.md new file mode 100644 index 0000000..aa2eb43 --- /dev/null +++ b/capsule-docs/src/content/docs/design/threat-model/scenarios.md @@ -0,0 +1,73 @@ +--- +title: Damage Scenarios and Quarantine +description: The damage-scenario → invariant map, quarantine surface inventory, and provenance immutability rules +--- + +The lookup table for "what damage X is prevented by which invariant Y in which doc Z." Each row names a concrete vector found during the audit and the single owner-doc anchor that defeats it. The table itself is the operational core of the threat model — adding a row obliges declaring the defense in exactly one owner doc. + +## Damage Scenario → Invariant Map + +| # | Damage scenario | Defense | Owner doc | +| --- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 1 | Old client writes a sidecar after stripping unknown fields | Sidecar signature covers `_unknown`; old client refuses to write when `sidecar_schema` > its max known | [Metadata — Schema Versioning Rules](/design/metadata/#schema-versioning-rules) | +| 2 | Faulty client uploads bytes that don't match the declared content type | Server's `content_type` allow-list per protocol version (no-key check) + receiving client decoder sandbox | [Validation §](/design/threat-model/validation/#server-side-validation-invariants), [Clients — Sandboxed Decoder](/design/clients/#sandboxed-decoder) | +| 3 | Buggy client uploads chunk with wrong offset and re-tries | Idempotency tuple `(upload_id, offset, chunk_hash)`; duplicate at offset with different hash → reject | [Import — Upload Protocol](/design/import/upload-protocol/#chunk-rules) | +| 4 | Hostile peer sends an old-but-validly-signed manifest to revive a deleted asset | `prior_provenance_hash` chain advance check on both client and server | [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications), [Validation §](/design/threat-model/validation/#server-side-validation-invariants) | +| 5 | Malicious client re-signs an existing manifest under a weaker `crypto_suite_id` | Signatures cover `crypto_suite_id` and `protocol_version` | [Cryptography — Write Authorization](/design/cryptography/keys/#write-authorization) | +| 6 | Two devices concurrently caption the same photo | Caption LWW + `superseded_captions` array surfaces the loser | [Metadata — Surfacing Concurrent Edits](/design/metadata/#surfacing-concurrent-edits) | +| 7 | Client issues an OR-set remove for an element it never observed an add for | Add-id binding: removes target a specific `add_id`; unknown `add_id` is rejected | [Metadata — Add-id Binding](/design/metadata/#add-id-binding) | +| 8 | Buggy client overwrites a good thumbnail with a corrupt one | Every derivative carries a signed `DerivativeManifest` on its own chain; overwrite is a `derivative-replace` lifecycle action | [Cryptography — Derivative Provenance](/design/cryptography/provenance/#derivative-provenance) | +| 9 | A client declares `timestamp = 2099-01-01` to distort the audit | `timestamp` is self-asserted and audit-only — never load-bearing for ordering or authorization (those ride the chain + epoch); a gross-drift sanity bound surfaces it, and the server's own `received_at` is the authoritative clock | [Cryptography — Write Authorization](/design/cryptography/keys/#write-authorization) | +| 10 | Server-side TOCTOU on blob dedup creates a duplicate | Dedup-check and pending-row insert are atomic on a single Postgres transaction | [Filesystem — Content-Addressing and Deduplication](/design/filesystem/server/#content-addressing-and-deduplication) | +| 11 | A faulty client uploads bytes that exceed its declared size | Server bounds cumulative received at every chunk, not only at finalization | [Import — Chunk Rules](/design/import/upload-protocol/#chunk-rules) | +| 12 | A new client writes a manifest with a `crypto_suite_id` the server does not recognize | Refuse-by-default at handshake: 400 before any session is created | [Validation — Protocol Negotiation](/design/threat-model/validation/#protocol-and-capability-negotiation) | +| 13 | A federated peer floods the rejected-hash table to exhaust memory | Per-peer quota; bounded LRU memory | [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics) | +| 14 | A model swap silently invalidates the AI tag namespace | Every `tags_ai` entry carries `model_id`+`model_version`; the vector-index insert API and query layer (`capsule-core::db`) reject unknown-model inserts and exclude stale entries — cross-model comparison is forbidden | [Metadata — Tag Provenance and Namespacing](/design/metadata/#tag-provenance-and-namespacing) | +| 15 | A leaked session token revokes all of a user's other sessions to lock them out | `revoke_all_sessions` requires master-key proof, not session auth | [Authentication — Explicit revocation](/design/authentication/#explicit-revocation) | +| 16 | An attacker holding every current key tries to rewrite the asset's history | Provenance chain references each predecessor's hash; rewriting any past record requires forging an earlier (possibly retired) device's hybrid signature | [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications) | +| 17 | A client picks a random `amk_version` to skip MLS | Server's no-key check: `amk_version` must be monotonic per album and known to the server | [Validation §](/design/threat-model/validation/#server-side-validation-invariants) | +| 18 | A v_old client tries to write into an album that has been upgraded to v_new | Album pinning + upgrade ceremony quiescence: server returns `409` for writes carrying a stale `intent_id` | [Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony) | +| 19 | A malformed CBOR sidecar lands on disk after a crash mid-write | Malformed sidecar → quarantined to `.library/quarantine/`; never silent-skipped | [Filesystem — Repair](/design/filesystem/maintenance/#repair) | +| 20 | A federation pull returns a manifest claiming a device that's not in the user's directory | Server's no-key check: `created_by_device` must be in the user's published device directory | [Validation §](/design/threat-model/validation/#server-side-validation-invariants) | +| 21 | A buggy client uploads a metadata blob with a hand-crafted wire format | Metadata blob wire format is byte-exact; mismatched envelope rejected at decode | [Cryptography — Metadata Blob Wire Format](/design/cryptography/encryption/#metadata-blob-wire-format) | +| 22 | A retry of a delete manifest decrements blob refcount twice | Manifest idempotency keyed by `prior_provenance_hash`: a duplicate manifest is a no-op | [Validation — Idempotency](/design/threat-model/validation/#idempotency-invariants) | +| 23 | A backup restore from 6 months ago silently overwrites current state | Restore is a chain-reconciliation, never a blind overwrite: newer local state always wins; an older or divergent restored manifest is quarantined for explicit merge, never auto-applied | [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) | +| 24 | A new device claims its key is older than the account itself | Device entry in the device directory is signed by the IK and carries `added_at`; a server rejects an upload from a device whose `added_at` postdates the manifest | [Cryptography — Device Keys](/design/cryptography/keys/#device-keys), [Validation §](/design/threat-model/validation/#server-side-validation-invariants) | +| 25 | A peer floods notifications to make Capsule pull garbage | Notifications are advisory; pull is on Capsule's schedule and goes through full validation | [Federation — Pull-Only Federation](/design/federation/#pull-only-federation) | +| 26 | A federated server's TLS endpoint silently changes its public key | Servers TOFU-pin each other's keys; a rotation is accepted only after a multi-vantage perspective check corroborates it, else surfaced | [Federation — Server Identity and Key Rotation](/design/federation/#server-identity-and-key-rotation) | +| 27 | A buggy client writes a stack edit that updates one member's sidecar and not the others | Stack edits are bundle-atomic: all `.tmp` files staged first, all renamed together; any rename failure discards the bundle | [Filesystem — Atomic Writes](/design/filesystem/maintenance/#atomic-writes-and-crash-recovery) | +| 28 | A federated peer serves a stale capability token after revocation | Capability TTL ≤ 24h + published revocation list polled ≤ 15 min | [Federation — Federation Capabilities](/design/federation/#federation-capabilities) | +| 29 | A faulty client uploads embeddings derived from a model the receiver does not run | Vector index refuses inserts whose `model_id` is unknown | [AI — Embedding Provenance](/design/ai/#embedding-provenance) | +| 30 | A client tries to forge a server-derived field (computed ciphertext hash, `received_at`, `sync_seq`, blob reference counts) | The server assigns/recomputes these itself and ignores client-supplied values; the ciphertext hash is recomputed at finalization and a mismatch is rejected | [Import — Finalization and Integrity](/design/import/upload-protocol/#finalization-and-integrity) | +| 31 | A server serves a *stale* device directory to undo a revocation or hide a new device | Master-signed monotonic `directory_version`; every reader refuses a directory below its per-user high-water mark | [Cryptography — Device Directory](/design/cryptography/keys/#device-directory) | +| 32 | A server fabricates or rewinds an album epoch, or a manifest cites an epoch whose key is mid-distribution | `amk_version` ceiling anchored to the admin-signed MLS commit chain, not the server's counter; a beyond-attested epoch is rejected, an in-flight key yields `verify_asset` *pending*/retry (not a forgery) | [Cryptography — Write Authorization](/design/cryptography/keys/#write-authorization) | +| 33 | A share link is enumerated or brute-forced | ≥128-bit opaque-id + per-IP/per-link rate limits + indistinguishable `404` + home-server-only serving | [Share Links — Security Contract](/design/share-links/#security-contract) | +| 34 | A peer mass-reports or false-flags a user to force a takedown | Federated reports are signed by the reporting server and rate-limited per `(reporting_server, reported_user)`; per-user blocks never propagate as server-wide blocks | [Moderation — Federated Reporting](/design/moderation/#federated-reporting) | +| 35 | A user pulls from many federated peers to exhaust the home server's storage | Federated-received blobs charged to the receiving user's quota (deduped) under a per-`(receiving_user, source_peer)` cap | [Quota — Accounting Model](/design/quota/#accounting-model) | + +When a scenario surfaces during implementation that does not match any of the above, the rule is: add a row here, then declare the defense in exactly one owner doc. Never restate a defense in multiple docs. + +## Quarantine Surfaces + +Every "don't apply, surface it" code path. The union exists so the UI surface and operator audit have a single inventory of "things that need a human to look at." + +| Surface | Where it lives on disk (client) | Source of truth doc | +| ------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------- | +| `verify_asset` reject (any signature or chain failure) | Quarantine area surfaced via the audit log | [Cryptography — Write Authorization](/design/cryptography/keys/#write-authorization) | +| Federation soft-fail | Rejected-hash table, bounded LRU | [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics) | +| Orphaned original (no sidecar) | `.library/quarantine/` after a failed recovery | [Filesystem — Repair](/design/filesystem/maintenance/#repair) | +| Malformed CBOR sidecar | `.library/quarantine/` (the unparseable bytes are preserved) | [Filesystem — Repair](/design/filesystem/maintenance/#repair) | +| Stale-revival (peer or restore sends old manifest) | Audit log + UI surface "peer sent stale state" | [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications) | +| Album upgrade stranded write | Local `pending_until_upgrade` queue | [Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony) | +| Backup restore chain conflict | Audit log + UI surface "restore conflicts" | [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) | + +A quarantined item is **never silently dropped and never silently applied**. The user (or operator) can inspect, repair, or discard explicitly. + +## Provenance Immutability Rules + +The append-only hash-chained record per asset is defined in [Cryptography — Provenance](/design/cryptography/provenance/#provenance-of-library-modifications). This section is the policy layer. + +- **No path exists to overwrite or delete an existing provenance entry.** Not via the API, not via the local filesystem (the client treats `.provenance.cbor` as append-only), not via federation. The constraint is structural, not enforced by a permission check. +- **Even a hard-delete preserves provenance.** When an asset is purged, its `media/{YYYY}/{YYYY-MM}/{uuid}.provenance.cbor` remains as a tombstone-with-history. The bytes that go away are the ciphertext blob and the encrypted metadata; the audit trail does not. +- **Export and backup carry the chain.** A backup artifact includes every asset's full provenance chain. On restore, the chain re-enters the local index — reconciled per [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification), which never silently overwrites newer local state. +- **What a key-holding attacker still cannot do.** A complete current-key compromise lets the attacker append forward. It does not let them rewrite the past — every prior record is bound by a signature from a (possibly retired) device whose public half is **retained in the [device directory](/design/cryptography/keys/#device-directory) and never pruned** (marked `revoked_at`, not deleted), so a retired device's signature stays verifiable forever. diff --git a/capsule-docs/src/content/docs/design/threat-model/schema-rules.md b/capsule-docs/src/content/docs/design/threat-model/schema-rules.md new file mode 100644 index 0000000..53d31c5 --- /dev/null +++ b/capsule-docs/src/content/docs/design/threat-model/schema-rules.md @@ -0,0 +1,74 @@ +--- +title: Schema Rules and Open Questions +description: Schema evolution, forbidden client behaviors, deprecation policy, and unresolved design questions +--- + +Capsule schemas evolve over time, but the rules of evolution are fixed — what fields a writer may add, what a receiver may safely ignore, what fields are closed enums, and what timing/grammar rules apply. Each schema's owner doc defines its fields; this doc defines what evolution is allowed across them. Schema-rule enforcement lives in `capsule-core::crypto` (sidecar/manifest decode) and the validation layers of every API crate. + +## Schema Evolution and Field Grammar + +### Deny-by-Default for Unknown Request Fields + +[Postel's Law](/design/principles/#postels-law-asymmetric) — as tightened in principles — applies asymmetrically: + +- **In requests (client → server, or peer → server):** unknown fields at known positions in a known schema are accepted and preserved verbatim (an unknown CBOR key *inside* a known manifest). Unknown fields at the **top level** that the receiver does not declare are **rejected** (a stray key at the request root). Schema-bearing requests that announce a `sidecar_schema` or `crypto_suite_id` the receiver does not implement are rejected outright. The asymmetry is deliberate: liberal acceptance in requests is what lets new clients write extensions, but only *inside* a known schema envelope. +- **In responses (server → client):** unknown fields are preserved verbatim. A new server sending an old client a response with a new field does not break the old client. + +### Closed Enums + +**Every enum in a signed or validated structure is closed per `protocol_version`** — a value outside the set known at that version is a structural error, never a "future value to ignore." This is a blanket rule, not a curated list, so it cannot rot: adding a value to *any* such enum bumps `protocol_version` (see [Versioning — Album Protocol Version Pinning](/design/versioning/#album-protocol-version-pinning)), and a pinned old album never sees the new value. It is enforced on **both sides** — the server's structural envelope check (invariant 16) and the client's `verify_asset`/decode path (see [Validation](/design/threat-model/validation/)). + +The authoritative value set for each enum lives in its owner doc — `AssetManifest.action` in [Authorization](/design/authorization/#the-closed-action-set), `content_type` and `gps.source` in [Metadata](/design/metadata/#sidecar-schema-v1), `DerivativeManifest.role` in [Provenance](/design/cryptography/provenance/#derivative-provenance) — never duplicated here. + +### Timestamp Grammar + +All `timestamp` and `ts` fields are RFC 3339 strings, **self-asserted and audit-only** — never load-bearing for authorization or ordering, which ride the provenance chain and the MLS epoch ([Keys — Write Authorization](/design/cryptography/keys/#write-authorization)). The server records its own trusted `received_at` for any time-based policy. + +A server-side **sanity bound** (default ±30 days of server wall-clock, deployment-configurable) is applied to writes only: a gross-drift guard that surfaces an honest client with a faulty NTP rather than silently distorting its audit trail. It is explicitly *not* a security control. Reads serve whatever timestamp was historically recorded. + +### Bounded String and Collection Sizes + +Every field has a maximum length declared in the schema (e.g. `caption_lww.value ≤ 4096 bytes`; `superseded_captions ≤ 16 entries`). The receiver rejects an oversized value. No field is unbounded. + +## Forbidden Client Behaviors + +This is **not a standalone contract** — each entry is the negative of a rule owned by another doc, consolidated here only as an index. The defense never depends on clients honoring the list: the receiving server and the client-side `verify_asset` chokepoint reject the *consequence* structurally regardless (that is the entire point of [refuse-by-default validation](/design/threat-model/validation/#server-side-validation-invariants)). A client that does any of these is **buggy by definition**, and the prohibition is enforced where its rule lives: + +| A correct client never… | Enforced / owned by | +| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | +| Re-signs a manifest under a lower `crypto_suite_id` (downgrade) | [Primitives — Versioning](/design/cryptography/primitives/#versioning-identifiers) | +| Signs for an album epoch it does not hold the write-tier key for | [Keys — Write Authorization](/design/cryptography/keys/#write-authorization) | +| Issues an OR-set remove for an `add_id` it never observed | [Metadata — Add-id Binding](/design/metadata/#add-id-binding) | +| Strips `_unknown` or `superseded_captions` on write-back | [Metadata](/design/metadata/#collaborative-metadata) — the signature covers them | +| Overwrites or truncates a provenance chain | [Provenance — Append-Only](/design/cryptography/provenance/#chained-append-only-structure) | +| Submits `revoke_all_sessions` without master-key proof | [Authentication](/design/authentication/) | +| Decodes non-home-peer bytes outside the sandbox | [Clients — Sandboxed Decoder](/design/clients/#sandboxed-decoder) | +| Silently promotes an AI tag to a user tag (must be a signed op) | [Metadata — Tag Provenance](/design/metadata/#tag-provenance-and-namespacing) | +| Retries a `429` / `409` / `426` with the same payload | back off / re-align / upgrade first — [Validation](/design/threat-model/validation/#protocol-and-capability-negotiation) | + +## Min-Supported-Client Deprecation Policy + +Dropping a `protocol_version` from the server's accepted window is a breaking change. The policy: + +1. **Announcement.** A deprecation cutoff date is published at `/.well-known/capsule/deprecation` ahead of the cutoff by at least the announcement window (default 90 days, deployment-configurable). The announcement names the cutoff date and the minimum `protocol_version` that will remain accepted. +2. **Server response.** Below the cutoff, every response carries `X-Capsule-Min-Client-Build` and a `Warning:` header pointing to the deprecation URL. +3. **Hard cutoff.** On the cutoff date, the dropped version moves outside `[Min, Max]`. Writes from clients pinned to that version receive `426`. Reads still succeed. +4. **Stranded user.** A user whose only client is below the cutoff still has every recovery path from [Cryptography — Failure Modes](/design/cryptography/failure-modes/): master key, cross-device, OGK, backup artifact. The deprecation does not strand data; it strands a specific old binary. + +The deprecation surface is **never** retroactive against historical state. Old albums pinned to a dropped version remain readable forever — they just cannot be written to from a current client. + +## Open Questions + +One design question remains open — and it is **deliberately deferred to v2**, not a v1 blocker: + +1. **Cross-server album replication (v2).** v1 pins each album to a single home server; v2 will need a story for cross-server MLS state and federated commit ordering. + +The following questions have since been **resolved** and now live in their owner docs, not here: + +- *Restore-vs-stale-revival* → [Backup & Recovery — Backup Verification](/design/backup-recovery/#backup-verification) (restore is a chain-reconciliation; newer local state always wins). +- *Sync cursor authenticity* → [Download & Sync](/design/import/download-sync/#discovering-what-changed) (server-MAC'd cursor + client monotonic check) and [Validation invariant 22](/design/threat-model/validation/#on-the-sync-feed-directory-publish-and-federated-reports). +- *Sponsored-account write damage* → [Cryptography — Keys: Damage bound under sponsor compromise](/design/cryptography/keys/#damage-bound-under-sponsor-compromise). +- *AMK epoch monotonicity bootstrap* → [Cryptography — Write Authorization](/design/cryptography/keys/#write-authorization) (ceiling anchored to the MLS commit chain). +- *Cross-language deterministic CBOR* → [Metadata — Canonical CBOR Encoding](/design/metadata/#canonical-cbor-encoding) (normative ruleset + blocking conformance gate). +- *Federated quota DoS via honest user* → [Quota — Accounting Model](/design/quota/#accounting-model) (receiving-user attribution + per-peer cache cap). +- *"New client" read surface* → [Clients — Reading State From a Newer Client](/design/clients/#reading-state-from-a-newer-client). diff --git a/capsule-docs/src/content/docs/design/threat-model/validation.md b/capsule-docs/src/content/docs/design/threat-model/validation.md new file mode 100644 index 0000000..31423f4 --- /dev/null +++ b/capsule-docs/src/content/docs/design/threat-model/validation.md @@ -0,0 +1,126 @@ +--- +title: Validation Invariants +description: Server and client refuse-by-default checklists; protocol handshake; idempotency; atomicity +--- + +The cross-cutting refuse-by-default rules every Capsule receiver runs before persisting any incoming write. These are the operational core of the threat model — a server or client that skips one of them silently widens the blast radius for the entire client class taxonomy. + +The server-side invariants are enforced in `capsule-api` (every write path passes through them); the client-side invariants are enforced via the single `verify_asset` chokepoint in `capsule-core::crypto` plus the per-receiver decoder paths. The protocol handshake is a one-shot pre-flight check on every request; idempotency and atomicity invariants are properties of specific write surfaces, each cross-linked to the doc that owns the surface. + +## Server-Side Validation Invariants + +The server holds no keys — it cannot verify any signature against a key it owns. But it **does** validate the *structure* of every write before persisting state. These checks are refuse-by-default and intentionally exhaustive; a buggy server that skips one of them silently widens the blast radius for the entire [client class taxonomy](/design/threat-model/#client-class-taxonomy). + +This list is the canonical statement; [Filesystem](/design/filesystem/), [Import](/design/import/), [Federation](/design/federation/), [Authorization](/design/authorization/), and [Authentication](/design/authentication/) reference it without restating. + +Invariants carry **stable numbers** (referenced across docs as "invariant 17", "items 1–18", etc.); they are grouped by write phase but the numbering is continuous. + +### On `POST /upload` (session creation) + +- **1.** `X-Capsule-Protocol` is within the server's `[Min, Max]` range. Otherwise `426 Upgrade Required`, no session created. +- **2.** `crypto_suite_id` is a row of the [Primitives Inventory](/design/cryptography/primitives/#primitives-inventory). Otherwise `400`. +- **3.** `hash` length matches the digest size for `crypto_suite_id` (32 bytes for SHA-256). Otherwise `400`. +- **4.** `size` ∈ (0, `max_file_size`]. Otherwise `400` / `413`. +- **5.** `content_type` ∈ closed enum for this protocol version. Otherwise `400`. +- **6.** `album_id` exists; authenticated user has server-visible write capability on it; album's pinned `protocol_version` equals the request's. Otherwise `403`. +- **7.** `created_by_device` is in the user's published device directory, and the directory entry's `added_at` precedes the request's `timestamp`. Otherwise `403`. +- **8.** `timestamp` passes a gross-drift **sanity** bound (default ±30 days of server clock, configurable). This is a non-security guard that surfaces a wildly-wrong honest client, **not** an authorization control — authorization and ordering ride the epoch and chain, and the server records its own trusted `received_at` as the authoritative time for time-based policy. The client `timestamp` is stored verbatim for audit. See [Keys — Write Authorization](/design/cryptography/keys/#write-authorization). Otherwise `400`. + +### On each `PATCH /upload/{id}` chunk + +- **9.** Offset is exactly the current received-byte count. Otherwise `409`, with `X-Capsule-Offset` returned. +- **10.** Non-final chunk size is a multiple of 4 KiB. Otherwise `400`. +- **11.** Cumulative received ≤ declared `size`. Otherwise `400` / `413`, session moves to `FailedProcessing`. +- **12.** The `(upload_id, offset, chunk_hash)` idempotency tuple is new OR matches an exact prior PATCH. Otherwise (same offset, different hash) `409` + corruption error. + +### At finalization + +- **13.** Total received == declared `size`. Otherwise `FailedProcessing`. +- **14.** Recomputed ciphertext hash == declared `hash`. Otherwise `FailedProcessing` + corruption error. +- **15.** Manifest envelope re-validated (rerun 1–8) inside the finalization transaction. + +### On non-upload writes (lifecycle action manifest, metadata-update, derivative-add/replace, trash-restore) + +- **16.** `action` is in the closed enum. Otherwise `400`. +- **17.** `prior_provenance_hash` equals the last accepted manifest's content hash for this `asset_id`. Otherwise `409` (stale-revival). +- **18.** `amk_version` is monotonic per album (never regresses) **and within the range the album's admin-signed MLS commit chain attests**. The server's stored counter is a structural backstop; the authoritative ceiling is MLS, so a server cannot fabricate a future epoch a client will honor — see [Write Authorization](/design/cryptography/keys/#write-authorization). Otherwise `400`. + +### On federation pull (server-to-server) + +- **19.** Capability token verifies under home server's signing key; `exp` in future; `jti` not in revocation list (cached ≤ 15 min). Otherwise `401` / `403`. +- **20.** All checks (1)–(18) re-applied — federation does not unlock looser rules. +- **21.** Per-peer rate budgets unbroken (events/hour, bytes/hour, CPU/hour). Otherwise `429`. + +### On the `/sync` feed, directory publish, and federated reports + +- **22.** The `sync_cursor` carries a server MAC under a server-only key; a forged or mutated cursor is rejected (`400`). This is the authenticity layer; the client independently enforces per-album `sync_seq` monotonicity (client-side invariants below). Owner: [Import — Download & Sync](/design/import/download-sync/#discovering-what-changed). +- **23.** A published `DeviceDirectory` has `directory_version` **strictly greater** than the version currently stored for that user, and the master signature covers it. A non-advancing or regressing publish is rejected (`409`). Owner: [Cryptography — Device Directory](/design/cryptography/keys/#device-directory). +- **24.** A federated **report** (an out-of-band moderation message, not a state write) carries a valid signature from the reporting server and is within that peer's report rate budget; otherwise it is dropped before reaching the admin queue. Owner: [Moderation — Federated Reporting](/design/moderation/#federated-reporting). + +Every rejection is logged with a structured reason code; the rejected hash is remembered (bounded, see [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics)) so divergence between Capsule's view and a permissive peer's view is detectable. + +## Client-Side Validation Invariants + +Mirror checklist that every client implements before applying any received data — local or remote. A client that skips one of these is in the *faulty* class. + +- Run [`verify_asset`](/design/cryptography/keys/#write-authorization) on every received `AssetManifest`. Quarantine on failure; never silent-drop, never silent-accept. +- Reject an incoming `sidecar_schema` greater than the client's `max_known_sidecar_schema`. Refuse to write that sidecar; refuse to read in normal mode (read-only opt-in is allowed). +- Reject an incoming `protocol_version` outside `[Min, Max]` known to the client. The same handshake the server runs. +- Reject an unknown enum value for any field whose enum is closed at the current schema (notably `action`, `content_type`, `gps.source`, `DerivativeManifest.role`). Unknown CBOR map keys are preserved per [Postel's Law](/design/principles/#postels-law-asymmetric) and never executed. +- Maintain a local `latest_provenance_hash` per `asset_id`. Refuse to apply any manifest whose `prior_provenance_hash` is behind the local value. Surface it. +- Maintain a per-user `directory_version` high-water mark. Refuse a `DeviceDirectory` whose `directory_version` is below it (a server attempting to roll back a revocation or hide a device); pin and surface the regression. +- Reject an OR-set remove whose `add_id` was never observed locally as an add. +- Refuse to follow a `revoke_all_sessions` confirmation that did not include a master-key proof. +- Decode remote-origin asset bytes only in the [sandboxed decoder](/design/clients/#sandboxed-decoder). + +## Protocol and Capability Negotiation + +Every versioned API surface — client-to-server uploads, sync feed, federation pull, peering — runs the same compatibility gate. The gate is **fail-closed**: a mismatch is a hard reject before any state is written, never a silent degrade. + +### Universal Headers + +| Header | Sent by | Meaning | +| ---------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------- | +| `X-Capsule-Protocol` | client / peer | `YYYY-MM-DD` protocol version the request is written against | +| `X-Capsule-Crypto-Suite` | client / peer on writes | `u16` suite id from the [Primitives Inventory](/design/cryptography/primitives/#primitives-inventory) | +| `X-Capsule-Sidecar-Schema` | client on metadata-update | `u16` schema version declared at `sidecar_schema` field 0 | +| `X-Capsule-Protocol-Min` | server on every response | the lowest protocol version this server accepts | +| `X-Capsule-Protocol-Max` | server on every response | the highest protocol version this server accepts | +| `X-Capsule-Min-Client-Build` | server on responses | semver deprecation cutoff; advisory unless the path is hard-deprecated | + +### Fail-Closed Rules + +- `X-Capsule-Protocol` outside `[Min, Max]` on a **write**: `426 Upgrade Required`. No session created, no row written. +- `X-Capsule-Crypto-Suite` not in the inventory: `400 Bad Request`. +- `X-Capsule-Sidecar-Schema` above the server's max known: `400 Bad Request`. (The server does not parse sidecars itself, but it refuses to acknowledge writes whose schema number it does not index.) +- **Reads of any past version succeed.** Read invariants are deliberately stable per [Versioning](/design/versioning/), so a current server still serves v_{k-N} blobs from years ago. +- Federation capability is an additional `401` / `403` layer on top of the protocol gate. A valid token never substitutes for a valid protocol header. + +The handshake is **one-shot per request**, not a negotiation. Either both sides agree by inspection, or the request fails. There is no back-and-forth that could leak partial state. + +## Idempotency Invariants + +Every write surface has a single idempotency key. Duplicates are no-ops; conflicts (same key, different content) are corruption errors. + +| Surface | Idempotency key | Duplicate behavior | +| ----------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------- | +| Upload chunk (`PATCH /upload/{id}`) | `(upload_id, offset, chunk_hash)` | Returns current offset; no double-write | +| Session creation (`POST /upload`) | `(owner_id, hash, album_id)` — server's existing dedup check | Returns the existing session; no second session | +| Lifecycle manifest write | `(asset_id, prior_provenance_hash, manifest_hash)` | No-op append; chain advances exactly once | +| Metadata-update operation | Operation id (UUIDv7) + `(asset_id, prior_provenance_hash)` | Re-applying the same op is structurally identical | +| Federation capability proof | `(peer_id, jti)` | Refresh with same `jti` returns the same response | +| Federation pull | `(peer_id, sync_cursor)` — the sync cursor itself is the key | Re-pull returns the same page | +| MLS commit | Handled by OpenMLS; commits are ordered by the group's commit chain | OpenMLS rejects duplicates | +| Album upgrade ceremony | `intent_id` (UUIDv7); see [Versioning](/design/versioning/#album-upgrade-ceremony) | Same intent never produces two forks | + +A write surface that does not appear here is, by default, **not** idempotent and must be designed before it ships. + +## Atomicity Invariants + +Multi-write operations that must succeed-as-one or not at all. A partial success on any of these is itself a damage scenario. + +- **Asset bundle finalization.** The manifest, ciphertext blob, metadata blob, and provenance blob commit together in a single Postgres transaction. Server failure between any pair leaves the entire bundle un-finalized; the session moves to `FailedProcessing` and the partial blobs are GC'd. ([Filesystem — Atomic Writes](/design/filesystem/maintenance/#atomic-writes-and-crash-recovery)) +- **Stack edits.** All affected sidecars stage as `.tmp` files first; renames happen together. Any rename failure discards every `.tmp` in the bundle. ([Filesystem — Atomic Writes](/design/filesystem/maintenance/#atomic-writes-and-crash-recovery)) +- **AMK epoch bump + write-tier key rotation.** A new AMK and a new write-tier key are minted as a single MLS commit. The two cannot exist out of sync. +- **Album upgrade ceremony.** The cutover is one MLS commit, the `AlbumTombstone`. Until applied, the client is in v_old; after, in v_new. ([Versioning — Album Upgrade Ceremony](/design/versioning/#album-upgrade-ceremony)) +- **Lifecycle manifest + provenance record.** Writing a lifecycle manifest and appending its provenance entry are the same act, because the provenance entry **is** the manifest plus the chain link. There is no separate "now record provenance" step that can race. diff --git a/capsule-docs/src/content/docs/design/thumbnails.md b/capsule-docs/src/content/docs/design/thumbnails.md index 701ee1f..f822faf 100644 --- a/capsule-docs/src/content/docs/design/thumbnails.md +++ b/capsule-docs/src/content/docs/design/thumbnails.md @@ -1,33 +1,38 @@ --- title: Thumbnails and Previews -description: How we generate and manage thumbnails and previews for media assets in Capsule +description: Format inventory, LQIP scheme, and derivative provenance for photo and video derivatives --- -We generate thumbnails and previews for all photos and videos. This doc is the **single source of truth** for the LQIP scheme and the thumbnail/preview formats — per the [single-source-of-truth rule](/design/principles/#single-source-of-truth), other docs reference these by link rather than restating the choice. +We generate thumbnails and previews for all photos and videos. This doc is the **single source of truth** for the LQIP scheme and the thumbnail/preview formats — per the [SSoT rule](/design/principles/#single-source-of-truth), other docs reference these by link rather than restating the choice. The format table is itself the contract: every receiver (and every federated peer) compares the `DerivativeManifest.format` value against this list, and an unknown value is a structural rejection. + +Derivative generation runs client-side in `capsule-sdk` (per-platform encoder libraries) over the shared format-detection and manifest-building logic in `capsule-core`. Server-side serving is `capsule-api-media` (it serves opaque ciphertext — never decodes). ## Thumbnail and Preview Formats -> **Status:** The format table below is **provisional**. The choice between AVIF and JXL as the primary still-image codec is pending field testing of decoder availability and quality-per-byte across Capsule's target devices in 2026. The single-source-of-truth structure means any later swap is a one-row edit here, propagated nowhere else — see [Single Source of Truth](/design/principles/#single-source-of-truth). - +**JPEG XL (JXL) is the committed primary** still-image codec — the highest-quality-per-byte master derivative. Because JXL decoder coverage is still uneven in 2026, every still tier is *also* generated in **AVIF** (with **WebP** as the last-resort fallback): a client that can decode JXL fetches it, and any other client is served the AVIF→WebP delivery variant and still renders. Because this doc is the SSoT, the codec choice is a one-row edit here that propagates nowhere else (see [SSoT](/design/principles/#single-source-of-truth)). + +:::note[JXL-primary is provisional] +The JXL-primary commitment is pending external validation of decoder availability and quality-per-byte across target devices — tracked in the [image-delivery-format demo](https://github.com/justin13888/image-delivery-format-demo). If that validation shows JXL coverage is insufficient, the primary reverts to AVIF — a one-row edit here that propagates nowhere else. +::: + -Three derivative tiers per photo asset and one preview tier for video assets: +Two derivative tiers per photo asset and one preview tier for video assets: -| Tier | Photo format | Video format | Notes | -| ------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **Thumbnail** (grid display) | **AVIF** (primary), WebP fallback for browsers without AVIF | First-frame AVIF still | AVIF q=50, 4:2:0 chroma, ~256 px long edge. | -| **Preview** (lightbox / single-asset view) | **AVIF** (primary), WebP fallback | **H.264 baseline** transcode at original resolution capped to 1080p | AVIF q=70 for stills; H.264 CRF 23 for video, 30 fps cap, AAC audio. | -| **Desktop-only optional cache** | **JXL** | (n/a) | JXL is generated only when the client is a desktop and the user opts in — best quality-per-byte but decoder support is still uneven in 2026. Never produced for shared/server-side derivatives. | +| Tier | Photo format | Video format | Notes | +| ------------------------------------------ | ------------------------------------------ | ------------------------------------------------------------------- | -------------------------------------------------- | +| **Thumbnail** (grid display) | **JXL** master; **AVIF**→**WebP** delivery | First-frame JXL/AVIF still | q=50, 4:2:0 chroma, ~256 px long edge. | +| **Preview** (lightbox / single-asset view) | **JXL** master; **AVIF**→**WebP** delivery | **H.264 baseline** transcode at original resolution capped to 1080p | Stills q=70; H.264 CRF 23, 30 fps cap, AAC audio. | -- **AVIF** is the primary because in 2026 it ships in every major browser and on every major OS (iOS 16+, Android 12+, Chrome/Firefox/Safari current). Hardware decode is widespread. -- **WebP** is the fallback for the rare client that lacks AVIF. We deliberately do not fall back to JPEG — WebP covers everything JPEG would. -- **JXL** is kept as a *desktop-only optional* tier rather than the primary because cross-platform decoder coverage is still patchy. It is purely a local-cache choice; remote/sharing paths never use JXL. -- **H.264 baseline** for video previews — universally decodable, cheap CPU/GPU cost on every platform. AV1 was considered but encode cost is still high on mobile in 2026. +- **JXL** is the committed primary: best quality-per-byte and an excellent archival master. Its only gap is decoder ubiquity, which the AVIF/WebP delivery variants cover until JXL coverage is universal. +- **AVIF** is the universal delivery format — in 2026 it ships in every major browser and OS (iOS 16+, Android 12+, current Chrome/Firefox/Safari) with widespread hardware decode — served to any client that cannot yet decode JXL. +- **WebP** is the last-resort fallback for the rare client lacking AVIF. We deliberately do not fall back to JPEG — WebP covers everything JPEG would. +- **H.264 baseline** for video previews — universally decodable, cheap to decode on every platform. AV1 was considered but mobile encode cost is still high in 2026. -If an original asset is lower-resolution than the highest thumbnail tier, the affected tier simply references the original instead of generating a redundant derivative. This is **distinct** from a missing derivative (an unintentional failure during generation) — the recovery-first principle treats missing derivatives as rebuildable from the original. +If an original asset is lower-resolution than the highest thumbnail tier, that tier references the original instead of generating a redundant derivative. This is **distinct** from a missing derivative (an unintentional generation failure): the tier's [`DerivativeManifest`](/design/cryptography/provenance/#derivative-provenance) carries the recognized sentinel `format = "original"` — an explicit, signed marker the receiver trusts — whereas a simply-absent derivative is treated as rebuildable from the original (recovery-first). ## LQIP -We use [chromahash](https://github.com/justin13888/chromahash) as a perceptual hash that decodes into a low-quality image placeholder. Chromahash was chosen for its color accuracy across color spaces and it was precisely developed for Capsule's particular needs. The hash is inlined into the encrypted CBOR metadata blob (see [Metadata Encryption](/design/cryptography/#metadata-encryption)), so it is available the instant metadata syncs, before any thumbnail fetch. +We use [chromahash](https://github.com/justin13888/chromahash) as a perceptual hash that decodes into a low-quality image placeholder, chosen for its color accuracy across color spaces and developed for Capsule's needs. The chromahash, its format version, and a `dominant_color` fallback are the [`lqip` field of the sidecar](/design/metadata/#sidecar-schema-v1) — inside the [encrypted metadata blob](/design/cryptography/encryption/#metadata-encryption), so the placeholder is available the instant metadata syncs, before any thumbnail fetch, and never leaks to the server. A decoder that does not recognize the chromahash format version falls back to the solid `dominant_color` fill rather than misrendering, so a future chromahash revision is a versioned change, never a silent break. Considered and rejected: ThumbHash (smaller wire size but worse color fidelity for the wide-gamut and HDR sources Capsule expects), BlurHash (older, blurrier, less color-accurate). The single-LQIP choice avoids exactly the kind of "chromahash/ThumbHash" hedge that previously caused doc drift. @@ -35,6 +40,17 @@ Considered and rejected: ThumbHash (smaller wire size but worse color fidelity f Thumbnails and previews are *ephemeral by recovery posture* (they can always be regenerated from the original) but not *unowned*. A buggy or hostile client could otherwise quietly replace a good thumbnail with a corrupted one, and the receiving side would have no way to tell. To prevent this, every thumbnail and preview is uploaded as a derivative whose addition or replacement is an authorized, signed lifecycle action. -The full derivative manifest structure and the `derivative-add` / `derivative-replace` action set are owned by [Cryptography — Derivative Provenance](/design/cryptography/#derivative-provenance) and [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set); this doc owns only the *format* of the derivative bytes. The two interact at exactly one point: the `DerivativeManifest.format` field names the codec/format from the table above, and the verifying side rejects a manifest whose `format` is not currently recognized (the closed-enum rule from [Threat Model — Schema Evolution](/design/threat-model/#schema-evolution-and-field-grammar)). +The full derivative manifest structure and the `derivative-add` / `derivative-replace` action set are owned by [Cryptography — Derivative Provenance](/design/cryptography/provenance/#derivative-provenance) and [Authorization — The Closed Action Set](/design/authorization/#the-closed-action-set); this doc owns only the *format* of the derivative bytes. The two interact at exactly one point: the `DerivativeManifest.format` field names the codec/format from the table above, and the verifying side rejects a manifest whose `format` is not currently recognized (the closed-enum rule from [Threat Model — Schema Rules](/design/threat-model/schema-rules/#schema-evolution-and-field-grammar)). A thumbnail whose `DerivativeManifest` fails verification is **regenerated locally from the original** rather than trusted — the [recovery-first principle](/design/principles/) means a derivative is always rebuildable, so refusal-and-regenerate is the safe default. The corrupt copy is discarded (not quarantined — it carries no irreplaceable bytes), and the corresponding regeneration appends a new `derivative-replace` provenance record. + +## Validation + +- **Format detection (unit).** Encode a derivative under each row of the format table; assert the format is correctly identified by the consumer (browser tier, native client tier). Negative: provide a malformed AVIF; assert structural rejection. +- **Closed-format enum (unit).** Submit a `DerivativeManifest` with `format = "image/future-codec"`; assert rejection at the envelope check. +- **JXL-to-AVIF delivery fallback (unit).** Simulate a consumer without a JXL decoder; assert it selects the AVIF variant (and a consumer without AVIF selects WebP), never failing to render a tier that exists. +- **LQIP round-trip (unit).** Generate chromahash for a fixture image; assert decoded LQIP matches expected pixel buffer within quality tolerance, and that an unrecognized chromahash format version falls back to `dominant_color`. +- **Derivative-manifest verification (smoke).** Upload a derivative; corrupt the bytes; refetch; assert the receiver discards and regenerates from the original; assert a new `derivative-replace` provenance record is appended. +- **Original-fallback (unit).** Provide an original smaller than the highest thumbnail tier; assert that tier's manifest carries `format = "original"` rather than generating a redundant derivative. + +The cross-module case — derivative generation → upload → fetch → display — is covered by the upload+sync E2E case in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/versioning.md b/capsule-docs/src/content/docs/design/versioning.md index 4ad3772..bc0566f 100644 --- a/capsule-docs/src/content/docs/design/versioning.md +++ b/capsule-docs/src/content/docs/design/versioning.md @@ -1,21 +1,38 @@ --- title: Versioning -description: Handling versioning gracefully +description: How Capsule pins each album to a protocol version, upgrades safely, and bounds client deprecation --- -Changes are inevitable. Capsule minimizes breaking changes but generously accepts compatible ones. The aim is backward-compatible reads forever and a deliberately fail-closed write path — a [version-mismatched client](/design/threat-model/) never silently corrupts state, it is rejected at the handshake. +Changes are inevitable. Capsule minimizes breaking changes but generously accepts compatible ones. The aim is backward-compatible reads forever and a deliberately fail-closed write path — a [version-mismatched client](/design/threat-model/) never silently corrupts state; it is rejected at the handshake. -Versioning happens on multiple layers: +The enforcement is cross-cutting: every wire request, every album commit, and every sidecar carries a version identifier. The header set below is the **contract** that lets two implementations agree (or fail-closed) without negotiating. Album pinning is implemented in the album metadata model (`capsule-api` + `capsule-core`); the upgrade ceremony is an MLS application-layer flow in `capsule-core::crypto::mls` driven by client UI. The min-supported-client window is enforced server-side in `capsule-api`. + +## Versioned Surfaces + +Versioning happens on multiple layers, each owned by the doc that defines it: - **Metadata CBOR schema** — `sidecar_schema` field 0 of every sidecar (see [Metadata — Schema Versioning Rules](/design/metadata/#schema-versioning-rules)). -- **Cryptographic primitive bundle** — `crypto_suite_id` on every manifest and metadata blob (see [Cryptography — Versioning Identifiers](/design/cryptography/#versioning-identifiers)). -- **Wire protocol** — `protocol_version` (date-based, `YYYY-MM-DD`) on every API request and album pin. See [Threat Model — Protocol and Capability Negotiation](/design/threat-model/#protocol-and-capability-negotiation) for the universal handshake. +- **Cryptographic primitive bundle** — `crypto_suite_id` on every manifest and metadata blob (see [Cryptography — Versioning Identifiers](/design/cryptography/primitives/#versioning-identifiers)). +- **Wire protocol** — `protocol_version` (date-based, `YYYY-MM-DD`) on every API request and album pin. See [Threat Model — Protocol Negotiation](/design/threat-model/validation/#protocol-and-capability-negotiation) for the universal handshake. - **Client cache** — internal and rebuildable; cache schema changes drop and rebuild rather than migrate. -- **Server data structures** — PostgreSQL schema migrations forward-only. The session-state store is a deployment choice, not a versioned API surface: by default `upload_sessions` lives in PostgreSQL, and high-concurrency deployments may relocate it to Valkey for hot-path performance only. The wire protocol is identical in both cases (see [Filesystem — Stores by Deployment Profile](/design/filesystem/#stores-by-deployment-profile)). +- **Server data structures** — PostgreSQL schema migrations forward-only. The session-state store is a deployment choice, not a versioned API surface (see [Filesystem — Server: Deployment Profiles](/design/filesystem/server/#deployment-profiles)). + +## Negotiation Headers + +The contract for version compatibility — every API request and response carries these. The full fail-closed rule set is owned by [Threat Model — Protocol and Capability Negotiation](/design/threat-model/validation/#protocol-and-capability-negotiation). + +| Header | Sent by | Meaning | +| ---------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------- | +| `X-Capsule-Protocol` | client / peer | `YYYY-MM-DD` protocol version the request is written against | +| `X-Capsule-Crypto-Suite` | client / peer on writes | `u16` suite id from the [Primitives Inventory](/design/cryptography/primitives/#primitives-inventory) | +| `X-Capsule-Sidecar-Schema` | client on metadata-update | `u16` schema version declared at `sidecar_schema` field 0 | +| `X-Capsule-Protocol-Min` | server on every response | the lowest protocol version this server accepts | +| `X-Capsule-Protocol-Max` | server on every response | the highest protocol version this server accepts | +| `X-Capsule-Min-Client-Build` | server on responses | semver deprecation cutoff; advisory unless the path is hard-deprecated | ## Compatibility Verification -Initial startups of a client and server always strictly check for version compatibility and **crash early** rather than soft-degrade. The single handshake in [Threat Model — Protocol and Capability Negotiation](/design/threat-model/#protocol-and-capability-negotiation) is the only point at which compatibility is determined; once an operation is past the handshake, both sides know they agree on `protocol_version`, `crypto_suite_id`, and `sidecar_schema`. +Initial startups of a client and server always strictly check for version compatibility and **crash early** rather than soft-degrade. The single handshake in [Threat Model — Protocol and Capability Negotiation](/design/threat-model/validation/#protocol-and-capability-negotiation) is the only point at which compatibility is determined; once an operation is past the handshake, both sides know they agree on `protocol_version`, `crypto_suite_id`, and `sidecar_schema`. Capsule does **not** support backwards migrations or version downgrades. Server-side schema migrations are forward-only; if a migration fails, the server refuses to start and the operator restores from backup. There is no "rollback then continue" — that path is what corrupts data. @@ -42,7 +59,7 @@ A version-pinned album is upgraded by a **tombstone-plus-fork** ceremony: the ol ### Steps -1. **Freeze proposal.** An album admin issues an MLS application message `UpgradeIntent { from_version, to_version, intent_id, proposer_device, deadline }`, hybrid-signed by the admin's [DSK](/design/cryptography/#device-keys). The proposal carries a deadline (default 7 days). Any member's client receiving an `UpgradeIntent` for an album that is already in upgrade quiescence under a *different* `intent_id` rejects the new proposal — only one upgrade can be in flight per album. +1. **Freeze proposal.** An album admin issues an MLS application message `UpgradeIntent { from_version, to_version, intent_id, proposer_device, deadline }`, hybrid-signed by the admin's [DSK](/design/cryptography/keys/#device-keys). The proposal carries a deadline (default 7 days). Any member's client receiving an `UpgradeIntent` for an album that is already in upgrade quiescence under a *different* `intent_id` rejects the new proposal — only one upgrade can be in flight per album. 2. **Quiesce writes.** Members enter upgrade quiescence on receipt of `UpgradeIntent`: - In-flight uploads against the album are allowed to reach a terminal state. - New writes are queued **locally** with a `pending_until_upgrade` flag and the `intent_id`; they are not sent to the server. @@ -56,18 +73,28 @@ A version-pinned album is upgraded by a **tombstone-plus-fork** ceremony: the ol ### What This Defends Against -- **Version-mismatched-client damage.** A v_old client cannot write into a v_new album because every write carries `protocol_version`, which is rejected by the [protocol handshake](/design/threat-model/#protocol-and-capability-negotiation) and the [server-side validation invariants](/design/threat-model/#server-side-validation-invariants). +- **Version-mismatched-client damage.** A v_old client cannot write into a v_new album because every write carries `protocol_version`, which is rejected by the [protocol handshake](/design/threat-model/validation/#protocol-and-capability-negotiation) and the [server-side validation invariants](/design/threat-model/validation/#server-side-validation-invariants). - **Partial-upgrade corruption.** Quiescence + drain ensures no v_old write is mid-flight at the moment of cutover. The `intent_id` keys every step so a retried, duplicated, or contradictory proposal cannot produce two divergent v_new albums. - **Hostile member sabotage.** A member whose computed `frozen_state_hash` differs from the proposer's rejects the tombstone, aborting the upgrade. A malicious member cannot trick the rest into a forged "post-upgrade" state. -The full atomicity rule lives in [Threat Model — Atomicity Invariants](/design/threat-model/#atomicity-invariants); stranded `pending_until_upgrade` writes are a [quarantine surface](/design/threat-model/#quarantine-surfaces). +The full atomicity rule lives in [Threat Model — Atomicity Invariants](/design/threat-model/validation/#atomicity-invariants); stranded `pending_until_upgrade` writes are a [quarantine surface](/design/threat-model/scenarios/#quarantine-surfaces). ## Min-Supported-Client Window -The server accepts a *window* of past `protocol_version` values, not only the newest, so a staggered client rollout keeps working. A version leaves the window only after a deprecation period; the policy is owned by [Threat Model — Min-Supported-Client Deprecation Policy](/design/threat-model/#min-supported-client-deprecation-policy). +The server accepts a *window* of past `protocol_version` values, not only the newest, so a staggered client rollout keeps working. A version leaves the window only after a deprecation period; the policy is owned by [Threat Model — Min-Supported-Client Deprecation Policy](/design/threat-model/schema-rules/#min-supported-client-deprecation-policy). The interaction with album pinning: - A client whose `protocol_version` falls below the server's `Min` is rejected at the handshake for *any* write — it cannot upload into any album, including ones pinned to the version it can still parse. - A client whose `protocol_version` falls below an album's pin is rejected for writes to *that album* — the album's pin is a per-album minimum, often higher than the server's minimum (e.g., a v_2024-09-01 album rejects v_2024-06-01 clients even on a server that still accepts v_2024-06-01 for other albums). - **Reads are unaffected.** A v_old client can always *read* an album it cannot write to. The deprecation policy never makes historical state unreadable. + +## Validation + +- **Handshake fail-closed (unit, both sides).** Client-side: send a request with `X-Capsule-Protocol` outside the server-advertised range; assert refusal and structured error surfacing in the UI. Server-side: receive such a request; assert `426` response with the supported range in headers. +- **Album pin immutability (unit).** Attempt to write into an album with a `protocol_version` other than the pin; assert rejection at the server envelope. +- **Upgrade ceremony idempotency (smoke).** Run the 8-step ceremony against a multi-member testcontainer setup. Inject a crash after step 4 (the tombstone commit); resume; assert the same `intent_id` produces no second fork. Inject a divergent member state before step 4; assert the abort path triggers cleanly. +- **Stranded write queue (smoke).** During quiescence, a member writes; the write is queued locally; the upgrade completes; the queued write is re-encoded against v_new and replayed. Assert no write is lost. +- **Deprecation cutoff (unit).** Mock the cutoff date past; assert a request from a now-deprecated client returns `426` and the well-known announcement is served. + +The cross-module case — full upgrade ceremony exercised through a real client UI + server + MLS group — is one bounded E2E test in [Module Map](/design/module-map/#e2e-test-surface).