Section: Core Specification Version: 0.1
Codex documents form a cryptographic chain through content-addressable hashing and lineage pointers. This enables:
- Tamper-evident document history
- Verifiable ancestry ("this document descended from that one")
- Block-level provenance proofs
- Partial disclosure with integrity guarantees
- Decentralized verification without central authority
Each Codex document's identity IS its content hash. When a document references its parent by hash, it creates an immutable link:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Document v1 │ │ Document v2 │ │ Document v3 │
│ │ │ │ │ │
│ id: sha256:aaa │◄────│ id: sha256:bbb │◄────│ id: sha256:ccc │
│ parent: null │ │ parent: aaa │ │ parent: bbb │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Properties:
- Each document commits to its entire ancestry
- Inserting a forged intermediate document is computationally infeasible
- No external infrastructure required — documents ARE the chain
- Verification requires only the documents themselves
Beyond document-level hashing, individual content blocks have their own hashes. This enables:
- Proving a specific block existed in a document
- Selective disclosure (reveal one section, prove it's authentic)
- Efficient change detection (which blocks changed?)
- Redaction proofs (prove what was removed)
The provenance lineage extends the manifest's summary lineage (which contains parent, version, branch, note) with additional fields for the full ancestor chain, depth, and merge history. The manifest provides quick access to the immediate parent and version; the provenance record provides the complete lineage for verification and auditing.
The provenance lineage:
{
"lineage": {
"parent": "sha256:abc123...",
"ancestors": [
"sha256:abc123...",
"sha256:def456...",
"sha256:789ghi..."
],
"version": 4,
"depth": 4
}
}| Field | Type | Required | Description |
|---|---|---|---|
parent |
string | No | Immediate parent document hash |
ancestors |
array | No | Chain of ancestor hashes (nearest first) |
version |
integer | No | Sequential version number |
depth |
integer | No | Distance from root document |
The ancestors array provides redundancy and fast chain verification:
{
"ancestors": [
"sha256:parent...",
"sha256:grandparent...",
"sha256:greatgrandparent...",
"sha256:root..."
]
}Rules:
- First element MUST equal
parent - Array SHOULD include all ancestors up to reasonable depth (recommended: 10)
- For chains longer than stored depth, final element represents oldest known
- Empty array or omitted field indicates root document
To verify a document's lineage:
1. Compute document's content hash
2. Verify it matches claimed ID
3. If parent exists:
a. Retrieve parent document
b. Verify parent's hash matches lineage.parent
c. Verify this document's hash appears in parent's future chain (if tracked)
d. Recursively verify parent's lineage
4. Chain is valid if all links verify
Multiple documents may share the same parent (branching):
┌─── sha256:branch-a (parent: root)
sha256:root┤
└─── sha256:branch-b (parent: root)
The lineage can track branch information:
{
"lineage": {
"parent": "sha256:root...",
"branch": "feature-x",
"mergedFrom": ["sha256:branch-b..."]
}
}| Field | Type | Description |
|---|---|---|
branch |
string | Branch identifier |
mergedFrom |
array | Document hashes merged into this version |
Content blocks form a Merkle tree, enabling efficient proofs:
┌─────────────────┐
│ Root Hash │
│ (in manifest) │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ Hash(L+R) │ │ Hash(L+R) │
└──────┬──────┘ └──────┬──────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ │ │ │
┌───┴───┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐
│Block 1│ │Block 2│ │Block 3│ │Block 4│
└───────┘ └───────┘ └───────┘ └───────┘
Each block's hash is computed from its canonical JSON representation:
BlockHash = SHA256(CanonicalJSON(block))
Where canonical JSON follows RFC 8785 (JCS):
- Keys sorted alphabetically
- No whitespace
- Deterministic number formatting
1. Compute hash of each content block
2. If odd number of blocks, duplicate last hash
3. Pair hashes and compute parent: Hash(left + right)
4. Repeat until single root hash remains
5. Store root hash in manifest
{
"content": {
"path": "content/document.json",
"hash": "sha256:contenthash...",
"merkleRoot": "sha256:merkleroot...",
"blockCount": 42
}
}For efficient proof generation, store block hashes:
Location: content/block-index.json
{
"version": "0.1",
"algorithm": "sha256",
"root": "sha256:merkleroot...",
"blocks": [
{
"id": "block-1",
"hash": "sha256:blockhash1...",
"index": 0
},
{
"id": "block-2",
"hash": "sha256:blockhash2...",
"index": 1
}
]
}Prove a block exists in a document without revealing other blocks:
{
"proof": {
"type": "inclusion",
"documentId": "sha256:docid...",
"merkleRoot": "sha256:root...",
"block": {
"id": "block-5",
"hash": "sha256:blockhash...",
"index": 4
},
"path": [
{ "position": "right", "hash": "sha256:sibling1..." },
{ "position": "left", "hash": "sha256:sibling2..." },
{ "position": "right", "hash": "sha256:sibling3..." }
]
}
}1. Start with block hash
2. For each path element:
a. If position is "left": hash = Hash(element.hash + hash)
b. If position is "right": hash = Hash(hash + element.hash)
3. Final hash must equal merkleRoot
4. merkleRoot must match document's manifest
Prove a block does NOT exist (useful for redaction verification):
{
"proof": {
"type": "exclusion",
"documentId": "sha256:docid...",
"merkleRoot": "sha256:root...",
"removedBlockHash": "sha256:removed...",
"adjacentBlocks": [
{ "index": 3, "hash": "sha256:before..." },
{ "index": 5, "hash": "sha256:after..." }
],
"path": [...]
}
}When content is redacted, prove the relationship between original and redacted:
{
"redaction": {
"originalDocument": "sha256:original...",
"redactedDocument": "sha256:redacted...",
"removedBlocks": [
{
"id": "confidential-section",
"originalIndex": 5,
"proof": { /* inclusion proof in original */ }
}
],
"retainedBlocks": [
{
"id": "public-section",
"proof": { /* inclusion proof in both */ }
}
]
}
}While the hash chain proves relative ordering (A before B), timestamp anchoring proves absolute time ("hash H existed at time T").
Traditional trusted timestamp authorities:
{
"timestamps": [
{
"type": "rfc3161",
"authority": "https://timestamp.digicert.com",
"time": "2025-01-15T10:00:00Z",
"hash": "sha256:docid...",
"token": "MIIEpgYJKoZI..."
}
]
}Decentralized timestamping via public blockchains:
{
"timestamps": [
{
"type": "blockchain",
"chain": "bitcoin",
"blockHeight": 850000,
"blockHash": "00000000000000000002a7c4...",
"txId": "abc123...",
"merkleProof": [...],
"time": "2025-01-15T10:05:00Z",
"hash": "sha256:docid..."
}
]
}Supported chains:
bitcoin— Most secure, ~10 minute blocksethereum— ~12 second blocks
Note: This only stores a hash on-chain, not the document. The blockchain provides timestamping, not storage.
For efficiency, multiple document hashes can be combined:
{
"timestamps": [
{
"type": "aggregated",
"provider": "opentimestamps.org",
"calendar": "https://alice.btc.calendar.opentimestamps.org",
"merkleRoot": "sha256:aggregateroot...",
"proof": {
"documentHash": "sha256:docid...",
"path": [...]
},
"anchor": {
"chain": "bitcoin",
"blockHeight": 850000,
"txId": "abc123..."
}
}
]
}This allows timestamping many documents in one blockchain transaction.
Documents can reference other documents by hash:
{
"type": "reference",
"documentId": "sha256:otherdoc...",
"blockId": "section-3",
"description": "See related analysis"
}This creates a verifiable link — the referenced document cannot change without breaking the reference.
Academic/legal citations can form their own provenance chain:
{
"citations": [
{
"id": "cite-1",
"target": {
"documentId": "sha256:sourcedoc...",
"blockId": "theorem-1"
},
"context": "As proven in [1]..."
}
]
}Location: provenance/record.json
{
"version": "0.1",
"documentId": "sha256:current...",
"created": "2025-01-15T10:00:00Z",
"creator": {
"name": "Jane Doe",
"identifier": "did:web:example.com:jane"
},
"lineage": {
"parent": "sha256:parent...",
"ancestors": ["sha256:parent...", "sha256:grandparent..."],
"depth": 3
},
"merkle": {
"root": "sha256:merkleroot...",
"blockCount": 42,
"algorithm": "sha256"
},
"timestamps": [...],
"derivedFrom": [
{
"documentId": "sha256:source1...",
"relationship": "excerpt",
"blocks": ["block-5", "block-6"]
}
]
}The provenance record enables queries like:
- "Show me all ancestors of this document"
- "When was this document first timestamped?"
- "Prove block X existed in document Y"
- "What documents cite this one?" (requires external index)
- Merkle tree construction: O(n) where n = number of blocks
- Proof generation: O(log n)
- Proof verification: O(log n)
- Chain traversal: O(depth) — store ancestors to avoid repeated fetches
- Block index adds ~50-100 bytes per block
- Merkle proofs are O(log n) hashes (~32 bytes each for SHA-256)
- Ancestor chain capped at reasonable depth (10-20) to limit growth
- Merkle tree can be computed on-demand
- Block index can be generated when needed
- Only root hash is required in manifest
The chain's security depends on hash collision resistance:
- SHA-256: ~128 bits security (quantum), ~256 bits (classical)
- Recommend migration path if hash weaknesses discovered
- RFC 3161: Trust depends on TSA integrity
- Blockchain: Trust depends on chain security (Bitcoin = very high)
- Aggregated: Trust depends on aggregator + anchor chain
If intermediate documents are unavailable:
- Chain verification stops at gap
- Document is valid but ancestry is partial
- Consider archiving full chains for important documents
{
"codex": "0.1",
"id": "sha256:abc123...",
"state": "frozen",
"lineage": {
"parent": "sha256:parent...",
"version": 3
}
}{
"version": "0.1",
"documentId": "sha256:3a7bd3e2360a3d29eea436fcfb7e44c735d117c42d1c1835420b6b9942dd4f1b",
"created": "2025-01-15T10:00:00Z",
"creator": {
"name": "Research Team",
"identifier": "did:web:university.edu:research"
},
"lineage": {
"parent": "sha256:v2hash...",
"ancestors": [
"sha256:v2hash...",
"sha256:v1hash...",
"sha256:originalhash..."
],
"depth": 4,
"branch": "main"
},
"merkle": {
"root": "sha256:merkleroot...",
"blockCount": 127,
"algorithm": "sha256"
},
"timestamps": [
{
"type": "rfc3161",
"authority": "https://timestamp.university.edu",
"time": "2025-01-15T10:00:05Z",
"token": "..."
},
{
"type": "blockchain",
"chain": "bitcoin",
"blockHeight": 880000,
"time": "2025-01-15T10:15:00Z",
"txId": "..."
}
]
}{
"proof": {
"type": "inclusion",
"documentId": "sha256:3a7bd3e2...",
"merkleRoot": "sha256:merkleroot...",
"block": {
"id": "conclusion",
"hash": "sha256:blockhash...",
"index": 42
},
"path": [
{ "position": "right", "hash": "sha256:h1..." },
{ "position": "left", "hash": "sha256:h2..." },
{ "position": "right", "hash": "sha256:h3..." },
{ "position": "left", "hash": "sha256:h4..." },
{ "position": "right", "hash": "sha256:h5..." },
{ "position": "left", "hash": "sha256:h6..." },
{ "position": "right", "hash": "sha256:h7..." }
]
}
}Verification: Starting with block hash, apply path to reach root.