Skip to content

feat(contract): ContentStore — content-addressed cold text store (D-CC-ARI-3)#581

Merged
AdaWorldAPI merged 3 commits into
mainfrom
claude/content-store-contract-draft
Jun 21, 2026
Merged

feat(contract): ContentStore — content-addressed cold text store (D-CC-ARI-3)#581
AdaWorldAPI merged 3 commits into
mainfrom
claude/content-store-contract-draft

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

What

The content-addressed cold text/blob store contract — the gating dependency for the AriGraph/OSINT episodic arc (D-CC-ARI-3). Zero-dep typed surface in lance-graph-contract:

  • ContentId(u64) = hash::fnv1a of the bytes — stable across versions/platforms (the correct content address; DefaultHasher must never key one; 0 = sentinel). Identical bytes ⇒ identical id ⇒ dedup.
  • SourceSpan{ContentId,u32,u32} = the fixed-size, Copy typed form of template-equivalence's (source_id,start,end) provenance. is_cited() = the gate's "no source span → no claim" predicate.
  • ContentStore (cold read: resolve(id) -> Option<&[u8]> zero-copy slice into the mmap/backing store; resolve_span/contains defaulted) + ContentSink (idempotent put -> ContentId, dedup by content-address — many episodes → one source row).

Why this shape

Encodes the three rules from the design discussion:

  1. The join key IS the identity — nothing variable-length enters the 512 B node; it carries only a fixed-size ContentId (a value tenant), the text lives in a columnar table and joins by id.
  2. Content-address, not raw GUID — shared OSINT sources dedup.
  3. Hot/cold firewall (ADR-022) — the hot path (SIMD sweep, AriGraph edge traversal) touches only ContentId/SourceSpan; bytes hydrate cold at the membrane (the fingerprint is the hot-path stand-in for text). resolve is never called during computation.

Scope

Additive, zero-dep; 6 tests (stable/dedup, idempotent put, resolve_span slice, OOB/missing errors, uncited-rejected); clippy clean. Board hygiene: LATEST_STATE.md Contract Inventory entry in the same commit.

Consumers: rs-graph-llm/episodic-arc-task (replaces its local fnv1a stand-in with ContentId/SourceSpan), template-equivalence (typed provenance). Also fixes the flagged WitnessEntry::tie_break_hash DefaultHasher correctness issue by giving content-addressing a stable canonical hash.

Coordination: if the other session is independently authoring content_store, this can be superseded/closed — content_store was absent from main and the active jirak session is on the supervisor surface (#578/#579/#580), so this fills the gap.

Plan: .claude/plans/arigraph-osint-episodic-v1.md.

🤖 Generated with Claude Code


Generated by Claude Code

claude added 2 commits June 21, 2026 15:39
Draft reference for the AriGraph/OSINT episodic-arc wiring (D-CC-ARI-3),
parked on its own branch off merged main (content_store does not yet exist
on main). Zero-dep typed surface in lance-graph-contract:

- ContentId(u64) = fnv1a of the bytes (canon hash, stable across versions —
  the correct content address; DefaultHasher must never key one).
- SourceSpan{ContentId,u32,u32} = the typed (source_id,start,end) form of
  template-equivalence's provenance; is_cited() = "no source span -> no claim".
- ContentStore (cold read, resolve -> Option<&[u8]> zero-copy slice) +
  ContentSink (idempotent put -> dedup by content-address). Hot path touches
  only ContentId/SourceSpan; bytes hydrate cold at the membrane (ADR-022).

Logic-complete + self-reviewed; cargo verification deferred (worktree was
disk/sibling-constrained). Run `cargo test -p lance-graph-contract content_store`
in a full checkout before merge. Author canonically or supersede as the other
session's content_store work lands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VGXeWN4XfVjteBVcVeuLo4
The content-addressed cold text/blob store for the AriGraph/OSINT episodic
arc (D-CC-ARI-3). Zero-dep typed surface in lance-graph-contract:

- ContentId(u64) = hash::fnv1a of the bytes (stable across versions — the
  correct content address; DefaultHasher must never key one; 0 = sentinel).
- SourceSpan{ContentId,u32,u32} = the fixed-size Copy typed form of
  template-equivalence's (source_id,start,end); is_cited() = "no source
  span -> no claim".
- ContentStore (cold resolve -> Option<&[u8]> zero-copy slice) + ContentSink
  (idempotent put -> dedup by content-address: many episodes -> one source).

Hot/cold firewall (ADR-022): the hot path touches only the fixed-size
ContentId/SourceSpan; bytes hydrate cold at the membrane. Nothing
variable-length enters the 512 B node.

Additive, zero-dep; +6 tests, clippy clean. Board: LATEST_STATE Contract
Inventory. Consumers: rs-graph-llm/episodic-arc-task, template-equivalence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VGXeWN4XfVjteBVcVeuLo4
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 34 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ad3f39e-b537-4625-a345-a9e1fd44b9d1

📥 Commits

Reviewing files that changed from the base of the PR and between 98d5d2f and 6103438.

📒 Files selected for processing (3)
  • .claude/board/LATEST_STATE.md
  • crates/lance-graph-contract/src/content_store.rs
  • crates/lance-graph-contract/src/lib.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10b9bb5dc0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// Span length in bytes.
#[must_use]
pub fn len(self) -> u32 {
self.end - self.start

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a saturating length for public SourceSpan values

Because SourceSpan's fields are public, consumers can deserialize or build typed provenance with end < start without going through new(); in that case this subtraction panics in debug builds and wraps to a huge u32 in release, even though is_empty() treats the same span as empty. Any downstream code using len() to size or copy a malformed span can therefore mis-handle provenance; make the invariant unrepresentable or compute the length with saturating_sub.

Useful? React with 👍 / 👎.

SourceSpan's fields are public, so a consumer can build end < start
(bypassing new()'s clamp); the old `end - start` panicked in debug and
wrapped to a huge u32 in release, inconsistent with is_empty(). Use
saturating_sub so len() reports 0 for a malformed span, matching
is_empty()/is_cited(). +1 test (malformed_span_len_saturates_not_panics).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VGXeWN4XfVjteBVcVeuLo4

Copy link
Copy Markdown
Owner Author

codex P2 (saturating SourceSpan::len) — ✅ fixed in 6103438.

SourceSpan's fields are public, so a consumer can build end < start (bypassing new()'s clamp). len() now uses saturating_sub, returning 0 for a malformed span — consistent with is_empty()/is_cited(), never panicking (debug) or wrapping (release). +1 test (malformed_span_len_saturates_not_panics). Now 7 tests, clippy clean.


Generated by Claude Code

@AdaWorldAPI AdaWorldAPI merged commit 96c1249 into main Jun 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants