feat(contract): ContentStore — content-addressed cold text store (D-CC-ARI-3)#581
Conversation
Draft reference for the AriGraph/OSINT episodic-arc wiring (D-CC-ARI-3),
parked on its own branch off merged main (content_store does not yet exist
on main). Zero-dep typed surface in lance-graph-contract:
- ContentId(u64) = fnv1a of the bytes (canon hash, stable across versions —
the correct content address; DefaultHasher must never key one).
- SourceSpan{ContentId,u32,u32} = the typed (source_id,start,end) form of
template-equivalence's provenance; is_cited() = "no source span -> no claim".
- ContentStore (cold read, resolve -> Option<&[u8]> zero-copy slice) +
ContentSink (idempotent put -> dedup by content-address). Hot path touches
only ContentId/SourceSpan; bytes hydrate cold at the membrane (ADR-022).
Logic-complete + self-reviewed; cargo verification deferred (worktree was
disk/sibling-constrained). Run `cargo test -p lance-graph-contract content_store`
in a full checkout before merge. Author canonically or supersede as the other
session's content_store work lands.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VGXeWN4XfVjteBVcVeuLo4
The content-addressed cold text/blob store for the AriGraph/OSINT episodic
arc (D-CC-ARI-3). Zero-dep typed surface in lance-graph-contract:
- ContentId(u64) = hash::fnv1a of the bytes (stable across versions — the
correct content address; DefaultHasher must never key one; 0 = sentinel).
- SourceSpan{ContentId,u32,u32} = the fixed-size Copy typed form of
template-equivalence's (source_id,start,end); is_cited() = "no source
span -> no claim".
- ContentStore (cold resolve -> Option<&[u8]> zero-copy slice) + ContentSink
(idempotent put -> dedup by content-address: many episodes -> one source).
Hot/cold firewall (ADR-022): the hot path touches only the fixed-size
ContentId/SourceSpan; bytes hydrate cold at the membrane. Nothing
variable-length enters the 512 B node.
Additive, zero-dep; +6 tests, clippy clean. Board: LATEST_STATE Contract
Inventory. Consumers: rs-graph-llm/episodic-arc-task, template-equivalence.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VGXeWN4XfVjteBVcVeuLo4
|
Warning Review limit reached
More reviews will be available in 14 minutes and 34 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10b9bb5dc0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// Span length in bytes. | ||
| #[must_use] | ||
| pub fn len(self) -> u32 { | ||
| self.end - self.start |
There was a problem hiding this comment.
Use a saturating length for public SourceSpan values
Because SourceSpan's fields are public, consumers can deserialize or build typed provenance with end < start without going through new(); in that case this subtraction panics in debug builds and wraps to a huge u32 in release, even though is_empty() treats the same span as empty. Any downstream code using len() to size or copy a malformed span can therefore mis-handle provenance; make the invariant unrepresentable or compute the length with saturating_sub.
Useful? React with 👍 / 👎.
SourceSpan's fields are public, so a consumer can build end < start (bypassing new()'s clamp); the old `end - start` panicked in debug and wrapped to a huge u32 in release, inconsistent with is_empty(). Use saturating_sub so len() reports 0 for a malformed span, matching is_empty()/is_cited(). +1 test (malformed_span_len_saturates_not_panics). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VGXeWN4XfVjteBVcVeuLo4
|
codex P2 (saturating
Generated by Claude Code |
What
The content-addressed cold text/blob store contract — the gating dependency for the AriGraph/OSINT episodic arc (
D-CC-ARI-3). Zero-dep typed surface inlance-graph-contract:ContentId(u64)=hash::fnv1aof the bytes — stable across versions/platforms (the correct content address;DefaultHashermust never key one;0= sentinel). Identical bytes ⇒ identical id ⇒ dedup.SourceSpan{ContentId,u32,u32}= the fixed-size,Copytyped form oftemplate-equivalence's(source_id,start,end)provenance.is_cited()= the gate's "no source span → no claim" predicate.ContentStore(cold read:resolve(id) -> Option<&[u8]>zero-copy slice into the mmap/backing store;resolve_span/containsdefaulted) +ContentSink(idempotentput -> ContentId, dedup by content-address — many episodes → one source row).Why this shape
Encodes the three rules from the design discussion:
ContentId(a value tenant), the text lives in a columnar table and joins by id.ContentId/SourceSpan; bytes hydrate cold at the membrane (the fingerprint is the hot-path stand-in for text).resolveis never called during computation.Scope
Additive, zero-dep; 6 tests (stable/dedup, idempotent put,
resolve_spanslice, OOB/missing errors, uncited-rejected); clippy clean. Board hygiene:LATEST_STATE.mdContract Inventory entry in the same commit.Consumers:
rs-graph-llm/episodic-arc-task(replaces its localfnv1astand-in withContentId/SourceSpan),template-equivalence(typed provenance). Also fixes the flaggedWitnessEntry::tie_break_hashDefaultHashercorrectness issue by giving content-addressing a stable canonical hash.Coordination: if the other session is independently authoring
content_store, this can be superseded/closed —content_storewas absent frommainand the active jirak session is on the supervisor surface (#578/#579/#580), so this fills the gap.Plan:
.claude/plans/arigraph-osint-episodic-v1.md.🤖 Generated with Claude Code
Generated by Claude Code