Skip to content

docs: post-merge corrections on PR #452 + #453 architecture docs#454

Merged
AdaWorldAPI merged 2 commits into
mainfrom
docs/post-merge-corrections-architecture
Jun 3, 2026
Merged

docs: post-merge corrections on PR #452 + #453 architecture docs#454
AdaWorldAPI merged 2 commits into
mainfrom
docs/post-merge-corrections-architecture

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

Post-merge corrections on the two architecture docs from PR #452 + #453. Both PRs merged with content the reviewer + peer-session flagged AFTER merge; this PR brings main in line with the post-review state. Doc-only, zero code, paired files.

What's corrected

docs/CLUSTER_ASYMMETRY.md (PR #453's vart drift)

The bullet about HHTL-prefix dedup cited vort/vart adaptive radix trie as if it were a shipped crate. A workspace scan across all Cargo.toml files confirms vart is not a dependency in this repository; the radix-shaped trie at the cognitive layer is a proposed consumer pattern (no shipped crate name). The corrected bullet cites the two shipped surfaces:

  • lance-graph-contract::hhtl::NiblePath (shipped) — the identity primitive (16-fan-out nibble path packed into a u64)
  • Lance versions() (shipped) — the time-axis

Adopters can derive an adaptive radix-trie index over NiblePath addresses themselves; that data structure is consumer code, not a lance-graph dep.

docs/APPEND_ONLY_RAFT_DOVETAIL.md (PR #452's same critique class as PR #453)

Three parallel fixes to the ones PR #453 received:

  1. Scope caveat immediately after the TL;DR — peer-Raft + Lance-local is an EXTERNAL pattern (bardioc B1 substrate-b), NOT a built-in lance-graph feature. Adopters provide the Raft layer (openraft / surreal-cluster / external TiKV).
  2. Compaction honesty in Section 1 — Lance has compaction via DatasetOptimizer.compact_files for fragment layout; just qualitatively different from LSM tombstone-reclaim + run-merge. Not absent. Operators still plan for it; just not coordinated with replication.
  3. Section 5 (consensus tax lands once) — acknowledges that Lance file compaction runs INDEPENDENTLY of consensus; the storage-commit tax and the consensus tax are the same write, but the layout-optimization cycle is a separate concern. The comparison table row also annotated.

Why follow-up rather than amend the original PRs

PR #452 and #453 already merged. New PR is the cleanest path. Bundling both files into one follow-up because both fix the SAME class of critique (overclaim + missing scope caveat) and reviewers can diff them together.

What this PR does NOT change

  • No structural change to either doc's outline
  • No removed sections
  • No new claims beyond the corrections
  • The architectural property (append-only storage + append-only Raft log share the same write shape) remains the central thesis — just bracketed by an honest scope statement

Provenance

Summary by CodeRabbit

  • Documentation
    • Clarified that the peer-Raft + per-node local layout approach is an external deployment pattern and not built into the core product, with updated operational guidance.
    • Explained that file/layout compaction is a local layout optimization whose outputs are replicated via consensus, and distinguished this from LSM-style tombstone reclamation.
    • Corrected and refined descriptions of available components and their deduplication/versioning behaviors.

Two follow-up doc fixes after PR #452 and #453 merged. Reviewers
flagged real bugs the doc-only PRs landed with; these corrections
bring main in line with the post-review state.

== CLUSTER_ASYMMETRY.md ==

Drop the vort/vart adaptive-radix-trie citation; cite shipped
NiblePath + Lance versions() instead. A workspace scan across all
Cargo.toml files confirms vart is NOT a dependency in this
repository; le-domino's own seam-map documents it as 'doc-prose
only'. Citing a non-existent crate in a public architecture doc
misleads adopters into looking for a crate that they cannot find.

The role the bullet described (HHTL-prefix dedup) is actually:

- lance-graph-contract::hhtl::NiblePath (shipped) for the identity
  primitive (16-fan-out nibble path packed into a u64)
- Lance versions() (shipped) for the time-axis (cross-session index
  of which identity positions changed when)

Adopters can derive an adaptive radix-trie index over NiblePath
addresses themselves; that data structure is consumer code, not a
lance-graph dep. The corrected bullet cites the two shipped surfaces
and flags the radix-shaped consumer pattern as proposed.

== APPEND_ONLY_RAFT_DOVETAIL.md ==

Apply the same critique class codex raised on PR #453's companion doc:

- Scope caveat: peer-Raft + Lance-local is an EXTERNAL architecture
  pattern (bardioc B1 substrate-b), NOT a built-in lance-graph
  feature. Adopters provide the Raft layer themselves (openraft /
  surreal-cluster / external TiKV). Lance-graph contributes the
  storage-append/consensus-append dovetail property that MAKES the
  pattern cheap; not the pattern itself. Added a scope banner
  immediately after the TL;DR.
- Compaction honesty: Lance has compaction TOO via
  DatasetOptimizer.compact_files for fragment layout. The doc
  previously said 'Lance has no compaction' which is wrong for
  append-heavy deployments. Rewrote section 1 to distinguish LSM
  tombstone-reclaim+run-merge compaction from Lance file-layout
  compaction. Both exist; only LSM coordinates with replication.
- Consensus-tax-lands-once section also updated to acknowledge that
  Lance file compaction runs INDEPENDENTLY of consensus (the storage
  commit tax and the consensus tax are the same write; the LAYOUT
  OPTIMIZATION cycle is a separate concern that does not couple).

Both files were merged earlier (PR #452, #453); these corrections
land in main as a follow-up so adopters reading the docs today see
the honest scope + correct citations. No code changes; doc-only.
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d2ab7c3c-a312-42d5-84ad-b9f30641c837

📥 Commits

Reviewing files that changed from the base of the PR and between d854974 and 4bd38f4.

📒 Files selected for processing (2)
  • docs/APPEND_ONLY_RAFT_DOVETAIL.md
  • docs/CLUSTER_ASYMMETRY.md

📝 Walkthrough

Walkthrough

Updates two architecture docs: clarifies peer-Raft + per-node Lance is an external deployment pattern, rewrites compaction semantics to describe local layout optimization with Raft-replicated manifest outputs, unifies consensus/storage cost under append-only semantics, and corrects shipped vs. consumer surfaces for identity/time-axis.

Changes

Documentation Clarifications for Raft and Consensus Patterns

Layer / File(s) Summary
External architecture pattern scope clarification
docs/APPEND_ONLY_RAFT_DOVETAIL.md
Adds explicit clarification that the peer-Raft + Lance-local-per-node deployment shape is an external architecture approach, not built-in lance-graph behavior, and updates the operational lead-in for compaction.
Compaction behavior and failure modes
docs/APPEND_ONLY_RAFT_DOVETAIL.md
Substantially rewrites compaction operational consequences: describes DatasetOptimizer.compact_files as a local layout optimization that may materialize deletes/remove dropped columns, contrasts it with Cassandra-style LSM tombstone reclaim, and documents local execution and failure-mode characteristics while noting compaction outputs (manifest + fragments) replicate via Raft.
Append-only write and consensus cost unification
docs/APPEND_ONLY_RAFT_DOVETAIL.md
Reworks the “consensus tax” explanation to state that append-only storage + Raft collapse consensus protocol cost and storage commit into the same append work; clarifies compaction generates a new manifest version that replicates through normal Raft/anti-entropy paths without a second per-replica LSM-style storage tax.
Shipped surfaces and consumer-side dedup correction
docs/CLUSTER_ASYMMETRY.md
Replaces the prior vort/vart claim with corrected attribution: lance-graph-contract::hhtl::NiblePath provides packed nibble-path identity, Lance versions() provides the snapshot log/time-axis, and prefix-dedup/adaptive-radix-trie behavior is consumer-implemented rather than a shipped crate.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AdaWorldAPI/lance-graph#453: Related edits to CLUSTER_ASYMMETRY.md refining the “fit on one node” narrative and shipped-surface corrections.
  • AdaWorldAPI/lance-graph#452: Prior edits to APPEND_ONLY_RAFT_DOVETAIL.md introducing the append-only + Raft dovetail narrative that these clarifications build on.

Poem

🐰 With nibble-paths snug and manifest bright,

Raft appends once and keeps the log tight.
Compaction hums local, leaves the cluster sane,
Docs now say clearly how the pieces remain.
Hoppity hops — clarity gained! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: post-merge corrections to architecture documentation in two files based on prior PR reviews.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/post-merge-corrections-architecture

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d854974c64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/APPEND_ONLY_RAFT_DOVETAIL.md Outdated
Comment on lines +89 to +93
- Compaction runs INDEPENDENTLY of consensus replication — file
compaction does not block writes, does not affect Raft log shipping,
does not coordinate across nodes
- Per-node compaction is local-only; each node compacts its own
Lance dataset on its own schedule without affecting peers

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not describe compaction as local-only

When one replica runs compact_files, Lance commits a new manifest/version with a different fragment set, so it cannot be invisible to peers under the architecture described here: this document says the manifest update is the commit acknowledgement and later uses manifest hashes plus missing fragments for anti-entropy. Scheduling compaction independently on each node will therefore make logically equivalent replicas appear divergent and can cause compaction manifests or fragments to be overwritten or unnecessarily shipped; the pattern needs to replicate the compaction commit or define a different logical-state synchronization scheme.

Useful? React with 👍 / 👎.

Comment thread docs/APPEND_ONLY_RAFT_DOVETAIL.md Outdated
Comment on lines +81 to +85
fragments which slow scans). It is NOT a tombstone-reclaim cycle —
Lance is append-only at the version level, so there are no tombstones
to reclaim in the LSM sense. File compaction takes existing
append-only fragments and produces new append-only fragments at a
better layout.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for deleted-row reclamation during compaction

For datasets that use Lance deletes, updates, or dropped columns, DatasetOptimizer.compact_files is not only a small-fragment layout optimization: the pinned Lance 7.0.0 implementation documents that it removes deleted rows and dropped columns by default. Although these are not LSM tombstones, describing compaction as having no reclamation role understates its space-management purpose and can lead operators to omit it from retention and storage planning.

Useful? React with 👍 / 👎.

Comment thread docs/CLUSTER_ASYMMETRY.md Outdated
Comment on lines +106 to +108
a built-in lance-graph crate.** Lance's own `versions()` log is the
time-axis (cross-session index of which identity positions changed
when). An earlier version of this doc cited `vort/vart` as if it

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop calling versions() a changed-position index

For adopters building the proposed time-axis index, Lance versions() does not identify which identity positions changed: VersionedGraph::versions() in crates/lance-graph/src/graph/versioned.rs simply returns Vec<lance::dataset::Version> metadata for the nodes dataset. Determining changed identities requires comparing snapshots or maintaining a separate change index, so this parenthetical overstates the shipped surface.

Useful? React with 👍 / 👎.

…sions-is-not-a-change-index

Three P2 findings on PR #454 walked back, all real overclaims:

P2 1 (APPEND_ONLY_RAFT_DOVETAIL.md):
   Walked back 'compaction runs INDEPENDENTLY of consensus
   replication'. Under peer-Raft, compaction COMMITS a new manifest
   version which IS part of the consensus log; the compaction OPERATION
   runs locally but its OUTPUT (new fragments + new manifest delta)
   flows through Raft like any other write. So peers see it. The
   independence claim should have been narrower: the SCHEDULING of
   the operation is local; the OUTCOME replicates.

P2 2 (APPEND_ONLY_RAFT_DOVETAIL.md):
   Walked back 'no tombstones to reclaim in the LSM sense; layout
   optimization only'. Lance compact_files DOES reclaim deleted rows
   and dropped columns by default. Different mechanism from LSM
   tombstones (Lance has no tombstones at the version level; rows
   are deleted via deletion vectors which compaction materializes
   away) but the functional role of reclamation IS present for
   datasets that use deletes, updates, or dropped columns. The doc
   now distinguishes 'no LSM-style tombstone reclaim' from 'has
   deletion-vector reclaim and dropped-column reclaim'.

P2 3 (CLUSTER_ASYMMETRY.md):
   Walked back the claim that Lance versions() is 'the time-axis
   index of which identity positions changed when'. versions()
   returns Vec<lance::dataset::Version> metadata: snapshot tags +
   timestamps, not a change-set index. To find which identities
   changed between versions, adopters compare snapshots OR maintain
   a separate index. The corrected bullet describes versions() as
   the version-snapshot log (which it is) and notes that the
   change-set derivation is consumer code.

Provenance: Codex P2 review on PR #454 commit 16f879b.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant