docs: post-merge corrections on PR #452 + #453 architecture docs#454
Conversation
Two follow-up doc fixes after PR #452 and #453 merged. Reviewers flagged real bugs the doc-only PRs landed with; these corrections bring main in line with the post-review state. == CLUSTER_ASYMMETRY.md == Drop the vort/vart adaptive-radix-trie citation; cite shipped NiblePath + Lance versions() instead. A workspace scan across all Cargo.toml files confirms vart is NOT a dependency in this repository; le-domino's own seam-map documents it as 'doc-prose only'. Citing a non-existent crate in a public architecture doc misleads adopters into looking for a crate that they cannot find. The role the bullet described (HHTL-prefix dedup) is actually: - lance-graph-contract::hhtl::NiblePath (shipped) for the identity primitive (16-fan-out nibble path packed into a u64) - Lance versions() (shipped) for the time-axis (cross-session index of which identity positions changed when) Adopters can derive an adaptive radix-trie index over NiblePath addresses themselves; that data structure is consumer code, not a lance-graph dep. The corrected bullet cites the two shipped surfaces and flags the radix-shaped consumer pattern as proposed. == APPEND_ONLY_RAFT_DOVETAIL.md == Apply the same critique class codex raised on PR #453's companion doc: - Scope caveat: peer-Raft + Lance-local is an EXTERNAL architecture pattern (bardioc B1 substrate-b), NOT a built-in lance-graph feature. Adopters provide the Raft layer themselves (openraft / surreal-cluster / external TiKV). Lance-graph contributes the storage-append/consensus-append dovetail property that MAKES the pattern cheap; not the pattern itself. Added a scope banner immediately after the TL;DR. - Compaction honesty: Lance has compaction TOO via DatasetOptimizer.compact_files for fragment layout. The doc previously said 'Lance has no compaction' which is wrong for append-heavy deployments. Rewrote section 1 to distinguish LSM tombstone-reclaim+run-merge compaction from Lance file-layout compaction. Both exist; only LSM coordinates with replication. - Consensus-tax-lands-once section also updated to acknowledge that Lance file compaction runs INDEPENDENTLY of consensus (the storage commit tax and the consensus tax are the same write; the LAYOUT OPTIMIZATION cycle is a separate concern that does not couple). Both files were merged earlier (PR #452, #453); these corrections land in main as a follow-up so adopters reading the docs today see the honest scope + correct citations. No code changes; doc-only.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdates two architecture docs: clarifies peer-Raft + per-node Lance is an external deployment pattern, rewrites compaction semantics to describe local layout optimization with Raft-replicated manifest outputs, unifies consensus/storage cost under append-only semantics, and corrects shipped vs. consumer surfaces for identity/time-axis. ChangesDocumentation Clarifications for Raft and Consensus Patterns
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d854974c64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - Compaction runs INDEPENDENTLY of consensus replication — file | ||
| compaction does not block writes, does not affect Raft log shipping, | ||
| does not coordinate across nodes | ||
| - Per-node compaction is local-only; each node compacts its own | ||
| Lance dataset on its own schedule without affecting peers |
There was a problem hiding this comment.
Do not describe compaction as local-only
When one replica runs compact_files, Lance commits a new manifest/version with a different fragment set, so it cannot be invisible to peers under the architecture described here: this document says the manifest update is the commit acknowledgement and later uses manifest hashes plus missing fragments for anti-entropy. Scheduling compaction independently on each node will therefore make logically equivalent replicas appear divergent and can cause compaction manifests or fragments to be overwritten or unnecessarily shipped; the pattern needs to replicate the compaction commit or define a different logical-state synchronization scheme.
Useful? React with 👍 / 👎.
| fragments which slow scans). It is NOT a tombstone-reclaim cycle — | ||
| Lance is append-only at the version level, so there are no tombstones | ||
| to reclaim in the LSM sense. File compaction takes existing | ||
| append-only fragments and produces new append-only fragments at a | ||
| better layout. |
There was a problem hiding this comment.
Account for deleted-row reclamation during compaction
For datasets that use Lance deletes, updates, or dropped columns, DatasetOptimizer.compact_files is not only a small-fragment layout optimization: the pinned Lance 7.0.0 implementation documents that it removes deleted rows and dropped columns by default. Although these are not LSM tombstones, describing compaction as having no reclamation role understates its space-management purpose and can lead operators to omit it from retention and storage planning.
Useful? React with 👍 / 👎.
| a built-in lance-graph crate.** Lance's own `versions()` log is the | ||
| time-axis (cross-session index of which identity positions changed | ||
| when). An earlier version of this doc cited `vort/vart` as if it |
There was a problem hiding this comment.
Stop calling versions() a changed-position index
For adopters building the proposed time-axis index, Lance versions() does not identify which identity positions changed: VersionedGraph::versions() in crates/lance-graph/src/graph/versioned.rs simply returns Vec<lance::dataset::Version> metadata for the nodes dataset. Determining changed identities requires comparing snapshots or maintaining a separate change index, so this parenthetical overstates the shipped surface.
Useful? React with 👍 / 👎.
…sions-is-not-a-change-index Three P2 findings on PR #454 walked back, all real overclaims: P2 1 (APPEND_ONLY_RAFT_DOVETAIL.md): Walked back 'compaction runs INDEPENDENTLY of consensus replication'. Under peer-Raft, compaction COMMITS a new manifest version which IS part of the consensus log; the compaction OPERATION runs locally but its OUTPUT (new fragments + new manifest delta) flows through Raft like any other write. So peers see it. The independence claim should have been narrower: the SCHEDULING of the operation is local; the OUTCOME replicates. P2 2 (APPEND_ONLY_RAFT_DOVETAIL.md): Walked back 'no tombstones to reclaim in the LSM sense; layout optimization only'. Lance compact_files DOES reclaim deleted rows and dropped columns by default. Different mechanism from LSM tombstones (Lance has no tombstones at the version level; rows are deleted via deletion vectors which compaction materializes away) but the functional role of reclamation IS present for datasets that use deletes, updates, or dropped columns. The doc now distinguishes 'no LSM-style tombstone reclaim' from 'has deletion-vector reclaim and dropped-column reclaim'. P2 3 (CLUSTER_ASYMMETRY.md): Walked back the claim that Lance versions() is 'the time-axis index of which identity positions changed when'. versions() returns Vec<lance::dataset::Version> metadata: snapshot tags + timestamps, not a change-set index. To find which identities changed between versions, adopters compare snapshots OR maintain a separate index. The corrected bullet describes versions() as the version-snapshot log (which it is) and notes that the change-set derivation is consumer code. Provenance: Codex P2 review on PR #454 commit 16f879b.
Summary
Post-merge corrections on the two architecture docs from PR #452 + #453. Both PRs merged with content the reviewer + peer-session flagged AFTER merge; this PR brings main in line with the post-review state. Doc-only, zero code, paired files.
What's corrected
docs/CLUSTER_ASYMMETRY.md(PR #453's vart drift)The bullet about HHTL-prefix dedup cited
vort/vart adaptive radix trieas if it were a shipped crate. A workspace scan across allCargo.tomlfiles confirms vart is not a dependency in this repository; the radix-shaped trie at the cognitive layer is a proposed consumer pattern (no shipped crate name). The corrected bullet cites the two shipped surfaces:lance-graph-contract::hhtl::NiblePath(shipped) — the identity primitive (16-fan-out nibble path packed into a u64)versions()(shipped) — the time-axisAdopters can derive an adaptive radix-trie index over
NiblePathaddresses themselves; that data structure is consumer code, not a lance-graph dep.docs/APPEND_ONLY_RAFT_DOVETAIL.md(PR #452's same critique class as PR #453)Three parallel fixes to the ones PR #453 received:
openraft/surreal-cluster/ external TiKV).DatasetOptimizer.compact_filesfor fragment layout; just qualitatively different from LSM tombstone-reclaim + run-merge. Not absent. Operators still plan for it; just not coordinated with replication.Why follow-up rather than amend the original PRs
PR #452 and #453 already merged. New PR is the cleanest path. Bundling both files into one follow-up because both fix the SAME class of critique (overclaim + missing scope caveat) and reviewers can diff them together.
What this PR does NOT change
Provenance
AdaWorldAPI/bardiocPR docs: add hot/cold path architecture and documentation drift audit #15 conversation thread, 2026-06-03.agent-logs/upstream-contributions/lance-graph-{05,06}-*.mdSummary by CodeRabbit