Skip to content

docs: cluster asymmetry — capacity-forced vs availability-chosen clustering#453

Merged
AdaWorldAPI merged 3 commits into
mainfrom
docs/cluster-asymmetry
Jun 3, 2026
Merged

docs: cluster asymmetry — capacity-forced vs availability-chosen clustering#453
AdaWorldAPI merged 3 commits into
mainfrom
docs/cluster-asymmetry

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

Adopters of lance-graph routinely import the Cassandra/JanusGraph operations playbook when planning a replicated deployment. They expect to need a large cluster (5-10+ nodes), to spread the data via consistent hashing, to schedule compactions, to budget for cross-node coordinator queries, to monitor anti-entropy. They overprovision and over-operate as a result.

This doc proposes capturing the architectural asymmetry that makes lance-graph cluster operations qualitatively different: OLD stacks (Cassandra+JG, ElasticSearch, etc.) cluster because they have to — the data does not fit on one node, so they shard. Lance-graph consumers cluster because they choose to — the data does fit on one node, and clustering serves availability, geography, and load distribution.

Same word ("cluster"), opposite cost structure. Naming the asymmetry upstream prevents costly mis-deployments.

Doc location

This PR places the doc at docs/CLUSTER_ASYMMETRY.md to match the existing ALL_CAPS convention in docs/. Happy to relocate to .claude/knowledge/ or docs/architecture/<lowercase-kebab>.md per maintainer preference — see Asks below.

Provenance

  • Cluster-asymmetry framing: AdaWorldAPI/bardioc PR docs: add hot/cold path architecture and documentation drift audit #15 conversation thread, 2026-06-03. User insight: "clustering in Cassandra / Janusgraph is most probably due to poor performance and data; if you compare that we hold Wikidata locally and Cassadra Janusgraph would need multiple instances".
  • Sister doc: lance-graph-05-lance-append-raft-dovetail.md (same conversation thread) captures the storage-vs-consensus dovetail property; both docs target the same audience and are best read together.
  • bardioc roadmap subsection: ROADMAP_RUST_PRIMARY_HEADSTONE.md § "Why OLD stack clusters vs why NEW stack clusters (cluster ≠ cluster)" — the in-bardioc canonical statement of this asymmetry.
  • Compression stack reference: lance-graph/crates/{bgz17, highheelbgz, bgz-tensor/src/hhtl_d.rs} — the encoding cascade that the doc cites. The numerical compression claims are derived from the existing per-row footprints documented in those crates' moduledocs.
  • Wikidata locality probe: AdaWorldAPI/Lance-graph PR docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph #444 — the 98.6% intra-family locality finding used in the HHTL-banked CAM-PQ paragraph.
  • vort/vart prior framing: AdaWorldAPI/bardioc .claude/TECH_DEBT.md collapse Module 6: #[track_caller] error macros for zero-cost location capture #2, where the radix-trie was first established as the time-axis above HHTL identity space.

Asks

  1. Confirm contribution shape welcome. Same shape as the sister doc lance-graph-05-lance-append-raft-dovetail.md. Both ship independently; either can be accepted without the other.
  2. Pick the doc home: docs/architecture/cluster-asymmetry.md (project-public) or .claude/knowledge/cluster-asymmetry.md (Claude-Code knowledge layer) — same recommendation as the sister doc (project-public so non-Claude-Code adopters see it).
  3. Decide on the specific numerical compression claims. The "1-3 orders of magnitude vs LSM wide-column" claim is derived from operational experience with the bardioc reference consumer. If upstream maintainers want a controlled benchmark before the claim lands, the doc can be reworded as "significantly smaller (we observe ~1-3 orders of magnitude with the encoding cascade, pending benchmark PR)".
  4. Bundle the sister doc: prefer landing append-only-raft-dovetail.md (companion PR) before this one because the consensus-tax discussion here references that doc. Or land both in the same review pass.
  5. Decide on the "When you actually DO need capacity-forced sharding" section. Brief; honest scope; but it surfaces an architectural question (application-level sharding pattern) that may deserve its own doc. Keep it brief here OR strip it and surface as a separate proposal later.

This PR is doc-only. Zero code changes, zero tests, no API impact. Review focus is on (a) whether the architectural property described is welcome content for Lance-graph, (b) the doc location, and (c) the honest-scope sections.

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guidance on cluster architecture and deployment for lance-graph: detailed comparison of availability-chosen vs capacity-sharded designs, effects on data placement, read/write latency, replication and anti-entropy, compaction/rehash operational trade-offs, recommended peer‑Raft full-dataset deployment pattern, guidance for when to use application-level sharding, and explicit limitations/non-claims.

Review Change Stack

Names why lance-graph consumers cluster qualitatively differently than
Cassandra-era stacks: data fits per-node (compression cascade +
vort/vart trie); peer-Raft replicates the FULL dataset to each node;
three-node clusters are production, not toy; no compaction; no
rebalancing; no cross-node fan-out for reads. Pairs with the
append-only-raft-dovetail PR.
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d61b95b9-e824-45b7-86bc-99901580b815

📥 Commits

Reviewing files that changed from the base of the PR and between 0007c83 and 8d8ad00.

📒 Files selected for processing (1)
  • docs/CLUSTER_ASYMMETRY.md
✅ Files skipped from review due to trivial changes (1)
  • docs/CLUSTER_ASYMMETRY.md

📝 Walkthrough

Walkthrough

This PR adds docs/CLUSTER_ASYMMETRY.md, documenting lance-graph’s availability-chosen (replicated) clustering versus Cassandra-style capacity-forced (sharded) clustering, justifying full-dataset-per-node deployments, describing operational consequences, Raft implications, occasional sharding alternatives, and a reference three-node deployment pattern.

Changes

Cluster Design Documentation

Layer / File(s) Summary
Cluster model contrast and technical justification
docs/CLUSTER_ASYMMETRY.md
Document framing contrasts availability-chosen clustering with capacity-forced models via a side-by-side behavioral comparison table and storage-amplification clarification.
Why consumers fit on one node
docs/CLUSTER_ASYMMETRY.md
Lists typical encoding/layout components and provides a concrete Wikidata-scale example concluding each peer replica holds the full dataset.
Operational consequences, anti-entropy, and rebalance
docs/CLUSTER_ASYMMETRY.md
Enumerates operational knock-on effects: three-node norm, no coordinator fan-out, different compaction model, manifest-hash anti-entropy (O(1) identification), and lack of token-range rebalancing/data movement.
Raft consensus tax and sharding alternatives
docs/CLUSTER_ASYMMETRY.md
Explains Raft write and linearizable-read implications under availability-chosen clustering and defines rare scenarios needing capacity-forced sharding, recommending application-level sharding with separate peer-Raft + Lance clusters per shard.
Non-claims and reference deployment pattern
docs/CLUSTER_ASYMMETRY.md
Lists explicit non-claims/limitations (e.g., single-node insufficient for HA) and provides a recommended peer-Raft replicated reference deployment pattern validated at Wikidata scale.

Estimated code review effort:
🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs:

  • AdaWorldAPI/lance-graph#452: Related documentation about append-only write shapes, replication/compaction trade-offs, and deployment recommendations.

Poem:

🐰 A chart of clusters, so clear and bright,
One node holds wonders, three guard the night.
No fan-out, no shuffle, compaction less fraught—
Raft keeps the ledger, manifest-hash spots what's not.
ᓚᘏᗢ

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main architectural change documented in the PR: contrasting capacity-forced vs availability-chosen clustering models, which is the core subject of the new CLUSTER_ASYMMETRY.md file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/cluster-asymmetry

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef93e287a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/CLUSTER_ASYMMETRY.md
Comment on lines +215 to +216
- Writes serve via Raft quorum to the local leader; replicated as Lance
fragment appends

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not document an unimplemented Raft deployment

For users of this repository, this replication guarantee is not implementable: the current Lance integration only resolves and opens datasets in crates/lance-graph/src/query.rs:779-852, and a repo-wide search finds no Raft dependency, substrate-b binary, voter configuration, or write-replication path. Presenting quorum-replicated writes as the recommended lance-graph deployment can cause adopters to assume their data is highly available when lance-graph itself provides no such capability; mark this as an external/proposed bardioc architecture rather than a supported deployment pattern.

Useful? React with 👍 / 👎.

Comment thread docs/CLUSTER_ASYMMETRY.md Outdated
too many nodes compact simultaneously, the cluster's effective
replication factor temporarily drops.

Lance has no compaction. The version log IS the truth. Garbage collection

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not tell operators that Lance has no compaction

For append-heavy deployments—especially the fragment appends recommended later in this document—this is false. Lance provides DatasetOptimizer.compact_files because many small appends create small fragments and a poor layout that slows queries; the official table-maintenance documentation explicitly describes this operation. Telling operators that compaction does not exist can leave production datasets unoptimized, so distinguish Lance file compaction from LSM-tree compaction instead of saying there is none.

Useful? React with 👍 / 👎.

Comment thread docs/CLUSTER_ASYMMETRY.md Outdated
| Reason to cluster | Data does not fit on one node | Data fits on one node; replicate for HA + geo |
| Each node holds | 1/N of the data (consistent-hash shards) | 100% of the data |
| Hot-path reads | Coordinator pattern; fan-out to shard owners | Local; no cross-node hop |
| Replication factor | 3-5 (storage AMPLIFICATION on top of sharding) | 3 (just for HA; no amplification) |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for full-replica storage amplification

When each of three nodes holds 100% of the dataset, the cluster consumes roughly three times the single-dataset storage, so describing replication factor 3 as having “no amplification” gives readers an incorrect disk-capacity model. The availability-chosen design may avoid additional shard-management overhead, but it still has replication storage amplification and should be budgeted as such.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/CLUSTER_ASYMMETRY.md (2)

130-133: ⚡ Quick win

Clarify the O(1) anti-entropy claim.

The claim that anti-entropy identification is "O(1)" depends on the manifest hash comparison being constant-time. This is accurate if the manifest itself is a bounded-size data structure (which it should be), but it may be worth clarifying that the O(1) refers to the identification decision, not necessarily the streaming cost of catching up (which is mentioned separately as "bounded by the actual divergence").

Consider adding a brief clarification that O(1) refers specifically to the "is-in-sync?" decision, independent of dataset size.

📝 Proposed clarification
 Lance peer-Raft anti-entropy: compare the manifest hash between nodes.
 If equal, sync. If not, ship missing fragments + the new manifest. The
-IDENTIFICATION step is O(1). The streaming step is bounded by the actual
-divergence, not by the dataset size.
+IDENTIFICATION step is O(1) — a single hash comparison, independent of 
+dataset size. The streaming step is bounded by the actual divergence, 
+not by the total dataset size.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/CLUSTER_ASYMMETRY.md` around lines 130 - 133, Clarify that the "O(1)"
remark refers specifically to the IDENTIFICATION step (the constant-time
manifest hash comparison) rather than the overall anti-entropy process; update
the paragraph mentioning "manifest hash" and "IDENTIFICATION" to state
explicitly that O(1) describes the is-in-sync decision cost assuming a
bounded-size manifest, and that streaming/catch-up costs remain proportional to
the actual divergence (not the dataset size).

193-197: ⚡ Quick win

Consider strengthening the caveat about varying compression ratios.

This non-claim appropriately acknowledges that "specific numbers vary" across different consumers. However, the earlier sections make very specific claims about byte sizes and compression ratios that appear universal. Consider whether the doc should more explicitly caveat the specific numbers throughout (e.g., "in the bardioc B1 reference implementation" or "typical values") or strengthen this non-claim to be a more prominent disclaimer earlier in the document.

This is consistent with the PR objectives question about "whether to keep specific numerical compression claims or soften them pending benchmarks."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/CLUSTER_ASYMMETRY.md` around lines 193 - 197, Strengthen the caveat
about compression variability by updating the paragraph that currently starts
"All Lance + Raft stacks ship the same compression" to explicitly flag that the
byte-size and compression-ratio examples are from the bardioc B1 reference
implementation and may not be representative; reword to something like "In the
bardioc B1 reference implementation (typical values)" or prepend a prominent
disclaimer earlier in the doc (intro/summary) noting that specific numbers are
implementation-dependent and subject to benchmark variation—ensure the existing
mention of "bardioc B1 reference consumer" remains and is made more prominent so
readers understand the numbers are not universal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/CLUSTER_ASYMMETRY.md`:
- Around line 130-133: Clarify that the "O(1)" remark refers specifically to the
IDENTIFICATION step (the constant-time manifest hash comparison) rather than the
overall anti-entropy process; update the paragraph mentioning "manifest hash"
and "IDENTIFICATION" to state explicitly that O(1) describes the is-in-sync
decision cost assuming a bounded-size manifest, and that streaming/catch-up
costs remain proportional to the actual divergence (not the dataset size).
- Around line 193-197: Strengthen the caveat about compression variability by
updating the paragraph that currently starts "All Lance + Raft stacks ship the
same compression" to explicitly flag that the byte-size and compression-ratio
examples are from the bardioc B1 reference implementation and may not be
representative; reword to something like "In the bardioc B1 reference
implementation (typical values)" or prepend a prominent disclaimer earlier in
the doc (intro/summary) noting that specific numbers are
implementation-dependent and subject to benchmark variation—ensure the existing
mention of "bardioc B1 reference consumer" remains and is made more prominent so
readers understand the numbers are not universal.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5c87ec78-b09b-40c7-b6af-d9e64a922253

📥 Commits

Reviewing files that changed from the base of the PR and between ec1f7d2 and ef93e28.

📒 Files selected for processing (1)
  • docs/CLUSTER_ASYMMETRY.md

…honesty

Three codex findings on PR #453:

P1: scope caveat - peer-Raft + Lance-local is an EXTERNAL architecture
pattern (bardioc B1 substrate-b), NOT a built-in lance-graph feature.
Adopters provide the Raft layer themselves (openraft / surreal-cluster
/ external TiKV); lance-graph provides the columnar storage +
DataFusion + encoding crates that MAKE the pattern cheap, not the
pattern itself. Added a scope banner after the TL;DR and a per-line
reminder on the Recommended deployment pattern section.

P2 compaction: Lance has compaction TOO
(DatasetOptimizer.compact_files for fragment layout optimization),
just qualitatively different from LSM tombstone-reclaim+run-merge.
Rewrote section 3 to distinguish LSM compaction from Lance file
compaction; operators should plan for the latter.

P2 replication amplification: 3 replicas of a 5GB dataset is 15GB
of total disk regardless of distribution shape. Replication
amplification does NOT disappear in availability-chosen clustering;
it does not compound with shard count (the architectural advantage).
Fixed the comparison table row and added an honesty paragraph.
…on table

The previous codex-review fix updated section 3 of operational
consequences but missed the parallel overclaim in the side-by-side
Capacity-forced vs Availability-chosen table row. Now both
occurrences distinguish LSM compaction from Lance file compaction
consistently.
@AdaWorldAPI AdaWorldAPI merged commit 16f879b into main Jun 3, 2026
1 check passed
AdaWorldAPI added a commit that referenced this pull request Jun 3, 2026
…architecture

docs: post-merge corrections on PR #452 + #453 architecture docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant