fix(data-branch): unify diff/merge collect-range over arbitrary DAG depth#24371
Merged
mergify[bot] merged 10 commits intoMay 15, 2026
Merged
Conversation
`data branch diff` previously walked only the two endpoint relations when computing change streams. Mutations performed on intermediate ancestors (e.g. t1 in a chain t0->t1->t2) that were inherited into the endpoint via cloning never reached the diff, causing rows to silently disappear from `diff t2 against t0`. Replace the old case-by-case branches (LCA == base / LCA == tar / LCA elsewhere with three sub-conditions each) with a single ancestor walk driven by the LCA->endpoint paths captured from the branch DAG. For each ancestor A on path[LCA, endpoint] we collect A's own mutations within (A.CTS, nextChildOnPath.CloneTS] (or (A.CTS, endpointSP] when A is the endpoint). The LCA's prefix that the other side inherits identically is pruned to skip redundant IO. The model generalizes to any tree depth and any pair of nodes. Add `DataBranchDAG.PathFromAncestor` and the corresponding chain fields on `branchMetaInfo` to thread the path information from the DAG resolver to the collect-range builder. The new `buildSideCollectRange` helper materializes one side of the diff uniformly; the historical case-2/3/4 logic is gone. BVT additions exercise the new code path: * test/distributed/cases/git4data/branch/diff/diff_12.sql: 12 cases covering linear chains, mid-chain writes, snapshot mixes, no-PK chains, and post-fork ancestor mutations. * test/distributed/cases/git4data/branch/diff/diff_13.sql: one giant tree case (19 nodes, depth 4, 10 leaves) with 10 diff pairs across non-adjacent levels. Whole `git4data/branch/diff/` BVT now passes 1317/1317. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esolver
Follow-up to the unified collect-range fix. No semantic change; BVT
still passes 1317/1317.
* Drop stale comments left over from the pre-unified design:
- "1. has no lca / 2. ..." ASCII case list in
decideLCABranchTSFromBranchDAG that described the old branch_t1_ts
/ branch_t2_ts model.
- Lone "Boundary semantics:" header in decideCollectRange whose
body was already absorbed into the new running example block.
- "// has no lca" trailing comment in diffOnBase that contradicted
the lca-handling block right above it.
* Refactor decideLCABranchTSFromBranchDAG so the LCA path information
is computed in a single linear flow (FindLCA → PathFromAncestor →
switch on lcaTableID) instead of a `var (...)` block + defer that
re-wired branchInfo from outside. Adds a small `buildTSs` helper
so the int64-to-types.TS lift no longer repeats four times. The
function-level doc comment now states what each output field is
for and who consumes it downstream.
* Drop an unused error-wrap path: the previous code re-wrapped
`dag.GetCloneTS` not-found into an internal error even though the
DAG always populates CloneTS for any node it knows about (FindLCA
has already validated descent). The simpler `_, _ := dag.GetCloneTS`
matches the intent.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eliminate the last vestiges of the pre-unified design where
branchMetaInfo carried three pieces of information that all derive
from the LCA->endpoint paths:
* lcaType (lcaEmpty / lcaLeft / lcaRight / lcaOther) — was set in
five places but only ever read at one decision point ("is the
side empty?"); the other reads forwarded it through diffDataHelper
and buildHashmapForTable as a *dead parameter*. Replace the gate
with branchMetaInfo.hasLCA() and remove the dead parameter
entirely.
* tarBranchTS / baseBranchTS — per-side LCA snapshot used by
findDeleteAndUpdateBat and diffOnBase. Both values are exactly
pathFromLCAToOther[1].cloneTS (or otherSP when the other endpoint
is the LCA). Expose three small helpers on branchMetaInfo —
tarLCASnapshot, baseLCASnapshot, lcaProbeSnapshot — and let the
consumers compute on demand.
Other touch-ups in the same spirit:
* decideLCABranchTSFromBranchDAG now also handles the
same-table-id-no-DAG case directly with synthetic single-node
paths, instead of leaking the special-case setup into
decideCollectRange. The function is once again single-purpose.
* The lcaEmpty / lcaLeft / lcaRight / lcaOther constants are gone.
* data_branch_hashdiff_test.go updated for the new signatures.
Whole `git4data/branch/diff/` BVT still passes 1317/1317. Frontend
unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous refactor swapped the meaning of pathFromLCAToTar vs
pathFromLCAToBase in the LCA-snapshot helpers, which broke
no-PK / fake-PK merge cases (merge_6, merge_7) where tombstones on
the descendant side were resolved against the wrong moment of the
LCA. The right semantics:
tarLCASnapshot returns the moment tar's view of the LCA was
anchored — tar's first-child CloneTS when tar forked off the LCA,
otherwise base's first-child CloneTS, otherwise baseSP. The
baseLCASnapshot mirror works the same way.
This matches the original `tarBranchTS = CloneTS(tarFirstChild)`
behavior. After the fix the whole merge/ BVT (merge_1..7) goes back
to 100%, and the chain merge_8.sql (172 statements over 9 cases)
also passes 100%.
Includes:
* test/distributed/cases/git4data/branch/merge/merge_8.{sql,result}
— chain merge BVT covering cascade up the chain, direct grandchild
-> grandparent merge, conflict skip/accept across multiple chain
depths, snapshot-scoped, no-PK chain, alternating-round merges,
and order-of-merges regression.
* Regenerated diff_12.result / diff_13.result against the corrected
helpers (no semantic change, only the LCA probe TS that drives
tombstone resolution).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
mergify Bot
pushed a commit
that referenced
this pull request
May 14, 2026
…ry DAG depth (cherry-pick #24371 to 4.0-dev) (#24378) Cherry-picks the 5 substantive commits from #24371 (skips the intermediate `Merge branch 'main'` merge commit, which only resolves context against `main`). All commits applied cleanly with no conflicts. | commit | summary | | --- | --- | | `0b566bbe75` | fix(data-branch): unify diff collect-range over arbitrary DAG depth | | `b0f44f14d5` | chore(data-branch): clean up dead comments and tangled defer in DAG resolver | | `e844aaf7ff` | refactor(data-branch): drop redundant lcaType / branchTS fields | | `b63cdac409` | fix(data-branch): correct LCA-snapshot helpers for chain merge | | `e193420425` | fix(data-branch): address review comments and fix gofmt | The final tree state of all 11 modified files (`pkg/frontend/data_branch*.go`, `pkg/frontend/databranchutils/branch_dag.go`, and the new BVT cases under `test/distributed/cases/git4data/branch/{diff,merge}/`) is byte-identical to the head of #24371 (verified via per-file shasum diff). ### TL;DR (from #24371) `data branch diff` and `data branch merge` previously dropped rows silently whenever the LCA → endpoint chain in the branch DAG had any intermediate node that performed inserts/updates/deletes before the endpoint was forked. Symptom: rows that the endpoint inherits via multi-level cloning were never compared on its side of the diff, so downstream merges failed to migrate them too. The fix replaces the historical case-by-case logic with a single ancestor-walking model that works uniformly for any DAG depth and shape. Refer to the original PR for the full bug analysis, fix design, and running 19-node tree example used throughout the new code comments. ### Verification on 4.0-dev * `pkg/frontend` build + `go vet` clean. * `pkg/frontend` unit tests pass (including `TestHashDiff_NoLCAWithStubHandles`, `TestDiffDataHelper_*`, `TestRunLCAProbeWithReaderFallback_*`, `TestHandleDelsOnLCA_*`, `TestLCAProbeJoinCastType`). * `gofmt -l` clean on all five modified Go files. * New BVT cases (`diff_12`, `diff_13`, `merge_8`) included with their corresponding `.result` files. Approved by: @XuPeng-SH, @heni02
7 tasks
Contributor
Merge Queue Status
This pull request spent 47 seconds in the queue, including 7 seconds running CI. Required conditions to merge
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Which issue(s) this PR fixes:
issue #21979
What this PR does / why we need it:
TL;DR
data branch diffanddata branch mergepreviously dropped rowssilently whenever the LCA → endpoint chain in the branch DAG had any
intermediate node that performed inserts/updates/deletes before the
endpoint was forked. Symptom: rows that the endpoint inherits via
multi-level cloning were never compared on its side of the diff, so
downstream merges failed to migrate them too. The fix replaces the
historical case-by-case logic with a single ancestor-walking model
that works uniformly for any DAG depth and shape.
The bug
Concretely, with a chain
t0 → t1 → t2where t1 inserts(4, 40)before t2 is branched:
Expected (state-diff):
Before this PR:
Root cause:
decideCollectRangeonly walked the two endpointrelations. Mutations performed on intermediate ancestors (here, t1's
insert (4,40)) live in those intermediates' change streams, not ineither endpoint's. The previous case 2 / case 3 / case 4 sub-branches
each handled at most one extra LCA relation, never the full chain.
The fix — unified DAG-walk model
For each endpoint X we know the path
[LCA, …, X]from the DAG. Thenew mutations distinguishing the two endpoints live entirely on
path[LCA, endpoint]of each side; everything strictly above LCA isshared and reconciles to no diff. For each ancestor A on that path:
Plus one prune at the LCA: clamp
windowFrom_LCAup tootherFork.Next()so the prefix shared with the other side via cloneis skipped (would otherwise reconcile to nothing).
The single
buildSideCollectRangehelper applies this to eitherside. The historical
case 2 / case 3 / case 4branching is gone.Implementation
DataBranchDAG.PathFromAncestor(new): top-down chain from LCA toany descendant.
branchMetaInfo(refactored): now carriespathFromLCAToTar / pathFromLCAToBaseas the single source oftruth.
lcaType / lcaLeft / lcaRight / lcaOther / tarBranchTS / baseBranchTSare gone — their information is derived on demandvia
hasLCA() / tarLCASnapshot() / baseLCASnapshot() / lcaProbeSnapshot().decideCollectRange(rewritten): three branches kept (case 0same-table, case 1 unrelated tables, case 2 unified
has-LCA chain walk) — the entire pre-existing case 2/3/4 logic
collapses into one loop driven by the path.
lcaType intparameters removed fromdiffDataHelper,hashDiffIfNoLCA, andbuildHashmapForTable(they were forwardedthrough three layers and never read).
The running example used throughout the new comments is a 19-node
multi-fork tree of depth 4 with 10 leaves (
t0root, fork to t1/t2/t3,each forks again twice, leaves t9..t18). Every diff/merge example in
the doc traces against this tree so reviewers can sanity-check the
window arithmetic against a concrete shape.
BVT additions
git4data/branch/diff/diff_12.sql— 12 cases covering linear chainswith IUD on every chain level, mid-chain post-branch writes,
snapshot mixes, no-PK chains, double-INSERT-same-PK reconciliation,
alternating multi-round IUD, and 1000-row scale.
git4data/branch/diff/diff_13.sql— one giant tree case (19 nodes,depth 4, 10 leaves) with 14 diff pairs spanning leaf↔root,
cross-subtree, sibling, cousin, intermediate↔intermediate variants.
git4data/branch/merge/merge_8.sql— 9 cases for chain merge:cascade up the chain, direct grandchild → grandparent merge,
conflict skip/accept across chain depths, snapshot-scoped, no-PK,
alternating-round merges, order-of-merges regression.
Verification
test/distributed/cases/git4data/branch/diff/BVT —1317/1317 PASS, 100%.
test/distributed/cases/git4data/branch/merge/BVT —466/466 PASS, 100%.
pkg/frontendunit tests pass.Special notes for your reviewer:
reproduction. The same shape underlies every chain failure
(deeper trees just compound it).
e4dc4d3099)initially regressed
merge_6/merge_7(no-PK cases). Rootcause was a swapped path lookup in the LCA-snapshot helpers,
fixed in the final commit
dd0205ebba. The unified designmade the bug surface as one helper to fix, not five
scattered branches — which is the point of the refactor.
etc/launch/cn.tomlcontains a local port override on myworkspace and is intentionally not in this PR.