Skip to content

to 4.0: fix(data-branch): unify diff/merge collect-range over arbitrary DAG depth (cherry-pick #24371 to 4.0-dev)#24378

Merged
mergify[bot] merged 6 commits into
matrixorigin:4.0-devfrom
gouhongshen:fix/data-branch-diff-chain-walk-4.0-dev
May 14, 2026
Merged

to 4.0: fix(data-branch): unify diff/merge collect-range over arbitrary DAG depth (cherry-pick #24371 to 4.0-dev)#24378
mergify[bot] merged 6 commits into
matrixorigin:4.0-devfrom
gouhongshen:fix/data-branch-diff-chain-walk-4.0-dev

Conversation

@gouhongshen
Copy link
Copy Markdown
Contributor

Backport of #24371 to 4.0-dev.

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #21979 (cherry-picked from #24371)

What this PR does / why we need it:

Cherry-picks the 5 substantive commits from #24371 (skips the
intermediate Merge branch 'main' merge commit, which only resolves
context against main). All commits applied cleanly with no conflicts.

commit summary
0b566bbe75 fix(data-branch): unify diff collect-range over arbitrary DAG depth
b0f44f14d5 chore(data-branch): clean up dead comments and tangled defer in DAG resolver
e844aaf7ff refactor(data-branch): drop redundant lcaType / branchTS fields
b63cdac409 fix(data-branch): correct LCA-snapshot helpers for chain merge
e193420425 fix(data-branch): address review comments and fix gofmt

The final tree state of all 11 modified files (pkg/frontend/data_branch*.go,
pkg/frontend/databranchutils/branch_dag.go, and the new BVT cases under
test/distributed/cases/git4data/branch/{diff,merge}/) is byte-identical
to the head of #24371 (verified via per-file shasum diff).

TL;DR (from #24371)

data branch diff and data branch merge previously dropped rows
silently whenever the LCA → endpoint chain in the branch DAG had any
intermediate node that performed inserts/updates/deletes before the
endpoint was forked. Symptom: rows that the endpoint inherits via
multi-level cloning were never compared on its side of the diff, so
downstream merges failed to migrate them too. The fix replaces the
historical case-by-case logic with a single ancestor-walking model
that works uniformly for any DAG depth and shape.

Refer to the original PR for the full bug analysis, fix design, and
running 19-node tree example used throughout the new code comments.

Verification on 4.0-dev

  • pkg/frontend build + go vet clean.
  • pkg/frontend unit tests pass (including
    TestHashDiff_NoLCAWithStubHandles, TestDiffDataHelper_*,
    TestRunLCAProbeWithReaderFallback_*, TestHandleDelsOnLCA_*,
    TestLCAProbeJoinCastType).
  • gofmt -l clean on all five modified Go files.
  • New BVT cases (diff_12, diff_13, merge_8) included with their
    corresponding .result files.

Special notes for your reviewer:

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

gouhongshen and others added 5 commits May 13, 2026 16:38
`data branch diff` previously walked only the two endpoint relations
when computing change streams. Mutations performed on intermediate
ancestors (e.g. t1 in a chain t0->t1->t2) that were inherited into the
endpoint via cloning never reached the diff, causing rows to silently
disappear from `diff t2 against t0`.

Replace the old case-by-case branches (LCA == base / LCA == tar / LCA
elsewhere with three sub-conditions each) with a single ancestor walk
driven by the LCA->endpoint paths captured from the branch DAG. For
each ancestor A on path[LCA, endpoint] we collect A's own mutations
within (A.CTS, nextChildOnPath.CloneTS] (or (A.CTS, endpointSP] when
A is the endpoint). The LCA's prefix that the other side inherits
identically is pruned to skip redundant IO. The model generalizes to
any tree depth and any pair of nodes.

Add `DataBranchDAG.PathFromAncestor` and the corresponding chain
fields on `branchMetaInfo` to thread the path information from the
DAG resolver to the collect-range builder. The new
`buildSideCollectRange` helper materializes one side of the diff
uniformly; the historical case-2/3/4 logic is gone.

BVT additions exercise the new code path:
* test/distributed/cases/git4data/branch/diff/diff_12.sql:
  12 cases covering linear chains, mid-chain writes, snapshot mixes,
  no-PK chains, and post-fork ancestor mutations.
* test/distributed/cases/git4data/branch/diff/diff_13.sql:
  one giant tree case (19 nodes, depth 4, 10 leaves) with 10 diff
  pairs across non-adjacent levels.

Whole `git4data/branch/diff/` BVT now passes 1317/1317.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esolver

Follow-up to the unified collect-range fix. No semantic change; BVT
still passes 1317/1317.

* Drop stale comments left over from the pre-unified design:
  - "1. has no lca / 2. ..." ASCII case list in
    decideLCABranchTSFromBranchDAG that described the old branch_t1_ts
    / branch_t2_ts model.
  - Lone "Boundary semantics:" header in decideCollectRange whose
    body was already absorbed into the new running example block.
  - "// has no lca" trailing comment in diffOnBase that contradicted
    the lca-handling block right above it.
* Refactor decideLCABranchTSFromBranchDAG so the LCA path information
  is computed in a single linear flow (FindLCA → PathFromAncestor →
  switch on lcaTableID) instead of a `var (...)` block + defer that
  re-wired branchInfo from outside. Adds a small `buildTSs` helper
  so the int64-to-types.TS lift no longer repeats four times. The
  function-level doc comment now states what each output field is
  for and who consumes it downstream.
* Drop an unused error-wrap path: the previous code re-wrapped
  `dag.GetCloneTS` not-found into an internal error even though the
  DAG always populates CloneTS for any node it knows about (FindLCA
  has already validated descent). The simpler `_, _ := dag.GetCloneTS`
  matches the intent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eliminate the last vestiges of the pre-unified design where
branchMetaInfo carried three pieces of information that all derive
from the LCA->endpoint paths:

  * lcaType (lcaEmpty / lcaLeft / lcaRight / lcaOther) — was set in
    five places but only ever read at one decision point ("is the
    side empty?"); the other reads forwarded it through diffDataHelper
    and buildHashmapForTable as a *dead parameter*. Replace the gate
    with branchMetaInfo.hasLCA() and remove the dead parameter
    entirely.
  * tarBranchTS / baseBranchTS — per-side LCA snapshot used by
    findDeleteAndUpdateBat and diffOnBase. Both values are exactly
    pathFromLCAToOther[1].cloneTS (or otherSP when the other endpoint
    is the LCA). Expose three small helpers on branchMetaInfo —
    tarLCASnapshot, baseLCASnapshot, lcaProbeSnapshot — and let the
    consumers compute on demand.

Other touch-ups in the same spirit:

  * decideLCABranchTSFromBranchDAG now also handles the
    same-table-id-no-DAG case directly with synthetic single-node
    paths, instead of leaking the special-case setup into
    decideCollectRange. The function is once again single-purpose.
  * The lcaEmpty / lcaLeft / lcaRight / lcaOther constants are gone.
  * data_branch_hashdiff_test.go updated for the new signatures.

Whole `git4data/branch/diff/` BVT still passes 1317/1317. Frontend
unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous refactor swapped the meaning of pathFromLCAToTar vs
pathFromLCAToBase in the LCA-snapshot helpers, which broke
no-PK / fake-PK merge cases (merge_6, merge_7) where tombstones on
the descendant side were resolved against the wrong moment of the
LCA. The right semantics:

  tarLCASnapshot returns the moment tar's view of the LCA was
  anchored — tar's first-child CloneTS when tar forked off the LCA,
  otherwise base's first-child CloneTS, otherwise baseSP. The
  baseLCASnapshot mirror works the same way.

This matches the original `tarBranchTS = CloneTS(tarFirstChild)`
behavior. After the fix the whole merge/ BVT (merge_1..7) goes back
to 100%, and the chain merge_8.sql (172 statements over 9 cases)
also passes 100%.

Includes:
* test/distributed/cases/git4data/branch/merge/merge_8.{sql,result}
  — chain merge BVT covering cascade up the chain, direct grandchild
  -> grandparent merge, conflict skip/accept across multiple chain
  depths, snapshot-scoped, no-PK chain, alternating-round merges,
  and order-of-merges regression.
* Regenerated diff_12.result / diff_13.result against the corrected
  helpers (no semantic change, only the LCA probe TS that drives
  tombstone resolution).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* SCA fail (gofmt): the ASCII tree art in three doc comments needed
  tab-indented continuation lines. Run gofmt -w; no semantic change.
* Must Fix (Qodo / 7-jury review): PathFromAncestor's ok=false
  return was discarded with `_`, which would leak nil slices into
  buildSideCollectRange and panic at `selfPath[len-1]`. Propagate
  it as an internal error so a corrupted DAG fails fast and loud
  instead of crashing the diff.
* Should Fix: add defensive length check before ctsList[0] in
  buildSideCollectRange — getTablesCreationCommitTS theoretically
  could return an empty slice on RPC corner cases.
* Should Fix: bound PathFromAncestor's walk by maxBranchDAGDepth
  (=1024) so a Parent-pointer cycle in the DAG cannot cause an
  infinite loop. Real chains never approach this; deep observed
  trees sit around 16 levels.
* Should Fix: extract tableStuff.resolvedSnapshots so the per-side
  snapshot derivation lives in one place. Replaces three
  duplicated copies in decideCollectRange, diffOnBase, and
  hashDiffIfHasLCA — the same triple-duplication that previously
  hid the LCA-snapshot mismatch bug fixed in dd0205e.

Whole `git4data/branch/diff/` BVT still 1317/1317. Whole
`git4data/branch/merge/` BVT still 466/466. Frontend unit tests
pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@matrix-meow matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label May 13, 2026
@mergify mergify Bot added kind/bug Something isn't working kind/test-ci kind/refactor Code refactor labels May 13, 2026
@gouhongshen gouhongshen changed the title fix(data-branch): unify diff/merge collect-range over arbitrary DAG depth (cherry-pick #24371 to 4.0-dev) to 4.0: fix(data-branch): unify diff/merge collect-range over arbitrary DAG depth (cherry-pick #24371 to 4.0-dev) May 13, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 14, 2026

Merge Queue Status

  • Entered queue2026-05-14 03:06 UTC · Rule: release-4.0
  • Checks skipped · PR is already up-to-date
  • Merged2026-05-14 03:06 UTC · at 20e00b9e60120a21eeb86d9de5379d401b80317b · squash

This pull request spent 19 seconds in the queue, including 2 seconds running CI.

Required conditions to merge

@mergify mergify Bot merged commit 1682cf5 into matrixorigin:4.0-dev May 14, 2026
57 of 64 checks passed
@mergify mergify Bot removed the queued label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working kind/refactor Code refactor kind/test-ci size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants