Skip to content

Add explicit next-row-id monotonic increase check (#589)#1054

Open
hectar-glitches wants to merge 8 commits into
apache:mainfrom
hectar-glitches:main
Open

Add explicit next-row-id monotonic increase check (#589)#1054
hectar-glitches wants to merge 8 commits into
apache:mainfrom
hectar-glitches:main

Conversation

@hectar-glitches

Copy link
Copy Markdown
Contributor

Summary

Add explicit post-update validation to ensure next-row-id monotonically increases across snapshots. Previously, MetadataBuilder.validateAndUpdateRowLineage only checked that the snapshot's first-row-id wasn't behind the table's cursor, but didn't verify the cursor actually advances after applying AddedRows.

This fix rejects buggy producers that set AddedRows to 0 or negative values, which would silently leave the cursor unchanged or move it backwards.

Changes

Implementation

table/metadata.go: Added explicit post-update check in validateAndUpdateRowLineage: Rejects if newNextRowID <= previousNextRowID
Error message names both the previous and new values for clarity: Catches both AddedRows = 0 (stasis) and negative AddedRows (backward movement)

Tests

Added three comprehensive tests in table/metadata_builder_internal_test.go to check happy cases, rejected added rows if number of rows is 0, and rejecting negative added rows.

Fixes #589

@laskoviymishka laskoviymishka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs a bit more work.

The <= nextRowID guard rejects a case that the V3 spec explicitly allows. The spec requires first-row-id even when a commit does not assign any ID space, and Java’s TableMetadata adds addedRows unconditionally without a > 0 guard.

The concrete failure case is already in this repo: manifest.go:1518 only advances nextRowID for ManifestContentData manifests. That means any merge-on-read deletion via performMergeOnReadDeletion lands here with addedRows = 0 and will fail this guard.

The earlier checks already cover the real invariant we care about: added-rows >= 0 in ValidateRowLineage, plus first-row-id >= next-row-id, which prevents the cursor from moving backwards. That’s what the spec requires.

So I think this check should be < rather than <=, or possibly dropped entirely.

The helper change in rebuild_manifest_test.go — skipping the snapshot when nextRowID == 0 — is the strongest signal that something is off. It papers over the over-restriction instead of fixing the logic.

Also, the negative-rows test’s own comment says it is hitting the pre-existing ValidateRowLineage path, not the new branch, so it doesn’t really validate this new guard.

Comment thread table/metadata.go Outdated
Comment on lines +508 to +511
if newNextRowID <= nextRowID {
return fmt.Errorf("%w: next-row-id must advance from previous %d to new %d (added-rows %d)",
ErrInvalidRowLineage, nextRowID, newNextRowID, *snapshot.AddedRows)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v3 spec permits added-rows=0 explicitly — first-row-id is required "even if a commit does not assign any ID space" — and Java's TableMetadata applies addedRows unconditionally with no >0 guard. More concretely, manifest.go:1518 in this repo only advances nextRowID for ManifestContentData manifests, so any merge-on-read delete via performMergeOnReadDeletion will land here with addedRows=0 and fail. The existing checks (ValidateRowLineage rejecting added-rows<0 plus first-row-id≥next-row-id) already cover what the spec requires. Suggest dropping this block, or at minimum changing <= to < so the check only fires on genuine regression.

Comment thread table/metadata.go Outdated

newNextRowID := nextRowID + *snapshot.AddedRows
if newNextRowID <= nextRowID {
return fmt.Errorf("%w: next-row-id must advance from previous %d to new %d (added-rows %d)",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When AddedRows is 0 this prints from previous 50 to new 50 — same value twice, reads like a formatting bug. If the check stays in some form, consider: next-row-id did not advance: added-rows is %d (table next-row-id %d).

Comment thread table/metadata.go
newNextRowID := nextRowID + *snapshot.AddedRows
if newNextRowID <= nextRowID {
return fmt.Errorf("%w: next-row-id must advance from previous %d to new %d (added-rows %d)",
ErrInvalidRowLineage, nextRowID, newNextRowID, *snapshot.AddedRows)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrInvalidRowLineage signals a lineage-chain break — cursor backwards, missing first-row-id. AddedRows=0 doesn't break the chain, it just doesn't extend it. If the check is kept, a different sentinel matches the failure mode better.

Comment thread table/metadata_builder_internal_test.go Outdated
Comment on lines +1672 to +1693
func TestAddSnapshotV3NextRowIDMustAdvance(t *testing.T) {
// Test that next-row-id must advance after applying a snapshot
builder := builderWithoutChanges(3)
schemaID := 0
firstRowID := int64(0)
addedRows := int64(50)

snapshot := Snapshot{
SnapshotID: 1,
ParentSnapshotID: nil,
SequenceNumber: 0,
TimestampMs: builder.base.LastUpdatedMillis() + 1,
ManifestList: "/snap-1.avro",
Summary: &Summary{Operation: OpAppend},
SchemaID: &schemaID,
FirstRowID: &firstRowID,
AddedRows: &addedRows,
}

require.NoError(t, builder.AddSnapshot(&snapshot))
require.Equal(t, int64(50), *builder.nextRowID)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name implies this verifies the must-advance invariant, but the body only adds 50 rows and asserts NoError + the resulting nextRowID — that behavior was already in place before this PR. Either rename to TestAddSnapshotV3AcceptsPositiveAddedRows (mirroring TestAddSnapshotV3AcceptsFirstRowIDEqualToNextRowID just above) or fold it with the zero-rows case into a single table-driven test once the underlying check is fixed.

Comment thread table/metadata_builder_internal_test.go Outdated
Comment on lines +1720 to +1742
func TestAddSnapshotV3RejectsNegativeAddedRowsAtUpdate(t *testing.T) {
// Test that negative AddedRows would not advance next-row-id and is rejected
builder := builderWithoutChanges(3)
schemaID := 0
firstRowID := int64(50)
negativeAddedRows := int64(-10)

snapshot := Snapshot{
SnapshotID: 1,
ParentSnapshotID: nil,
SequenceNumber: 0,
TimestampMs: builder.base.LastUpdatedMillis() + 1,
ManifestList: "/snap-1.avro",
Summary: &Summary{Operation: OpAppend},
SchemaID: &schemaID,
FirstRowID: &firstRowID,
AddedRows: &negativeAddedRows,
}

err := builder.AddSnapshot(&snapshot)
// First rejection should be from ValidateRowLineage (added-rows cannot be negative)
require.ErrorIs(t, err, ErrInvalidRowLineage)
require.ErrorContains(t, err, "added-rows cannot be negative")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name says this covers the new monotonic-advance branch, but the comment on line 1740 admits the rejection comes from ValidateRowLineage. The new branch never runs for negative AddedRows — ValidateRowLineage fires first. So this test gives no coverage for the change being made. Drop it or rename to reflect that it's re-verifying the upstream gate.

Comment thread table/rebuild_manifest_test.go Outdated
Comment on lines 67 to 82
// Only add a snapshot if nextRowID > 0. A brand-new table has NextRowID = 0
// without any snapshots.
if nextRowID > 0 {
firstRowID := int64(0)
addedRows := nextRowID
snap := Snapshot{
SnapshotID: 1000,
SequenceNumber: 1,
TimestampMs: txn.meta.base.LastUpdatedMillis() + 1,
Summary: &Summary{Operation: OpAppend},
FirstRowID: &firstRowID,
AddedRows: &addedRows,
}
require.NoError(t, txn.meta.AddSnapshot(&snap))
require.NoError(t, txn.meta.SetSnapshotRef(MainBranch, 1000, BranchRef))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if nextRowID > 0 guard is the tell — the helper had to be modified to dodge the new check because adding a snapshot with addedRows=0 now fails. That's bending the test around an over-restriction rather than fixing the validation. Once the metadata.go check is corrected, this helper should revert to its previous shape.

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: v3 support tracking issue

3 participants