Add explicit next-row-id monotonic increase check (#589)#1054
Add explicit next-row-id monotonic increase check (#589)#1054hectar-glitches wants to merge 8 commits into
Conversation
…e#830 Signed-off-by: hectar-glitches <hectar@uni.minerva.edu>
Signed-off-by: hectar-glitches <hectar@uni.minerva.edu>
laskoviymishka
left a comment
There was a problem hiding this comment.
I think this still needs a bit more work.
The <= nextRowID guard rejects a case that the V3 spec explicitly allows. The spec requires first-row-id even when a commit does not assign any ID space, and Java’s TableMetadata adds addedRows unconditionally without a > 0 guard.
The concrete failure case is already in this repo: manifest.go:1518 only advances nextRowID for ManifestContentData manifests. That means any merge-on-read deletion via performMergeOnReadDeletion lands here with addedRows = 0 and will fail this guard.
The earlier checks already cover the real invariant we care about: added-rows >= 0 in ValidateRowLineage, plus first-row-id >= next-row-id, which prevents the cursor from moving backwards. That’s what the spec requires.
So I think this check should be < rather than <=, or possibly dropped entirely.
The helper change in rebuild_manifest_test.go — skipping the snapshot when nextRowID == 0 — is the strongest signal that something is off. It papers over the over-restriction instead of fixing the logic.
Also, the negative-rows test’s own comment says it is hitting the pre-existing ValidateRowLineage path, not the new branch, so it doesn’t really validate this new guard.
| if newNextRowID <= nextRowID { | ||
| return fmt.Errorf("%w: next-row-id must advance from previous %d to new %d (added-rows %d)", | ||
| ErrInvalidRowLineage, nextRowID, newNextRowID, *snapshot.AddedRows) | ||
| } |
There was a problem hiding this comment.
The v3 spec permits added-rows=0 explicitly — first-row-id is required "even if a commit does not assign any ID space" — and Java's TableMetadata applies addedRows unconditionally with no >0 guard. More concretely, manifest.go:1518 in this repo only advances nextRowID for ManifestContentData manifests, so any merge-on-read delete via performMergeOnReadDeletion will land here with addedRows=0 and fail. The existing checks (ValidateRowLineage rejecting added-rows<0 plus first-row-id≥next-row-id) already cover what the spec requires. Suggest dropping this block, or at minimum changing <= to < so the check only fires on genuine regression.
|
|
||
| newNextRowID := nextRowID + *snapshot.AddedRows | ||
| if newNextRowID <= nextRowID { | ||
| return fmt.Errorf("%w: next-row-id must advance from previous %d to new %d (added-rows %d)", |
There was a problem hiding this comment.
When AddedRows is 0 this prints from previous 50 to new 50 — same value twice, reads like a formatting bug. If the check stays in some form, consider: next-row-id did not advance: added-rows is %d (table next-row-id %d).
| newNextRowID := nextRowID + *snapshot.AddedRows | ||
| if newNextRowID <= nextRowID { | ||
| return fmt.Errorf("%w: next-row-id must advance from previous %d to new %d (added-rows %d)", | ||
| ErrInvalidRowLineage, nextRowID, newNextRowID, *snapshot.AddedRows) |
There was a problem hiding this comment.
ErrInvalidRowLineage signals a lineage-chain break — cursor backwards, missing first-row-id. AddedRows=0 doesn't break the chain, it just doesn't extend it. If the check is kept, a different sentinel matches the failure mode better.
| func TestAddSnapshotV3NextRowIDMustAdvance(t *testing.T) { | ||
| // Test that next-row-id must advance after applying a snapshot | ||
| builder := builderWithoutChanges(3) | ||
| schemaID := 0 | ||
| firstRowID := int64(0) | ||
| addedRows := int64(50) | ||
|
|
||
| snapshot := Snapshot{ | ||
| SnapshotID: 1, | ||
| ParentSnapshotID: nil, | ||
| SequenceNumber: 0, | ||
| TimestampMs: builder.base.LastUpdatedMillis() + 1, | ||
| ManifestList: "/snap-1.avro", | ||
| Summary: &Summary{Operation: OpAppend}, | ||
| SchemaID: &schemaID, | ||
| FirstRowID: &firstRowID, | ||
| AddedRows: &addedRows, | ||
| } | ||
|
|
||
| require.NoError(t, builder.AddSnapshot(&snapshot)) | ||
| require.Equal(t, int64(50), *builder.nextRowID) | ||
| } |
There was a problem hiding this comment.
The name implies this verifies the must-advance invariant, but the body only adds 50 rows and asserts NoError + the resulting nextRowID — that behavior was already in place before this PR. Either rename to TestAddSnapshotV3AcceptsPositiveAddedRows (mirroring TestAddSnapshotV3AcceptsFirstRowIDEqualToNextRowID just above) or fold it with the zero-rows case into a single table-driven test once the underlying check is fixed.
| func TestAddSnapshotV3RejectsNegativeAddedRowsAtUpdate(t *testing.T) { | ||
| // Test that negative AddedRows would not advance next-row-id and is rejected | ||
| builder := builderWithoutChanges(3) | ||
| schemaID := 0 | ||
| firstRowID := int64(50) | ||
| negativeAddedRows := int64(-10) | ||
|
|
||
| snapshot := Snapshot{ | ||
| SnapshotID: 1, | ||
| ParentSnapshotID: nil, | ||
| SequenceNumber: 0, | ||
| TimestampMs: builder.base.LastUpdatedMillis() + 1, | ||
| ManifestList: "/snap-1.avro", | ||
| Summary: &Summary{Operation: OpAppend}, | ||
| SchemaID: &schemaID, | ||
| FirstRowID: &firstRowID, | ||
| AddedRows: &negativeAddedRows, | ||
| } | ||
|
|
||
| err := builder.AddSnapshot(&snapshot) | ||
| // First rejection should be from ValidateRowLineage (added-rows cannot be negative) | ||
| require.ErrorIs(t, err, ErrInvalidRowLineage) | ||
| require.ErrorContains(t, err, "added-rows cannot be negative") |
There was a problem hiding this comment.
The function name says this covers the new monotonic-advance branch, but the comment on line 1740 admits the rejection comes from ValidateRowLineage. The new branch never runs for negative AddedRows — ValidateRowLineage fires first. So this test gives no coverage for the change being made. Drop it or rename to reflect that it's re-verifying the upstream gate.
| // Only add a snapshot if nextRowID > 0. A brand-new table has NextRowID = 0 | ||
| // without any snapshots. | ||
| if nextRowID > 0 { | ||
| firstRowID := int64(0) | ||
| addedRows := nextRowID | ||
| snap := Snapshot{ | ||
| SnapshotID: 1000, | ||
| SequenceNumber: 1, | ||
| TimestampMs: txn.meta.base.LastUpdatedMillis() + 1, | ||
| Summary: &Summary{Operation: OpAppend}, | ||
| FirstRowID: &firstRowID, | ||
| AddedRows: &addedRows, | ||
| } | ||
| require.NoError(t, txn.meta.AddSnapshot(&snap)) | ||
| require.NoError(t, txn.meta.SetSnapshotRef(MainBranch, 1000, BranchRef)) | ||
| } |
There was a problem hiding this comment.
The if nextRowID > 0 guard is the tell — the helper had to be modified to dodge the new check because adding a snapshot with addedRows=0 now fails. That's bending the test around an over-restriction rather than fixing the validation. Once the metadata.go check is corrected, this helper should revert to its previous shape.
tanmayrauth
left a comment
There was a problem hiding this comment.
Thanks for the PR, LGTM!
Summary
Add explicit post-update validation to ensure
next-row-idmonotonically increases across snapshots. Previously,MetadataBuilder.validateAndUpdateRowLineageonly checked that the snapshot's first-row-id wasn't behind the table's cursor, but didn't verify the cursor actually advances after applying AddedRows.This fix rejects buggy producers that set
AddedRowsto 0 or negative values, which would silently leave the cursor unchanged or move it backwards.Changes
Implementation
table/metadata.go: Added explicit post-update check in
validateAndUpdateRowLineage: Rejects ifnewNextRowID <= previousNextRowIDError message names both the previous and new values for clarity: Catches both
AddedRows = 0(stasis) and negativeAddedRows(backward movement)Tests
Added three comprehensive tests in table/metadata_builder_internal_test.go to check happy cases, rejected added rows if number of rows is 0, and rejecting negative added rows.
Fixes #589