Skip to content

Commit 8c7c8b8

Browse files
cbi42meta-codesync[bot]
authored andcommitted
Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata (facebook#14122)
Summary: Range deletion start keys are considered during compaction for cutting output files. Due to some ordering requirement (see comment above InsertNextValidRangeTombstoneAtLevel()) between truncated range deletion start key and a file's point keys, there was logic in https://github.com/facebook/rocksdb/blob/f6c9c3bf1cf05096e8ff8c03ded60c1e199edbb7/db/range_del_aggregator.cc#L39 that changes the value type to be kTypeMaxValid. However, kTypeMaxValid is not supposed to be persisted per https://github.com/facebook/rocksdb/blob/f6c9c3bf1cf05096e8ff8c03ded60c1e199edbb7/db/dbformat.h#L75-L76. This can cause forward compatibility issues reported in facebook#14101. This PR fixes this issue by removing the logic that sets kTypeMaxValid and always skip truncated range deletion start key in CompactionMergingIterator. For existing SST files, we want to avoid using this kTypeMaxValid, so this PR also introduces a new placeholder value type. This allows us to re-strengthen the relevant value type checks (IsExtendedValueType()) that was loosen for kTypeMaxValid. Pull Request resolved: facebook#14122 Test Plan: - a unit test that persists kTypeMaxValid before this fix - crash test with frequent range deletion: `python3 ./tools/db_crashtest.py blackbox --delrangepercent=11 --readpercent=35` - Generate SST files with 0x1A as value type (kTypeMaxValid before this change) in file metadata. Run ldb with the strengthened check in IsExtendedValueType() to dump the MANIFEST. It failed to parse MANIFEST as expected before this PR and succeeds after this PR. ``` Error in processing file /tmp/rocksdbtest-543376/db_range_del_test_2549357_6547198162080866792/MANIFEST-000005 Corruption: VersionEdit: new-file4 entry The file /tmp/rocksdbtest-543376/db_range_del_test_2549357_6547198162080866792/MANIFEST-000005 may be corrupted. ``` Reviewed By: pdillinger Differential Revision: D87016541 Pulled By: cbi42 fbshipit-source-id: 9957a095db2cd9947463b403f352bd9a1fd70a76
1 parent 2f583ae commit 8c7c8b8

9 files changed

Lines changed: 185 additions & 39 deletions

File tree

db/db_range_del_test.cc

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3825,6 +3825,134 @@ TEST_F(DBRangeDelTest, RowCache) {
38253825
// and should not turn db into read-only mdoe.
38263826
ASSERT_OK(Put(Key(5), "foo"));
38273827
}
3828+
3829+
TEST_F(DBRangeDelTest, FileCutWithTruncatedRangeDelKey) {
3830+
// Test for a bug that used to generate files with meta.smallest
3831+
// containing kMaxValid.
3832+
//
3833+
// Setup:
3834+
// - Write Key(2), Key(3) and DeleteRange(Key(1), Key(4))
3835+
// - Flush to L0
3836+
// - Use SingleKeySstPartitioner to force each user key into its own file
3837+
// - Compact files from L0 to L1 will generate files
3838+
// File[0]:
3839+
// smallest=[user_key=key000001, seq=4, type=15],
3840+
// largest= [user_key=key000002, seq=72057594037927935, type=15]
3841+
// File[1]:
3842+
// smallest=[user_key=key000002, seq=2, type=1],
3843+
// largest= [user_key=key000003, seq=72057594037927935, type=15]
3844+
// File[2]:
3845+
// smallest=[user_key=key000003, seq=3, type=1],
3846+
// largest= [user_key=key000004, seq=72057594037927935, type=15]
3847+
// With range deletions truncated to each files key range.
3848+
//
3849+
// - Compact these files again into L2. RocksDB usede to set truncated
3850+
// range deletion start key to have value type kMaxValid. The range deletion
3851+
// start key is used in compaction file cutting decision.
3852+
// - Verify the file boundary keys after compaction have valid boundary keys
3853+
//
3854+
// Before the fix:
3855+
// File[0]:
3856+
// smallest=[user_key=key000001, seq=4, type=15],
3857+
// largest= [user_key=key000002, seq=72057594037927935, type=15]
3858+
// File[1]:
3859+
// smallest=[user_key=key000002, seq=2, type=26],
3860+
// largest= [user_key=key000003, seq=72057594037927935, type=15]
3861+
// File[2]:
3862+
// smallest=[user_key=key000003, seq=3, type=26],
3863+
// largest= [user_key=key000004, seq=72057594037927935, type=15]
3864+
//
3865+
// After the fix:
3866+
// File[0]:
3867+
// smallest=[user_key=key000001, seq=4, type=15],
3868+
// largest= [user_key=key000002, seq=72057594037927935, type=15]
3869+
// File[1]:
3870+
// smallest=[user_key=key000002, seq=2, type=1],
3871+
// largest= [user_key=key000003, seq=72057594037927935, type=15]
3872+
// File[2]:
3873+
// smallest=[user_key=key000003, seq=3, type=1],
3874+
// largest= [user_key=key000004, seq=72057594037927935, type=15]
3875+
3876+
Options options = CurrentOptions();
3877+
options.disable_auto_compactions = true;
3878+
3879+
// Use partitioner that cuts before every new user key.
3880+
// Key(x) generates keys of length 9.
3881+
auto factory = std::shared_ptr<SstPartitionerFactory>(
3882+
NewSstPartitionerFixedPrefixFactory(10));
3883+
options.sst_partitioner_factory = factory;
3884+
3885+
DestroyAndReopen(options);
3886+
3887+
Random rnd(301);
3888+
3889+
// Create a file in a lower level so the compactions below are not
3890+
// bottommost compactions. Range deletion start keys are not considered
3891+
// in bottommost compaction.
3892+
ASSERT_OK(Put(Key(3), rnd.RandomBinaryString(100)));
3893+
ASSERT_OK(Flush());
3894+
MoveFilesToLevel(6);
3895+
ASSERT_EQ(1, NumTableFilesAtLevel(6));
3896+
3897+
ASSERT_OK(Put(Key(2), rnd.RandomString(100)));
3898+
// Snapshots keep point keys alive.
3899+
ManagedSnapshot snapshot1(db_);
3900+
ASSERT_OK(Put(Key(3), rnd.RandomString(100)));
3901+
ManagedSnapshot snapshot2(db_);
3902+
ASSERT_OK(db_->DeleteRange(WriteOptions(), db_->DefaultColumnFamily(), Key(1),
3903+
Key(4)));
3904+
ASSERT_OK(Flush());
3905+
ASSERT_EQ(1, NumTableFilesAtLevel(0));
3906+
3907+
ColumnFamilyMetaData cf_meta_l0;
3908+
db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_l0);
3909+
ASSERT_EQ(1, cf_meta_l0.levels[0].files.size());
3910+
std::vector<std::string> l0_filenames;
3911+
for (const auto& sst_file : cf_meta_l0.levels[0].files) {
3912+
l0_filenames.push_back(sst_file.name);
3913+
}
3914+
3915+
// Compact L0 files to L1
3916+
CompactionOptions compact_options_l0;
3917+
ASSERT_OK(db_->CompactFiles(compact_options_l0, l0_filenames, 1));
3918+
ASSERT_EQ(3, NumTableFilesAtLevel(1));
3919+
3920+
// Check L1 file metadata
3921+
std::vector<std::vector<FileMetaData>> files_l1;
3922+
dbfull()->TEST_GetFilesMetaData(db_->DefaultColumnFamily(), &files_l1);
3923+
3924+
for (const auto& file : files_l1[1]) {
3925+
ASSERT_LT(ExtractValueType(file.smallest.Encode()), kTypeMaxValid);
3926+
ASSERT_LT(ExtractValueType(file.largest.Encode()), kTypeMaxValid);
3927+
}
3928+
3929+
// Get file names from level 1
3930+
ColumnFamilyMetaData cf_meta;
3931+
db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta);
3932+
std::vector<std::string> input_filenames;
3933+
for (const auto& sst_file : cf_meta.levels[1].files) {
3934+
input_filenames.push_back(sst_file.name);
3935+
}
3936+
3937+
// Compact files from L1 to L2
3938+
CompactionOptions compact_options;
3939+
ASSERT_OK(db_->CompactFiles(compact_options, input_filenames, 2));
3940+
3941+
// Check L2 file metadata
3942+
std::vector<std::vector<FileMetaData>> files;
3943+
dbfull()->TEST_GetFilesMetaData(db_->DefaultColumnFamily(), &files);
3944+
3945+
for (const auto& file : files[2]) {
3946+
ASSERT_LT(ExtractValueType(file.smallest.Encode()), kTypeMaxValid);
3947+
ASSERT_LT(ExtractValueType(file.largest.Encode()), kTypeMaxValid);
3948+
}
3949+
3950+
// // Verify iteration works correctly
3951+
std::unique_ptr<Iterator> iter{db_->NewIterator(ReadOptions())};
3952+
iter->SeekToFirst();
3953+
ASSERT_OK(iter->status());
3954+
ASSERT_FALSE(iter->Valid());
3955+
}
38283956
} // namespace ROCKSDB_NAMESPACE
38293957

38303958
int main(int argc, char** argv) {

db/dbformat.h

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,12 @@ enum ValueType : unsigned char {
7272
kTypeColumnFamilyWideColumnEntity = 0x17, // WAL only
7373
kTypeValuePreferredSeqno = 0x18, // Value with a unix write time
7474
kTypeColumnFamilyValuePreferredSeqno = 0x19, // WAL only
75+
// Placeholder value type for legacy SST files with incorrectly persisted
76+
// file boundaries. Prior to the fix, TruncatedRangeDelIterator assigned
77+
// kTypeMaxValid to truncated range deletion keys, which was then
78+
// incorrectly persisted to SST file metadata. This placeholder value allows
79+
// reading such legacy files for without using kTypeMaxValid.
80+
kTypeTruncatedRangeDeletionSentinel = 0x1A,
7581
kTypeMaxValid, // Should be after the last valid type, only used for
7682
// validation
7783
kMaxValue = 0x7F // Not used for storing records.
@@ -118,10 +124,11 @@ inline bool IsValueType(ValueType t) {
118124

119125
// Checks whether a type is from user operation
120126
// kTypeRangeDeletion is in meta block so this API is separated from above
121-
// kTypeMaxValid can be from keys generated by
122-
// TruncatedRangeDelIterator::start_key()
127+
// kTypeTruncatedRangeDeletionSentinel is for legacy files with incorrectly
128+
// persisted file boundaries.
123129
inline bool IsExtendedValueType(ValueType t) {
124-
return IsValueType(t) || t == kTypeRangeDeletion || t == kTypeMaxValid;
130+
return IsValueType(t) || t == kTypeRangeDeletion ||
131+
t == kTypeTruncatedRangeDeletionSentinel;
125132
}
126133

127134
// We leave eight bits empty at the bottom so a type and sequence#
@@ -180,8 +187,7 @@ inline size_t InternalKeyEncodingLength(const ParsedInternalKey& key) {
180187
// Pack a sequence number and a ValueType into a uint64_t
181188
inline uint64_t PackSequenceAndType(uint64_t seq, ValueType t) {
182189
assert(seq <= kMaxSequenceNumber);
183-
// kTypeMaxValid is used in TruncatedRangeDelIterator, see its constructor.
184-
assert(IsExtendedValueType(t) || t == kTypeMaxValid);
190+
assert(IsExtendedValueType(t));
185191
return (seq << 8) | t;
186192
}
187193

db/range_del_aggregator.cc

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ TruncatedRangeDelIterator::TruncatedRangeDelIterator(
3636
Status pik_status = ParseInternalKey(smallest->Encode(), &parsed_smallest,
3737
false /* log_err_key */); // TODO
3838
pik_status.PermitUncheckedError();
39-
parsed_smallest.type = kTypeMaxValid;
4039
assert(pik_status.ok());
4140
smallest_ = &parsed_smallest;
4241
}
@@ -71,9 +70,6 @@ TruncatedRangeDelIterator::TruncatedRangeDelIterator(
7170
// the truncated end key can cover the largest key in this sstable, reduce
7271
// its sequence number by 1.
7372
parsed_largest.sequence -= 1;
74-
// This line is not needed for correctness, but it ensures that the
75-
// truncated end key is not covering keys from the next SST file.
76-
parsed_largest.type = kTypeMaxValid;
7773
}
7874
largest_ = &parsed_largest;
7975
}

db/range_del_aggregator_test.cc

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,9 @@ void VerifyIterator(
8989
for (size_t i = 0; i < expected_range_dels.size(); i++, iter->Next()) {
9090
ASSERT_TRUE(iter->Valid());
9191
EXPECT_EQ(0, icmp.Compare(iter->start_key(), expected_range_dels[i].start));
92-
EXPECT_EQ(0, icmp.Compare(iter->end_key(), expected_range_dels[i].end));
92+
EXPECT_EQ(0, icmp.Compare(iter->end_key(), expected_range_dels[i].end))
93+
<< iter->end_key().DebugString(false, false) << " "
94+
<< expected_range_dels[i].end.DebugString(false, false);
9395
EXPECT_EQ(expected_range_dels[i].seq, iter->seq());
9496
}
9597
EXPECT_FALSE(iter->Valid());
@@ -305,28 +307,28 @@ TEST_F(RangeDelAggregatorTest, TruncatedIterPartiallyCutTombstones) {
305307

306308
VerifyIterator(
307309
&iter, bytewise_icmp,
308-
{{InternalValue("d", 7, kTypeMaxValid), UncutEndpoint("e"), 10},
310+
{{InternalValue("d", 7, kTypeValue), UncutEndpoint("e"), 10},
309311
{InternalValue("e", 8, kTypeRangeDeletion), UncutEndpoint("g"), 8},
310312
{InternalValue("j", 4, kTypeRangeDeletion),
311-
InternalValue("m", 8, kTypeMaxValid), 4}});
313+
InternalValue("m", 8, kTypeValue), 4}});
312314

313315
VerifySeek(
314316
&iter, bytewise_icmp,
315-
{{"d", InternalValue("d", 7, kTypeMaxValid), UncutEndpoint("e"), 10},
317+
{{"d", InternalValue("d", 7, kTypeValue), UncutEndpoint("e"), 10},
316318
{"e", InternalValue("e", 8, kTypeRangeDeletion), UncutEndpoint("g"), 8},
317319
{"ia", InternalValue("j", 4, kTypeRangeDeletion),
318-
InternalValue("m", 8, kTypeMaxValid), 4, false /* invalid */},
320+
InternalValue("m", 8, kTypeValue), 4, false /* invalid */},
319321
{"n", InternalValue("", 0, kTypeRangeDeletion), UncutEndpoint(""), 0,
320322
true /* invalid */},
321-
{"", InternalValue("d", 7, kTypeMaxValid), UncutEndpoint("e"), 10}});
323+
{"", InternalValue("d", 7, kTypeValue), UncutEndpoint("e"), 10}});
322324

323325
VerifySeekForPrev(
324326
&iter, bytewise_icmp,
325-
{{"d", InternalValue("d", 7, kTypeMaxValid), UncutEndpoint("e"), 10},
327+
{{"d", InternalValue("d", 7, kTypeValue), UncutEndpoint("e"), 10},
326328
{"e", InternalValue("e", 8, kTypeRangeDeletion), UncutEndpoint("g"), 8},
327329
{"ia", InternalValue("e", 8, kTypeRangeDeletion), UncutEndpoint("g"), 8},
328330
{"n", InternalValue("j", 4, kTypeRangeDeletion),
329-
InternalValue("m", 8, kTypeMaxValid), 4, false /* invalid */},
331+
InternalValue("m", 8, kTypeValue), 4, false /* invalid */},
330332
{"", InternalValue("", 0, kTypeRangeDeletion), UncutEndpoint(""), 0,
331333
true /* invalid */}});
332334
}
@@ -345,23 +347,21 @@ TEST_F(RangeDelAggregatorTest, TruncatedIterFullyCutTombstones) {
345347
TruncatedRangeDelIterator iter(std::move(input_iter), &bytewise_icmp,
346348
&smallest, &largest);
347349

348-
VerifyIterator(
349-
&iter, bytewise_icmp,
350-
{{InternalValue("f", 7, kTypeMaxValid), UncutEndpoint("g"), 8}});
350+
VerifyIterator(&iter, bytewise_icmp,
351+
{{InternalValue("f", 7, kTypeValue), UncutEndpoint("g"), 8}});
351352

352-
VerifySeek(
353-
&iter, bytewise_icmp,
354-
{{"d", InternalValue("f", 7, kTypeMaxValid), UncutEndpoint("g"), 8},
355-
{"f", InternalValue("f", 7, kTypeMaxValid), UncutEndpoint("g"), 8},
356-
{"j", InternalValue("", 0, kTypeRangeDeletion), UncutEndpoint(""), 0,
357-
true /* invalid */}});
353+
VerifySeek(&iter, bytewise_icmp,
354+
{{"d", InternalValue("f", 7, kTypeValue), UncutEndpoint("g"), 8},
355+
{"f", InternalValue("f", 7, kTypeValue), UncutEndpoint("g"), 8},
356+
{"j", InternalValue("", 0, kTypeRangeDeletion), UncutEndpoint(""),
357+
0, true /* invalid */}});
358358

359359
VerifySeekForPrev(
360360
&iter, bytewise_icmp,
361361
{{"d", InternalValue("", 0, kTypeRangeDeletion), UncutEndpoint(""), 0,
362362
true /* invalid */},
363-
{"f", InternalValue("f", 7, kTypeMaxValid), UncutEndpoint("g"), 8},
364-
{"j", InternalValue("f", 7, kTypeMaxValid), UncutEndpoint("g"), 8}});
363+
{"f", InternalValue("f", 7, kTypeValue), UncutEndpoint("g"), 8},
364+
{"j", InternalValue("f", 7, kTypeValue), UncutEndpoint("g"), 8}});
365365
}
366366

367367
TEST_F(RangeDelAggregatorTest, SingleIterInAggregator) {

db/version_edit.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ uint64_t PackFileNumberAndPathId(uint64_t number, uint64_t path_id) {
3030
Status FileMetaData::UpdateBoundaries(const Slice& key, const Slice& value,
3131
SequenceNumber seqno,
3232
ValueType value_type) {
33+
assert(value_type < kTypeMaxValid);
3334
if (value_type == kTypeBlobIndex) {
3435
BlobIndex blob_index;
3536
const Status s = blob_index.DecodeFrom(value);

db/version_edit.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,9 @@ struct FileMetaData {
317317
void UpdateBoundariesForRange(const InternalKey& start,
318318
const InternalKey& end, SequenceNumber seqno,
319319
const InternalKeyComparator& icmp) {
320+
assert(ExtractValueType(start.Encode()) < kTypeMaxValid);
321+
assert(ExtractValueType(end.Encode()) < kTypeMaxValid);
322+
320323
if (smallest.size() == 0 || icmp.Compare(start, smallest) < 0) {
321324
smallest = start;
322325
}

table/compaction_merging_iterator.cc

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -191,13 +191,6 @@ class CompactionMergingIterator : public InternalIterator {
191191

192192
bool operator()(HeapItem* a, HeapItem* b) const {
193193
int r = comparator_->Compare(a->key(), b->key());
194-
// For each file, we assume all range tombstone start keys come before
195-
// its file boundary sentinel key (file's meta.largest key).
196-
// In the case when meta.smallest = meta.largest and range tombstone start
197-
// key is truncated at meta.smallest, the start key will have op_type =
198-
// kMaxValid to make it smaller (see TruncatedRangeDelIterator
199-
// constructor). The following assertion validates this assumption.
200-
assert(a->type == b->type || r != 0);
201194
return r > 0;
202195
}
203196

@@ -242,8 +235,24 @@ class CompactionMergingIterator : public InternalIterator {
242235
return !minHeap_.empty() ? minHeap_.top() : nullptr;
243236
}
244237

245-
void InsertRangeTombstoneAtLevel(size_t level) {
238+
// For each file under a LevelIterator, the lifetime of range tombstone
239+
// iterator is tied to the point key iterator. So we want scan through
240+
// all range tombstone start keys before the file boundary sentinel key
241+
// (file's meta.largest). When meta.smallest == meta.largest, the truncated
242+
// range del start key may be ordered after meta.largest.
243+
// Here we skip the first range deletion start key if it's truncated.
244+
// This range deletion start key is redundant for compaction file cutting
245+
// decision anyway, since the same point key will be considered for file
246+
// cutting too.
247+
void InsertNextValidRangeTombstoneAtLevel(size_t level) {
246248
if (range_tombstone_iters_[level]->Valid()) {
249+
if (range_tombstone_iters_[level]->start_key().type !=
250+
kTypeRangeDeletion) {
251+
range_tombstone_iters_[level]->Next();
252+
if (!range_tombstone_iters_[level]->Valid()) {
253+
return;
254+
}
255+
}
247256
pinned_heap_item_[level].SetTombstoneForCompaction(
248257
range_tombstone_iters_[level]->start_key());
249258
minHeap_.push(&pinned_heap_item_[level]);
@@ -262,7 +271,7 @@ void CompactionMergingIterator::SeekToFirst() {
262271
for (size_t i = 0; i < range_tombstone_iters_.size(); ++i) {
263272
if (range_tombstone_iters_[i]) {
264273
range_tombstone_iters_[i]->SeekToFirst();
265-
InsertRangeTombstoneAtLevel(i);
274+
InsertNextValidRangeTombstoneAtLevel(i);
266275
}
267276
}
268277

@@ -290,7 +299,7 @@ void CompactionMergingIterator::Seek(const Slice& target) {
290299
0) {
291300
range_tombstone_iters_[i]->Next();
292301
}
293-
InsertRangeTombstoneAtLevel(i);
302+
InsertNextValidRangeTombstoneAtLevel(i);
294303
}
295304
}
296305

@@ -357,7 +366,7 @@ void CompactionMergingIterator::FindNextVisibleKey() {
357366
minHeap_.pop();
358367
}
359368
if (range_tombstone_iters_[current->level]) {
360-
InsertRangeTombstoneAtLevel(current->level);
369+
InsertNextValidRangeTombstoneAtLevel(current->level);
361370
}
362371
}
363372
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* Fix a bug where compaction with range deletion can persist kTypeMaxValid in MANIFEST as file metadata. kTypeMaxValid is not supposed to be persisted and can change as new value types are introduced. This can cause a forward compatibility issue where older versions of RocksDB don't recognize kTypeMaxValid from newer versions. A new placeholder value type kTypeTruncatedRangeDeletionSentinel is also introduced to replace kTypeMaxValid when reading existing SST files' metadata from MANIFEST. This allows us to strengthen some checks to avoid using kTypeMaxValid in the future.

utilities/debug.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ static std::unordered_map<std::string, ValueType> value_type_string_map = {
4141
{"TypeValuePreferredSeqno", ValueType::kTypeValuePreferredSeqno},
4242
{"TypeColumnFamilyValuePreferredSeqno",
4343
ValueType::kTypeColumnFamilyValuePreferredSeqno},
44+
{"kTypeTruncatedRangeDeletionSentinel",
45+
ValueType::kTypeTruncatedRangeDeletionSentinel},
4446
};
4547

4648
std::string KeyVersion::GetTypeName() const {

0 commit comments

Comments
 (0)