[core] introduce Placeholder for Blob File Format#7889
Conversation
| * The placeholder blob, mainly for blob update in data-evolution. It should never be exposed to | ||
| * users. | ||
| */ | ||
| Blob PLACE_HOLDER = |
There was a problem hiding this comment.
This is strange, maybe just use NULL as place holder?
There was a problem hiding this comment.
Thanks for your advise! But in #7125 we supports storing nulls in blob file. I'm not clear how to distinguish placeholders and native NULLs if so.
From the semantics, NULLs are exposed to users, users know that they store some nulls. But placeholders are fully internal used, users should never be aware about them. If users set some rows as nulls, we may fallback those rows to earlier versions, this is not expected in our design.
Could you please give me some advise?
There was a problem hiding this comment.
Perhaps you can consider using row number in blob to determine how to merge? You can just return valid blobs with row number.
There was a problem hiding this comment.
The row number is actually the primary key.
There was a problem hiding this comment.
I understand that you not only need this class for reading, but also for writing. If you skip these elements, the changes will be significant.
I thin you can just introduce a BlobPlaceHolder implements Blob, Serializable for this, use instance of is better.
There was a problem hiding this comment.
Thanks! I'll modify my code!
Purpose
This is the first part of #7881
Including:
a. At first, all data files will be divided according to max_seq_num
b. within each group, create a sequential reader to logically concat files and fill missing gaps. For example: If the full row range of normal files is [0, 100], but some group only have one file with range [20, 80], the output is: [0, 19] -> filled with placeholders; [20, 80] -> records from files; [81, 100] -> filled with placeholders.
c. create readers for each group, and read the blob from the max group whose value is NOT a placeholder.
The mechanism can be illustrated as below:

Tests
ITCase and Unit tests