[flink][spark] supports adding blob columns through ALTER TABLE statements#7921
[flink][spark] supports adding blob columns through ALTER TABLE statements#7921steFaiz wants to merge 2 commits into
ALTER TABLE statements#7921Conversation
|
PR requires users to first set the bloom field and then add the column. This is a two-step operation, the order cannot be reversed. It is recommended to provide clearer operation instructions in the error message. |
|
Lines 1134-1138 of FlinkCatalog.java (the original blobTypeFields method) are missing the blobExternalStoreField, while the newly added Spark side blobTypeFields in PR and the blobTypeFields in the alterTable path both contain the blobExternalStoreField. This leads to inconsistency in the determination of blob fields between Flink's table creation path (from CatalogTable) and the Alter TABLE path. The original method should be fixed to make it consistent. |
|
Thanks for working on this. I think this still has a blocking issue: the newly added engine scenarios do not pass locally. I ran the added Flink/Spark tests on the PR head (fe99faa) with Java 8: mvn -pl paimon-flink/paimon-flink-common -DskipITs=false -Dcheckstyle.skip -Drat.skip=true -Dspotless.check.skip=true -Dtest=SchemaChangeITCase#testAlterAddBlobColumn test
mvn -pl paimon-spark/paimon-spark-ut -DskipITs=false -Dcheckstyle.skip -Drat.skip=true -Dspotless.check.skip=true -Dtest=SparkSchemaEvolutionITCase#testAlterAddBlobColumn testBoth fail with the same validation error: So the intended workflow in the tests: ALTER TABLE ... SET ('blob-field' = 'picture');
ALTER TABLE ... ADD picture BYTES/BINARY;still ends up committing a normal binary column while the table option says that Please fix this before merge. The engine-side ALTER ADD COLUMN path needs to resolve the new column type using the effective table options after the previous/current option changes, not only the options carried by the ADD COLUMN request. In particular, Flink's After the fix, please make sure the added Flink and Spark tests pass for both |
Purpose
This is the part0 of #7881
Currently, we can't add new blob columns through Flink/Spark sql. In this PR, I slightly changes the restriction of altering
blob-fieldsconfiguration, as below:blob-fieldandblob-descriptor-fieldare mutable nowThen during altering tables, new bytes fields with blob-fields configured will be converted to blob fields, as below:
Tests
See Unit Tests and ITCases