Skip to content

[flink][spark] supports adding blob columns through ALTER TABLE statements#7921

Open
steFaiz wants to merge 2 commits into
apache:masterfrom
steFaiz:alter_table_blob
Open

[flink][spark] supports adding blob columns through ALTER TABLE statements#7921
steFaiz wants to merge 2 commits into
apache:masterfrom
steFaiz:alter_table_blob

Conversation

@steFaiz
Copy link
Copy Markdown
Contributor

@steFaiz steFaiz commented May 21, 2026

Purpose

This is the part0 of #7881
Currently, we can't add new blob columns through Flink/Spark sql. In this PR, I slightly changes the restriction of altering blob-fields configuration, as below:

  1. blob-field and blob-descriptor-field are mutable now
  2. Do not allow altering an existing fields to BLOB
  3. Do not allow removing an existing fields from blob fields
  4. Allow configuring a non-existing fields as blob -- this is for future blob fields

Note that due to complexity, blob-external-fields, blob-view-fields are still immutable now

Then during altering tables, new bytes fields with blob-fields configured will be converted to blob fields, as below:

-- current setting is only 'blob-field'='video'
ALTER TABLE T_BLOB SET ('blob-field'='video,picture');
ALTER TABLE T_BLOB ADD picture BYTES

Tests

See Unit Tests and ITCases

@steFaiz steFaiz closed this May 21, 2026
@steFaiz steFaiz reopened this May 21, 2026
@steFaiz steFaiz closed this May 21, 2026
@steFaiz steFaiz reopened this May 21, 2026
@steFaiz steFaiz closed this May 21, 2026
@steFaiz steFaiz reopened this May 21, 2026
@steFaiz steFaiz closed this May 21, 2026
@steFaiz steFaiz reopened this May 21, 2026
@JingsongLi
Copy link
Copy Markdown
Contributor

PR requires users to first set the bloom field and then add the column. This is a two-step operation, the order cannot be reversed. It is recommended to provide clearer operation instructions in the error message.

@JingsongLi
Copy link
Copy Markdown
Contributor

Lines 1134-1138 of FlinkCatalog.java (the original blobTypeFields method) are missing the blobExternalStoreField, while the newly added Spark side blobTypeFields in PR and the blobTypeFields in the alterTable path both contain the blobExternalStoreField. This leads to inconsistency in the determination of blob fields between Flink's table creation path (from CatalogTable) and the Alter TABLE path. The original method should be fixed to make it consistent.

@leaves12138
Copy link
Copy Markdown
Contributor

Thanks for working on this. I think this still has a blocking issue: the newly added engine scenarios do not pass locally.

I ran the added Flink/Spark tests on the PR head (fe99faa) with Java 8:

mvn -pl paimon-flink/paimon-flink-common   -DskipITs=false -Dcheckstyle.skip -Drat.skip=true -Dspotless.check.skip=true   -Dtest=SchemaChangeITCase#testAlterAddBlobColumn test

mvn -pl paimon-spark/paimon-spark-ut   -DskipITs=false -Dcheckstyle.skip -Drat.skip=true -Dspotless.check.skip=true   -Dtest=SparkSchemaEvolutionITCase#testAlterAddBlobColumn test

Both fail with the same validation error:

java.lang.IllegalArgumentException: Field 'picture' in 'blob-field' must be a BLOB field in table schema.
    at org.apache.paimon.schema.SchemaValidation.validateBlobFields(SchemaValidation.java:807)

So the intended workflow in the tests:

ALTER TABLE ... SET ('blob-field' = 'picture');
ALTER TABLE ... ADD picture BYTES/BINARY;

still ends up committing a normal binary column while the table option says that picture is a blob field, and schema validation rejects the table.

Please fix this before merge. The engine-side ALTER ADD COLUMN path needs to resolve the new column type using the effective table options after the previous/current option changes, not only the options carried by the ADD COLUMN request. In particular, Flink's alterTable(..., newTable, tableChanges, ...) should probably compute blob type fields from old table options merged with the incoming option changes, similar to what the Spark path is trying to do, and the Spark path also needs to make sure the separate SET-then-ADD workflow sees the updated blob-field option before resolving the added column type.

After the fix, please make sure the added Flink and Spark tests pass for both blob-field and blob-descriptor-field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants