Skip to content

feat: Support multiple external table locations#22695

Open
kumarUjjawal wants to merge 1 commit into
apache:mainfrom
kumarUjjawal:feature/multi-location-external-table
Open

feat: Support multiple external table locations#22695
kumarUjjawal wants to merge 1 commit into
apache:mainfrom
kumarUjjawal:feature/multi-location-external-table

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of #16303.

Rationale for this change

CREATE EXTERNAL TABLE can reference only one location today. This adds support for listing multiple explicit locations and reading them as one table.

What changes are included in this PR?

  • Adds LOCATION ('a.parquet', 'b.parquet') syntax.
  • Keeps LOCATION 'a,b.parquet' as a single path, so literal commas still work.
  • Carries multiple locations through the logical plan and proto.
  • Updates listing table creation to scan all listed locations.
  • Requires all listed locations to use the same object store and matching fields.
  • Keeps stream tables limited to exactly one location.
  • Updates docs and upgrade notes.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes. CREATE EXTERNAL TABLE now accepts a parenthesized list of locations.

There is also a public API change: CreateExternalTable.location is replaced by CreateExternalTable.locations.

@github-actions github-actions Bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate proto Related to proto crate ffi Changes to the ffi crate labels Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [  98.482s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.038s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  98.981s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.039s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.889s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 200.491s] datafusion
    Building datafusion-catalog v53.1.0 (current)
       Built [  37.382s] (current)
     Parsing datafusion-catalog v53.1.0 (current)
      Parsed [   0.027s] (current)
    Building datafusion-catalog v53.1.0 (baseline)
       Built [  36.873s] (baseline)
     Parsing datafusion-catalog v53.1.0 (baseline)
      Parsed [   0.029s] (baseline)
    Checking datafusion-catalog v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.176s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  76.043s] datafusion-catalog
    Building datafusion-cli v53.1.0 (current)
       Built [ 170.023s] (current)
     Parsing datafusion-cli v53.1.0 (current)
      Parsed [   0.035s] (current)
    Building datafusion-cli v53.1.0 (baseline)
       Built [ 168.758s] (baseline)
     Parsing datafusion-cli v53.1.0 (baseline)
      Parsed [   0.041s] (baseline)
    Checking datafusion-cli v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.144s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 343.230s] datafusion-cli
    Building datafusion-expr v53.1.0 (current)
       Built [  25.454s] (current)
     Parsing datafusion-expr v53.1.0 (current)
      Parsed [   0.078s] (current)
    Building datafusion-expr v53.1.0 (baseline)
       Built [  25.382s] (baseline)
     Parsing datafusion-expr v53.1.0 (baseline)
      Parsed [   0.079s] (baseline)
    Checking datafusion-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.697s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field CreateExternalTable.locations in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/ddl.rs:219
  field CreateExternalTable.locations in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/ddl.rs:219

--- failure struct_pub_field_missing: pub struct's pub field removed or renamed ---

Description:
A publicly-visible struct has at least one public field that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/struct_pub_field_missing.ron

Failed in:
  field location of struct CreateExternalTable, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/06dd7115d01d285840987edcd0dd32a5798a198a/datafusion/expr/src/logical_plan/ddl.rs:215
  field location of struct CreateExternalTable, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/06dd7115d01d285840987edcd0dd32a5798a198a/datafusion/expr/src/logical_plan/ddl.rs:215

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  53.750s] datafusion-expr
    Building datafusion-ffi v53.1.0 (current)
       Built [  57.910s] (current)
     Parsing datafusion-ffi v53.1.0 (current)
      Parsed [   0.061s] (current)
    Building datafusion-ffi v53.1.0 (baseline)
       Built [  57.839s] (baseline)
     Parsing datafusion-ffi v53.1.0 (baseline)
      Parsed [   0.064s] (baseline)
    Checking datafusion-ffi v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.336s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 117.963s] datafusion-ffi
    Building datafusion-proto v53.1.0 (current)
       Built [  56.273s] (current)
     Parsing datafusion-proto v53.1.0 (current)
      Parsed [   0.020s] (current)
    Building datafusion-proto v53.1.0 (baseline)
       Built [  56.793s] (baseline)
     Parsing datafusion-proto v53.1.0 (baseline)
      Parsed [   0.020s] (baseline)
    Checking datafusion-proto v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.357s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 115.785s] datafusion-proto
    Building datafusion-proto-models v53.1.0 (current)
       Built [  23.632s] (current)
     Parsing datafusion-proto-models v53.1.0 (current)
      Parsed [   0.135s] (current)
    Building datafusion-proto-models v53.1.0 (baseline)
       Built [  23.313s] (baseline)
     Parsing datafusion-proto-models v53.1.0 (baseline)
      Parsed [   0.133s] (baseline)
    Checking datafusion-proto-models v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   2.272s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field CreateExternalTableNode.locations in /home/runner/work/datafusion/datafusion/datafusion/proto-models/src/generated/prost.rs:247
  field CreateExternalTableNode.locations in /home/runner/work/datafusion/datafusion/datafusion/proto-models/src/generated/prost.rs:247

--- failure struct_pub_field_missing: pub struct's pub field removed or renamed ---

Description:
A publicly-visible struct has at least one public field that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/struct_pub_field_missing.ron

Failed in:
  field location of struct CreateExternalTableNode, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/06dd7115d01d285840987edcd0dd32a5798a198a/datafusion/proto-models/src/generated/prost.rs:247
  field location of struct CreateExternalTableNode, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/06dd7115d01d285840987edcd0dd32a5798a198a/datafusion/proto-models/src/generated/prost.rs:247

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  50.723s] datafusion-proto-models
    Building datafusion-sql v53.1.0 (current)
       Built [  39.938s] (current)
     Parsing datafusion-sql v53.1.0 (current)
      Parsed [   0.034s] (current)
    Building datafusion-sql v53.1.0 (baseline)
       Built [  40.236s] (baseline)
     Parsing datafusion-sql v53.1.0 (baseline)
      Parsed [   0.035s] (baseline)
    Checking datafusion-sql v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.320s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field CreateExternalTable.locations in /home/runner/work/datafusion/datafusion/datafusion/sql/src/parser.rs:255

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  82.361s] datafusion-sql
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 169.182s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.024s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 169.428s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.025s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.108s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 343.592s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 1, 2026
}

fn schemas_have_same_fields(left: &SchemaRef, right: &SchemaRef) -> bool {
left.fields() == right.fields()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/datafusion/pull/22695/changes#diff-8ce463e0d56c67a237fce2942f0759984d56764b13b62156472132ab84c78192R226-R227 says Schema metadata may differ between files ... but here the equality check compares the fields' metadata too - https://docs.rs/arrow-schema/58.3.0/src/arrow_schema/field.rs.html#113
Is this intentional ?

#[prost(message, optional, tag = "9")]
pub name: ::core::option::Option<TableReference>,
#[prost(string, tag = "2")]
pub location: ::prost::alloc::string::String,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the lifetime of a serialized CreateExternalTableNode ?
If there is a use case where a CreateExternalTableNode serialized with DF v.X and deserialized with v. X+1 then it would be better to keep location around until v. X+2 (assuming users don't upgrade from X to X+2).
I.e. deserialize should do:

  1. if locations is not empty then use it
  2. otherwise if location is not an empty string then insert it to locations

Serialize should: 1) write non-empty locations and location=""

"CreateExternalTable",
)?,
create_extern_table.location.clone(),
create_extern_table
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If backward compatibility is needed, i.e. if a CreateExternalTableNode is serialized with v. X and it has to be read with v. X+1 then you need to keep location around for at least one major version.
If such compatibility is not needed then you can just use "" here. It is overwritten by .with_locations(...) below

{
return plan_err!(
"All locations of a CREATE EXTERNAL TABLE must have the \
same schema, but the provided locations have differing schemas"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message does not help the user to find the problematic schema/location.

) -> Result<Arc<dyn TableProvider>> {
Ok(Arc::new(TestTableProvider {
url: cmd.location.to_string(),
url: cmd.locations.first().cloned().unwrap_or_default(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could silently use "" if no locations are provided. Better return an error.

}
write!(f, ")")
} else {
write!(f, "LOCATION {}", self.location)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is how it worked before but shouldn't the locations be quoted ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change catalog Related to the catalog crate core Core DataFusion crate documentation Improvements or additions to documentation ffi Changes to the ffi crate logical-expr Logical plan and expressions proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants