You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When PR #247 (staging + COPY INTO bulk ingest) lands, geometry columns sent via adbc_insert with geoarrow.wkb Arrow metadata will need special handling. Databricks doesn't support direct ingestion of geometry types via COPY INTO — the data must arrive as BINARY (WKB) and be converted server-side with ST_GeomFromWKB.
For GEOGRAPHY: ST_GeogFromWKB(geom_col) instead of ST_GeomFromWKB.
SRID from CRS metadata
The geoarrow.wkb field may carry CRS metadata (PROJJSON or EPSG:NNNN). This connects with PR #350 (geoarrow.wkb export) which already handles CRS on the export side — the import side should mirror that.
geoarrow.wkb → BINARY via Parquet → Volume → COPY INTO → CTAS ST_GeomFromWKB
All three drivers follow the same three-step pattern: staging as binary → bulk load → server-side conversion. The details differ only in the SQL dialect and staging mechanism.
Current Workaround
Users must manually convert geometry to WKB before calling adbc_insert, then run CTAS on Databricks:
# In DuckDBCREATETABLE_importASSELECT*, ST_AsWKB(geom) asgeom_wkbEXCLUDE(geom) FROMsource;
# adbc_insert sends geom_wkb as BINARY (works with PR #247)# Then on Databricks:CREATETABLEfinalASSELECT*, ST_GeomFromWKB(geom_wkb) asgeomFROMstaging;
This is what our benchmark scripts do today and it works at ~15-23K rows/sec. Making it transparent in the driver would enable a unified adbc_insert API for geometry across all warehouses.
Summary
When PR #247 (staging + COPY INTO bulk ingest) lands, geometry columns sent via
adbc_insertwith geoarrow.wkb Arrow metadata will need special handling. Databricks doesn't support direct ingestion of geometry types via COPY INTO — the data must arrive as BINARY (WKB) and be converted server-side withST_GeomFromWKB.This is the same pattern already implemented in the Snowflake ADBC driver (adbc-drivers/snowflake#99) and proposed for Redshift (adbc-drivers/redshift#3).
Proposed Solution
When the driver detects
geoarrow.wkborgeoarrow.wktin Arrow field extension metadata during ingest:BINARY(for WKB) orSTRING(for WKT) in the staging tableBulkIngestManager#247's path — works fine for BINARY)ST_GeogFromWKB(geom_col)Statement option
For GEOGRAPHY:
ST_GeogFromWKB(geom_col)instead ofST_GeomFromWKB.SRID from CRS metadata
The geoarrow.wkb field may carry CRS metadata (PROJJSON or
EPSG:NNNN). This connects with PR #350 (geoarrow.wkb export) which already handles CRS on the export side — the import side should mirror that.Prior Art
All three drivers follow the same three-step pattern: staging as binary → bulk load → server-side conversion. The details differ only in the SQL dialect and staging mechanism.
Current Workaround
Users must manually convert geometry to WKB before calling
adbc_insert, then run CTAS on Databricks:This is what our benchmark scripts do today and it works at ~15-23K rows/sec. Making it transparent in the driver would enable a unified
adbc_insertAPI for geometry across all warehouses.Relationship to other PRs
BulkIngestManager#247 — Staging + COPY INTO bulk ingest (prerequisite — provides the transport layer)