Bigquery Sink

A Bigquery sink requires the following variables to be set along with Generic ones

`SINK_BIGQUERY_GOOGLE_CLOUD_PROJECT_ID`

Contains information of google cloud project id of the bigquery table where the records need to be inserted. Further documentation on google cloud project id.

Example value: gcp-project-id
Type: required

`SINK_BIGQUERY_TABLE_NAME`

The name of bigquery table. Here is further documentation of bigquery table naming.

Example value: user_profile
Type: required

`SINK_BIGQUERY_DATASET_NAME`

The name of dataset that contains the bigquery table. Here is further documentation of bigquery dataset naming.

Example value: customer
Type: required

`SINK_BIGQUERY_DATASET_LABELS`

Labels of a bigquery dataset, key-value information separated by comma attached to the bigquery dataset. This configuration define labels that will be set to the bigquery dataset. Here is further documentation of bigquery labels.

Example value: owner=data-engineering,granurality=daily
Type: optional

`SINK_BIGQUERY_TABLE_LABELS`

Labels of a bigquery table, key-value information separated by comma attached to the bigquery table. This configuration define labels that will be set to the bigquery table. Here is further documentation of bigquery labels.

Example value: owner=data-engineering,granurality=daily
Type: optional

`SINK_BIGQUERY_TABLE_PARTITIONING_ENABLE`

Configuration for enable table partitioning. This config will be used for provide partitioning config when creating the bigquery table. Bigquery table partitioning config can only be set once, on the table creation and the partitioning cannot be disabled once created. Changing this value of this config later will cause error when the application trying to update the bigquery table. Here is further documentation of bigquery table partitioning.

Example value: true
Type: required
Default value: false

`SINK_BIGQUERY_TABLE_PARTITION_KEY`

Define bigquery field name that will be used for bigquery table partitioning. only bigquery Timestamp column is supported as partitioning key. Currently, this sink only support DAY time partitioning type. Here is further documentation of bigquery column time partitioning.

Example value: event_timestamp
Type: required

`SINK_BIGQUERY_TABLE_CLUSTERING_ENABLE`

Configuration for enable table clustering. This config will be used for provide clustering config when creating and modifying bigquery table. Changing this value of this config later for the existing table will modify the existing clustered table config. Here is further documentation of bigquery table clustering.

Example value: true
Type: required
Default value: false

`SINK_BIGQUERY_TABLE_CLUSTERING_KEYS`

Define bigquery field names that will be used for bigquery table clustering. You can specify up to four clustering columns. Here is further documentation of bigquery table clustering columns.

Example value: id,name,age,city
Type: required

`SINK_BIGQUERY_ROW_INSERT_ID_ENABLE`

This config enables adding of ID row intended for deduplication when inserting new records into bigquery. Here is further documentation of bigquery streaming insert deduplication.

Example value: false
Type: required
Default value: true

`SINK_BIGQUERY_CREDENTIAL_PATH`

Full path of google cloud credentials file. Here is further documentation of google cloud authentication and credentials.

Example value: /.secret/google-cloud-credentials.json
Type: required

`SINK_BIGQUERY_METADATA_NAMESPACE`

The name of column that will be added alongside of the existing bigquery column. This column contains struct of metadata of the inserted record. When this config is not configured the metadata column will not be added to the table.

Example value: kafka_metadata
Type: optional

`SINK_BIGQUERY_DATASET_LOCATION`

The geographic region name of location of bigquery dataset. Further documentation on bigquery dataset location.

Example value: us-central1
Type: optional
Default value: asia-southeast1

`SINK_BIGQUERY_TABLE_PARTITION_EXPIRY_MS`

The duration of bigquery table partitioning expiration in milliseconds. Fill this config with -1 will disable the table partition expiration. Further documentation on bigquery table partition expiration.

Example value: 2592000000
Type: optional
Default value: -1

`SINK_BIGQUERY_CLIENT_READ_TIMEOUT_MS`

The duration of bigquery client http read timeout in milliseconds, 0 for an infinite timeout, a negative number for the default value (20000).

Example value: 20000
Type: optional
Default value: -1

`SINK_BIGQUERY_CLIENT_CONNECT_TIMEOUT_MS`

The duration of bigquery client http connection timeout in milliseconds, 0 for an infinite timeout, a negative number for the default value (20000).

Example value: 20000
Type: optional
Default value: -1

`SINK_BIGQUERY_ADD_METADATA_ENABLED`

A boolean value to enable adding metadata columns to output.

Example value: true
Type: optional
Default value: true

`SINK_BIGQUERY_METADATA_COLUMNS_TYPES`

Metadata columns and their types to be added.

Example value: message_offset=integer,message_topic=string,load_time=timestamp,message_timestamp=timestamp,message_partition=integer
Type: optional

Configs for json input data type,

`SINK_BIGQUERY_DEFAULT_COLUMNS`

Default list of columns to be added when creating the table.

Example value: event_timstamp=timestamp,first_name=string

`SINK_BIGQUERY_ADD_EVENT_TIMESTAMP_ENABLE`

A boolean value to enable injecting event_timestamp with value as ingestion time

Example value: true
Type: optional boolean
Default value: false

`SINK_BIGQUERY_DYNAMIC_SCHEMA_ENABLE`

A boolean value to enable inferring schema from incoming data

Example value: true
Type: optional boolean
Default value: true

`SINK_BIGQUERY_DEFAULT_DATATYPE_STRING_ENABLE`

A boolean value to enable converting all incoming json values to string

Example value: true
Type: optional boolean
Default value: true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigquery Sink

`SINK_BIGQUERY_GOOGLE_CLOUD_PROJECT_ID`

`SINK_BIGQUERY_TABLE_NAME`

`SINK_BIGQUERY_DATASET_NAME`

`SINK_BIGQUERY_DATASET_LABELS`

`SINK_BIGQUERY_TABLE_LABELS`

`SINK_BIGQUERY_TABLE_PARTITIONING_ENABLE`

`SINK_BIGQUERY_TABLE_PARTITION_KEY`

`SINK_BIGQUERY_TABLE_CLUSTERING_ENABLE`

`SINK_BIGQUERY_TABLE_CLUSTERING_KEYS`

`SINK_BIGQUERY_ROW_INSERT_ID_ENABLE`

`SINK_BIGQUERY_CREDENTIAL_PATH`

`SINK_BIGQUERY_METADATA_NAMESPACE`

`SINK_BIGQUERY_DATASET_LOCATION`

`SINK_BIGQUERY_TABLE_PARTITION_EXPIRY_MS`

`SINK_BIGQUERY_CLIENT_READ_TIMEOUT_MS`

`SINK_BIGQUERY_CLIENT_CONNECT_TIMEOUT_MS`

`SINK_BIGQUERY_ADD_METADATA_ENABLED`

`SINK_BIGQUERY_METADATA_COLUMNS_TYPES`

Configs for json input data type,

`SINK_BIGQUERY_DEFAULT_COLUMNS`

`SINK_BIGQUERY_ADD_EVENT_TIMESTAMP_ENABLE`

`SINK_BIGQUERY_DYNAMIC_SCHEMA_ENABLE`

`SINK_BIGQUERY_DEFAULT_DATATYPE_STRING_ENABLE`

FilesExpand file tree

bigquery-sink.md

Latest commit

History

bigquery-sink.md

File metadata and controls

Bigquery Sink

SINK_BIGQUERY_GOOGLE_CLOUD_PROJECT_ID

SINK_BIGQUERY_TABLE_NAME

SINK_BIGQUERY_DATASET_NAME

SINK_BIGQUERY_DATASET_LABELS

SINK_BIGQUERY_TABLE_LABELS

SINK_BIGQUERY_TABLE_PARTITIONING_ENABLE

SINK_BIGQUERY_TABLE_PARTITION_KEY

SINK_BIGQUERY_TABLE_CLUSTERING_ENABLE

SINK_BIGQUERY_TABLE_CLUSTERING_KEYS

SINK_BIGQUERY_ROW_INSERT_ID_ENABLE

SINK_BIGQUERY_CREDENTIAL_PATH

SINK_BIGQUERY_METADATA_NAMESPACE

SINK_BIGQUERY_DATASET_LOCATION

SINK_BIGQUERY_TABLE_PARTITION_EXPIRY_MS

SINK_BIGQUERY_CLIENT_READ_TIMEOUT_MS

SINK_BIGQUERY_CLIENT_CONNECT_TIMEOUT_MS

SINK_BIGQUERY_ADD_METADATA_ENABLED

SINK_BIGQUERY_METADATA_COLUMNS_TYPES

Configs for json input data type,

SINK_BIGQUERY_DEFAULT_COLUMNS

SINK_BIGQUERY_ADD_EVENT_TIMESTAMP_ENABLE

SINK_BIGQUERY_DYNAMIC_SCHEMA_ENABLE

SINK_BIGQUERY_DEFAULT_DATATYPE_STRING_ENABLE

`SINK_BIGQUERY_GOOGLE_CLOUD_PROJECT_ID`

`SINK_BIGQUERY_TABLE_NAME`

`SINK_BIGQUERY_DATASET_NAME`

`SINK_BIGQUERY_DATASET_LABELS`

`SINK_BIGQUERY_TABLE_LABELS`

`SINK_BIGQUERY_TABLE_PARTITIONING_ENABLE`

`SINK_BIGQUERY_TABLE_PARTITION_KEY`

`SINK_BIGQUERY_TABLE_CLUSTERING_ENABLE`

`SINK_BIGQUERY_TABLE_CLUSTERING_KEYS`

`SINK_BIGQUERY_ROW_INSERT_ID_ENABLE`

`SINK_BIGQUERY_CREDENTIAL_PATH`

`SINK_BIGQUERY_METADATA_NAMESPACE`

`SINK_BIGQUERY_DATASET_LOCATION`

`SINK_BIGQUERY_TABLE_PARTITION_EXPIRY_MS`

`SINK_BIGQUERY_CLIENT_READ_TIMEOUT_MS`

`SINK_BIGQUERY_CLIENT_CONNECT_TIMEOUT_MS`

`SINK_BIGQUERY_ADD_METADATA_ENABLED`

`SINK_BIGQUERY_METADATA_COLUMNS_TYPES`

`SINK_BIGQUERY_DEFAULT_COLUMNS`

`SINK_BIGQUERY_ADD_EVENT_TIMESTAMP_ENABLE`

`SINK_BIGQUERY_DYNAMIC_SCHEMA_ENABLE`

`SINK_BIGQUERY_DEFAULT_DATATYPE_STRING_ENABLE`