A Bigquery sink requires the following variables to be set along with Generic ones
Contains information of google cloud project id of the bigquery table where the records need to be inserted. Further documentation on google cloud project id.
- Example value:
gcp-project-id - Type:
required
The name of bigquery table. Here is further documentation of bigquery table naming.
- Example value:
user_profile - Type:
required
The name of dataset that contains the bigquery table. Here is further documentation of bigquery dataset naming.
- Example value:
customer - Type:
required
Labels of a bigquery dataset, key-value information separated by comma attached to the bigquery dataset. This configuration define labels that will be set to the bigquery dataset. Here is further documentation of bigquery labels.
- Example value:
owner=data-engineering,granurality=daily - Type:
optional
Labels of a bigquery table, key-value information separated by comma attached to the bigquery table. This configuration define labels that will be set to the bigquery table. Here is further documentation of bigquery labels.
- Example value:
owner=data-engineering,granurality=daily - Type:
optional
Configuration for enable table partitioning. This config will be used for provide partitioning config when creating the bigquery table. Bigquery table partitioning config can only be set once, on the table creation and the partitioning cannot be disabled once created. Changing this value of this config later will cause error when the application trying to update the bigquery table. Here is further documentation of bigquery table partitioning.
- Example value:
true - Type:
required - Default value:
false
Define bigquery field name that will be used for bigquery table partitioning. only bigquery Timestamp column is
supported as partitioning key. Currently, this sink only support DAY time partitioning type. Here is further
documentation of
bigquery column time partitioning.
- Example value:
event_timestamp - Type:
required
Configuration for enable table clustering. This config will be used for provide clustering config when creating and modifying bigquery table. Changing this value of this config later for the existing table will modify the existing clustered table config. Here is further documentation of bigquery table clustering.
- Example value:
true - Type:
required - Default value:
false
Define bigquery field names that will be used for bigquery table clustering. You can specify up to four clustering columns. Here is further documentation of bigquery table clustering columns.
- Example value:
id,name,age,city - Type:
required
This config enables adding of ID row intended for deduplication when inserting new records into bigquery. Here is further documentation of bigquery streaming insert deduplication.
- Example value:
false - Type:
required - Default value:
true
Full path of google cloud credentials file. Here is further documentation of google cloud authentication and credentials.
- Example value:
/.secret/google-cloud-credentials.json - Type:
required
The name of column that will be added alongside of the existing bigquery column. This column contains struct of metadata of the inserted record. When this config is not configured the metadata column will not be added to the table.
- Example value:
kafka_metadata - Type:
optional
The geographic region name of location of bigquery dataset. Further documentation on bigquery dataset location.
- Example value:
us-central1 - Type:
optional - Default value:
asia-southeast1
The duration of bigquery table partitioning expiration in milliseconds. Fill this config with -1 will disable the
table partition expiration. Further documentation on bigquery table
partition expiration.
- Example value:
2592000000 - Type:
optional - Default value:
-1
The duration of bigquery client http read timeout in milliseconds, 0 for an infinite timeout, a negative number for the default value (20000).
- Example value:
20000 - Type:
optional - Default value:
-1
The duration of bigquery client http connection timeout in milliseconds, 0 for an infinite timeout, a negative number for the default value (20000).
- Example value:
20000 - Type:
optional - Default value:
-1
A boolean value to enable adding metadata columns to output.
- Example value:
true - Type:
optional - Default value:
true
Metadata columns and their types to be added.
- Example
value:
message_offset=integer,message_topic=string,load_time=timestamp,message_timestamp=timestamp,message_partition=integer - Type:
optional
Default list of columns to be added when creating the table.
- Example value:
event_timstamp=timestamp,first_name=string
A boolean value to enable injecting event_timestamp with value as ingestion time
- Example value: true
- Type: optional boolean
- Default value: false
A boolean value to enable inferring schema from incoming data
- Example value: true
- Type: optional boolean
- Default value: true
A boolean value to enable converting all incoming json values to string
- Example value: true
- Type: optional boolean
- Default value: true