diff --git a/docs/integrations/data-ingestion/etl-tools/estuary.md b/docs/integrations/data-ingestion/etl-tools/estuary/clickpipes.md similarity index 80% rename from docs/integrations/data-ingestion/etl-tools/estuary.md rename to docs/integrations/data-ingestion/etl-tools/estuary/clickpipes.md index c92862447c9..66b4ce4769f 100644 --- a/docs/integrations/data-ingestion/etl-tools/estuary.md +++ b/docs/integrations/data-ingestion/etl-tools/estuary/clickpipes.md @@ -1,8 +1,8 @@ --- -sidebar_label: 'Estuary' -slug: /integrations/estuary -description: 'Stream a variety of sources into ClickHouse with an Estuary integration' -title: 'Connect Estuary with ClickHouse' +sidebar_label: 'Connect with ClickPipes' +slug: /integrations/estuary/clickpipes +description: 'Set up an integration between Estuary and ClickHouse via ClickPipes' +title: 'Ingest Estuary data using ClickPipes' doc_type: 'guide' integration: - support_level: 'partner' @@ -15,14 +15,18 @@ import PartnerBadge from '@theme/badges/PartnerBadge'; -[Estuary](https://estuary.dev/) is a right-time data platform that flexibly combines real-time and batch data in simple-to-setup ETL pipelines. With enterprise-grade security and deployment options, Estuary unlocks durable data flows from SaaS, database, and streaming sources to a variety of destinations, including ClickHouse. +Estuary can connect with ClickHouse via the Kafka ClickPipe. -Estuary connects with ClickHouse via the Kafka ClickPipe. You don't need to maintain your own Kafka ecosystem with this integration. +You don't need to maintain your own Kafka ecosystem with this integration. Instead, Estuary emits new data like Kafka messages. You can configure a Kafka ClickPipe to use Estuary's broker and schema registry information to consume these messages. + +See [Estuary's direct ClickHouse integration](/integrations/estuary/native) for an alternative. ## Setup guide {#setup-guide} **Prerequisites** +You will need: + * An [Estuary account](https://dashboard.estuary.dev/register) * One or more [**captures**](https://docs.estuary.dev/concepts/captures/) in Estuary that pull data from your desired sources * A ClickHouse Cloud account with ClickPipe permissions @@ -37,7 +41,7 @@ To move data from your source collections in Estuary to ClickHouse, you will fir 2. Click **+ New Materialization**. -3. Select the **ClickHouse** connector. +3. Select the **ClickHouse Kafka API** connector. 4. Fill out details in the Materialization, Endpoint, and Source Collections sections: @@ -99,10 +103,8 @@ ClickHouse will provision your new data source and start consuming messages from ## Additional resources {#additional-resources} -For more on setting up an integration with Estuary, see Estuary's documentation: +For more on setting up a ClickPipe integration with Estuary, see Estuary's documentation: -* Reference Estuary's [ClickHouse materialization docs](https://docs.estuary.dev/reference/Connectors/materialization-connectors/Dekaf/clickhouse/). +* Reference Estuary's [ClickHouse materialization docs](https://docs.estuary.dev/reference/Connectors/materialization-connectors/Dekaf/clickhouse/) for the ClickPipes integration. * Estuary exposes data as Kafka messages using **Dekaf**. You can learn more about Dekaf [here](https://docs.estuary.dev/guides/dekaf_reading_collections_from_kafka/). - -* To see a list of sources that you can stream into ClickHouse with Estuary, check out [Estuary's capture connectors](https://docs.estuary.dev/reference/Connectors/capture-connectors/). diff --git a/docs/integrations/data-ingestion/etl-tools/estuary/index.md b/docs/integrations/data-ingestion/etl-tools/estuary/index.md new file mode 100644 index 00000000000..53c3cadf506 --- /dev/null +++ b/docs/integrations/data-ingestion/etl-tools/estuary/index.md @@ -0,0 +1,39 @@ +--- +sidebar_label: 'Estuary' +slug: /integrations/estuary +description: 'Stream SaaS, database, and other sources into ClickHouse with an Estuary integration' +title: 'Connect Estuary with ClickHouse' +doc_type: 'guide' +integration: + - support_level: 'partner' + - category: 'data_ingestion' + - website: 'https://estuary.dev' +keywords: ['estuary', 'data ingestion', 'etl', 'pipeline', 'data integration'] +--- + +import PartnerBadge from '@theme/badges/PartnerBadge'; + + + +[Estuary](https://estuary.dev/) is a right-time data platform that flexibly combines real-time and batch data in simple-to-setup ETL pipelines. With enterprise-grade security and deployment options, Estuary unlocks durable data flows from SaaS, database, and streaming sources to a variety of destinations, including ClickHouse. + +Estuary provides two main ways to integrate with ClickHouse: +* [Directly connect to your ClickHouse database](/integrations/estuary/native). +* [Connect via Kafka ClickPipes](/integrations/estuary/clickpipes). + +In both cases, Estuary handles data capture and movement. You don't need to maintain your own Kafka ecosystem or other infrastructure. + +## When to choose each integration {#choose-integration-type} + +Estuary's [direct ClickHouse materialization](/integrations/estuary/native) is recommended for most use cases. It is specifically designed to integrate with ClickHouse's native protocol and supports self-hosted deployments as well as ClickHouse Cloud instances. + +Opt for the [ClickPipe integration](/integrations/estuary/clickpipes) instead if you specifically want to manage your pipelines via ClickPipes. This allows you to handle incoming data like Kafka messages. + +## Additional resources {#additional-resources} + +For more on setting up an integration with Estuary, see Estuary's documentation: + +* [Explore Estuary's capabilities](https://docs.estuary.dev/). +* See reference documentation for Estuary's [direct ClickHouse materialization connector](https://docs.estuary.dev/reference/Connectors/materialization-connectors/ClickHouse/). +* See reference documentation for Estuary's [Kafka ClickPipe integration](https://docs.estuary.dev/reference/Connectors/materialization-connectors/Dekaf/clickhouse/). +* To see a list of sources that you can stream into ClickHouse with Estuary, check out [Estuary's capture connectors](https://docs.estuary.dev/reference/Connectors/capture-connectors/). diff --git a/docs/integrations/data-ingestion/etl-tools/estuary/native-protocol.md b/docs/integrations/data-ingestion/etl-tools/estuary/native-protocol.md new file mode 100644 index 00000000000..d7a837fdb2b --- /dev/null +++ b/docs/integrations/data-ingestion/etl-tools/estuary/native-protocol.md @@ -0,0 +1,125 @@ +--- +sidebar_label: 'Direct materialization connector' +slug: /integrations/estuary/native +description: 'Integrate between Estuary and ClickHouse with a connector using the native protocol' +title: 'Direct materialization from Estuary to ClickHouse' +doc_type: 'guide' +integration: + - support_level: 'partner' + - category: 'data_ingestion' + - website: 'https://estuary.dev' +keywords: ['estuary', 'data ingestion', 'etl', 'pipeline', 'data integration'] +--- + +import PartnerBadge from '@theme/badges/PartnerBadge'; + + + +Estuary provides a direct materialization connector with ClickHouse that uses ClickHouse's [native protocol](/interfaces/tcp) and [native format](/interfaces/formats/Native). + +This allows Estuary to: +* Materialize data to both self-hosted and ClickHouse Cloud instances +* Automatically handle tasks like table creation and schema evolution +* Support soft or hard deletes +* Use `ReplacingMergeTree` for standard merge updates or `MergeTree` for delta updates +* Provide exactly-once delivery + +See [Estuary's Kafka ClickPipe integration](/integrations/estuary/clickpipes) for a ClickPipes workflow. + +## Setup guide {#setup-guide} + +**Prerequisites** + +You will need: + +* An [Estuary account](https://dashboard.estuary.dev/register) +* One or more [**captures**](https://docs.estuary.dev/concepts/captures/) in Estuary that pull data from your desired sources +* A ClickHouse instance, self-hosted or Cloud account +* A ClickHouse database user with credentials + + + +### Configure ClickHouse for integration {#1-configure-clickhouse} + +To set up Estuary's ClickHouse connector, you will need to gather some information from your ClickHouse instance and configure user permissions. + +1. Copy your database's host endpoint. + + For the port, use **9440** if TLS is enabled or **9000** if TLS is disabled. + + Together, the host and port will form the **address** you need to provide to Estuary. + +2. Grant permissions to the database user that Estuary will access. + + To automatically create and manage tables for you, Estuary will need `CREATE`, `SELECT`, `INSERT`, etc permissions on your target database as well as permissions for metadata discovery and partition management. + + You can grant all required permissions by running these SQL commands, replacing `` and `` with your own information: + + ```sql + -- Target database access: CREATE TABLE, DROP TABLE, SELECT, INSERT, TRUNCATE, etc. + GRANT ALL ON .* TO ; + + -- System table access for metadata discovery and partition management. + -- These are NOT covered by the database grant above. + GRANT SELECT ON system.columns TO ; + GRANT SELECT ON system.parts TO ; + GRANT SELECT ON system.tables TO ; + ``` + +3. Optionally restrict user system access to only the target database. + + You can do so with row-level policies. For example: + + ```sql + CREATE ROW POLICY estuary_columns ON system.columns FOR SELECT USING database = '' TO ; + CREATE ROW POLICY estuary_parts ON system.parts FOR SELECT USING database = '' TO ; + CREATE ROW POLICY estuary_tables ON system.tables FOR SELECT USING database = '' TO ; + ``` + +You can then move to Estuary to finish setup. + +### Create an Estuary materialization {#2-create-an-estuary-materialization} + +1. In Estuary's dashboard, go to the [Destinations](https://dashboard.estuary.dev/materializations) page. + +2. Click **+ New Materialization**. + +3. Select the **ClickHouse** connector. + +4. Fill out the **Materialization Details** section. + + * Provide a unique name for your materialization + * Choose a data plane (cloud provider and region) + +5. Fill out **Endpoint Config** details so Estuary can connect to your ClickHouse instance. + + * **Address:** the host and port of your instance + * **Database:** target database name + * **Authentication:** username and password for the database user + + You can also configure optional settings, such as whether to use hard deletes and the SSL mode to use. + +### Configure source collections {#3-configure-source-collections} + +Choose which sources you'd like to materialize into ClickHouse in the **Source Collections** section. + +1. Link an existing **capture** or add individual data collections to materialize to ClickHouse. + +2. Select a data collection from the list to configure further if necessary. Customization options include: + + * Choose a different table name for the collection + * Select merge behavior for the collection (whether to use delta updates mode) + * Customize field selection behavior to control which fields are materialized + +3. Once you're happy with how data will be materialized to ClickHouse, click **Next** and **Save and Publish**. + +Estuary will start backfilling data from the selected collections to ClickHouse and then stream updates as they occur. + + + +## Additional resources {#additional-resources} + +For more on setting up a ClickHouse connector with Estuary, see Estuary's documentation: + +* Reference Estuary's [ClickHouse materialization docs](https://docs.estuary.dev/reference/Connectors/materialization-connectors/ClickHouse/). +* Besides the UI-based workflow provided in these instructions, you can also manage pipeline setup with Estuary via CLI. See Estuary's [guides on `flowctl`](https://docs.estuary.dev/guides/flowctl/ci-cd/) for more on working with Estuary programmatically. diff --git a/sidebars.js b/sidebars.js index 499e428dc5b..9d138616093 100644 --- a/sidebars.js +++ b/sidebars.js @@ -1191,7 +1191,21 @@ const sidebars = { ], }, 'integrations/data-ingestion/etl-tools/dlt-and-clickhouse', - 'integrations/data-ingestion/etl-tools/estuary', + { + type: 'category', + label: 'Estuary', + className: 'top-nav-item', + collapsed: true, + collapsible: true, + link: { + type: 'doc', + id: 'integrations/data-ingestion/etl-tools/estuary/index', + }, + items: [ + 'integrations/data-ingestion/etl-tools/estuary/native-protocol', + 'integrations/data-ingestion/etl-tools/estuary/clickpipes', + ], + }, { type: 'category', label: 'Fivetran',