Skip to content

Commit 0037741

Browse files
authored
Edit integrations/dbt (#606)
1 parent f0295bd commit 0037741

1 file changed

Lines changed: 62 additions & 70 deletions

File tree

docs/integrations/dbt.md

Lines changed: 62 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,23 @@
11
# dbt
22

3-
SQLMesh has native support for running dbt projects. This featuring is currently under development. You can view the development backlog [here](https://github.com/orgs/TobikoData/projects/1/views/3). If you are interested in this feature, we encourage you to try it with your dbt projects and submit issues here (https://github.com/TobikoData/sqlmesh/issues), so we can make it more robust.
3+
SQLMesh has native support for reading dbt projects.
44

5-
## Getting Started
6-
### Importing a dbt project
5+
**Note:** This feature is currently under development. You can view the [development backlog](https://github.com/orgs/TobikoData/projects/1/views/3) to see what improvements are already planned. If you are interested in this feature, we encourage you to try it with your dbt projects and [submit issues](https://github.com/TobikoData/sqlmesh/issues) so we can make it more robust.
76

8-
A SQLMesh project can be configured during initialization to read from a dbt formatted project. To do so, run the following command within the dbt project root:
7+
## Getting started
8+
### Reading a dbt project
9+
10+
Create a SQLMesh project from an existing dbt project by running the `init` command *within the dbt project root directory* and with the `dbt` template option:
911

1012
```bash
1113
$ sqlmesh init -t dbt
1214
```
1315

14-
The target specified in your `profiles.yml` file will be used by default. The target can be changed at anytime.
16+
SQLMesh will use the data warehouse connection target in your dbt project `profiles.yml` file. The target can be changed at any time.
17+
18+
### Setting model backfill start dates
1519

16-
**Note:** Models require a start date for backfilling data through use of the `start` configuration parameter. Start can be defined for each model, or globally in the `dbt_project.yml` file as follows:
20+
Models **require** a start date for backfilling data through use of the `start` configuration parameter. `start` can be defined individually for each model, or globally in the `dbt_project.yml` file as follows:
1721

1822
```
1923
> models:
@@ -22,36 +26,44 @@ The target specified in your `profiles.yml` file will be used by default. The ta
2226

2327
### Running SQLMesh
2428

25-
Link to how to normally run sqlmesh here (plan, run). Continue to use your dbt format.
29+
Run SQLMesh as with any SQLMesh project, generating and applying [plans](../concepts/overview.md#make-a-plan), running [tests](../concepts/overview.md#tests) or [audits](../concepts/overview.md#audits), and executing models with a [scheduler](../guides/scheduling.md) if desired.
2630

27-
### Workflow differences between SQLMesh and dbt
31+
You continue to use your dbt file and project format.
2832

29-
The following are considerations when importing a dbt project:
33+
## Workflow differences between SQLMesh and dbt
3034

31-
* SQLMesh will detect and deploy new or modified seeds as part of running the `plan` command and applying changes. There is no separate seed command. Refer to [seed models](/concepts/models/seed_models) for more information.
32-
* The `plan` command dynamically creates environments, and therefore environments do not need to be hardcoded into your `profiles.yml` file as targets. To get the most out of SQLMesh, point your profile target at the production target, and let SQLMesh handle the rest for you.
33-
* dbt tests are considered [audits](/concepts/audits) in SQLMesh. SQLMesh tests are [unit tests](/concepts/tests), which test query logic before applying a plan.
34-
* SQLMesh's incremental models track which intervals have been filled and automatically detects and fills interval gaps. dbt does not support intervals and their recommended incremental logic is not compatible, requiring small tweaks to the models (don't worry dbt compatibility is maintained).
35+
Consider the following when using a dbt project:
3536

36-
## How to use SQLMesh incremental models within dbt
37+
* SQLMesh will detect and deploy new or modified seeds as part of running the `plan` command and applying changes - there is no separate seed command. Refer to [seed models](../concepts/models/seed_models.md) for more information.
38+
* The `plan` command dynamically creates environments, so environments do not need to be hardcoded into your `profiles.yml` file as targets. To get the most out of SQLMesh, point your dbt profile target at the production target, and let SQLMesh handle the rest for you.
39+
* The term "test" has a different meaning in dbt than in SQLMesh:
40+
- dbt "tests" are [audits](../concepts/audits.md) in SQLMesh.
41+
- SQLMesh "tests" are [unit tests](../concepts/tests.md), which test query logic before applying a SQLMesh plan.
42+
* dbt's' recommended incremental logic is not compatible with SQLMesh, so small tweaks to the models are required (don't worry - dbt can still use the models!).
3743

38-
SQLMesh's incremental models track uses true incremental models, which are capable of detecting and backfilling any missing intervals. dbt's incremental logic does not support intervals, and is not compatible with SQLMesh.
44+
## How to use SQLMesh incremental models with dbt projects
3945

40-
### Mapping dbt incremental to SQLMesh incremental
41-
SQLMesh supports [idempotent](/concepts/glossary#idempotency) incremental loads through the use of merge (sqlmesh calls this `incremental_by_unique_key`) and insert-overwrite/delete+insert (sqlmesh calls this `incremental_by_time`) incremental strategies. Append is not currently supported and not recommended due to not being idempotent.
46+
Incremental loading is a powerful technique when datasets are large and recomputing tables is expensive. SQLMesh offers first-class support for incremental models, and its approach differs from dbt's.
4247

48+
SQLMesh automatically detects and offers to backfill missing time intervals for incremental models. dbt's incremental logic does not support intervals and is not compatible with SQLMesh.
4349

50+
This section describes how to implement SQLMesh incremental models in a dbt-formatted project.
4451

45-
#### Merge modifications
52+
### dbt's incremental logic
53+
dbt's incremental logic is implemented with jinja blocks gated by `{% if is_incremental() %}`.
4654

55+
Existing uses of these blocks do not need to be removed from the dbt project's models, but SQLMesh will ignore them.
4756

57+
### SQLMesh's incremental logic
58+
SQLMesh's incremental logic is implemented in dbt projects with jinja blocks gated by `{% if sqlmesh is defined %}`.
4859

49-
#### Insert-overwrite and delete+insert modifications
50-
1. For insert-overwrite, add a `time_column` configuration field with the value of the name of the model's time column to use.
60+
SQLMesh supports two approaches to implement [idempotent](../concepts/glossary.md#idempotency) incremental loads:
61+
- Using merge (with the sqlmesh [`incremental_by_unique_key` model kind](../concepts/models/model_kinds.md#incremental_by_unique_key))
62+
- Using insert-overwrite/delete+insert (with the sqlmesh [`incremental_by_time_range` model kind](../concepts/models/model_kinds.md#incremental_by_time_range))
5163

52-
As mentioned in the workflow changes, a small model tweak is required. In order to maintain backwards compatibility with dbt, SQLMesh will ignore any jinja blocks using `{% if is_incremental() %}`, and will instead ask you define a new jinja block gated by `{% if sqlmesh is defined %}`.
64+
A model using the insert-overwrite approach must specify the model's time column. The following example jinja block is for an `INCREMENTAL_BY_TIME_RANGE` model kind with a `time_column` named "ds".
5365

54-
For example, for incremental by time using a ds `time_column`:
66+
The SQL `WHERE` clause selecting a time interval with the "ds" column goes in a jinja block gated by `{% if sqlmesh is defined %}`:
5567

5668
```bash
5769
> {% if sqlmesh is defined %}
@@ -60,71 +72,51 @@ For example, for incremental by time using a ds `time_column`:
6072
> {% endif %}
6173
```
6274
63-
For more information about how to use different time types or unique keys, refer to [incremental model kinds](/concepts/models/model_kinds).
75+
Note that you must use standard jinja macro notation rather than the special SQLMesh interval macros (e.g., `{{ start_ds }}` instead of `@start_ds`).
6476
65-
### Unit Tests
77+
For more information about how to use different time types or unique keys with incremental loads, refer to [incremental model kinds](../concepts/models/model_kinds.md).
78+
79+
## Unit Tests
6680
This is the same as sqlmesh unit tests...link to that. Yes, they go in the same folder as dbt tests (audits).
6781
68-
## Using airflow
69-
Setup airflow following the airflow docs section
82+
## Using Airflow
83+
To use SQLMesh and dbt projects with Airflow, first configure SQLMesh to use Airflow as described in the [Airflow integrations documentation](./airflow.md).
7084
71-
In config.py within the project root dir, add:
85+
Then, add the following to `config.py` within the project root directory:
7286
7387
```bash
7488
> airflow_config = sqlmesh_config(Path(__file__).parent, scheduler=AirflowSchedulerConfig())
7589
```
7690
77-
See airflow docs for AirflowSchedulerConfig configuration options.
78-
91+
See the [Airflow configuration documentation](https://airflow.apache.org/docs/apache-airflow/2.1.0/configurations-ref.html) for a list of all AirflowSchedulerConfig configuration options.
7992
8093
## Support dbt jinja methods
8194
82-
The majority of dbt jinja methods are supported. Here is a list (it'd be nice if the list was multiple columns so it wasn't so long):
83-
84-
- adapter
85-
- as_bool
86-
- as_native
87-
- as_number
88-
- as_text
89-
- api
90-
- builtins
91-
- config
92-
- env_var
93-
- exceptions
94-
- from_yaml
95-
- is_incremental (always returns false, see incremental section)
96-
- load_result
97-
- log
98-
- modules
99-
- print
100-
- project_name
101-
- ref
102-
- return
103-
- run_query
104-
- schema
105-
- set
106-
- source
107-
- statement
108-
- target
109-
- this
110-
- to_yaml
111-
- var
112-
- zip
95+
The majority of dbt jinja methods are supported, including:
11396
114-
## Unsupported dbt features
115-
116-
SQLMesh is continuously adding more dbt features
97+
| Method | Method | Method | Method
98+
| ------ | ------ | ------ | ------
99+
| adapter | env_var | project_name | target
100+
| as_bool | exceptions | ref | this
101+
| as_native | from_yaml | return | to_yaml
102+
| as_number | is_incremental (ignored, see [above](#insert-overwrite-and-deleteinsert-modifications)) | run_query | var
103+
| as_text | load_result | schema | zip
104+
| api | log | set |
105+
| builtins | modules | source |
106+
| config | print | statement |
117107
118-
Not an exhaustive list, but trying to catch the major features
119-
120-
dbt deps
121-
- While SQLMesh can read dbt packages, it does not currently support managing those packages. Continue to use dbt deps and dbt clean to update, add, or remove packages. For more information, refer to [dbt deps](https://docs.getdbt.com/reference/commands/deps).
108+
## Unsupported dbt features
122109
123-
dbt test not currently supported, but in development
110+
SQLMesh is continuously adding more dbt features. This is a list of major features that are currently unsupported, but it is not exhaustive:
124111
125-
dbt docs is not supported, snapshots not supported
112+
* dbt deps
113+
- While SQLMesh can read dbt packages, it does not currently support managing those packages.
114+
- Continue to use dbt deps and dbt clean to update, add, or remove packages. For more information, refer to the [dbt deps](https://docs.getdbt.com/reference/commands/deps) documentation.
115+
* dbt test (in development)
116+
* dbt docs
117+
* dbt snapshots
126118
127119
## Missing something you need?
128120
129-
Submit an issue here (https://github.com/TobikoData/sqlmesh/issues) and we'll look into it
121+
Submit an [issue](https://github.com/TobikoData/sqlmesh/issues), and we'll look into it!
130122

0 commit comments

Comments
 (0)