You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.md
+36-36Lines changed: 36 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,66 +8,66 @@ The experience of developing and deploying data pipelines is more uncertain and
8
8
9
9
Here are some challenges that data teams run into, especially when data sizes increase or the number of data users expands:
10
10
11
-
###Data pipelines are fragmented and fragile
12
-
Data pipelines generally consist of Python or SQL scripts that implicitly depend upon each other through tables. Changes to upstream scripts that break downstream dependencies are usually only detected at run time.
11
+
1. Data pipelines are fragmented and fragile
12
+
*Data pipelines generally consist of Python or SQL scripts that implicitly depend upon each other through tables. Changes to upstream scripts that break downstream dependencies are usually only detected at run time.
13
13
14
-
###Data quality checks are not sufficient
15
-
The data community has settled on data quality checks as the "solution" for testing data pipelines. Although data quality checks are great for detecting large unexpected data changes, they are expensive to run, and they have trouble validating exact logic.
14
+
1. Data quality checks are not sufficient
15
+
*The data community has settled on data quality checks as the "solution" for testing data pipelines. Although data quality checks are great for detecting large unexpected data changes, they are expensive to run, and they have trouble validating exact logic.
16
16
17
-
###It's too hard and too costly to build staging environments for data
18
-
Validating changes to data pipelines before deploying to production is an uncertain and sometimes expensive process. Although branches can be deployed to environments, when merged to production, the code is re-run. This is wasteful and generates uncertainty because the data is regenerated.
17
+
1. It's too hard and too costly to build staging environments for data
18
+
*Validating changes to data pipelines before deploying to production is an uncertain and sometimes expensive process. Although branches can be deployed to environments, when merged to production, the code is re-run. This is wasteful and generates uncertainty because the data is regenerated.
19
19
20
-
###Silos transform data lakes to data swamps
21
-
The difficulty and cost of making changes to core pipelines can lead to duplicate pipelines with minor customizations. The inability to easily make and validate changes causes contributors to follow the "path of least resistence". The proliferation of similar tables leads to additional costs, inconsistencies, and maintenance burden.
20
+
1. Silos transform data lakes to data swamps
21
+
*The difficulty and cost of making changes to core pipelines can lead to duplicate pipelines with minor customizations. The inability to easily make and validate changes causes contributors to follow the "path of least resistence". The proliferation of similar tables leads to additional costs, inconsistencies, and maintenance burden.
22
22
23
23
## What is SQLMesh?
24
24
SQLMesh consists of a CLI, a Python API, and a Web UI to make data pipeline development and deployment easy, efficient, and safe.
25
25
26
26
### Core principles
27
27
SQLMesh was built on three core principles:
28
28
29
-
####Correctness is non-negotiable
30
-
Bad data is worse than no data. SQLMesh guarantees that your data will be consistent even in heavily collaborative environments.
29
+
1. Correctness is non-negotiable
30
+
*Bad data is worse than no data. SQLMesh guarantees that your data will be consistent even in heavily collaborative environments.
31
31
32
-
####Change with confidence
33
-
SQLMesh summarizes the impact of changes and provides automated guardrails empowering everyone to safely and quickly contribute.
32
+
1. Change with confidence
33
+
*SQLMesh summarizes the impact of changes and provides automated guardrails empowering everyone to safely and quickly contribute.
34
34
35
-
####Efficiency without complexity
36
-
SQLMesh automatically optimizes your workloads by reusing tables and minimizing computation saving you time and money.
35
+
1. Efficiency without complexity
36
+
*SQLMesh automatically optimizes your workloads by reusing tables and minimizing computation saving you time and money.
37
37
38
38
### Key features
39
-
####Efficient dev/staging environments
40
-
SQLMesh builds a virtual data mart using views, which allows you to seamlessly rollback or roll forward your changes. Any data computation you run for validation purposes is actually not wasted — with a cheap pointer swap, you re-use your “staging” data in production. This means you get unlimited copy-on-write environments that make data exploration and preview of changes fun and safe.
39
+
* Efficient dev/staging environments
40
+
*SQLMesh builds a virtual data mart using views, which allows you to seamlessly rollback or roll forward your changes. Any data computation you run for validation purposes is actually not wasted — with a cheap pointer swap, you re-use your “staging” data in production. This means you get unlimited copy-on-write environments that make data exploration and preview of changes fun and safe.
41
41
42
-
####Automatic DAG generation by semantically parsing and understanding SQL or Python scripts
43
-
No need to manually tag dependencies — SQLMesh was built with the ability to understand your entire data warehouse’s dependency graph.
42
+
* Automatic DAG generation by semantically parsing and understanding SQL or Python scripts
43
+
*No need to manually tag dependencies — SQLMesh was built with the ability to understand your entire data warehouse’s dependency graph.
44
44
45
-
####Informative change summaries
46
-
Before making changes, SQLMesh will determine what has changed and show the entire graph of affected jobs.
45
+
* Informative change summaries
46
+
*Before making changes, SQLMesh will determine what has changed and show the entire graph of affected jobs.
47
47
48
-
####CI-Runnable Unit and Integration tests
49
-
Can be easily defined in YAML and run in CI. SQLMesh can optionally transpile your queries to DuckDB so that your tests can be self-contained.
48
+
* CI-Runnable Unit and Integration tests
49
+
*Can be easily defined in YAML and run in CI. SQLMesh can optionally transpile your queries to DuckDB so that your tests can be self-contained.
50
50
51
-
####Smart change categorization
52
-
Column-level lineage automatically determines whether changes are “breaking” or “non-breaking”, allowing you to correctly categorize changes and to skip expensive backfills.
51
+
* Smart change categorization
52
+
*Column-level lineage automatically determines whether changes are “breaking” or “non-breaking”, allowing you to correctly categorize changes and to skip expensive backfills.
53
53
54
-
####Easy incremental loads
55
-
Loading tables incrementally is as easy as a full refresh. SQLMesh transparently handles the complexity of tracking which intervals need loading, so all you have to do is specify a date filter.
54
+
* Easy incremental loads
55
+
*Loading tables incrementally is as easy as a full refresh. SQLMesh transparently handles the complexity of tracking which intervals need loading, so all you have to do is specify a date filter.
56
56
57
-
####Integrated with Airflow
58
-
You can schedule jobs with our simple built-in scheduler or use your existing Airflow cluster. SQLMesh can dynamically generate and push Airflow DAGs. We aim to support other schedulers like Dagster and Prefect in the future.
57
+
* Integrated with Airflow
58
+
*You can schedule jobs with our simple built-in scheduler or use your existing Airflow cluster. SQLMesh can dynamically generate and push Airflow DAGs. We aim to support other schedulers like Dagster and Prefect in the future.
59
59
60
-
####Notebook / CLI
61
-
Interact with SQLMesh with whatever tool you’re comfortable with.
60
+
* Notebook / CLI
61
+
*Interact with SQLMesh with whatever tool you’re comfortable with.
62
62
63
-
####Web based IDE (in development)
64
-
Edit, run, and visualize queries in your browser.
63
+
* Web based IDE (in development)
64
+
*Edit, run, and visualize queries in your browser.
65
65
66
-
####Github CI/CD bot (in development)
67
-
A bot to tie your code directly to your data.
66
+
* Github CI/CD bot (in development)
67
+
*A bot to tie your code directly to your data.
68
68
69
-
####Table/Column level lineage visualizations (in development)
70
-
Quickly understand the full lineage and sequence of transformation of any column.
69
+
* Table/Column level lineage visualizations (in development)
70
+
*Quickly understand the full lineage and sequence of transformation of any column.
71
71
72
72
## Next steps
73
73
*[Jump right in with the quickstart](quick_start.md)
0 commit comments