Skip to content

Commit 6cf03a1

Browse files
committed
update docs
1 parent 9a3b24d commit 6cf03a1

1 file changed

Lines changed: 49 additions & 20 deletions

File tree

docs/index.md

Lines changed: 49 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,37 +8,66 @@ The experience of developing and deploying data pipelines is more uncertain and
88

99
Here are some challenges that data teams run into, especially when data sizes increase or the number of data users expands:
1010

11-
1. **Data pipelines are fragmented and fragile:** Data pipelines generally consist of Python or SQL scripts that implicitly depend upon each other through tables. Changes to upstream scripts that break downstream dependencies are usually only detected at run time.
11+
### Data pipelines are fragmented and fragile
12+
Data pipelines generally consist of Python or SQL scripts that implicitly depend upon each other through tables. Changes to upstream scripts that break downstream dependencies are usually only detected at run time.
1213

13-
1. **Data quality checks are not sufficient:** The data community has settled on data quality checks as the "solution" for testing data pipelines. Although data quality checks are great for detecting large unexpected data changes, they are expensive to run, and they have trouble validating exact logic.
14+
### Data quality checks are not sufficient
15+
The data community has settled on data quality checks as the "solution" for testing data pipelines. Although data quality checks are great for detecting large unexpected data changes, they are expensive to run, and they have trouble validating exact logic.
1416

15-
1. **It's too hard and too costly to build staging environments for data:** Validating changes to data pipelines before deploying to production is an uncertain and sometimes expensive process. Although branches can be deployed to environments, when merged to production, the code is re-run. This is wasteful and generates uncertainty because the data is regenerated.
17+
### It's too hard and too costly to build staging environments for data
18+
Validating changes to data pipelines before deploying to production is an uncertain and sometimes expensive process. Although branches can be deployed to environments, when merged to production, the code is re-run. This is wasteful and generates uncertainty because the data is regenerated.
1619

17-
1. **Silos transform data lakes to data swamps:** The difficulty and cost of making changes to core pipelines can lead to duplicate pipelines with minor customizations. The inability to easily make and validate changes causes contributors to follow the "path of least resistence". The proliferation of similar tables leads to additional costs, inconsistencies, and maintenance burden.
20+
### Silos transform data lakes to data swamps
21+
The difficulty and cost of making changes to core pipelines can lead to duplicate pipelines with minor customizations. The inability to easily make and validate changes causes contributors to follow the "path of least resistence". The proliferation of similar tables leads to additional costs, inconsistencies, and maintenance burden.
1822

1923
## What is SQLMesh?
2024
SQLMesh consists of a CLI, a Python API, and a Web UI to make data pipeline development and deployment easy, efficient, and safe.
2125

2226
### Core principles
23-
SQLMesh was built on four core principles:
27+
SQLMesh was built on three core principles:
2428

25-
* Correctness is non-negotiable
26-
* Change should be easy
27-
* Be as efficient and cost effective as possible
28-
* Integration should be seamless
29+
#### Correctness is non-negotiable
30+
Bad data is worse than no data. SQLMesh guarantees that your data will be consistent even in heavily collaborative environments.
31+
32+
#### Change with confidence
33+
SQLMesh summarizes the impact of changes and provides automated guardrails empowering everyone to safely and quickly contribute.
34+
35+
#### Efficiency without complexity
36+
SQLMesh automatically optimizes your workloads by reusing tables and minimizing computation saving you time and money.
2937

3038
### Key features
31-
* **Automatic DAG generation by parsing SQL**: No need to manually tag dependencies — SQLMesh was built with the ability to understand your entire data warehouse’s dependency graph.
32-
* **Informative change summaries:** Before making changes, SQLMesh will determine what has changed and show the entire graph of affected jobs.
33-
* **Easy incremental loads:** Loading tables incrementally is as easy as a full refresh. SQLMesh transparently handles the complexity of tracking which intervals need loading, so all you have to do is specify a date filter.
34-
* **CI-Runnable Unit and Integration tests:** Can be easily defined in YAML and run in CI. SQLMesh can optionally transpile your queries to DuckDB so that your tests can be self-contained.
35-
* **Efficient dev/staging environments:** SQLMesh builds a virtual data mart using views, which allows you to seamlessly rollback or roll forward your changes. Any data computation you run for validation purposes is actually not wasted — with a cheap pointer swap, you re-use your “staging” data in production. This means you get unlimited copy-on-write environments that make data exploration and preview of changes fun and safe.
36-
* **Smart change categorization:** Column-level lineage automatically determines whether changes are “breaking” or “non-breaking”, allowing you to correctly categorize changes and to skip expensive backfills.
37-
* **Integrated with Airflow:** You can schedule jobs with our simple built-in scheduler or use your existing Airflow cluster. SQLMesh can dynamically generate and push Airflow DAGs. We aim to support other schedulers like Dagster and Prefect in the future.
38-
* **Notebook / CLI:** Interact with SQLMesh with whatever tool you’re comfortable with.
39-
* **Web based IDE (pre-alpha):** Edit, run, and visualize queries in your browser.
40-
* **Github CI/CD bot (pre-alpha):** A bot to tie your code directly to your data.
41-
* **Table/Column level lineage visualizations (pre-alpha):** Quickly understand the full lineage and sequence of transformation of any column.
39+
#### Automatic DAG generation by semantically parsing and understanding SQL or Python scripts
40+
No need to manually tag dependencies — SQLMesh was built with the ability to understand your entire data warehouse’s dependency graph.
41+
42+
#### Informative change summaries
43+
Before making changes, SQLMesh will determine what has changed and show the entire graph of affected jobs.
44+
45+
#### Easy incremental loads
46+
Loading tables incrementally is as easy as a full refresh. SQLMesh transparently handles the complexity of tracking which intervals need loading, so all you have to do is specify a date filter.
47+
48+
#### CI-Runnable Unit and Integration tests
49+
Can be easily defined in YAML and run in CI. SQLMesh can optionally transpile your queries to DuckDB so that your tests can be self-contained.
50+
51+
#### Efficient dev/staging environments
52+
SQLMesh builds a virtual data mart using views, which allows you to seamlessly rollback or roll forward your changes. Any data computation you run for validation purposes is actually not wasted — with a cheap pointer swap, you re-use your “staging” data in production. This means you get unlimited copy-on-write environments that make data exploration and preview of changes fun and safe.
53+
54+
#### Smart change categorization
55+
Column-level lineage automatically determines whether changes are “breaking” or “non-breaking”, allowing you to correctly categorize changes and to skip expensive backfills.
56+
57+
#### Integrated with Airflow
58+
You can schedule jobs with our simple built-in scheduler or use your existing Airflow cluster. SQLMesh can dynamically generate and push Airflow DAGs. We aim to support other schedulers like Dagster and Prefect in the future.
59+
60+
#### Notebook / CLI
61+
Interact with SQLMesh with whatever tool you’re comfortable with.
62+
63+
#### Web based IDE (in development)
64+
Edit, run, and visualize queries in your browser.
65+
66+
#### Github CI/CD bot (in development)
67+
A bot to tie your code directly to your data.
68+
69+
#### Table/Column level lineage visualizations (in development)
70+
Quickly understand the full lineage and sequence of transformation of any column.
4271

4372
## Next steps
4473
* [Jump right in with the quickstart](quick_start.md)

0 commit comments

Comments
 (0)