You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.md
+4-7Lines changed: 4 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,6 @@ Data-as-Code (DaC) `dac` is a tool that supports the distribution of data as (py
10
10
<imgsrc="img/logo.jpg"alt="drawing"width="250"/>
11
11
</div>
12
12
13
-
14
13
## How will the Data Scientists use a DaC package?
15
14
16
15
Say that the Data Engineers prepared the `demo-data` as code for you. Then you will install the code in your environment
@@ -36,12 +35,11 @@ With the schema you can, for example
36
35
* access the column names (e.g. `Schema.my_column`)
37
36
* unit test your functions by [synthetizining data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html)
38
37
39
-
40
38
## How can a Data Engineer provide a DaC python package?
41
39
42
40
Install this library
43
41
```sh
44
-
python -m pip install edg-dac
42
+
python -m pip install dac
45
43
```
46
44
and use the command `dac pack` (run `dac pack --help` for detailed instructions).
47
45
@@ -51,7 +49,6 @@ On a high level, the most important elements you must provide are:
51
49
* a [pandera ModelSchema](https://pandera.readthedocs.io/en/stable/schema_models.html) fitting the data that can be loaded
52
50
* python dependencies
53
51
54
-
55
52
## What are the advantages of distributing data in this way?
56
53
57
54
* The code needed to load the data, the data source, and locations are abstracted away from the user.
@@ -63,9 +60,9 @@ On a high level, the most important elements you must provide are:
63
60
64
61
* Semantic versioning can be used to communicate significat changes:
65
62
66
-
* a patch update corresponds to a fix in the data: its intended content is unchanged
67
-
* a minor update corresponds to a change in the data that does not break the schema
68
-
* a major update corresponds to a change in the schema, or any other breaking change
63
+
* a patch update corresponds to a fix in the data: its intended content is unchanged
64
+
* a minor update corresponds to a change in the data that does not break the schema
65
+
* a major update corresponds to a change in the schema, or any other breaking change
69
66
70
67
In this way data pipelines can subscribe to the appropriate updates. Furthermore, it will be easy to keep releasing data updates maintaining retro-compatibility (one can keep deploying `1.X.Y` updates even after version `2` has been rolled-out).
0 commit comments