Skip to content

Commit 713e5b1

Browse files
committed
update readme
1 parent fb612ae commit 713e5b1

1 file changed

Lines changed: 37 additions & 4 deletions

File tree

README.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,12 @@
55
<img alt="Pasteur Logo with text. Tagline reads: 'Sanitize Your Data'" src="./res/logo/logo_text_light.svg" width="90%">
66
</picture>
77
</h1>
8-
9-
Pasteur is a system for data synthesis.
10-
This readme is under construction.
8+
Pasteur is a library for performing end-to-end data synthesis.
9+
Gather your raw data and preprocess, synthesize, and evaluate it within a single
10+
project.
11+
Use the tools you're familiar with: numpy, pandas, scikit-learn, scipy or any other.
12+
When your dataset grows, scale to out-of-core data by using Pasteur's parallelization
13+
and partitioning primitives, without code changes or using different libraries.
1114

1215
## Reproducibility
1316
You can find the experiment files that can be used to reproduce the paper
@@ -30,4 +33,34 @@ PASTEUR_MODULES = get_recommended_modules()
3033
Currently, there does not exist a template project from which to start upon.
3134
This repository is a working Pasteur project and is what was used to develop it.
3235
The module `./src/project` is a kedro project with configs in `./conf` and it is
33-
the one that was used to develop pasteur.
36+
the one that was used to develop pasteur.
37+
38+
## Contributing
39+
To contribute, clone this repository and install the frozen requirements.
40+
```bash
41+
git clone github.com/pasteur-dev/pasteur pasteur
42+
43+
cd pasteur
44+
python3.11 -m venv venv
45+
pip install -r requirements.txt
46+
```
47+
48+
The requirements file installs Pasteur from this repository in an editable
49+
state, so you can begin modifying files.
50+
The requirements file can be regenerated with the following commands, which
51+
will pull the latest version of packages.
52+
To ensure interoperability with other packages, Pasteur does not specify narrow
53+
ranges for supported package versions, which might cause issues for certain version
54+
combinations.
55+
```bash
56+
rm requirements.txt
57+
pip-compile --resolver=backtracking
58+
```
59+
60+
This repository is a Pasteur project used for testing.
61+
You can start testing Pasteur by running commands.
62+
```bash
63+
pasteur download --accept adult
64+
pasteur p adult.ingest
65+
pasteur p tab_adult.privbayes --synth
66+
```

0 commit comments

Comments
 (0)