Skip to content

Commit c6780d4

Browse files
committed
add template
1 parent b9a02df commit c6780d4

33 files changed

Lines changed: 985 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<img alt="Pasteur Logo with text. Tagline reads: 'Sanitize Your Data'" src="./res/logo/logo_text_light.svg" width="90%">
66
</picture>
77
</h1>
8-
Pasteur is a library for performing end-to-end data synthesis.
8+
Pasteur is a library for performing privacy-aware end-to-end data synthesis.
99
Gather your raw data and preprocess, synthesize, and evaluate it within a single
1010
project.
1111
Use the tools you're familiar with: numpy, pandas, scikit-learn, scipy or any other.

pyproject.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,9 @@ pasteur_mlflow = "pasteur.kedro.hooks:mlflow"
7070
[project.entry-points."kedro.project_commands"]
7171
pasteur = "pasteur.cli:cli"
7272

73+
[project.entry-points."kedro.starters"]
74+
pasteur = "pasteur.kedro.starters:starters"
75+
7376
[build-system]
7477
requires = ["setuptools>=61.0", "wheel", "numpy>=1.15"]
7578
build-backend = "setuptools.build_meta"
@@ -81,7 +84,7 @@ include = ["pasteur*"] # package names should match these glob patterns (["*"]
8184
[tool.kedro]
8285
package_name = "project"
8386
project_name = "Pasteur Testing Project"
84-
project_version = "0.18.3"
87+
kedro_init_version = "0.18.5"
8588

8689
[tool.isort]
8790
multi_line_output = 3

src/pasteur/cli.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,13 @@
55
else:
66
cli = None
77

8-
__all__ = ["cli"]
8+
import logging
9+
10+
logger = logging.getLogger(__name__)
11+
logger.warn(
12+
"Pasteur project not found in the current directory "
13+
+ "(settings.py file doesn't contain `PASTEUR_MODULES = ...`). "
14+
+ "Disabling Pasteur commands."
15+
)
16+
17+
__all__ = ["cli"]

src/pasteur/kedro/starters.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from pathlib import Path
2+
3+
from kedro.framework.cli.starters import KedroStarterSpec
4+
5+
import pasteur
6+
7+
PASTEUR_PATH = Path(pasteur.__file__).parent
8+
TEMPLATE_PATH = PASTEUR_PATH / "templates" / "project"
9+
10+
# plugin.py
11+
starters = [
12+
KedroStarterSpec(
13+
alias="pasteur",
14+
template_path=str(TEMPLATE_PATH),
15+
)
16+
]
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"project_name": "New Pasteur Project",
3+
"repo_name": "{{ cookiecutter.project_name.strip().replace(' ', '-').replace('_', '-').lower() }}",
4+
"python_package": "{{ cookiecutter.project_name.strip().replace(' ', '_').replace('-', '_').lower() }}",
5+
"kedro_version": "{{ cookiecutter.kedro_version }}"
6+
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
import pasteur
2+
3+
# Inject pasteur version
4+
with open('src/requirements.txt', "r") as f:
5+
reqs = f.read()
6+
7+
reqs = reqs.replace("pasteur[opt,test,docs]", f"pasteur[opt,test,docs]~={pasteur.version}")
8+
9+
with open('src/requirements.txt', "w") as f:
10+
f.write(reqs)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
project_name:
2+
title: "Project Name"
3+
text: |
4+
Please enter a human readable name for your new project.
5+
Spaces, hyphens, and underscores are allowed.
6+
regex_validator: "^[\\w -]{2,}$"
7+
error_message: |
8+
It must contain only alphanumeric symbols, spaces, underscores and hyphens and
9+
be at least 2 characters long.
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
##########################
2+
# KEDRO PROJECT
3+
4+
# ignore all local configuration
5+
conf/local/**
6+
!conf/local/.gitkeep
7+
.telemetry
8+
9+
# ignore potentially sensitive credentials files
10+
conf/**/*credentials*
11+
12+
# ignore everything in the following folders
13+
data
14+
external
15+
data/**
16+
raw/**
17+
logs/**
18+
19+
# except their sub-folders
20+
!data/**/
21+
!raw/**/
22+
!logs/**/
23+
!data/**/readme.md
24+
!raw/**/readme.md
25+
26+
# also keep all .gitkeep files
27+
!.gitkeep
28+
29+
# also keep the example dataset
30+
!data/01_raw/iris.csv
31+
32+
33+
##########################
34+
# Common files
35+
36+
# IntelliJ
37+
.idea/
38+
*.iml
39+
out/
40+
.idea_modules/
41+
42+
### macOS
43+
*.DS_Store
44+
.AppleDouble
45+
.LSOverride
46+
.Trashes
47+
48+
# Vim
49+
*~
50+
.*.swo
51+
.*.swp
52+
53+
# emacs
54+
*~
55+
\#*\#
56+
/.emacs.desktop
57+
/.emacs.desktop.lock
58+
*.elc
59+
60+
# JIRA plugin
61+
atlassian-ide-plugin.xml
62+
63+
# C extensions
64+
*.so
65+
66+
### Python template
67+
# Byte-compiled / optimized / DLL files
68+
__pycache__/
69+
*.py[cod]
70+
*$py.class
71+
72+
# Distribution / packaging
73+
.Python
74+
build/
75+
develop-eggs/
76+
dist/
77+
downloads/
78+
eggs/
79+
.eggs/
80+
lib/
81+
lib64/
82+
parts/
83+
sdist/
84+
var/
85+
wheels/
86+
*.egg-info/
87+
.installed.cfg
88+
*.egg
89+
MANIFEST
90+
91+
# PyInstaller
92+
# Usually these files are written by a python script from a template
93+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
94+
*.manifest
95+
*.spec
96+
97+
# Installer logs
98+
pip-log.txt
99+
pip-delete-this-directory.txt
100+
101+
# Unit test / coverage reports
102+
htmlcov/
103+
.tox/
104+
.coverage
105+
.coverage.*
106+
.cache
107+
nosetests.xml
108+
coverage.xml
109+
*.cover
110+
.hypothesis/
111+
112+
# Translations
113+
*.mo
114+
*.pot
115+
116+
# Django stuff:
117+
*.log
118+
.static_storage/
119+
.media/
120+
local_settings.py
121+
122+
# Flask stuff:
123+
instance/
124+
.webassets-cache
125+
126+
# Scrapy stuff:
127+
.scrapy
128+
129+
# Sphinx documentation
130+
docs/_build/
131+
132+
# PyBuilder
133+
target/
134+
135+
# Jupyter Notebook
136+
.ipynb_checkpoints
137+
138+
# IPython
139+
.ipython/profile_default/history.sqlite
140+
.ipython/profile_default/startup/README
141+
142+
# pyenv
143+
.python-version
144+
145+
# celery beat schedule file
146+
celerybeat-schedule
147+
148+
# SageMath parsed files
149+
*.sage.py
150+
151+
# Environments
152+
.env
153+
.envrc
154+
.venv
155+
env/
156+
venv/
157+
ENV/
158+
env.bak/
159+
venv.bak/
160+
161+
# mkdocs documentation
162+
/site
163+
164+
# mypy
165+
.mypy_cache/
166+
167+
# vscode
168+
.vscode
169+
170+
# ShelveStore sessions
171+
sessions/
172+
173+
# MLflwo dir
174+
mlruns/
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# {{ cookiecutter.project_name }}
2+
3+
## Overview
4+
5+
This is your new Kedro project, which was generated using `Kedro {{ cookiecutter.kedro_version }}`.
6+
7+
Take a look at the [Kedro documentation](https://kedro.readthedocs.io) to get started.
8+
9+
## Rules and guidelines
10+
11+
In order to get the best out of the template:
12+
13+
* Don't remove any lines from the `.gitignore` file we provide
14+
* Make sure your results can be reproduced by following a [data engineering convention](https://kedro.readthedocs.io/en/stable/faq/faq.html#what-is-data-engineering-convention)
15+
* Don't commit data to your repository
16+
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`
17+
18+
## How to install dependencies
19+
20+
Declare any dependencies in `src/requirements.txt` for `pip` installation and `src/environment.yml` for `conda` installation.
21+
22+
To install them, run:
23+
24+
```
25+
pip install -r src/requirements.txt
26+
```
27+
28+
## How to run your Kedro pipeline
29+
30+
You can run your Kedro project with:
31+
32+
```
33+
kedro run
34+
```
35+
36+
## How to test your Kedro project
37+
38+
Have a look at the file `src/tests/test_run.py` for instructions on how to write your tests. You can run your tests as follows:
39+
40+
```
41+
kedro test
42+
```
43+
44+
To configure the coverage threshold, go to the `.coveragerc` file.
45+
46+
## Project dependencies
47+
48+
To generate or update the dependency requirements for your project:
49+
50+
```
51+
kedro build-reqs
52+
```
53+
54+
This will `pip-compile` the contents of `src/requirements.txt` into a new file `src/requirements.lock`. You can see the output of the resolution by opening `src/requirements.lock`.
55+
56+
After this, if you'd like to update your project requirements, please update `src/requirements.txt` and re-run `kedro build-reqs`.
57+
58+
[Further information about project dependencies](https://kedro.readthedocs.io/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)
59+
60+
## How to work with Kedro and notebooks
61+
62+
> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `context`, `catalog`, and `startup_error`.
63+
>
64+
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r src/requirements.txt` you will not need to take any extra steps before you use them.
65+
66+
### Jupyter
67+
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:
68+
69+
```
70+
pip install jupyter
71+
```
72+
73+
After installing Jupyter, you can start a local notebook server:
74+
75+
```
76+
kedro jupyter notebook
77+
```
78+
79+
### JupyterLab
80+
To use JupyterLab, you need to install it:
81+
82+
```
83+
pip install jupyterlab
84+
```
85+
86+
You can also start JupyterLab:
87+
88+
```
89+
kedro jupyter lab
90+
```
91+
92+
### IPython
93+
And if you want to run an IPython session:
94+
95+
```
96+
kedro ipython
97+
```
98+
99+
### How to convert notebook cells to nodes in a Kedro project
100+
You can move notebook code over into a Kedro project structure using a mixture of [cell tagging](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#release-5-0-0) and Kedro CLI commands.
101+
102+
By adding the `node` tag to a cell and running the command below, the cell's source code will be copied over to a Python file within `src/<package_name>/nodes/`:
103+
104+
```
105+
kedro jupyter convert <filepath_to_my_notebook>
106+
```
107+
> *Note:* The name of the Python file matches the name of the original notebook.
108+
109+
Alternatively, you may want to transform all your notebooks in one go. Run the following command to convert all notebook files found in the project root directory and under any of its sub-folders:
110+
111+
```
112+
kedro jupyter convert --all
113+
```
114+
115+
### How to ignore notebook output cells in `git`
116+
To automatically strip out all output cell contents before committing to `git`, you can run `kedro activate-nbstripout`. This will add a hook in `.git/config` which will run `nbstripout` before anything is committed to `git`.
117+
118+
> *Note:* Your output cells will be retained locally.
119+
120+
## Package your Kedro project
121+
122+
[Further information about building project documentation and packaging your project](https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html)

0 commit comments

Comments
 (0)