You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-4Lines changed: 20 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ This repository serves as a personal template for data science projects.
9
9
- Analysis scripts and notebooks are located in [`analysis/`](analysis/).
10
10
- Reusable functions and modules are stored in the local package [`src/`](src/).
11
11
- The package can then be installed in development mode with `pip install -e .` for easy prototyping.
12
-
-[`src/config.py`](src/config.py)can be used to store variables, constants and configurations.
12
+
-[`src/config.py`](src/config.py)is used to store variables, constants and configurations.
13
13
- Tests for functions in [`src/`](src/) should go to [`tests/`](tests/) and follow the convention `test_*.py`.
14
14
15
15
Moreover, I use the following the directories that are (usually) ignored by Git:
@@ -58,15 +58,31 @@ In particular, I use:
58
58
59
59
### Possible extensions
60
60
61
+
The `src/` package could contain the following modules or sub-packages depending on the project:
62
+
63
+
-`utils` for utility functions.
64
+
-`data_processing` for data processing functions (this could be imported as `dp`).
65
+
-`features`: for extracting features.
66
+
-`models`: for defining models.
67
+
-`evaluation`: for evaluating performance.
68
+
-`plots`: for plotting functions.
69
+
61
70
The repository structure could be extended with:
62
71
63
72
-`docs/` to store documentation (e.g. of the `src` package). For example, a documentation could be generated using [mkdocs](https://www.mkdocs.org/) or [quartodoc](https://machow.github.io/quartodoc/get-started/overview.html).
64
-
-`data-raw/`to store data processing functions whose output is stored in [`data/`](data/). For example when data is scrapped from the web and cleaned before saving it.
73
+
-subfolders in `data/`such as `data/raw/` for storing raw data.
65
74
-`models/` to store model files.
66
75
67
76
### Related
68
77
69
-
This template is inspired by the concept of a [research compendium](https://doi.org/10.1080/00031305.2017.1375986).
78
+
This template is inspired by the concept of a [research compendium](https://doi.org/10.1080/00031305.2017.1375986) and similar projects I created for R projects (e.g. [reproducible-workflow](https://github.com/ghurault/reproducible-workflow)).
70
79
71
80
This template is relatively simple and tailored to my needs.
72
-
More sophisticated templates are available elsewhere, such as the [Cookiecutter Data Science](https://github.com/drivendataorg/cookiecutter-data-science/) template.
81
+
More sophisticated templates are available elsewhere, such as:
82
+
83
+
-[Cookiecutter Data Science](https://github.com/drivendataorg/cookiecutter-data-science/).
-[Data Science for Social Good's hitchikers guide template](https://github.com/dssg/hitchhikers-guide/tree/master/sources/curriculum/0_before_you_start/pipelines-and-project-workflow)
0 commit comments