Skip to content

Commit f17f314

Browse files
committed
Update README
1 parent 591a337 commit f17f314

1 file changed

Lines changed: 20 additions & 4 deletions

File tree

README.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ This repository serves as a personal template for data science projects.
99
- Analysis scripts and notebooks are located in [`analysis/`](analysis/).
1010
- Reusable functions and modules are stored in the local package [`src/`](src/).
1111
- The package can then be installed in development mode with `pip install -e .` for easy prototyping.
12-
- [`src/config.py`](src/config.py) can be used to store variables, constants and configurations.
12+
- [`src/config.py`](src/config.py) is used to store variables, constants and configurations.
1313
- Tests for functions in [`src/`](src/) should go to [`tests/`](tests/) and follow the convention `test_*.py`.
1414

1515
Moreover, I use the following the directories that are (usually) ignored by Git:
@@ -58,15 +58,31 @@ In particular, I use:
5858

5959
### Possible extensions
6060

61+
The `src/` package could contain the following modules or sub-packages depending on the project:
62+
63+
- `utils` for utility functions.
64+
- `data_processing` for data processing functions (this could be imported as `dp`).
65+
- `features`: for extracting features.
66+
- `models`: for defining models.
67+
- `evaluation`: for evaluating performance.
68+
- `plots`: for plotting functions.
69+
6170
The repository structure could be extended with:
6271

6372
- `docs/` to store documentation (e.g. of the `src` package). For example, a documentation could be generated using [mkdocs](https://www.mkdocs.org/) or [quartodoc](https://machow.github.io/quartodoc/get-started/overview.html).
64-
- `data-raw/` to store data processing functions whose output is stored in [`data/`](data/). For example when data is scrapped from the web and cleaned before saving it.
73+
- subfolders in `data/` such as `data/raw/` for storing raw data.
6574
- `models/` to store model files.
6675

6776
### Related
6877

69-
This template is inspired by the concept of a [research compendium](https://doi.org/10.1080/00031305.2017.1375986).
78+
This template is inspired by the concept of a [research compendium](https://doi.org/10.1080/00031305.2017.1375986) and similar projects I created for R projects (e.g. [reproducible-workflow](https://github.com/ghurault/reproducible-workflow)).
7079

7180
This template is relatively simple and tailored to my needs.
72-
More sophisticated templates are available elsewhere, such as the [Cookiecutter Data Science](https://github.com/drivendataorg/cookiecutter-data-science/) template.
81+
More sophisticated templates are available elsewhere, such as:
82+
83+
- [Cookiecutter Data Science](https://github.com/drivendataorg/cookiecutter-data-science/).
84+
- [https://joserzapata.github.io/data-science-project-template/](https://joserzapata.github.io/data-science-project-template/)
85+
- [Data Science for Social Good's hitchikers guide template](https://github.com/dssg/hitchhikers-guide/tree/master/sources/curriculum/0_before_you_start/pipelines-and-project-workflow)
86+
- [https://github.com/khuyentran1401/data-science-template](https://github.com/khuyentran1401/data-science-template)
87+
88+
As opposed to other templates, this template is more focused on experimentation rather than sharing a single final product.

0 commit comments

Comments
 (0)