Skip to content
This repository was archived by the owner on Dec 31, 2025. It is now read-only.

Commit cea4317

Browse files
committed
clean up datalad section, add datalad run
1 parent e19ae38 commit cea4317

2 files changed

Lines changed: 21 additions & 5 deletions

File tree

book/data_management.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -966,16 +966,19 @@ Let's say that we want to create a new dataset on our local computer that will b
966966

967967
```bash
968968
➤ datalad create -d . my_datalad_repo
969+
969970
add(ok): my_datalad_repo (dataset)
970971
add(ok): .gitmodules (file)
971972
save(ok): . (dataset)
972973
create(ok): my_datalad_repo (dataset)
974+
973975
```
974976

975977
This creates a new directory, called `my_datalad_repo` and sets it up as a DataLad subdataset within our main git repo. We then download some data files from another project using the `datalad download-url` function, which will both download the data and save them into the datalad dataset:
976978

977979
```bash
978980
➤ datalad download-url -d . -O my_datalad_repo/data/ https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv
981+
979982
[INFO ] Downloading 'https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv' into '/Users/poldrack/Dropbox/code/BetterCodeBetterScience/my_datalad_repo/data/'
980983
download_url(ok): /Users/poldrack/Dropbox/code/BetterCodeBetterScience/my_datalad_repo/data/demographics.csv (file)
981984
add(ok): data/demographics.csv (file)
@@ -984,7 +987,9 @@ add(ok): my_datalad_repo (dataset)
984987
add(ok): .gitmodules (file)
985988
save(ok): . (dataset)
986989

990+
987991
➤ datalad download-url -d . -O my_datalad_repo/data/ https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv
992+
988993
[INFO ] Downloading 'https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv' into '/Users/poldrack/Dropbox/code/BetterCodeBetterScience/my_datalad_repo/data/'
989994
download_url(ok): /Users/poldrack/Dropbox/code/BetterCodeBetterScience/my_datalad_repo/data/meaningful_variables_clean.csv (file)
990995
add(ok): data/meaningful_variables_clean.csv (file)
@@ -1026,6 +1031,7 @@ Now let's say that we want to make a change to one of the files and save the cha
10261031

10271032
```bash
10281033
➤ datalad unlock my_datalad_repo/data/demographics.csv
1034+
10291035
unlock(ok): my_datalad_repo/data/demographics.csv (file)
10301036
```
10311037

@@ -1040,19 +1046,19 @@ We can now use `datalad status` to see that the file has been modified:
10401046

10411047
```bash
10421048
➤ datalad status
1043-
modified: book/data_management.md (file)
1049+
10441050
modified: my_datalad_repo (dataset)
10451051
```
10461052

10471053
And we can then save it using `datalad save`:
10481054
```bash
1049-
➤ datalad save -m "removed Motivation variables from demographics.csv"
1055+
➤ datalad save -d . -m "Modified demographics.csv" my_datalad_repo/data/demographics.csv
10501056

10511057
add(ok): data/demographics.csv (file)
1058+
save(ok): my_datalad_repo (dataset)
1059+
add(ok): my_datalad_repo (dataset)
1060+
add(ok): .gitmodules (file)
10521061
save(ok): . (dataset)
1053-
action summary:
1054-
add (ok: 1)
1055-
save (ok: 1)
10561062
```
10571063

10581064
DataLad doesn't have a staging area like `git` does, so there is no need to first add and then commit the file; `datalad save` is equivalent to adding and then committing the changes. If we then check the status we see that there are no changes waiting to be saved:

scripts/datalad_test.sh

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,17 @@
1+
2+
13
datalad create -d . my_datalad_repo
4+
25
datalad download-url -d . -O my_datalad_repo/data/ https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv
6+
37
datalad download-url -d . -O my_datalad_repo/data/ https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/meaningful_variables_clean.csv
8+
49
datalad unlock my_datalad_repo/data/demographics.csv
10+
511
python src/BetterCodeBetterScience/modify_data.py my_datalad_repo/data/demographics.csv
12+
13+
datalad status
14+
615
datalad save -d . -m "Modified demographics.csv" my_datalad_repo/data/demographics.csv
16+
717
datalad status

0 commit comments

Comments
 (0)