11# Uploading data in bulk
2+
23This tutorial will follow a
34[ Flattened Data Layout] ( ../../explanations/structuring_your_project.md#flattened-data-layout-example ) .
45With a project that has this example layout:
@@ -19,10 +20,11 @@ With a project that has this example layout:
1920```
2021
2122## Tutorial Purpose
23+
2224In this tutorial you will:
2325
24261 . Find the synapse ID of your project
25- 1 . Create a manifest TSV file to upload data in bulk
27+ 1 . Create a manifest CSV file to upload data in bulk
26281 . Upload all of the files for our project
27291 . Add an annotation to all of our files
28301 . Add a provenance/activity record to one of our files
@@ -40,56 +42,59 @@ In this tutorial you will:
4042
4143
4244## Prerequisites
45+
4346* Make sure that you have completed the following tutorials:
4447 * [ Project] ( ./project.md )
4548* This tutorial is setup to upload the data from ` ~/my_ad_project ` , make sure that this or
4649another desired directory exists.
4750* Pandas is used in this tutorial. Refer to our
4851[ installation guide] ( ../installation.md#pypi ) to install it. Feel free to skip this
4952portion of the tutorial if you do not wish to use Pandas. You may also use external
50- tools to open and manipulate Tab Separated Value (TSV) files.
53+ tools to open and manipulate CSV files.
5154
5255
5356## 1. Find the synapse ID of your project
5457
5558First let's set up some constants we'll use in this script, and find the ID of our project
5659``` python
57- {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=5 - 21 }
60+ {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=5 - 22 }
5861```
5962
60- ## 2. Create a manifest TSV file to upload data in bulk
63+ ## 2. Create a manifest CSV file to upload data in bulk
6164
62- Let's "walk" our directory on disk to create a manifest file for upload
65+ Let's walk our local directory and build a CSV manifest with the required ` path ` and
66+ ` parentId ` columns. In a future release ` Project.sync_from_synapse ` will support
67+ writing a manifest CSV directly; for now we build one with pandas.
6368``` python
64- {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=23 - 33 }
69+ {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=23 - 44 }
6570```
6671
6772<details class =" example " >
68- <summary >After this has been run if you inspect the TSV file created you'll see it will look
73+ <summary >After this has been run if you inspect the CSV file created you'll see it will look
6974similar to this:</summary >
7075```
71- path parent
72- /home/user_name/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz syn60109537
73- /home/user_name/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz syn60109537
74- /home/user_name/my_ad_project/biospecimen_experiment_2/fileD.txt syn60109543
75- /home/user_name/my_ad_project/biospecimen_experiment_2/fileC.txt syn60109543
76- /home/user_name/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz syn60109534
77- /home/user_name/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz syn60109534
78- /home/user_name/my_ad_project/biospecimen_experiment_1/fileA.txt syn60109540
79- /home/user_name/my_ad_project/biospecimen_experiment_1/fileB.txt syn60109540
76+ path,parentId
77+ /home/user_name/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz,syn60109500
78+ /home/user_name/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz,syn60109500
79+ /home/user_name/my_ad_project/biospecimen_experiment_2/fileD.txt,syn60109500
80+ /home/user_name/my_ad_project/biospecimen_experiment_2/fileC.txt,syn60109500
81+ /home/user_name/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz,syn60109500
82+ /home/user_name/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz,syn60109500
83+ /home/user_name/my_ad_project/biospecimen_experiment_1/fileA.txt,syn60109500
84+ /home/user_name/my_ad_project/biospecimen_experiment_1/fileB.txt,syn60109500
8085```
8186</details >
8287
8388## 3. Upload the data in bulk
8489``` python
85- {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=35 - 37 }
90+ {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=46 - 48 }
8691```
8792
8893
8994<details class =" example " >
9095 <summary >While this is running you'll see output in your console similar to:</summary >
9196```
92- Validating manifest: /home/user_name/manifest-for-upload.tsv
97+ Validating manifest: /home/user_name/manifest-for-upload.csv
9398Validating that all paths exist...
9499Validating that all files are unique...
95100Validating that all the files are not empty...
@@ -103,12 +108,12 @@ Uploading 8 files: 100%|██████████████████
103108
104109
105110## 4. Add an annotation to our manifest file
106- At this point in the tutorial we will start to use pandas to manipulate a TSV file . If
111+ At this point in the tutorial we will use pandas to manipulate the CSV manifest . If
107112you are not comfortable with pandas you may use any tool that can open and manipulate
108- TSV such as excel or google sheets .
113+ CSV files such as Excel or Google Sheets .
109114
110115``` python
111- {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=39 - 55 }
116+ {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=50 - 63 }
112117```
113118
114119Now that you have uploaded and annotated your files you'll be able to inspect your data
@@ -123,14 +128,14 @@ Let's create an [Activity/Provenance](../../explanations/domain_models_of_synaps
123128record for one of our files. In otherwords, we will record the steps taken to generate
124129the file.
125130
126- In this code we are finding a row in our TSV file and pointing to the file path of
131+ In this code we are finding a row in our CSV file and pointing to the file path of
127132another file within our manifest. By doing this we are creating a relationship between
128133the two files. This is a simple example of how you can create a provenance record in
129134Synapse. Additionally we'll link off to a sample URL that describes a process that we
130135may have executed to generate the file.
131136
132137``` python
133- {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=57 - 83 }
138+ {! docs/ tutorials/ python/ tutorial_scripts/ upload_data_in_bulk.py! lines=68 - 92 }
134139```
135140
136141After running this code we may again inspect the synapse web UI. In this screenshot i've
@@ -157,7 +162,6 @@ navigated to the Files tab and selected the file that we added a Provenance reco
157162
158163- [ syn.login] [ synapseclient.Synapse.login ]
159164- [ syn.findEntityId] [ synapseclient.Synapse.findEntityId ]
160- - [ synapseutils.generate_sync_manifest] [ ]
161165- [ Project.sync_to_synapse] [ synapseclient.models.mixins.StorableContainer.sync_to_synapse ]
162- - [ synapseutils.syncToSynapse ] [ ] * (deprecated) *
166+ - [ Manifest CSV format ] ( ../../explanations/manifest_csv.md )
163167- [ Activity/Provenance] ( ../../explanations/domain_models_of_synapse.md#activityprovenance )
0 commit comments