Synapse files can be created by uploading content from your local computer or linking to digital files on the web.
Files in Synapse always have a “parent”, which could be a project or a folder. You can organize collections of files into folders and sub-folders, just as you would on your local computer.
Note: You may optionally follow the Uploading data in bulk tutorial instead. The bulk tutorial may fit your needs better as it limits the amount of code that you are required to write and maintain.
This tutorial will follow a Flattened Data Layout. With this example layout:
.
├── biospecimen_experiment_1
│ ├── fileA.txt
│ └── fileB.txt
├── biospecimen_experiment_2
│ ├── fileC.txt
│ └── fileD.txt
├── single_cell_RNAseq_batch_1
│ ├── SRR12345678_R1.fastq.gz
│ └── SRR12345678_R2.fastq.gz
└── single_cell_RNAseq_batch_2
├── SRR12345678_R1.fastq.gz
└── SRR12345678_R2.fastq.gz
In this tutorial you will:
- Upload several files to Synapse
- Print stored attributes about your files
- List all Folders and Files within my project
- Make sure that you have completed the Folder tutorial.
- The tutorial assumes you have a number of files ready to upload. If you do not, create test or dummy files. You may also use these dummy files used during the creation of these tutorials. These are text files with example file extensions that a researcher may be using.
!!! warning "Uploading Large Files" If you are uploading very large files (>100 GB each), consider using sequential uploads with async API instead.
For large file uploads, see the `execute_walk_file_sequential()` function in [uploadBenchmark.py](https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/docs/scripts/uploadBenchmark.py#L286) as a reference implementation. This approach uses `asyncio.run(file.store_async())` with the newer async API, which has been optimized for handling very large files efficiently. In benchmarks, this pattern successfully uploaded 45 files of 100 GB each (4.5 TB total) in approximately 20.6 hours.
{!docs/tutorials/python/tutorial_scripts/file.py!lines=5-30}{!docs/tutorials/python/tutorial_scripts/file.py!lines=32-75}{!docs/tutorials/python/tutorial_scripts/file.py!lines=77-85}Each file being uploaded has an upload progress bar:
##################################################
Uploading file to Synapse storage
##################################################
Uploading [####################]100.00% 2.0bytes/2.0bytes (1.8bytes/s) SRR12345678_R1.fastq.gz Done...
{!docs/tutorials/python/tutorial_scripts/file.py!lines=87-99}You'll notice the output looks like:
``` My file ID is: syn53205687 The parent ID of my file is: syn53205629 I created my file on: 2023-12-28T21:55:17.971Z The ID of the user that created my file is: 3481671 My file was last modified on: 2023-12-28T21:55:17.971Z ```Now that your project has a number of Folders and Files let's explore how we can traverse the content stored within the Project.
{!docs/tutorials/python/tutorial_scripts/file.py!lines=101-112}The result of walking your project structure should look something like:
``` Directory (syn60109540): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1 Directory (syn60109543): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2 Directory (syn60109534): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1 Directory (syn60109537): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2 File (syn60115444): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileA.txt File (syn60115457): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileB.txt File (syn60115472): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileC.txt File (syn60115485): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileD.txt File (syn60115498): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz File (syn60115513): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz File (syn60115526): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz File (syn60115539): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz ```Now that you have created your files you'll be able to inspect this on the Files tab of your project in the synapse web UI. It should look similar to:
Click to show me
{!docs/tutorials/python/tutorial_scripts/file.py!}- [File][file-reference-sync]
- [syn.login][synapseclient.Synapse.login]
- [syn.findEntityId][synapseclient.Synapse.findEntityId]
- [syn.store][synapseclient.Synapse.store]
- [synapseutils.walk][]
