Skip to content

Latest commit

 

History

History
151 lines (108 loc) · 6 KB

File metadata and controls

151 lines (108 loc) · 6 KB

Files in Synapse

Synapse files can be created by uploading content from your local computer or linking to digital files on the web.

Files in Synapse always have a “parent”, which could be a project or a folder. You can organize collections of files into folders and sub-folders, just as you would on your local computer.

Read more

Note: You may optionally follow the Uploading data in bulk tutorial instead. The bulk tutorial may fit your needs better as it limits the amount of code that you are required to write and maintain.

This tutorial will follow a Flattened Data Layout. With this example layout:

.
├── biospecimen_experiment_1
│   ├── fileA.txt
│   └── fileB.txt
├── biospecimen_experiment_2
│   ├── fileC.txt
│   └── fileD.txt
├── single_cell_RNAseq_batch_1
│   ├── SRR12345678_R1.fastq.gz
│   └── SRR12345678_R2.fastq.gz
└── single_cell_RNAseq_batch_2
    ├── SRR12345678_R1.fastq.gz
    └── SRR12345678_R2.fastq.gz

Tutorial Purpose

In this tutorial you will:

  1. Upload several files to Synapse
  2. Print stored attributes about your files
  3. List all Folders and Files within my project

Prerequisites

1. Upload several files to Synapse

!!! warning "Uploading Large Files" If you are uploading very large files (>100 GB each), consider using sequential uploads with async API instead.

For large file uploads, see the `execute_walk_file_sequential()` function in [uploadBenchmark.py](https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/docs/scripts/uploadBenchmark.py#L286) as a reference implementation. This approach uses `asyncio.run(file.store_async())` with the newer async API, which has been optimized for handling very large files efficiently. In benchmarks, this pattern successfully uploaded 45 files of 100 GB each (4.5 TB total) in approximately 20.6 hours.

First let's retrieve all of the Synapse IDs we are going to use

{!docs/tutorials/python/tutorial_scripts/file.py!lines=5-30}

Next let's create all of the File objects to upload content

{!docs/tutorials/python/tutorial_scripts/file.py!lines=32-75}

Finally we'll store the files in Synapse

{!docs/tutorials/python/tutorial_scripts/file.py!lines=77-85}
Each file being uploaded has an upload progress bar:
##################################################
 Uploading file to Synapse storage
##################################################

Uploading [####################]100.00%   2.0bytes/2.0bytes (1.8bytes/s) SRR12345678_R1.fastq.gz Done...

2. Print stored attributes about your files

{!docs/tutorials/python/tutorial_scripts/file.py!lines=87-99}
You'll notice the output looks like: ``` My file ID is: syn53205687 The parent ID of my file is: syn53205629 I created my file on: 2023-12-28T21:55:17.971Z The ID of the user that created my file is: 3481671 My file was last modified on: 2023-12-28T21:55:17.971Z ```

3. List all Folders and Files within my project

Now that your project has a number of Folders and Files let's explore how we can traverse the content stored within the Project.

{!docs/tutorials/python/tutorial_scripts/file.py!lines=101-112}
The result of walking your project structure should look something like: ``` Directory (syn60109540): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1 Directory (syn60109543): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2 Directory (syn60109534): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1 Directory (syn60109537): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2 File (syn60115444): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileA.txt File (syn60115457): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileB.txt File (syn60115472): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileC.txt File (syn60115485): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileD.txt File (syn60115498): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz File (syn60115513): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz File (syn60115526): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz File (syn60115539): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz ```

Results

Now that you have created your files you'll be able to inspect this on the Files tab of your project in the synapse web UI. It should look similar to:

file

Source code for this tutorial

Click to show me
{!docs/tutorials/python/tutorial_scripts/file.py!}

References used in this tutorial

  • [File][file-reference-sync]
  • [syn.login][synapseclient.Synapse.login]
  • [syn.findEntityId][synapseclient.Synapse.findEntityId]
  • [syn.store][synapseclient.Synapse.store]
  • [synapseutils.walk][]