This tutorial covers how to work with manifest TSV files for bulk file operations in Synapse. Manifest files provide a way to track file metadata, download files with their annotations, and upload files with provenance information.
A manifest file is a tab-separated values (TSV) file that contains metadata about files in Synapse. The manifest includes:
- File paths and Synapse IDs
- Parent container IDs
- Annotations
- Provenance information (used/executed references)
When syncing files from Synapse, you can automatically generate a manifest file that captures all file metadata.
from synapseclient.models import Project
import synapseclient
synapseclient.login()
# Download a project with manifest generation at each directory level
project = Project(id="syn123456").sync_from_synapse(
path="/path/to/download",
generate_manifest="all"
)
# Or generate a single manifest at the root level only
project = Project(id="syn123456").sync_from_synapse(
path="/path/to/download",
generate_manifest="root"
)The generate_manifest parameter accepts three values:
| Value | Description |
|---|---|
"suppress" |
(Default) Do not create any manifest files |
"root" |
Create a single manifest at the root download path |
"all" |
Create a manifest in each directory level |
You can also generate a manifest after syncing:
from synapseclient.models import Project
import synapseclient
synapseclient.login()
# First sync without manifest
project = Project(id="syn123456").sync_from_synapse(
path="/path/to/download"
)
# Then generate manifest separately
manifest_path = project.generate_manifest(
path="/path/to/download",
manifest_scope="root"
)
print(f"Manifest created at: {manifest_path}")The generated manifest file (SYNAPSE_METADATA_MANIFEST.tsv) contains the following columns:
| Column | Description |
|---|---|
path |
Local file path |
parent |
Synapse ID of the parent container |
name |
File name in Synapse |
id |
Synapse file ID |
synapseStore |
Whether the file is stored in Synapse |
contentType |
MIME type of the file |
used |
Provenance - entities used to create this file |
executed |
Provenance - code/scripts executed |
activityName |
Name of the provenance activity |
activityDescription |
Description of the provenance activity |
| custom columns | Any annotations on the files |
path parent name id synapseStore contentType used executed activityName activityDescription study dataType
/data/file1.csv syn123 file1.csv syn456 True text/csv Data Processing Study1 RNA-seq
/data/file2.csv syn123 file2.csv syn789 True text/csv syn456 Analysis Processed from file1 Study1 RNA-seqYou can upload files to Synapse using a manifest file:
from synapseclient.models import Project
import synapseclient
synapseclient.login()
# Upload files from a manifest
files = Project.from_manifest(
manifest_path="/path/to/manifest.tsv",
parent_id="syn123456"
)
for file in files:
print(f"Uploaded: {file.name} ({file.id})")Before uploading, you can validate the manifest:
from synapseclient.models import Project
# Validate without uploading
is_valid, errors = Project.validate_manifest(
manifest_path="/path/to/manifest.tsv"
)
if is_valid:
print("Manifest is valid, ready for upload")
else:
for error in errors:
print(f"Error: {error}")Or use the dry_run option to validate the manifest and see what would be uploaded without making changes:
# Dry run - validates and returns what would be uploaded, but doesn't upload
files = Project.from_manifest(
manifest_path="/path/to/manifest.tsv",
parent_id="syn123456",
dry_run=True # Validate only, no actual upload
)
print(f"Would upload {len(files)} files")The dry_run parameter is useful for:
- Validating manifest format before committing to an upload
- Testing your manifest configuration
- Previewing which files will be affected
Annotations in the manifest are automatically handled:
When generating a manifest, all file annotations are included as additional columns:
project = Project(id="syn123456").sync_from_synapse(
path="/path/to/download",
generate_manifest="root"
)
# Annotations appear as columns in the manifestAny columns in the manifest that aren't standard fields become annotations:
path parent study dataType specimenType
/data/file1.csv syn123 Study1 RNA-seq tissuefiles = Project.from_manifest(
manifest_path="/path/to/manifest.tsv",
parent_id="syn123456",
merge_existing_annotations=True # Merge with existing annotations
)Provenance information is captured in the used, executed, activityName, and activityDescription columns:
project = Project(id="syn123456").sync_from_synapse(
path="/path/to/download",
include_activity=True, # Include provenance
generate_manifest="root"
)You can specify provenance in the manifest:
path parent used executed activityName activityDescription
/data/output.csv syn123 syn456;syn789 https://github.com/repo/script.py Analysis Generated from input files- Multiple references are separated by semicolons (
;) - References can be Synapse IDs, URLs, or local file paths
The manifest functionality integrates with Synapse's Download List feature. You can generate a manifest directly from your Synapse download list, which is useful for exporting metadata about files you've queued for download in the Synapse web interface.
from synapseclient.models import Project
import synapseclient
synapseclient.login()
# Generate a manifest from your Synapse download list
manifest_path = Project.generate_download_list_manifest(
download_path="/path/to/save/manifest"
)
print(f"Manifest downloaded to: {manifest_path}")You can customize the manifest format:
from synapseclient.models import Project
import synapseclient
synapseclient.login()
# Generate a tab-separated manifest
manifest_path = Project.generate_download_list_manifest(
download_path="/path/to/save/manifest",
csv_separator="\t", # Tab-separated
include_header=True
)For more control over the manifest generation process, use the DownloadListManifestRequest class directly:
from synapseclient.models import DownloadListManifestRequest, CsvTableDescriptor
import synapseclient
synapseclient.login()
# Create a request with custom CSV formatting
request = DownloadListManifestRequest(
csv_table_descriptor=CsvTableDescriptor(
separator="\t",
quote_character='"',
is_first_line_header=True
)
)
# Send the job and wait for completion
request.send_job_and_wait()
# Download the generated manifest
manifest_path = request.download_manifest(download_path="/path/to/download")
print(f"Manifest file handle: {request.result_file_handle_id}")-
Use
generate_manifest="root"for simple cases - Creates a single manifest at the root level, easier to manage. -
Use
generate_manifest="all"for complex hierarchies - Creates manifests at each directory level, useful for large projects with many subdirectories. -
Validate manifests before upload - Use
validate_manifest()ordry_run=Trueto catch errors early. -
Include provenance information - Set
include_activity=Truewhen syncing to capture provenance in the manifest. -
Backup your manifest - The manifest is a valuable record of your data and its metadata.
All manifest operations are available as async methods:
import asyncio
from synapseclient.models import Project
import synapseclient
async def main():
synapseclient.login()
# Async sync with manifest
project = Project(id="syn123456")
await project.sync_from_synapse_async(
path="/path/to/download",
generate_manifest="root"
)
# Async manifest generation
manifest_path = await project.generate_manifest_async(
path="/path/to/download",
manifest_scope="root"
)
# Async upload from manifest
files = await Project.from_manifest_async(
manifest_path="/path/to/manifest.tsv",
parent_id="syn123456"
)
asyncio.run(main())