Skip to content

Latest commit

 

History

History
122 lines (83 loc) · 4.56 KB

File metadata and controls

122 lines (83 loc) · 4.56 KB

Activity/Provenance

Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. Data analysis often involves multiple steps to go from a raw data file to a finished analysis. Synapse's provenance tools allow users to keep track of each step involved in an analysis and share those steps with other users.

The model Synapse uses for provenance is based on the W3C provenance spec where items are derived from an activity which has components that were used and components that were executed. Think of the used items as input files and executed items as software or code. Both used and executed items can reside in Synapse or in URLs such as a link to a GitHub commit or a link to a specific version of a software tool.

Dive into Activity/Provenance further here

Tutorial Purpose

In this tutorial you will:

  1. Add a new Activity to your File
  2. Add a new Activity to a specific version of your File
  3. Print stored activities on your File
  4. Delete an activity

Prerequisites

  • In order to follow this tutorial you will need to have a Project created with at least one File stored in a Folder named biospecimen_experiment_1.

1. Add a new Activity to your File

First retrieve the project, folder, and file we want to track provenance for

{!docs/tutorials/python/tutorial_scripts/activity.py!lines=6-24}

Create an Activity and attach it to the file

An Activity captures what was used (input data and reference URLs) and executed (code and software) to produce a file. Here we record a QC pipeline run on the biospecimen data:

{!docs/tutorials/python/tutorial_scripts/activity.py!lines=26-53}
You'll notice the output looks like:
Stored file: fileA.txt (version 1) with activity: Quality Control Analysis

2. Add a new Activity to a specific version of your File

Each time you store an updated file, Synapse creates a new version. You can associate a distinct activity with each version to capture the full history of how the data evolved. Here we record a downstream analysis step that used the QC-passed data from version 1:

{!docs/tutorials/python/tutorial_scripts/activity.py!lines=55-88}
You'll notice the output looks like:
Stored activity 'Downstream Analysis' on file fileA.txt (version 1)

3. Print stored activities on your File

Use Activity.from_parent() to retrieve the provenance for any version of a file. Pass a parent_version_number to retrieve the activity for a specific older version:

{!docs/tutorials/python/tutorial_scripts/activity.py!lines=90-108}
You'll notice the output looks like:
Activity on latest version (v1):
  Name: Downstream Analysis
  Description: Downstream analysis of QC-passed biospecimen samples.
  Used: UsedURL(name='Seurat v5.0.0', url='https://github.com/satijalab/seurat/releases/tag/v5.0.0')
  Used: UsedEntity(target_id='syn12345678', target_version_number=1)
  Executed: UsedURL(name='Downstream Analysis Script', url='https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/downstream_analysis.py')

Activity on version 1:
  Name: Quality Control Analysis
  Description: Initial QC analysis of biospecimen data using the FastQC pipeline.

4. Delete an activity

Deleting an activity disassociates it from the entity. Once the activity is no longer referenced by any entity, Synapse removes it entirely. If the same activity is shared across multiple entities you will need to call Activity.delete() on each of them:

{!docs/tutorials/python/tutorial_scripts/activity.py!lines=110-118}
You'll notice the output looks like:
Deleted activity from: fileA.txt (version 1)
Activity after deletion: None

Source code for this tutorial

Click to show me
{!docs/tutorials/python/tutorial_scripts/activity.py!}

References used in this tutorial

  • [Activity][synapseclient.models.Activity]
  • [UsedEntity][synapseclient.models.UsedEntity]
  • [UsedURL][synapseclient.models.UsedURL]
  • [File][file-reference-sync]
  • [syn.login][synapseclient.Synapse.login]