Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 106 additions & 5 deletions docs/tutorials/python/activity.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
# Activity/Provenance
[See the current available tutorial](../python_client.md#provenance)

![Under Construction](../../assets/under_construction.png)

Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. Data analysis often involves multiple steps to go from a raw data file to a finished analysis. Synapse’s provenance tools allow users to keep track of each step involved in an analysis and share those steps with other users.
Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. Data analysis often involves multiple steps to go from a raw data file to a finished analysis. Synapse's provenance tools allow users to keep track of each step involved in an analysis and share those steps with other users.

The model Synapse uses for provenance is based on the [W3C provenance spec](https://www.w3.org/TR/prov-n/) where items are derived from an activity which has components that were **used** and components that were **executed**. Think of the **used** items as input files and **executed** items as software or code. Both **used** and **executed** items can reside in Synapse or in URLs such as a link to a GitHub commit or a link to a specific version of a software tool.

Expand All @@ -18,4 +15,108 @@ In this tutorial you will:
1. Delete an activity

## Prerequisites
- In order to follow this tutorial you will need to have a [Project](./project.md) created with at least one [File](./file.md) with multiple [Versions](./versions.md).
- In order to follow this tutorial you will need to have a [Project](./project.md) created with at least one [File](./file.md) stored in a Folder named `biospecimen_experiment_1`.

## 1. Add a new Activity to your File

#### First retrieve the project, folder, and file we want to track provenance for

```python
{!docs/tutorials/python/tutorial_scripts/activity.py!lines=6-24}
```

#### Create an Activity and attach it to the file

An `Activity` captures what was **used** (input data and reference URLs) and **executed** (code and software) to produce a file. Here we record a QC pipeline run on the biospecimen data:

```python
{!docs/tutorials/python/tutorial_scripts/activity.py!lines=26-53}
```

<details class="example">
<summary>You'll notice the output looks like:</summary>

```
Stored file: fileA.txt (version 1) with activity: Quality Control Analysis
```
</details>


## 2. Add a new Activity to a specific version of your File

Each time you store an updated file, Synapse creates a new version. You can associate a distinct activity with each version to capture the full history of how the data evolved. Here we record a downstream analysis step that used the QC-passed data from version 1:

```python
{!docs/tutorials/python/tutorial_scripts/activity.py!lines=55-88}
```
Comment thread
thomasyu888 marked this conversation as resolved.

<details class="example">
<summary>You'll notice the output looks like:</summary>

```
Stored activity 'Downstream Analysis' on file fileA.txt (version 1)
```
</details>


## 3. Print stored activities on your File

Use `Activity.from_parent()` to retrieve the provenance for any version of a file. Pass a `parent_version_number` to retrieve the activity for a specific older version:

```python
{!docs/tutorials/python/tutorial_scripts/activity.py!lines=90-108}
```

<details class="example">
<summary>You'll notice the output looks like:</summary>

```
Activity on latest version (v1):
Name: Downstream Analysis
Description: Downstream analysis of QC-passed biospecimen samples.
Used: UsedURL(name='Seurat v5.0.0', url='https://github.com/satijalab/seurat/releases/tag/v5.0.0')
Used: UsedEntity(target_id='syn12345678', target_version_number=1)
Executed: UsedURL(name='Downstream Analysis Script', url='https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/downstream_analysis.py')

Activity on version 1:
Name: Quality Control Analysis
Description: Initial QC analysis of biospecimen data using the FastQC pipeline.
```
Comment thread
thomasyu888 marked this conversation as resolved.
</details>


## 4. Delete an activity

Deleting an activity disassociates it from the entity. Once the activity is no longer referenced by any entity, Synapse removes it entirely. If the same activity is shared across multiple entities you will need to call `Activity.delete()` on each of them:

```python
{!docs/tutorials/python/tutorial_scripts/activity.py!lines=110-118}
```

<details class="example">
<summary>You'll notice the output looks like:</summary>

```
Deleted activity from: fileA.txt (version 1)
Activity after deletion: None
```
</details>


## Source code for this tutorial

<details class="quote">
<summary>Click to show me</summary>

```python
{!docs/tutorials/python/tutorial_scripts/activity.py!}
```
</details>

## References used in this tutorial

- [Activity][synapseclient.models.Activity]
- [UsedEntity][synapseclient.models.UsedEntity]
- [UsedURL][synapseclient.models.UsedURL]
- [File][file-reference-sync]
- [syn.login][synapseclient.Synapse.login]
118 changes: 118 additions & 0 deletions docs/tutorials/python/tutorial_scripts/activity.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""
Here is where you'll find the code for the Activity/Provenance tutorial.
"""

# Step 1: Add a new Activity to your File
import synapseclient
from synapseclient.models import Activity, File, Folder, Project, UsedEntity, UsedURL

syn = synapseclient.login()

# Retrieve the project and folder IDs
my_project_id = (
Project(name="My uniquely named project about Alzheimer's Disease").get().id
Comment thread
thomasyu888 marked this conversation as resolved.
Outdated
)

biospecimen_experiment_1_folder = Folder(
name="biospecimen_experiment_1", parent_id=my_project_id
).get()
Comment thread
thomasyu888 marked this conversation as resolved.

# Retrieve an existing file from the project
my_file = File(
name="fileA.txt",
parent_id=biospecimen_experiment_1_folder.id,
).get()

# Create an Activity describing the analysis step that produced this file
analysis_activity = Activity(
name="Quality Control Analysis",
description="Initial QC analysis of biospecimen data using the FastQC pipeline.",
used=[
UsedURL(
name="FastQC v0.12.1",
url="https://github.com/s-andrews/FastQC/releases/tag/v0.12.1",
),
UsedEntity(target_id=my_project_id),
],
executed=[
UsedURL(
name="QC Analysis Script",
url="https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/qc_analysis.py",
),
],
)

# Attach the activity to the file and store it
my_file.activity = analysis_activity
my_file = my_file.store()
Comment thread
thomasyu888 marked this conversation as resolved.

first_version_number = my_file.version_number
print(
f"Stored file: {my_file.name} (version {first_version_number}) "
f"with activity: {my_file.activity.name}"
)

# Step 2: Add a new Activity to a specific version of your File
# Each time you store an updated file, Synapse creates a new version.
# You can track a different activity for each version to capture the
# full history of what was done to produce each version of the file.
downstream_activity = Activity(
name="Downstream Analysis",
description="Downstream analysis of QC-passed biospecimen samples.",
used=[
UsedURL(
name="Seurat v5.0.0",
url="https://github.com/satijalab/seurat/releases/tag/v5.0.0",
),
UsedEntity(
target_id=my_file.id,
target_version_number=first_version_number,
),
],
executed=[
UsedURL(
name="Downstream Analysis Script",
url="https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/downstream_analysis.py",
),
],
)

# Store the activity directly on the file using Activity.store()
second_version_file = File(id=my_file.id).get()
downstream_activity.store(parent=second_version_file)

second_version_number = second_version_file.version_number
Comment thread
thomasyu888 marked this conversation as resolved.
Outdated
print(
f"Stored activity '{downstream_activity.name}' on file "
f"{second_version_file.name} (version {second_version_number})"
)
Comment thread
thomasyu888 marked this conversation as resolved.

# Step 3: Print stored activities on your File
# Retrieve and print the activity on the latest version of the file
current_activity = Activity.from_parent(parent=my_file)
print(f"\nActivity on latest version (v{my_file.version_number}):")
print(f" Name: {current_activity.name}")
Comment thread
thomasyu888 marked this conversation as resolved.
print(f" Description: {current_activity.description}")
for item in current_activity.used:
print(f" Used: {item}")
for item in current_activity.executed:
print(f" Executed: {item}")

# Retrieve and print the activity for the first version
first_activity = Activity.from_parent(
parent=my_file,
parent_version_number=first_version_number,
)
print(f"\nActivity on version {first_version_number}:")
print(f" Name: {first_activity.name}")
print(f" Description: {first_activity.description}")

# Step 4: Delete an activity
# Deleting an activity disassociates it from the entity and removes it from
# Synapse once it is no longer referenced by any entity.
Activity.delete(parent=my_file)
print(f"\nDeleted activity from: {my_file.name} (version {my_file.version_number})")
Comment thread
thomasyu888 marked this conversation as resolved.
Outdated

# Verify the activity was removed
deleted_activity = Activity.from_parent(parent=my_file)
print(f"Activity after deletion: {deleted_activity}")
Loading