Skip to content

Commit 520760b

Browse files
thomasyu888claude
andauthored
[SYNPY-1375] Add Activity/Provenance tutorial (#1351)
* [SYNPY-1375]: Add Activity/Provenance tutorial Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Use --8<-- anchors to avoid having to maintain specific coding lines * Add activity to mkdocs * Clean up activity script * Update activity * Update tutorial to create files * Update docs --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent bd5c718 commit 520760b

3 files changed

Lines changed: 266 additions & 7 deletions

File tree

docs/tutorials/python/activity.md

Lines changed: 106 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,9 @@
11
# Activity/Provenance
2-
[See the current available tutorial](../python_client.md#provenance)
32

4-
![Under Construction](../../assets/under_construction.png)
5-
6-
Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. Data analysis often involves multiple steps to go from a raw data file to a finished analysis. Synapse’s provenance tools allow users to keep track of each step involved in an analysis and share those steps with other users.
3+
Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. Data analysis often involves multiple steps to go from a raw data file to a finished analysis. Synapse's provenance tools allow users to keep track of each step involved in an analysis and share those steps with other users.
74

85
The model Synapse uses for provenance is based on the [W3C provenance spec](https://www.w3.org/TR/prov-n/) where items are derived from an activity which has components that were **used** and components that were **executed**. Think of the **used** items as input files and **executed** items as software or code. Both **used** and **executed** items can reside in Synapse or in URLs such as a link to a GitHub commit or a link to a specific version of a software tool.
96

10-
[Dive into Activity/Provenance further here](../../explanations/domain_models_of_synapse.md#activityprovenance)
117

128
## Tutorial Purpose
139
In this tutorial you will:
@@ -18,4 +14,108 @@ In this tutorial you will:
1814
1. Delete an activity
1915

2016
## Prerequisites
21-
- In order to follow this tutorial you will need to have a [Project](./project.md) created with at least one [File](./file.md) with multiple [Versions](./versions.md).
17+
- In order to follow this tutorial you will need to have a [Project](./project.md) created with a Folder named `biospecimen_experiment_1` containing at least one [File](./file.md). You will also need the Synapse ID of that file (e.g. `synNNNNN`).
18+
19+
## 1. Add a new Activity to your File
20+
21+
#### First retrieve the project, folder and a file is created within that folder to track provenance
22+
23+
```python
24+
--8<-- "docs/tutorials/python/tutorial_scripts/activity.py:retrieve_project_folder_file"
25+
```
26+
27+
#### Create an Activity and attach it to the file
28+
29+
An `Activity` captures what was **used** (input data and reference URLs) and **executed** (code and software) to produce a file. Here we record a QC pipeline run on the biospecimen data:
30+
31+
```python
32+
--8<-- "docs/tutorials/python/tutorial_scripts/activity.py:create_activity"
33+
```
34+
35+
<details class="example">
36+
<summary>You'll notice the output looks like:</summary>
37+
38+
```
39+
Stored file: fileA.txt (version 1) with activity: Quality Control Analysis
40+
```
41+
</details>
42+
43+
44+
## 2. Add a new Activity to a specific version of your File
45+
46+
Each time you store an updated file, Synapse creates a new version. You can associate a distinct activity with each version to capture the full history of how the data evolved. Here we record a downstream analysis step that used the QC-passed data from version 1:
47+
48+
```python
49+
--8<-- "docs/tutorials/python/tutorial_scripts/activity.py:add_activity_to_version"
50+
```
51+
52+
<details class="example">
53+
<summary>You'll notice the output looks like:</summary>
54+
55+
```
56+
Stored activity 'Downstream Analysis' on file fileA.txt (version 2)
57+
```
58+
</details>
59+
60+
61+
## 3. Print stored activities on your File
62+
63+
Use `Activity.from_parent()` to retrieve the provenance for any version of a file. Pass a `parent_version_number` to retrieve the activity for a specific older version:
64+
65+
```python
66+
--8<-- "docs/tutorials/python/tutorial_scripts/activity.py:print_activities"
67+
```
68+
69+
<details class="example">
70+
<summary>You'll notice the output looks like:</summary>
71+
72+
```
73+
Activity on latest version (v1):
74+
Name: Downstream Analysis
75+
Description: Downstream analysis of QC-passed biospecimen samples.
76+
Used: UsedURL(name='Seurat v5.0.0', url='https://github.com/satijalab/seurat/releases/tag/v5.0.0')
77+
Used: UsedEntity(target_id='syn12345678', target_version_number=1)
78+
Executed: UsedURL(name='Downstream Analysis Script', url='https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/downstream_analysis.py')
79+
80+
Activity on version 1:
81+
Name: Quality Control Analysis
82+
Description: Initial QC analysis of biospecimen data using the FastQC pipeline.
83+
```
84+
</details>
85+
86+
87+
## 4. Delete an activity
88+
89+
Deleting an activity is a two-step process: first call `disassociate_from_entity()` to remove the link between the activity and the file version, then call `delete()` to remove the activity record from Synapse entirely:
90+
91+
```python
92+
--8<-- "docs/tutorials/python/tutorial_scripts/activity.py:delete_activity"
93+
```
94+
95+
<details class="example">
96+
<summary>You'll notice the output looks like:</summary>
97+
98+
```
99+
Deleted activity from: fileA.txt (version 2)
100+
Activity after deletion: None
101+
```
102+
</details>
103+
104+
105+
## Source code for this tutorial
106+
107+
<details class="quote">
108+
<summary>Click to show me</summary>
109+
110+
```python
111+
--8<-- "docs/tutorials/python/tutorial_scripts/activity.py"
112+
```
113+
</details>
114+
115+
## References used in this tutorial
116+
117+
- [Activity][synapseclient.models.Activity]
118+
- [UsedEntity][synapseclient.models.UsedEntity]
119+
- [UsedURL][synapseclient.models.UsedURL]
120+
- [File][file-reference-sync]
121+
- [syn.login][synapseclient.Synapse.login]
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
"""
2+
Here is where you'll find the code for the Activity/Provenance tutorial.
3+
"""
4+
5+
# Step 1: Add a new Activity to your File
6+
# --8<-- [start:retrieve_project_folder_file]
7+
import os
8+
import tempfile
9+
10+
import synapseclient
11+
from synapseclient.models import Activity, File, Folder, Project, UsedEntity, UsedURL
12+
13+
syn = synapseclient.login()
14+
15+
# Set project and folder name that exists within the project
16+
PROJECT_NAME = "Dark Side Of The Moon"
17+
FOLDER_NAME = "biospecimen_experiment_1"
18+
19+
# Retrieve the project and folder IDs
20+
my_project_id = Project(name=PROJECT_NAME).get().id
21+
22+
biospecimen_experiment_1_folder = Folder(
23+
name=FOLDER_NAME, parent_id=my_project_id
24+
).get()
25+
26+
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as tmp:
27+
tmp.write("First biospecimen data - post-QC analysis results")
28+
tmp_path = tmp.name
29+
# Store a first version of the file in Synapse
30+
my_file = File(
31+
path=tmp_path,
32+
name="biospecimen_data.txt",
33+
parent_id=biospecimen_experiment_1_folder.id,
34+
)
35+
my_file.store()
36+
37+
# --8<-- [end:retrieve_project_folder_file]
38+
39+
# --8<-- [start:create_activity]
40+
# Create an Activity describing the analysis step that produced this file
41+
analysis_activity = Activity(
42+
name="Quality Control Analysis",
43+
description="Initial QC analysis of biospecimen data using the FastQC pipeline.",
44+
used=[
45+
UsedURL(
46+
name="FastQC v0.12.1",
47+
url="https://github.com/s-andrews/FastQC/releases/tag/v0.12.1",
48+
),
49+
UsedEntity(target_id=my_project_id),
50+
],
51+
executed=[
52+
UsedURL(
53+
name="QC Analysis Script",
54+
url="https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/qc_analysis.py",
55+
),
56+
],
57+
)
58+
59+
# Attach the activity to the file and store it
60+
my_file.activity = analysis_activity
61+
my_file = my_file.store()
62+
63+
first_version_number = my_file.version_number
64+
print(
65+
f"Stored file: {my_file.name} (version {first_version_number}) "
66+
f"with activity: {my_file.activity.name}"
67+
)
68+
# --8<-- [end:create_activity]
69+
70+
# --8<-- [start:add_activity_to_version]
71+
# Step 2: Add a new Activity to a specific version of your File
72+
# Each time you store an updated file, Synapse creates a new version.
73+
# You can track a different activity for each version to capture the
74+
# full history of what was done to produce each version of the file.
75+
76+
# Create a dummy file and upload it as a new version
77+
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as tmp:
78+
tmp.write("Updated biospecimen data - post-QC analysis results")
79+
tmp_path = tmp.name
80+
81+
updated_file = File(
82+
path=tmp_path,
83+
name="biospecimen_data.txt",
84+
parent_id=biospecimen_experiment_1_folder.id,
85+
)
86+
updated_file.store()
87+
second_version_number = updated_file.version_number
88+
89+
downstream_activity = Activity(
90+
name="Downstream Analysis",
91+
description="Downstream analysis of QC-passed biospecimen samples.",
92+
used=[
93+
UsedURL(
94+
name="Seurat v5.0.0",
95+
url="https://github.com/satijalab/seurat/releases/tag/v5.0.0",
96+
),
97+
UsedEntity(
98+
target_id=my_file.id,
99+
target_version_number=first_version_number,
100+
),
101+
],
102+
executed=[
103+
UsedURL(
104+
name="Downstream Analysis Script",
105+
url="https://github.com/Sage-Bionetworks/analysis-scripts/blob/v1.0/downstream_analysis.py",
106+
),
107+
],
108+
)
109+
110+
# Store the activity on the new version using Activity.store()
111+
downstream_activity.store(parent=updated_file)
112+
print(
113+
f"Stored activity '{downstream_activity.name}' on file "
114+
f"{updated_file.name} (version {second_version_number})"
115+
)
116+
# --8<-- [end:add_activity_to_version]
117+
118+
# --8<-- [start:print_activities]
119+
# Step 3: Print stored activities on your File
120+
# Retrieve and print the activity on the latest version of the file
121+
current_activity = Activity.from_parent(parent=my_file)
122+
print(f"\nActivity on latest version (v{my_file.version_number}):")
123+
print(f" Name: {current_activity.name}")
124+
print(f" Description: {current_activity.description}")
125+
for item in current_activity.used:
126+
print(f" Used: {item}")
127+
for item in current_activity.executed:
128+
print(f" Executed: {item}")
129+
130+
# Retrieve and print the activity for the first version
131+
first_activity = Activity.from_parent(
132+
parent=my_file,
133+
parent_version_number=first_version_number,
134+
)
135+
print(f"\nActivity on version {first_version_number}:")
136+
print(f" Name: {first_activity.name}")
137+
print(f" Description: {first_activity.description}")
138+
# --8<-- [end:print_activities]
139+
140+
# --8<-- [start:delete_activity]
141+
# Step 4: Delete an activity
142+
# Deleting an activity disassociates it from the entity and removes it from
143+
# Synapse once it is no longer referenced by any entity.
144+
145+
current_activity.disassociate_from_entity(parent=updated_file)
146+
current_activity.delete(parent=updated_file)
147+
print(
148+
f"\nDeleted activity from: {updated_file.name} (version {updated_file.version_number})"
149+
)
150+
151+
# Verify the activity was removed
152+
deleted_activity = Activity.from_parent(
153+
parent=updated_file, parent_version_number=updated_file.version_number
154+
)
155+
print(f"Activity after deletion: {deleted_activity}")
156+
# --8<-- [end:delete_activity]

mkdocs.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ nav:
3131
- Submission: tutorials/python/submission.md
3232
- Annotation: tutorials/python/annotation.md
3333
# - Versions: tutorials/python/versions.md
34-
# - Activity/Provenance: tutorials/python/activity.md
34+
- Activity/Provenance: tutorials/python/activity.md
3535
- Entity View: tutorials/python/entityview.md
3636
- Table: tutorials/python/table.md
3737
# - Using a Table: tutorials/python/table_crud.md
@@ -261,6 +261,9 @@ markdown_extensions:
261261

262262
- markdown_include.include:
263263

264+
- pymdownx.snippets:
265+
base_path: ["."]
266+
264267
- toc:
265268
permalink: true
266269
- attr_list

0 commit comments

Comments
 (0)