Skip to content

Commit abc48d5

Browse files
committed
reformat
1 parent de20094 commit abc48d5

7 files changed

Lines changed: 391 additions & 202 deletions

File tree

docs/tutorials/python/migration.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Migrating Files to a New Storage Location
2+
3+
Storage location migration lets you move files from one Synapse storage location
4+
to another — for example, from Synapse-managed S3 (`SYNAPSE_S3`) to your own
5+
S3 bucket (`EXTERNAL_S3`). The process is intentionally two-phase so you can
6+
review exactly what will be moved before committing to the transfer.
7+
8+
This tutorial demonstrates how to index a folder's files and then migrate them
9+
to a new storage location using the Python client.
10+
11+
[Read more about Custom Storage Locations](https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html)
12+
[Read more about setting up storage location](./storage_location.md)
13+
14+
## Tutorial Purpose
15+
16+
In this tutorial you will:
17+
18+
1. Set up and get a project and folder
19+
2. Index files in a folder for migration to a destination storage location
20+
3. Review the index results CSV
21+
4. Migrate the indexed files
22+
5. Review the migration results CSV
23+
24+
## Prerequisites
25+
26+
* Make sure that you have completed the [Installation](../installation.md) and
27+
[Authentication](../authentication.md) setup.
28+
* You must have a [Project](./project.md) and a destination storage location
29+
already created. See the [Storage Locations tutorial](./storage_location.md).
30+
* Migration is currently supported **only** between S3 storage locations
31+
(`SYNAPSE_S3` and `EXTERNAL_S3`) that reside in the **same AWS region**.
32+
33+
## How Migration Works
34+
35+
Migration is a two-phase process:
36+
37+
1. **Index** — scan the project or folder and record every file that needs to
38+
move into a local SQLite database.
39+
2. **Migrate** — read the index database and copy each file to the destination
40+
storage location, updating the entity's file handle.
41+
42+
Separating the phases lets you inspect what will be migrated before committing
43+
to the move.
44+
45+
> **Warning:** Migration modifies existing entities. Always run against a test
46+
> project first and review the index results before migrating production data.
47+
48+
## 1. Set up and get project
49+
50+
```python
51+
{!docs/tutorials/python/tutorial_scripts/migration.py!lines:"start:setup":"end:setup"}
52+
```
53+
54+
## 2. Index and migrate files
55+
56+
Phase 1 scans the folder and records all files that need to move. The result is
57+
a `MigrationResult` whose `db_path` points to the local SQLite database. Use
58+
`as_csv` to export the index for review before proceeding.
59+
60+
Phase 2 reads the index database and performs the actual migration, returning
61+
another `MigrationResult`. Set `continue_on_error=True` to record failures in
62+
the database rather than aborting. Set `force=True` to skip the interactive
63+
confirmation prompt.
64+
65+
```python
66+
{!docs/tutorials/python/tutorial_scripts/migration.py!lines:"start:index_and_migrate_files":"end:migrate_indexed_files"}
67+
```
68+
69+
Review the index CSV to confirm what was discovered before migration runs:
70+
71+
![indexresults](./tutorial_screenshots/index_results.png)
72+
73+
After migration, inspect the results CSV for status details and any errors.
74+
Detailed tracebacks are saved in the exception column of the CSV:
75+
76+
![migrationresults](./tutorial_screenshots/migration_results.png)
77+
78+
## Source code for this tutorial
79+
80+
<details class="quote">
81+
<summary>Click to show me</summary>
82+
83+
```python
84+
{!docs/tutorials/python/tutorial_scripts/migration.py!}
85+
```
86+
</details>
87+
88+
## References used in this tutorial
89+
90+
- [Folder][synapseclient.models.Folder]
91+
- [Project][synapseclient.models.Project]
92+
- [FailureStrategy][synapseclient.models.FailureStrategy]
93+
- [MigrationResult][synapseclient.models.services.MigrationResult]
94+
- [syn.login][synapseclient.Synapse.login]
95+
- [Custom Storage Locations Documentation](https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html)
96+
97+
## See also
98+
99+
- [Storage Location Tutorial](./storage_location.md) — How to create and manage storage locations
100+
- [Storage Location Architecture](../../explanations/storage_location_architecture.md) — In-depth architecture diagrams and design documentation
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Proxy Storage Locations in Synapse
2+
3+
A proxy storage location delegates file access to a proxy server that controls
4+
authentication and access to the underlying storage. Synapse stores only the
5+
metadata; the proxy server handles the actual file retrieval.
6+
7+
This tutorial demonstrates how to create a proxy storage location, register a
8+
file via a `ProxyFileHandle`, and associate it with a Synapse File entity.
9+
10+
[Read more about Custom Storage Locations](https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html)
11+
12+
## Tutorial Purpose
13+
14+
In this tutorial you will:
15+
16+
1. Set up and get a project
17+
2. Create a proxy storage location and assign it to a folder
18+
3. Register a file by creating a `ProxyFileHandle` via the REST API
19+
4. Associate the `ProxyFileHandle` with a Synapse File entity
20+
21+
## Prerequisites
22+
23+
* Make sure that you have completed the [Installation](../installation.md) and
24+
[Authentication](../authentication.md) setup.
25+
* You must have a [Project](./project.md) created and replace the one used in
26+
this tutorial.
27+
* A running proxy server with a shared secret key. See the
28+
[Synapse Proxy Storage documentation](https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html)
29+
for proxy server requirements.
30+
31+
## 1. Set up and get project
32+
33+
```python
34+
{!docs/tutorials/python/tutorial_scripts/proxy_storage_location.py!lines:"start:setup":"end:setup"}
35+
```
36+
37+
## 2. Create a proxy storage location
38+
39+
Create a `StorageLocation` of type `PROXY`, providing your proxy server URL and
40+
the shared secret key. Setting `benefactor_id` to the project or folder ensures that
41+
access control is inherited from the project or folder. Assign it to a folder so that
42+
files uploaded there are served through the proxy.
43+
44+
```python
45+
{!docs/tutorials/python/tutorial_scripts/proxy_storage_location.py!lines:"start:create_proxy_storage_location":"end:create_proxy_storage_location"}
46+
```
47+
48+
<details class="example">
49+
<summary>You'll notice the output looks like:</summary>
50+
51+
```
52+
Created proxy storage location: 12345
53+
Proxy URL: https://my-proxy-server.example.com
54+
Benefactor ID: syn123456
55+
```
56+
</details>
57+
58+
## 3. Register a file via ProxyFileHandle
59+
60+
Files in proxy storage are **not** uploaded through the UI or Python client. Instead, you
61+
register a file that already exists on the proxy server by posting a
62+
`ProxyFileHandle` to the Synapse file service. You provide the file's MD5,
63+
size, and the relative path used by the proxy to serve it.
64+
65+
```python
66+
{!docs/tutorials/python/tutorial_scripts/proxy_storage_location.py!lines:"start:create_proxy_file_handle":"end:create_proxy_file_handle"}
67+
```
68+
69+
<details class="example">
70+
<summary>You'll notice the output looks like:</summary>
71+
72+
```
73+
{"id": ..., "etag":..., ..., "filePath":...}
74+
```
75+
</details>
76+
77+
## 4. Associate the ProxyFileHandle with a File entity
78+
79+
Create a `File` entity using the `data_file_handle_id` returned above. Synapse
80+
stores the metadata and uses the `ProxyFileHandle` to serve downloads through
81+
your proxy server.
82+
83+
```python
84+
{!docs/tutorials/python/tutorial_scripts/proxy_storage_location.py!lines:"start:associate_proxy_file_handle":"end:associate_proxy_file_handle"}
85+
```
86+
87+
## Source code for this tutorial
88+
89+
<details class="quote">
90+
<summary>Click to show me</summary>
91+
92+
```python
93+
{!docs/tutorials/python/tutorial_scripts/proxy_storage_location.py!}
94+
```
95+
</details>
96+
97+
## References used in this tutorial
98+
99+
- [StorageLocation][synapseclient.models.StorageLocation]
100+
- [StorageLocationType][synapseclient.models.StorageLocationType]
101+
- [Folder][synapseclient.models.Folder]
102+
- [File][synapseclient.models.File]
103+
- [Project][synapseclient.models.Project]
104+
- [syn.login][synapseclient.Synapse.login]
105+
- [Custom Storage Locations Documentation](https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html)
106+
107+
## See also
108+
109+
- [Storage Locations Tutorial](./storage_location.md) — How to create and manage all storage location types
110+
- [Storage Location Architecture](../../explanations/storage_location_architecture.md) — In-depth architecture diagrams and design documentation

docs/tutorials/python/storage_location.md

Lines changed: 25 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -72,28 +72,10 @@ relevant to its type are populated:
7272

7373
Common attributes are: `concrete_type`, `storage_location_id`, `storage_type`, `upload_type`, `banner`, `description`, `etag`, `created_on`, `created_by`
7474

75-
## Data Migration Between Storage Locations
76-
77-
Files in a project or folder can be migrated from one storage location to another using
78-
`index_files_for_migration` followed by `migrate_indexed_files`. Migration is
79-
currently supported **only** between S3 storage locations (both Synapse-managed
80-
`SYNAPSE_S3` and external `EXTERNAL_S3`) that reside in the **same AWS
81-
region**.
82-
83-
Migration is a two-phase process:
84-
85-
1. **Index** — scan the project/folder and record every file that needs to move into a
86-
local SQLite database.
87-
2. **Migrate** — read the index database and move each file to the destination
88-
storage location.
89-
90-
Separating the phases lets you review what will be migrated before committing
91-
to the move.
92-
9375
## 1. Set up and get project
9476

9577
```python
96-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=4-18}
78+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:setup":"end:setup"}
9779
```
9880

9981
## 2. Create an external S3 storage location
@@ -103,7 +85,7 @@ properly configured with an `owner.txt` file. Synapse will transfer data
10385
directly to and from this bucket on the user's behalf.
10486

10587
```python
106-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=20-33}
88+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_s3_storage_location":"end:create_s3_storage_location"}
10789
```
10890

10991
<details class="example">
@@ -121,24 +103,32 @@ Create a folder and assign it the S3 storage location. All files uploaded into
121103
this folder will be stored in your S3 bucket.
122104

123105
```python
124-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=39-51}
106+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_folder_with_s3_storage_location":"end:create_folder_with_s3_storage_location]"}
125107
```
126108

109+
<details class="example">
110+
<summary>You'll notice the output looks like:</summary>
111+
112+
```
113+
ProjectSetting(id=..., project_id=..., settings_type='upload', locations=[...], concrete_type='org.sagebionetworks.repo.model.project.UploadDestinationListSetting', etag='...')
114+
```
115+
</details>
116+
127117
## 4. Create a Google Cloud Storage location
128118

129119
Create a storage location backed by a Google Cloud Storage bucket and assign it
130120
to a folder.
131121

132122
```python
133-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=54-75}
123+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_gcs_storage_location":"end:create_gcs_storage_location"}
134124
```
135125

136126
## 5. Create an SFTP storage location
137127

138128
SFTP storage locations point to an external SFTP server, where files are stored outside of Synapse. Synapse only manages the metadata and does not handle the file transfer itself. This setup requires the pysftp package, and files must be uploaded separately through the **client** once configured.
139129

140130
```python
141-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=78-102}
131+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_sftp_storage_location":"end:create_sftp_storage_location"}
142132
```
143133

144134
## 6. Create an HTTPS storage location
@@ -148,7 +138,7 @@ used when the external server is accessed over HTTPS. Note that the Python
148138
client does NOT support uploading files to HTTPS storage locations directly yet. To add files, use the Synapse REST API directly.
149139

150140
```python
151-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=107-128}
141+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_https_storage_location":"end:create_https_storage_location"}
152142
```
153143

154144
## 7. Create an External Object Store storage location
@@ -158,23 +148,21 @@ accessed by Synapse. Unlike `EXTERNAL_S3`, the Python client transfers data
158148
directly to the object store using locally configured AWS credentials —
159149
Synapse is never involved in the data transfer, only in storing the metadata.
160150

161-
You can add a profile to work with s3 in `~/.synapseConfig`
151+
Configure your AWS credentials using any method supported by the AWS SDK
152+
(environment variables, `~/.aws/credentials`, IAM roles, etc.). See the
153+
[AWS documentation on credential configuration](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
154+
for details.
155+
156+
Once credentials are configured, add a matching profile section to `~/.synapseConfig`
157+
so the client knows which profile to use for a given endpoint and bucket:
162158

163-
Add a section matching your endpoint+bucket URL:
164159
```
165160
[https://s3.us-east-1.amazonaws.com/test-external-object-store]
166161
profile_name = my-s3-profile
167162
```
168-
Then ensure my-s3-profile exists in `~/.aws/config` with valid keys:
169-
170-
```
171-
[my-s3-profile]
172-
aws_access_key_id = ...
173-
aws_secret_access_key = ...
174-
```
175163

176164
```python
177-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=135-164}
165+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_object_store_storage_location":"end:create_object_store_storage_location"}
178166
```
179167

180168
## 8. Create a Proxy storage location
@@ -184,7 +172,7 @@ authentication and access to the underlying storage. Files are registered by
184172
creating a `ProxyFileHandle` via the REST API. Then, files can be uploaded via store function with data_file_handle_id.
185173

186174
```python
187-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=168-226}
175+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:create_proxy_storage_location":"end:create_proxy_storage_location"}
188176
```
189177

190178
## 9. Retrieve and inspect storage location settings
@@ -193,7 +181,7 @@ You can retrieve a storage location by ID. Only fields relevant to the storage
193181
type are populated.
194182

195183
```python
196-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=230-236}
184+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:retrieve_storage_location":"end:retrieve_storage_location"}
197185
```
198186

199187
<details class="example">
@@ -214,42 +202,9 @@ creation. To "update" a storage location, create a new one with the desired
214202
settings and reassign it to the folder or project.
215203

216204
```python
217-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=245-280}
218-
```
219-
220-
## 11. Index and migrate files to a new storage location
221-
222-
> **Warning:** This will migrate files associated with the folder. Run against a
223-
> test project first and review the index result before migrating production data.
224-
225-
Phase 1. indexes all files that need to move into a local SQLite database. This will return a MigrationResults object. You can use the `as_csv` to check the details of indexing status.
226-
227-
```python
228-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=288-298}
205+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines:"start:update_storage_location":"end:update_storage_location"}
229206
```
230207

231-
Index results can be checked in the index results csv
232-
![indexresults](./tutorial_screenshots/index_results.png)
233-
234-
Phase 2. reads that database and performs the actual migration. This will return a MigrationResults object. You can use the `as_csv` to check the details of migration status and errors if any.
235-
236-
```python
237-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=300-310}
238-
```
239-
240-
Currently, detailed Traceback is saved in the exception columns of the csv.
241-
![migrationresults](./tutorial_screenshots/migration_results.png)
242-
243-
## Source code for this tutorial
244-
245-
<details class="quote">
246-
<summary>Click to show me</summary>
247-
248-
```python
249-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!}
250-
```
251-
</details>
252-
253208
## References used in this tutorial
254209

255210
- [StorageLocation][synapseclient.models.StorageLocation]

0 commit comments

Comments
 (0)