Skip to content

Commit 72b5db5

Browse files
committed
update docs
1 parent 030d812 commit 72b5db5

4 files changed

Lines changed: 59 additions & 41 deletions

File tree

docs/explanations/storage_location_architecture.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@ relationships, and data flows that enable flexible storage configuration.
88

99
## On This Page
1010

11-
<div class="grid cards" markdown>
12-
1311
- **[Domain Model](#domain-model)**
1412

1513
Core classes, enums, and their relationships
@@ -34,7 +32,6 @@ relationships, and data flows that enable flexible storage configuration.
3432

3533
Two-phase file migration process
3634

37-
</div>
3835

3936
---
4037

@@ -181,7 +178,7 @@ classDiagram
181178

182179
<br>
183180

184-
## Storage Type Mapping (TODO: double checking if EXTERNAL_HTTP works as expected)
181+
## Storage Type Mapping
185182

186183
Each `StorageLocationType` maps to a specific REST API `concreteType` and has a
187184
default `UploadType`. This mapping allows the system to parse
@@ -194,6 +191,7 @@ flowchart LR
194191
EXTERNAL_S3["EXTERNAL_S3"]
195192
EXTERNAL_GOOGLE_CLOUD["EXTERNAL_GOOGLE_CLOUD"]
196193
EXTERNAL_SFTP["EXTERNAL_SFTP"]
194+
EXTERNAL_HTTPS["EXTERNAL_HTTPS"]
197195
EXTERNAL_OBJECT_STORE["EXTERNAL_OBJECT_STORE"]
198196
PROXY["PROXY"]
199197
end
@@ -212,12 +210,14 @@ flowchart LR
212210
GCS["GOOGLECLOUDSTORAGE"]
213211
SFTP["SFTP"]
214212
HTTPS["HTTPS"]
213+
PROXYLOCAL["PROXYLOCAL"]
215214
end
216215
217216
SYNAPSE_S3 --> S3SLS --> S3
218217
EXTERNAL_S3 --> ExtS3SLS --> S3
219218
EXTERNAL_GOOGLE_CLOUD --> ExtGCSSLS --> GCS
220219
EXTERNAL_SFTP --> ExtSLS --> SFTP
220+
EXTERNAL_HTTPS --> ExtSLS --> HTTPS
221221
EXTERNAL_OBJECT_STORE --> ExtObjSLS --> S3
222222
PROXY --> ProxySLS --> HTTPS
223223
```
@@ -244,11 +244,11 @@ Different storage types support different configuration attributes:
244244
| `stsEnabled` | boolean |||||||
245245
| `bucket` | string || ✓ (required) | ✓ (required) || ✓ (required) ||
246246
| `endpointUrl` | string ||| ✓ (required) ||||
247-
| `url` | string |||||||
247+
| `url` | string ||||(required) |||
248248
| `supportsSubfolders` | boolean |||||||
249-
| `proxyUrl` | string |||||||
250-
| `secretKey` | string |||||||
251-
| `benefactorId` | string |||||||
249+
| `proxyUrl` | string ||||||(required) |
250+
| `secretKey` | string ||||||(required) |
251+
| `benefactorId` | string ||||||(required) |
252252

253253
## Summary by type
254254

@@ -257,9 +257,9 @@ Different storage types support different configuration attributes:
257257
| **S3StorageLocationSetting** | Default Synapse storage on Amazon S3. | `baseKey`, `stsEnabled` |
258258
| **ExternalS3StorageLocationSetting** | External S3 bucket connected with Synapse (Synapse-accessed). | `bucket` (required), `baseKey`, `stsEnabled`, `endpointUrl` |
259259
| **ExternalObjectStorageLocationSetting** | S3-compatible object storage **not** accessed by Synapse. | `bucket` (required), `endpointUrl` (required) |
260-
| **ExternalStorageLocationSetting** | SFTP or HTTPS upload destination. | `url`, `supportsSubfolders` |
260+
| **ExternalStorageLocationSetting** | SFTP or HTTPS upload destination. | `url` (required), `supportsSubfolders` |
261261
| **ExternalGoogleCloudStorageLocationSetting** | External Google Cloud Storage bucket connected with Synapse. | `bucket` (required), `baseKey` |
262-
| **ProxyStorageLocationSettings** | HTTPS proxy for all upload/download operations. | `proxyUrl`, `secretKey`, `benefactorId` |
262+
| **ProxyStorageLocationSettings** | HTTPS proxy for all upload/download operations. | `proxyUrl` (required), `secretKey` (required), `benefactorId` (required) |
263263

264264

265265
<br>
@@ -274,7 +274,7 @@ flowchart TB
274274
Start -->|No| DEFAULT[Use default Synapse storage]
275275
Start -->|Yes| Q1{Want Synapse to<br/>manage storage?}
276276
277-
Q1 -->|Yes| SYNAPSE_S3[Use SYNAPSE_S3]
277+
Q1 -->|Yes| DEFAULT[Use default Synapse storage]
278278
Q1 -->|No| Q2{What storage<br/>backend?}
279279
280280
Q2 -->|AWS S3| Q3{Synapse accesses<br/>bucket directly?}
@@ -785,6 +785,7 @@ sequenceDiagram
785785
MigrateFn->>DB: Update row status to MIGRATED/ERRORED
786786
end
787787
788+
end
788789
789790
MigrateFn-->>Entity: MigrationResult (migrated counts)
790791
deactivate MigrateFn

docs/tutorials/python/storage_location.md

Lines changed: 47 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ In this tutorial you will:
2020
5. Create an External Object Store location and assign it to a folder
2121
6. Create a Proxy storage location, register a proxy file handle, and assign it to a folder
2222
7. Retrieve and inspect storage location settings
23-
8. Index and migrate files to a new storage location
23+
8. Update a storage location (create a replacement and reassign)
24+
9. Index and migrate files to a new storage location
2425

2526
## Prerequisites
2627

@@ -46,7 +47,7 @@ Synapse supports several types of storage locations:
4647
an `owner.txt` file in the bucket to verify ownership.
4748
- **EXTERNAL_GOOGLE_CLOUD**: User-owned Google Cloud Storage bucket
4849
- **EXTERNAL_SFTP**: External SFTP server
49-
- **EXTERNAL_HTTPS**: External HTTPS server (uploading via client is not
50+
- **EXTERNAL_HTTPS**: External HTTPS server (uploading via client is **not**
5051
supported right now.)
5152
- **EXTERNAL_OBJECT_STORE**: An S3-compatible store (e.g., MinIO, OpenStack
5253
Swift) that Synapse does **not** access. The client transfers data directly
@@ -62,25 +63,26 @@ relevant to its type are populated:
6263

6364
| Type | Key fields |
6465
|------|-----------|
65-
| `EXTERNAL_S3` | `bucket`, `base_key` |
66+
| `SYNAPSE_S3` | `base_key`, `sts_enabled` |
67+
| `EXTERNAL_S3` | `bucket`, `base_key`, `sts_enabled`, `endpoint_url` |
6668
| `EXTERNAL_GOOGLE_CLOUD` | `bucket`, `base_key` |
6769
| `EXTERNAL_SFTP` / `EXTERNAL_HTTPS` | `url`, `supports_subfolders` |
6870
| `EXTERNAL_OBJECT_STORE` | `bucket`, `endpoint_url` |
6971
| `PROXY` | `proxy_url`, `secret_key`, `benefactor_id` |
7072

71-
Common attributes are: concrete_type, storage_location_id, storage_type, upload_type, banner, description, etag, created_on, created_by
73+
Common attributes are: `concrete_type`, `storage_location_id`, `storage_type`, `upload_type`, `banner`, `description`, `etag`, `created_on`, `created_by`
7274

7375
## Data Migration Between Storage Locations
7476

75-
Files in a folder can be migrated from one storage location to another using
77+
Files in a project or folder can be migrated from one storage location to another using
7678
`index_files_for_migration` followed by `migrate_indexed_files`. Migration is
77-
currently supported only between S3 storage locations (both Synapse-managed
79+
currently supported **only** between S3 storage locations (both Synapse-managed
7880
`SYNAPSE_S3` and external `EXTERNAL_S3`) that reside in the **same AWS
7981
region**.
8082

8183
Migration is a two-phase process:
8284

83-
1. **Index** — scan the folder and record every file that needs to move into a
85+
1. **Index** — scan the project/folder and record every file that needs to move into a
8486
local SQLite database.
8587
2. **Migrate** — read the index database and move each file to the destination
8688
storage location.
@@ -91,7 +93,7 @@ to the move.
9193
## 1. Set up and get project
9294

9395
```python
94-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=4-15}
96+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=4-18}
9597
```
9698

9799
## 2. Create an external S3 storage location
@@ -101,7 +103,7 @@ properly configured with an `owner.txt` file. Synapse will transfer data
101103
directly to and from this bucket on the user's behalf.
102104

103105
```python
104-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=17-30}
106+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=20-33}
105107
```
106108

107109
<details class="example">
@@ -119,7 +121,7 @@ Create a folder and assign it the S3 storage location. All files uploaded into
119121
this folder will be stored in your S3 bucket.
120122

121123
```python
122-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=32-40}
124+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=39-51}
123125
```
124126

125127
## 4. Create a Google Cloud Storage location
@@ -128,27 +130,25 @@ Create a storage location backed by a Google Cloud Storage bucket and assign it
128130
to a folder.
129131

130132
```python
131-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=42-62}
133+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=54-75}
132134
```
133135

134136
## 5. Create an SFTP storage location
135137

136-
SFTP storage locations point to an external SFTP server. Files are not
137-
transferred through Synapse — Synapse only stores metadata. Requires the
138-
`pysftp` package.
138+
SFTP storage locations point to an external SFTP server, where files are stored outside of Synapse. Synapse only manages the metadata and does not handle the file transfer itself. This setup requires the pysftp package, and files must be uploaded separately through the **client** once configured.
139139

140140
```python
141-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=64-87}
141+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=78-102}
142142
```
143143

144144
## 6. Create an HTTPS storage location
145145

146146
`EXTERNAL_HTTPS` uses the same underlying API type as `EXTERNAL_SFTP` but is
147147
used when the external server is accessed over HTTPS. Note that the Python
148-
client does NOT support uploading files to HTTPS storage locations directly yet.
148+
client does NOT support uploading files to HTTPS storage locations directly yet. To add files, use the Synapse REST API directly.
149149

150150
```python
151-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=89-111}
151+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=107-128}
152152
```
153153

154154
## 7. Create an External Object Store storage location
@@ -158,21 +158,23 @@ accessed by Synapse. Unlike `EXTERNAL_S3`, the Python client transfers data
158158
directly to the object store using locally configured AWS credentials —
159159
Synapse is never involved in the data transfer, only in storing the metadata.
160160

161-
You can add a profile to work with s3 in ~/.synapseConfig
161+
You can add a profile to work with s3 in `~/.synapseConfig`
162162

163163
Add a section matching your endpoint+bucket URL:
164-
164+
```
165165
[https://s3.us-east-1.amazonaws.com/test-external-object-store]
166166
profile_name = my-s3-profile
167+
```
168+
Then ensure my-s3-profile exists in `~/.aws/config` with valid keys:
167169

168-
Then ensure my-s3-profile exists in ~/.aws/credentials with valid keys:
169-
170+
```
170171
[my-s3-profile]
171172
aws_access_key_id = ...
172173
aws_secret_access_key = ...
174+
```
173175

174176
```python
175-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=113-139}
177+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=135-164}
176178
```
177179

178180
## 8. Create a Proxy storage location
@@ -182,7 +184,7 @@ authentication and access to the underlying storage. Files are registered by
182184
creating a `ProxyFileHandle` via the REST API. Then, files can be uploaded via store function with data_file_handle_id.
183185

184186
```python
185-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=141-194}
187+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=168-226}
186188
```
187189

188190
## 9. Retrieve and inspect storage location settings
@@ -191,7 +193,7 @@ You can retrieve a storage location by ID. Only fields relevant to the storage
191193
type are populated.
192194

193195
```python
194-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=196-204}
196+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=230-236}
195197
```
196198

197199
<details class="example">
@@ -205,24 +207,39 @@ Base key: synapse-data
205207
```
206208
</details>
207209

208-
## 10. Index and migrate files to a new storage location
210+
## 10. Update a storage location
211+
212+
Storage locations are immutable — individual fields cannot be edited after
213+
creation. To "update" a storage location, create a new one with the desired
214+
settings and reassign it to the folder or project.
215+
216+
```python
217+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=245-280}
218+
```
219+
220+
## 11. Index and migrate files to a new storage location
209221

210222
> **Warning:** This will migrate files associated with the folder. Run against a
211223
> test project first and review the index result before migrating production data.
212224
213-
Phase 1 indexes all files that need to move into a local SQLite database. This will return a MigrationResults object. You can use the `as_csv` to check the details of indexing status.
225+
Phase 1. indexes all files that need to move into a local SQLite database. This will return a MigrationResults object. You can use the `as_csv` to check the details of indexing status.
214226

215227
```python
216-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=214-221}
228+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=288-298}
217229
```
218-
Phase 2 reads that database and performs the actual migration. This will return a MigrationResults object. You can use the `as_csv` to check the details of migration status and errors if any.
230+
231+
Index results can be checked in the index results csv
232+
![indexresults](./tutorial_screenshots/index_results.png)
233+
234+
Phase 2. reads that database and performs the actual migration. This will return a MigrationResults object. You can use the `as_csv` to check the details of migration status and errors if any.
235+
219236
```python
220-
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=224-234}
237+
{!docs/tutorials/python/tutorial_scripts/storage_location.py!lines=300-310}
221238
```
239+
222240
Currently, detailed Traceback is saved in the exception columns of the csv.
223241
![migrationresults](./tutorial_screenshots/migration_results.png)
224242

225-
226243
## Source code for this tutorial
227244

228245
<details class="quote">
98.9 KB
Loading
11.3 KB
Loading

0 commit comments

Comments
 (0)