Skip to content

Commit 5c87093

Browse files
authored
Merge branch 'develop' into SYNPY-1802
2 parents 8b04266 + 532ceb3 commit 5c87093

20 files changed

Lines changed: 4551 additions & 107 deletions

File tree

docs/CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Reference markdown files use `::: synapseclient.ClassName` syntax to trigger aut
3333
- `filters: ["!^_", "!to_synapse_request", "!fill_from_dict"]` — private members, `to_synapse_request()`, and `fill_from_dict()` are excluded from docs
3434
- `inherited_members: true` — shows mixin methods on inheriting classes
3535
- Member lists are explicit — each reference page specifies which methods to document
36+
- When adding a new public method to a model class, add it to the `members:` list in the corresponding reference pages (`docs/reference/experimental/sync/` and `docs/reference/experimental/async/`). Without this, mkdocstrings won't generate an anchor and cross-references like `[synapseclient.models.ClassName.method]` will break.
3637

3738
### Anchor links for cross-referencing
3839
Pattern: `[](){ #reference-anchor }` in reference pages. Tutorials link to reference via `[API Reference][project-reference-sync]`. Explicit type hints use: `[syn.login][synapseclient.Synapse.login]`.

docs/explanations/manifest_csv.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Manifest CSV
2+
3+
The manifest is a CSV file with file locations and metadata used to bulk upload and download files in Synapse. It is the standard manifest format used by `Project.sync_from_synapse`, `Project.sync_to_synapse`, `Folder.sync_from_synapse`, `Folder.sync_to_synapse`, the Synapse UI download cart, and the `synapse get-download-list` CLI command.
4+
5+
!!! note
6+
This CSV manifest replaces the legacy TSV manifest produced by `synapseutils.syncFromSynapse`. The `syncFromSynapse` and `syncToSynapse` utility functions are deprecated and will be removed in v5.0.0. Use `Project.sync_from_synapse` / `Folder.sync_from_synapse` and `Project.sync_to_synapse` / `Folder.sync_to_synapse` instead. See the [legacy TSV manifest documentation](manifest_tsv.md) for details on the old format.
7+
8+
## Manifest file format
9+
10+
The format of the manifest file is a comma-separated value (CSV) file with one row per file and columns describing the file. The minimum required columns for uploading are **path** and **parentId**, where `path` is the local file path and `parentId` is the Synapse ID of the project or folder where the file is uploaded to. Values that contain commas are automatically quoted (e.g., `"hello, world"`).
11+
12+
### Required fields for upload
13+
14+
| Field | Meaning | Example |
15+
|----------|----------------------------|-------------------------|
16+
| path | local file path or URL | /path/to/local/file.txt |
17+
| parentId | Synapse ID of parent | syn1235 |
18+
19+
!!! note
20+
The legacy TSV manifest used a column named `parent`. The CSV manifest uses `parentId` instead, which is consistent with the Synapse REST API field name. If you are migrating an existing TSV manifest to CSV, rename the `parent` column to `parentId`.
21+
22+
### Standard fields
23+
24+
These columns are recognized by `sync_to_synapse` and have specific meaning. Any of these columns may be present in the manifest but only `path` and `parentId` are required for upload.
25+
Each of these are individual examples and is what you would find in a row in each of these columns. To clarify, "syn1235;/path/to_local/file.txt" below states that you would like both "syn1235" and "/path/to_local/file.txt" added as items used to generate a file. You can also specify one item by specifying "syn1234"
26+
27+
| Field | Meaning | Example |
28+
|---------------------|--------------------------------------------|----------------------------------------------|
29+
| path | local file path or URL | /path/to/local/file.txt |
30+
| parentId | Synapse ID of parent container | syn1235 |
31+
| ID | Synapse entity ID | syn2345 |
32+
| name | name of file in Synapse | Example_file |
33+
| synapseStore | whether to upload the file | True |
34+
| contentType | content type of file to overwrite defaults | text/html |
35+
| forceVersion | whether to update version | False |
36+
| activityName | name of activity in provenance | Ran normalization |
37+
| activityDescription | text description of what was done | Ran algorithm xyz with parameters... |
38+
| used | list of items used to generate file | syn1235;/path/to_local/file.txt |
39+
| executed | list of items executed | https://github.org/;/path/to_local/code.py |
40+
41+
### Metadata fields (ignored during upload)
42+
43+
These columns are present in manifests generated by the Synapse UI download cart and `synapse get-download-list` CLI. They are ignored by `sync_to_synapse` and are **not** treated as annotations.
44+
45+
| Field | Meaning |
46+
|-------------------|-------------------------------|
47+
| error | any error in downloading file |
48+
| versionNumber | version of the file |
49+
| dataFileSizeBytes | size of the file in bytes |
50+
| createdBy | user who created the file |
51+
| createdOn | date the file was created |
52+
| modifiedBy | user who last modified |
53+
| modifiedOn | date last modified |
54+
| synapseURL | URL to the file in Synapse |
55+
| dataFileMD5Hex | MD5 hash of the file |
56+
57+
### Annotations
58+
59+
Any columns that are not in the standard or metadata fields described above will be interpreted as annotations of the file.
60+
61+
Adding annotations to each row:
62+
63+
| path | parentId | annot1 | annot2 | annot3 | annot4 | annot5 |
64+
| --- | --- | --- | --- | --- | --- | --- |
65+
| /path/file1.txt | syn1243 | bar | 3.1415 | "aaaa, bbbb" | "[14,27,30]" | "Annotation, with a comma" |
66+
| /path/file2.txt | syn12433 | baz | 2.71 | value_1 | "[1,2,3]" | test 123 |
67+
| /path/file3.txt | syn12455 | zzz | 3.52 | value_3 | "[42,56,77]" | a single annotation |
68+
69+
#### Multiple values of annotations per key
70+
71+
Using multiple values for a single annotation should be used sparingly as it makes it more
72+
difficult for you to manage the data. However, it is supported.
73+
74+
**Annotations can be comma `,` separated lists surrounded by brackets `[]`.**
75+
76+
Because the manifest is a CSV file, multi-value annotations that contain commas are automatically quoted. For example, `[a,b,c]` will appear in the CSV as `"[a,b,c]"`.
77+
78+
This is an annotation with 3 values:
79+
80+
| path | parentId | annot1 |
81+
|-----------------|----------|--------------|
82+
| /path/file1.txt | syn1243 | "[a,b,c]" |
83+
84+
85+
86+
### Dates in the manifest file
87+
88+
Dates within the manifest file will always be written as [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format in UTC without milliseconds. For example: `2023-12-20T16:55:08Z`.
89+
90+
Dates can be written in other formats specified in ISO 8601 and they will be recognized. However, `sync_from_synapse` will always write dates in the UTC format specified above. For example, you may want to specify a datetime at a specific timezone like `2023-12-20 23:55:08-07:00` and this will be recognized as a valid datetime.
91+
92+
## Manifest sources
93+
94+
The CSV manifest format is shared across multiple tools:
95+
96+
| Source | Filename |
97+
|----------------------------------------------------------|---------------------------------|
98+
| `Project.sync_from_synapse` / `Folder.sync_from_synapse` | manifest.csv |
99+
| Synapse UI download cart | manifest.csv |
100+
| CLI `synapse get-download-list` | `manifest_<timestamp>.csv` |
101+
102+
A manifest generated by any of these sources can be used as input to `sync_to_synapse`, provided the `path` column is present with valid local file paths. Manifests from the Synapse UI do not include a `path` column by default, so users must add it before uploading.
103+
104+
### Example manifest file
105+
106+
| path | parentId | ID | name | annot1 | annot2 | collection_date | used | executed |
107+
|-----------------|----------|---------|-----------|--------|--------|---------------------------|--------------------------|------------------------------|
108+
| /path/file1.txt | syn1243 | syn5001 | file1.txt | bar | 3.1415 | 2023-12-04T07:00:00Z | syn124;/path/file2.txt | https://github.org/foo/bar |
109+
| /path/file2.txt | syn12433 | syn5002 | file2.txt | baz | 2.71 | 2001-01-01T08:00:00Z | | https://github.org/foo/baz |
110+
| /path/file3.txt | syn12455 | syn5003 | file3.txt | zzz | 3.52 | 2023-12-04T07:00:00Z | | https://github.org/foo/zzz |
111+
112+
## References
113+
114+
- [Project.sync_from_synapse][synapseclient.models.Project.sync_from_synapse]
115+
- [Project.sync_to_synapse][synapseclient.models.Project.sync_to_synapse]
116+
- [Folder.sync_from_synapse][synapseclient.models.Folder.sync_from_synapse]
117+
- [Folder.sync_to_synapse][synapseclient.models.Folder.sync_to_synapse]
118+
- [Manifest TSV (legacy)](manifest_tsv.md)
119+
- [Managing custom metadata at scale](https://help.synapse.org/docs/Managing-Custom-Metadata-at-Scale.2004254976.html#ManagingCustomMetadataatScale-BatchUploadFileswithAnnotations)

docs/explanations/manifest_tsv.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
1-
# Manifest
1+
# Manifest TSV (Legacy)
22
The manifest is a tsv file with file locations and metadata to be pushed to Synapse. The purpose is to allow bulk actions through a TSV without the need to manually execute commands for every requested action.
33

4+
!!! warning "Deprecated"
5+
This TSV manifest format is produced by [synapseutils.syncFromSynapse][] and consumed by [synapseutils.syncToSynapse][], both of which are deprecated and will be removed in v5.0.0. Use `Project.sync_from_synapse` / `Folder.sync_from_synapse` and `Project.sync_to_synapse` / `Folder.sync_to_synapse` instead, which use the [CSV manifest format](manifest_csv.md).
6+
47
## Manifest file format
58

69
The format of the manifest file is a tab delimited file with one row per file to upload and columns describing the file. The minimum required columns are **path** and **parent** where path is the local file path and parent is the Synapse Id of the project or folder where the file is uploaded to.
@@ -20,6 +23,9 @@ Any additional columns will be added as annotations.
2023
| path | local file path or URL | /path/to/local/file.txt |
2124
| parent | synapse id | syn1235 |
2225

26+
!!! note "Column renamed in CSV format"
27+
The CSV manifest format uses `parentId` instead of `parent`. If you are migrating to the new [CSV manifest format](manifest_csv.md), rename the `parent` column to `parentId`.
28+
2329
### Common fields:
2430

2531
| Field | Meaning | Example |

docs/reference/experimental/async/folder.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ at your own risk.
1616
- copy_async
1717
- walk_async
1818
- sync_from_synapse_async
19+
- sync_to_synapse_async
1920
- flatten_file_list
2021
- map_directory_to_all_contained_files
2122
- get_permissions_async

docs/reference/experimental/async/project.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ at your own risk.
1515
- delete_async
1616
- walk_async
1717
- sync_from_synapse_async
18+
- sync_to_synapse_async
1819
- flatten_file_list
1920
- map_directory_to_all_contained_files
2021
- get_permissions_async

docs/reference/experimental/sync/folder.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ at your own risk.
2727
- copy
2828
- walk
2929
- sync_from_synapse
30+
- sync_to_synapse
3031
- flatten_file_list
3132
- map_directory_to_all_contained_files
3233
- get_permissions

docs/reference/experimental/sync/project.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ at your own risk.
2626
- delete
2727
- walk
2828
- sync_from_synapse
29+
- sync_to_synapse
2930
- flatten_file_list
3031
- map_directory_to_all_contained_files
3132
- get_permissions

docs/tutorials/configuration.md

Lines changed: 85 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,107 @@
22

33
The Synapse Python client can be configured either programmatically or by using a configuration file.
44

5-
**The default configuration file does not need to be modified for most use-cases**.
5+
!!! note "Default Configuration"
6+
The default configuration file does not need to be modified for most use-cases
67

8+
When installing the Synapse Python client, the `.synapseConfig` file is added to your home directory if it doesn't exist already. This file stores configuration options including your Synapse auth token, cache location, multi-threading settings, and storage credentials.
79

8-
When installing the Synapse Python client, the `.synapseConfig` is added to your home directory. This configuration file is used to store a number of configuration options, including your Synapse authtoken, cache, and multi-threading settings.
9-
10-
A full example `.synapseConfig` can be found in the [github repository](https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/.synapseConfig).
10+
A full annotated example `.synapseConfig` can be found in the [GitHub repository](https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/.synapseConfig).
1111

1212
## `.synapseConfig` sections
1313

14-
### `[authentication]`
14+
### `[default]` and `[profile <name>]`
15+
16+
Holds Synapse login credentials. `[default]` is used when no profile is specified; named profiles use `[profile <name>]` syntax. See the [authentication](./authentication.md) document for full details including how to create tokens, select profiles, and use environment variables.
17+
18+
### `[sftp://hostname]`
19+
20+
Credentials for files stored on SFTP servers. Use one section per server; the section name is the full SFTP URL.
21+
22+
| Key | Description |
23+
| --- | --- |
24+
| `username` | Username for the SFTP server. |
25+
| `password` | Password for the SFTP server. |
26+
27+
```ini
28+
[sftp://some.sftp.url.com]
29+
username = sftpuser
30+
password = sftppassword
31+
```
1532

16-
See details on this section in the [authentication](./authentication.md) document.
33+
### `[https://s3.amazonaws.com/bucket_name]`
34+
35+
Credentials for files stored in AWS S3 or S3-compatible storage that Synapse does not manage access for. Use one section per bucket; the section name is the full endpoint URL including the bucket name.
36+
37+
| Key | Description |
38+
| --- | --- |
39+
| `profile_name` | Name of an AWS CLI profile from `~/.aws/credentials`. If omitted, the `default` AWS profile is used. |
40+
41+
```ini
42+
[https://s3.amazonaws.com/bucket_name]
43+
profile_name = local_credential_profile_name
44+
```
45+
46+
For more information on AWS credentials files, see the [AWS CLI documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
1747

1848
### `[cache]`
1949

20-
Your downloaded files are cached to avoid repeat downloads of the same file. Change 'location' to use a different folder on your computer as the cache location
50+
Downloaded files are cached to avoid repeat downloads of the same file.
51+
52+
| Key | Description |
53+
| --- | --- |
54+
| `location` | Path to the cache directory. Supports `~` and environment variables. Default: `~/.synapseCache`. |
55+
56+
```ini
57+
[cache]
58+
location = ~/.synapseCache
59+
```
60+
61+
### `[debug]`
62+
63+
When this section is present (no keys required), the client prints debug-level log output. Equivalent to passing `debug=True` to the `Synapse()` constructor.
64+
65+
```ini
66+
[debug]
67+
```
2168

2269
### `[endpoints]`
2370

24-
Configuring these will cause the Python client to use these as Synapse service endpoints instead of the default prod endpoints.
71+
Override the default Synapse production service endpoints. Useful for testing against staging or development environments.
72+
73+
| Key | Description |
74+
| --- | --- |
75+
| `repoEndpoint` | Synapse repository REST API endpoint. |
76+
| `authEndpoint` | Synapse authentication service endpoint. |
77+
| `fileHandleEndpoint` | Synapse file service endpoint. |
78+
| `portalEndpoint` | Synapse web portal URL. |
79+
80+
Note: The following are the default endpoints.
81+
82+
```ini
83+
[endpoints]
84+
repoEndpoint = https://repo-prod.prod.sagebase.org/repo/v1
85+
authEndpoint = https://auth-prod.prod.sagebase.org/auth/v1
86+
fileHandleEndpoint = https://file-prod.prod.sagebase.org/file/v1
87+
portalEndpoint = https://www.synapse.org/
88+
```
2589

2690
### `[transfer]`
2791

28-
Settings to configure how Synapse uploads/downloads data.
92+
Settings to configure how Synapse uploads and downloads data.
93+
94+
| Key | Description |
95+
| --- | --- |
96+
| `max_threads` | Number of concurrent threads/connections for file transfers. Applies to AWS S3 transfers (uploads and downloads). Default: `min(cpu_count + 4, 128)`. Maximum: `128`. Minimum: `1`. |
97+
| `use_boto_sts` | If `true`, use AWS STS (Security Token Service) to obtain temporary credentials for S3 transfers instead of using stored AWS credentials directly. Valid values: `true` or `false` (case-insensitive). Default: `false`. |
98+
99+
```ini
100+
[transfer]
101+
max_threads = 16
102+
use_boto_sts = false
103+
```
29104

30-
You may also set the `max_threads` programmatically via:
105+
You may also set `max_threads` programmatically:
31106

32107
```python
33108
import synapseclient

0 commit comments

Comments
 (0)