Skip to content

Commit 162c80b

Browse files
committed
Merge branch 'develop' into synpy-1764-trivy-scanning
2 parents 629e014 + e06c715 commit 162c80b

13 files changed

Lines changed: 316 additions & 133 deletions

File tree

CLAUDE.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@ Synapse Python Client — official Python SDK and CLI for Synapse (synapse.org),
1111
- Models: stdlib dataclasses (NOT Pydantic)
1212
- Tests: pytest 8.2, pytest-asyncio, pytest-socket, pytest-xdist
1313
- Docs: MkDocs with Material theme, mkdocstrings
14-
- Linting: ruff, black (line-length 88), isort (profile=black), bandit, flake8
14+
- Linting: ruff, black (line-length 88), isort (profile=black), bandit
1515
- CI: GitHub Actions → SonarCloud, PyPI deploy on release
16+
- Docker: `Dockerfile` at repo root, published to `ghcr.io/sage-bionetworks/synapsepythonclient`
1617

1718
## Commands
1819

@@ -94,8 +95,8 @@ Data flow: User → `operations/` factory → model async methods → `api/` ser
9495
## Constraints
9596

9697
- Do not use Pydantic for models — the codebase uses stdlib dataclasses with custom serialization. Mixing would break the `@async_to_sync` decorator and `fill_from_dict()` pattern.
97-
- Do not write synchronous test files — write async tests only. The `@async_to_sync` decorator is validated by a dedicated smoke test. Duplicate sync tests were removed to cut CI cost.
98-
- Unit tests must not make network calls — `pytest-socket` blocks all sockets. Use `pytest-mock` for HTTP mocking.
98+
- For new tests, prefer async test modules. Existing synchronous unit tests under `tests/unit/` are retained and maintained; the `@async_to_sync` decorator is covered by a dedicated smoke test, so avoid adding duplicate sync/async test coverage.
99+
- On non-Windows platforms, unit tests must not make external network calls — `pytest-socket` blocks internet-facing sockets while allowing Unix domain sockets. Socket blocking is skipped on Windows. Use `pytest-mock` for HTTP mocking.
99100
- `develop` is the default/main branch, not `main` or `master`. PRs target `develop`.
100101
- Legacy classes in root `synapseclient/` (entity.py, table.py, etc.) are kept for backwards compatibility. New features go in `models/` using the dataclass pattern.
101102
- Avoid adding new methods to `client.py` (9600+ lines) — prefer the `api/` + `models/` layered pattern.
@@ -108,3 +109,9 @@ Data flow: User → `operations/` factory → model async methods → `api/` ser
108109
- Unit test client fixture: session-scoped, `skip_checks=True`, `cache_client=False`
109110
- Integration tests use `--reruns 3` for flaky retries and `-n 8 --dist loadscope` for parallelism
110111
- Integration fixtures create per-worker Synapse projects; use `schedule_for_cleanup()` for teardown
112+
- Auth env vars: `SYNAPSE_AUTH_TOKEN` (bearer token), `SYNAPSE_PROFILE` (config file profile, default: `"default"`), `SYNAPSE_TOKEN_AWS_SSM_PARAMETER_NAME` (AWS SSM path)
113+
- CI runs integration tests only on Python 3.10 and 3.14 (oldest + newest) to limit Synapse server load
114+
115+
## Maintenance
116+
117+
Each CLAUDE.md file has a `<!-- Last reviewed: YYYY-MM -->` header. Update this when the file is reviewed or modified. If a code change invalidates guidance in a CLAUDE.md file, update the guidance in the same PR.

docs/CLAUDE.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ User-facing documentation for the Synapse Python Client. Built with MkDocs + Mat
88

99
MkDocs with Material theme, mkdocstrings (Google-style docstrings), termynal (CLI animations), markdown-include (file embedding).
1010

11+
### Python style
12+
- Use built-in generics (`list`, `dict`, `tuple`, `set`) instead of `typing.List`, `typing.Dict`, etc. (Python 3.9+)
13+
1114
## Conventions
1215

1316
### Content types (Diataxis framework)
@@ -50,4 +53,9 @@ Defined in `mkdocs.yml` nav section. 5 main sections: Home, Tutorials, How-To Gu
5053
- Do not edit tutorial code inline in markdown — edit the `.py` script file in `tutorial_scripts/` and update line ranges if needed.
5154
- Reference docs auto-generate from source docstrings — to change method documentation, edit the docstring in the Python source, not the markdown.
5255
- `mkdocs.yml` is at the repo root, not in `docs/` — it configures the entire doc build.
53-
- Docs deploy via `mkdocs gh-deploy --force` targeting the `master` branch (not `develop`).
56+
- Docs deploy to Read the Docs (configured via `.readthedocs.yaml` at repo root).
57+
- Local build output goes to `docs_site/` (via `site_dir` in `mkdocs.yml`) — gitignored.
58+
- Cross-referencing uses the `autorefs` plugin: `[display text][synapseclient.ClassName.method]` auto-resolves to mkdocstrings anchors.
59+
60+
### news.md
61+
Release notes live in `docs/news.md`. Each release gets a heading with the version number and date, followed by bullet points describing changes. Group entries by category (Features, Bug Fixes, etc.). Reference Jira ticket numbers (SYNPY-XXXX) in each entry.

synapseclient/api/CLAUDE.md

Lines changed: 39 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,33 +16,68 @@ async def verb_resource(
1616
) -> Dict[str, Any]:
1717
```
1818
- All functions are `async def`
19+
- `synapse_client` is **always** `Optional["Synapse"] = None` — never make it required. Callers omit it to use the cached singleton returned by `Synapse.get_client()`.
1920
- `synapse_client` is always the last parameter, keyword-only (after `*`)
2021
- Use `Synapse.get_client(synapse_client=synapse_client)` to get the client instance
2122
- Use `TYPE_CHECKING` guard for `Synapse` import — avoids circular dependencies between `api/` and `client.py`
23+
- Construct a `query_params` dictionary for non-null optional args, and pass it to the `params` arg of the REST call. See `entity_services.py` for the pattern.
24+
25+
### Docstring conventions
26+
Module-level — every file opens with boilerplate linking to the Synapse REST controller:
27+
```python
28+
"""This module is responsible for exposing the services defined at:
29+
<https://rest-docs.synapse.org/rest/#org.sagebionetworks.repo.web.controller.XController>
30+
"""
31+
```
32+
Function-level (Google style):
33+
```python
34+
"""
35+
One-line summary.
36+
37+
<https://rest-docs.synapse.org/rest/POST/endpoint.html>
38+
39+
Arguments:
40+
param: Description.
41+
synapse_client: If not passed in and caching was not disabled by
42+
`Synapse.allow_client_caching(False)` this will use the last created
43+
instance from the Synapse class constructor.
44+
45+
Returns:
46+
Description of return value.
47+
"""
48+
```
49+
- The `synapse_client` argument description is boilerplate — always copy it verbatim, not paraphrased.
50+
- The REST endpoint URL uses `<link>` format (angled brackets), not markdown `[text](url)`.
51+
- Parameter descriptions in `Arguments:` must be copied verbatim from the Synapse REST API docs for that endpoint — do not paraphrase or infer.
2252

2353
### REST call pattern
2454
```python
2555
client = Synapse.get_client(synapse_client=synapse_client)
2656
return await client.rest_post_async(uri="/endpoint", body=json.dumps(request))
2757
```
28-
Available methods: `rest_get_async`, `rest_post_async`, `rest_put_async`, `rest_delete_async`. Pass `endpoint=client.fileHandleEndpoint` for file handle operations; omit for the default repository endpoint. Use `json.dumps()` for request bodies — not raw dicts.
58+
Available methods: `rest_get_async`, `rest_post_async`, `rest_put_async`, `rest_delete_async`. Pass `endpoint=client.fileHandleEndpoint` for file handle operations; omit for the default repository endpoint. Use `json.dumps()` for request bodies — not raw dicts. Always assign the response to a named `response` variable before returning or extracting attributes from it.
2959

3060
### Return values
3161
- Most functions return raw `Dict[str, Any]` — transformation happens in the model layer via `fill_from_dict()`
3262
- Some return typed dataclass instances (e.g., `EntityHeader` from `entity_services.py`) when the data is only used internally
3363
- Delete operations return `None`
3464

3565
### Pagination
36-
Use helpers from `api_client.py`:
66+
Use async pagination helpers when the API endpoint returns a list of results. For single-object responses, a simple `return` is sufficient.
67+
68+
Helpers from `api_client.py`:
3769
- `rest_get_paginated_async()` — for GET endpoints with limit/offset. Expects `results` or `children` key in response.
3870
- `rest_post_paginated_async()` — for POST endpoints with `nextPageToken`. Expects `page` array in response.
39-
Both are async generators yielding individual items.
71+
Both are async generators yielding individual items. Reference `entity_services.py`, `table_services.py`, or `evaluation_services.py` for pagination patterns.
4072

4173
### Entity factory (`entity_factory.py`)
4274
Polymorphic entity deserialization via concrete type dispatch. Maps Java class names from `core/constants/concrete_types.py` to model classes. When adding a new entity type, register the type mapping here.
4375

76+
### When to add a new service file vs. update an existing one
77+
Add a new file when the Synapse REST controller is different (each file maps to one controller). Update an existing file when adding endpoints under the same controller.
78+
4479
### Adding a new service file
4580
1. Create `synapseclient/api/new_service.py`
4681
2. Add all public functions to `api/__init__.py` imports and `__all__` — every public function must be re-exported
4782
3. Use `json.dumps()` for request bodies (not dict)
48-
4. Reference `entity_services.py` for CRUD pattern, `table_services.py` for pagination pattern
83+
4. Reference `entity_services.py` for CRUD pattern, `table_services.py` or `evaluation_services.py` for pagination pattern

synapseclient/core/CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Provider chain tries in order: login args → config file → env var (`SYNAPSE_
3232
- Download validates MD5 post-transfer, raises `SynapseMd5MismatchError` on mismatch
3333
- Progress via `tqdm`; multi-threaded uploads suppress per-file messages via `cumulative_transfer_progress`
3434

35-
### concrete_types.py
35+
### concrete_types.py (`core/constants/concrete_types.py`)
3636
Maps Java class names from Synapse REST API for polymorphic deserialization. When adding a new entity type, add its concrete type string here AND in `api/entity_factory.py` type map AND in `models/mixins/asynchronous_job.py` ASYNC_JOB_URIS if it's an async job type.
3737

3838
### Key reusable utilities (`utils.py`)

synapseclient/core/download/CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ File download from Synapse storage with MD5 validation, collision handling, and
77
## Conventions
88

99
### Primary download path
10-
`download_async.py` is the primary async download implementation. `download_functions.py` contains shared helpers and the sync download wrapper.
10+
`download_async.py` is the primary async download implementation. `download_functions.py` contains shared helpers and the sync download wrapper. The default part size of 8 MiB was empirically optimized for Synapse download throughput — do not change it without benchmarking.
1111

1212
### MD5 validation
1313
Post-transfer MD5 validation is mandatory. Raises `SynapseMd5MismatchError` on mismatch — the download is retried automatically (60 retries spanning ~30 minutes).

synapseclient/extensions/curator/CLAUDE.md

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,28 +10,14 @@ Optional dependencies (gated by `[curator]` extras): pandas, pandarallel, networ
1010

1111
## Conventions
1212

13-
### schema_generation.py (5984 lines)
14-
Largest file in the codebase. Contains `DataModelParser`, `DataModelComponent`, `DataModelRelationships` classes. Uses networkx (DiGraph, MultiDiGraph) for node/edge relationships and cycle detection (via multiprocessing). Many deprecated validation rule enums marked for removal (SYNPY-1724, SYNPY-1692). Active development area — multiple recent PRs modifying conditionals, display names, and grouping.
15-
16-
### schema_registry.py
17-
Query engine for the schema registry table. Default table ID: `syn69735275` (configurable via parameter). Builds SQL WHERE clauses from filter kwargs — supports exact match and LIKE pattern match. `return_latest_only=True` returns newest version URI only.
13+
### schema_generation.py
14+
Largest file in the codebase. Uses networkx (DiGraph, MultiDiGraph) for node/edge relationships and cycle detection (via multiprocessing). Many deprecated validation rule enums marked for removal (SYNPY-1724, SYNPY-1692). Active development area.
1815

1916
### schema_management.py
20-
Thin wrappers around `JSONSchema` OOP model:
21-
- `register_jsonschema()` / `register_jsonschema_async()` — loads schema from file, calls `.store_async()`
22-
- `bind_jsonschema()` / `bind_jsonschema_async()` — binds schema to entity
23-
- `fix_schema_name()` — replaces dashes/underscores with periods for Synapse compliance
24-
25-
Uses `wrap_async_to_sync()` for sync versions (not class decorator).
26-
27-
### file_based_metadata_task.py
28-
Creates EntityView from JSON Schema bound to folder/project. `create_json_schema_entity_view()` auto-reorders columns (createdBy→name→id to front). `create_or_update_wiki_with_entity_view()` embeds EntityView query in Wiki page.
29-
30-
### record_based_metadata_task.py
31-
Extracts schema properties → DataFrame → RecordSet → CurationTask + Grid. Supports URI-based schemas via `JSONSchema.from_uri()`.
17+
Uses `wrap_async_to_sync()` for sync versions (not class decorator). `fix_schema_name()` replaces dashes/underscores with periods for Synapse compliance.
3218

3319
### utils.py
34-
`project_id_from_entity_id()` — traverses folder hierarchy up to project (max 1000 iterations). Uses legacy sync `get()` API in a loop — known tech debt.
20+
`project_id_from_entity_id()` — traverses folder hierarchy up to project (max 1000 iterations). Uses `operations.get` in a loop — known tech debt.
3521

3622
## Constraints
3723

synapseclient/extensions/curator/file_based_metadata_task.py

Lines changed: 45 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
from synapseclient.operations import FileOptions, get
2525

2626
TYPE_DICT = {
27-
"string": ColumnType.STRING,
27+
"string": ColumnType.MEDIUMTEXT,
2828
"number": ColumnType.DOUBLE,
2929
"integer": ColumnType.INTEGER,
3030
"boolean": ColumnType.BOOLEAN,
@@ -199,48 +199,67 @@ def _create_columns_from_json_schema(json_schema: dict[str, Any]) -> list[Column
199199
raise ValueError(
200200
"The 'properties' field in the JSON Schema must be a dictionary."
201201
)
202-
columns = []
203-
for name, prop_schema in properties.items():
204-
column_type = _get_column_type_from_js_property(prop_schema)
205-
maximum_size = None
206-
if column_type == "STRING":
207-
maximum_size = 100
208-
if column_type in LIST_TYPE_DICT.values():
209-
maximum_size = 5
210-
211-
column = Column(
212-
name=name,
213-
column_type=column_type,
214-
maximum_size=maximum_size,
215-
default_value=None,
216-
)
217-
columns.append(column)
202+
columns = [
203+
_create_synapse_column_from_js_property(prop_schema, name)
204+
for name, prop_schema in properties.items()
205+
]
218206
return columns
219207

220208

209+
def _create_synapse_column_from_js_property(
210+
js_property: dict[str, Any], name: str
211+
) -> Column:
212+
"""
213+
Creates a Synapse Column based on a JSON Schema property.
214+
215+
Args:
216+
js_property: A JSON Schema property in dict form.
217+
name: The name of the column.
218+
219+
Returns:
220+
A Synapse Column based on the JSON Schema property.
221+
"""
222+
column_type = _get_column_type_from_js_property(js_property)
223+
return Column(name=name, column_type=column_type)
224+
225+
221226
def _get_column_type_from_js_property(js_property: dict[str, Any]) -> ColumnType:
222227
"""
223228
Gets the Synapse column type from a JSON Schema property.
224229
The JSON Schema should be valid but that should not be assumed.
225-
If the type can not be determined ColumnType.STRING will be returned.
230+
If the type can not be determined ColumnType.MEDIUMTEXT will be returned.
226231
227232
Args:
228233
js_property: A JSON Schema property in dict form.
229234
230235
Returns:
231236
A Synapse ColumnType based on the JSON Schema type
232237
"""
233-
# Enums are always strings in Synapse tables
238+
# Enums are set as MediumText columns
234239
if "enum" in js_property:
235-
return ColumnType.STRING
240+
return ColumnType.MEDIUMTEXT
236241
if "type" in js_property:
237-
if js_property["type"] == "array":
242+
js_type = js_property["type"]
243+
# Synapse columns cannot be more than one type
244+
# If the JSONSchema type is a list of types, check if it's a nullable single type
245+
if isinstance(js_type, list):
246+
types = [t for t in js_type if t != "null"]
247+
if len(types) == 1:
248+
js_type = types[0]
249+
# If there are multiple non-null types, we cannot determine a single column type, so default to MediumText
250+
else:
251+
return ColumnType.MEDIUMTEXT
252+
if js_type == "array":
238253
return _get_list_column_type_from_js_property(js_property)
239-
return TYPE_DICT.get(js_property["type"], ColumnType.STRING)
254+
# If there is only one JSONSChema type, return the corresponding Synapse column type,
255+
# defaulting to MediumText if there is no match
256+
return TYPE_DICT.get(js_type, ColumnType.MEDIUMTEXT)
240257
# A oneOf list usually indicates that the type could be one or more different things
258+
# Curator extension does not create the types of JSON Schemas where this is the case
259+
# but if it is present we will attempt to determine the type based on the items in the oneOf list.
241260
if "oneOf" in js_property and isinstance(js_property["oneOf"], list):
242261
return _get_column_type_from_js_one_of_list(js_property["oneOf"])
243-
return ColumnType.STRING
262+
return ColumnType.MEDIUMTEXT
244263

245264

246265
def _get_column_type_from_js_one_of_list(js_one_of_list: list[Any]) -> ColumnType:
@@ -258,15 +277,15 @@ def _get_column_type_from_js_one_of_list(js_one_of_list: list[Any]) -> ColumnTyp
258277
items = [item for item in js_one_of_list if isinstance(item, dict)]
259278
# Enums are always strings in Synapse tables
260279
if [item for item in items if "enum" in item]:
261-
return ColumnType.STRING
280+
return ColumnType.MEDIUMTEXT
262281
# For Synapse ColumnType we can ignore null types in JSON Schemas
263282
type_items = [item for item in items if "type" in item if item["type"] != "null"]
264283
if len(type_items) == 1:
265284
type_item = type_items[0]
266285
if type_item["type"] == "array":
267286
return _get_list_column_type_from_js_property(type_item)
268-
return TYPE_DICT.get(type_item["type"], ColumnType.STRING)
269-
return ColumnType.STRING
287+
return TYPE_DICT.get(type_item["type"], ColumnType.MEDIUMTEXT)
288+
return ColumnType.MEDIUMTEXT
270289

271290

272291
def _get_list_column_type_from_js_property(js_property: dict[str, Any]) -> ColumnType:

0 commit comments

Comments
 (0)