Skip to content
117 changes: 117 additions & 0 deletions CLAUDE.md
Comment thread
BryanFauble marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
<!-- Last reviewed: 2026-03 -->

## Project

Synapse Python Client — official Python SDK and CLI for Synapse (synapse.org), a collaborative science platform by Sage Bionetworks. Provides programmatic access to entities (projects, files, folders, tables, views), metadata, permissions, evaluations, and data curation workflows. Published to PyPI as `synapseclient`.

## Stack

- Python 3.10–3.14 (`setup.cfg`: `python_requires = >=3.10, <3.15`)
- HTTP: httpx (async), requests (sync/legacy)
- Models: stdlib dataclasses (NOT Pydantic)
- Tests: pytest 8.2, pytest-asyncio, pytest-socket, pytest-xdist
- Docs: MkDocs with Material theme, mkdocstrings
- Linting: ruff, black (line-length 88), isort (profile=black), bandit
- CI: GitHub Actions → SonarCloud, PyPI deploy on release
- Docker: `Dockerfile` at repo root, published to `ghcr.io/sage-bionetworks/synapsepythonclient`

## Commands

```bash
# Install for development
pip install -e ".[boto3,pandas,pysftp,tests,curator,dev]"

# Unit tests
pytest -sv tests/unit

# Integration tests (requires Synapse credentials, runs in parallel)
pytest -sv --reruns 3 tests/integration -n 8 --dist loadscope

# Pre-commit checks (ruff, black, isort, bandit)
pre-commit run --all-files

# Build docs locally
pip install -e ".[docs]" && mkdocs serve
```

## Conventions

### Async-first with generated sync wrappers
All new methods must be async with `_async` suffix. The `@async_to_sync` class decorator (`core/async_utils.py`) auto-generates sync counterparts at class definition time. Never write sync methods manually on model classes — the decorator handles it.

### `wrap_async_to_sync()` for standalone functions
Use `wrap_async_to_sync()` (not `@async_to_sync`) for free-standing async functions outside of classes — see `operations/` layer for the pattern. The class decorator only works on classes.

### Protocol classes for sync type hints
Each model in `models/` has a corresponding protocol in `models/protocols/` defining the sync method signatures. When adding a new async method to a model, add its sync signature to the protocol class so IDE type hints work.

### Dataclass models with `fill_from_dict()`
Models are `@dataclass` classes, NOT Pydantic. REST responses are deserialized via `fill_from_dict()` methods on each model. New models must follow this pattern.

### Concrete types are Java class names
`core/constants/concrete_types.py` maps Java class names (e.g., `org.sagebionetworks.repo.model.FileEntity`) for polymorphic entity deserialization. When adding new entity types, register the concrete type string here AND in `api/entity_factory.py` AND in `models/mixins/asynchronous_job.py` if it's an async job type.

### Options dataclass pattern
The `operations/` layer uses dataclass option objects (`StoreFileOptions`, `FileOptions`, `TableOptions`, etc.) to bundle type-specific configuration for CRUD operations. Follow this pattern for new entity-type-specific options.

### Mixin composition for shared behavior
Shared functionality lives in `models/mixins/` (AccessControllable, StorableContainer, AsynchronousJob, etc.). Prefer adding to existing mixins over duplicating logic across models.

### `synapse_client` parameter pattern
Most functions accept an optional `synapse_client` parameter. If omitted, `Synapse.get_client()` returns the cached singleton. Never pass `None` explicitly — omit the argument instead.

### Branch naming
Use `SYNPY-{issue_number}` or `synpy-{issue_number}` prefix for feature branches. PR titles follow `[SYNPY-XXXX] Description` format.

## Architecture

```
synapseclient/
├── client.py # Synapse class — public entry point, REST methods, auth (9600+ lines)
├── api/ # REST API layer — one file per resource type (21 files)
│ └── entity_factory.py # Polymorphic entity deserialization via concrete type dispatch
├── models/ # Dataclass entities (Project, File, Table, etc.) (28 files)
│ ├── protocols/ # Sync method type signatures for IDE hints (18 files)
│ ├── mixins/ # Shared behavior (ACL, containers, async jobs, tables) (7 files)
│ └── services/ # Model-level business logic (storable_entity, search)
├── operations/ # High-level CRUD: get(), store(), delete() — factory dispatch
├── core/ # Infrastructure: upload/download, retry, cache, creds, OTel
│ ├── upload/ # Multipart upload (sync + async)
│ ├── download/ # File download (sync + async)
│ ├── credentials/ # Auth chain (PAT, env var, config file, AWS SSM)
│ ├── constants/ # Concrete types, config keys, limits, method flags
│ ├── models/ # ACL, Permission, DictObject, custom JSON serialization
│ └── multithread_download/ # Threaded download manager
├── extensions/
│ └── curator/ # Schema curation (pandas, networkx, rdflib) — optional
├── services/ # JSON schema validation services
└── entity.py, table.py, ... # Legacy classes (pre-OOP rewrite, read-only)

synapseutils/ # Legacy bulk utilities (copy, sync, migrate, walk) — sync-only
```

Data flow: User → `operations/` factory → model async methods → `api/` service functions → `client.py` REST calls → Synapse API. Responses deserialized via `fill_from_dict()` on model instances.

## Constraints

- Do not use Pydantic for models — the codebase uses stdlib dataclasses with custom serialization. Mixing would break the `@async_to_sync` decorator and `fill_from_dict()` pattern.
- For new tests, prefer async test modules. Existing synchronous unit tests under `tests/unit/` are retained and maintained; the `@async_to_sync` decorator is covered by a dedicated smoke test, so avoid adding duplicate sync/async test coverage.
- On non-Windows platforms, unit tests must not make external network calls — `pytest-socket` blocks internet-facing sockets while allowing Unix domain sockets. Socket blocking is skipped on Windows. Use `pytest-mock` for HTTP mocking.
- `develop` is the default/main branch, not `main` or `master`. PRs target `develop`.
- Legacy classes in root `synapseclient/` (entity.py, table.py, etc.) are kept for backwards compatibility. New features go in `models/` using the dataclass pattern.
- Avoid adding new methods to `client.py` (9600+ lines) — prefer the `api/` + `models/` layered pattern.
- `synapseutils/` is legacy sync-only (uses `requests`, NOT `httpx`). Do not add async methods there — new async equivalents go in `models/` or `operations/`.

## Testing

- `asyncio_mode = auto` in pytest.ini — no need for `@pytest.mark.asyncio`
- `asyncio_default_fixture_loop_scope = session` — all async tests share one event loop
- Unit test client fixture: session-scoped, `skip_checks=True`, `cache_client=False`
- Integration tests use `--reruns 3` for flaky retries and `-n 8 --dist loadscope` for parallelism
- Integration fixtures create per-worker Synapse projects; use `schedule_for_cleanup()` for teardown
- Auth env vars: `SYNAPSE_AUTH_TOKEN` (bearer token), `SYNAPSE_PROFILE` (config file profile, default: `"default"`), `SYNAPSE_TOKEN_AWS_SSM_PARAMETER_NAME` (AWS SSM path)
- CI runs integration tests only on Python 3.10 and 3.14 (oldest + newest) to limit Synapse server load

## Maintenance

Each CLAUDE.md file has a `<!-- Last reviewed: YYYY-MM -->` header. Update this when the file is reviewed or modified. If a code change invalidates guidance in a CLAUDE.md file, update the guidance in the same PR.
61 changes: 61 additions & 0 deletions docs/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<!-- Last reviewed: 2026-03 -->
Comment thread
BryanFauble marked this conversation as resolved.

## Project

User-facing documentation for the Synapse Python Client. Built with MkDocs + Material theme, deployed via GitHub Pages. Follows the Diataxis documentation framework with four content types: tutorials, guides, reference, and explanations.

## Stack

MkDocs with Material theme, mkdocstrings (Google-style docstrings), termynal (CLI animations), markdown-include (file embedding).

### Python style
- Use built-in generics (`list`, `dict`, `tuple`, `set`) instead of `typing.List`, `typing.Dict`, etc. (Python 3.9+)

## Conventions

Comment thread
BryanFauble marked this conversation as resolved.
### Content types (Diataxis framework)
- **tutorials/** — Step-by-step learning (competence-building). Themed around a biomedical researcher working with Alzheimer's Disease data. Progressive build-up: Project → Folder → File → Annotations → etc.
- **guides/** — How-to guides for specific use cases (problem-solution oriented). Includes extension-specific guides (curator).
- **reference/** — API reference auto-generated from docstrings via mkdocstrings. Split into `experimental/sync/` and `experimental/async/` for new OOP API.
- **explanations/** — Deep conceptual content ("why" not just "how"). Design decisions, internal machinery.

### File inclusion pattern (markdown-include)
Tutorial code lives in `tutorials/python/tutorial_scripts/*.py` and is embedded in markdown via line-range includes:
```markdown
{!docs/tutorials/python/tutorial_scripts/annotation.py!lines=5-23}
```
Single source of truth — edit the `.py` file, not the markdown. Changing line numbers in scripts requires updating the line ranges in the corresponding `.md` files.

### mkdocstrings reference generation
Reference markdown files use `::: synapseclient.ClassName` syntax to trigger auto-generation from docstrings. Key configuration:
- `docstring_style: google` — parse Google-style docstrings
- `members_order: source` — preserve source code order
- `filters: ["!^_", "!to_synapse_request", "!fill_from_dict"]` — private members, `to_synapse_request()`, and `fill_from_dict()` are excluded from docs
- `inherited_members: true` — shows mixin methods on inheriting classes
- Member lists are explicit — each reference page specifies which methods to document

### Anchor links for cross-referencing
Pattern: `[](){ #reference-anchor }` in reference pages. Tutorials link to reference via `[API Reference][project-reference-sync]`. Explicit type hints use: `[syn.login][synapseclient.Synapse.login]`.

### termynal CLI animations
Terminal animation blocks marked with `<!-- termynal -->` HTML comment. Prompts configured as `$` or `>`. Used in authentication.md and installation docs.

### Custom CSS (`css/custom.css`)
- API reference indentation: `doc-contents` has 25px left padding with border
- Smaller table font (0.7rem) for API docs
- Wide layout: `max-width: 1700px` for complex content

### Navigation structure
Defined in `mkdocs.yml` nav section. 5 main sections: Home, Tutorials, How-To Guides, API Reference, Further Reading, News. API Reference has ~85 markdown files (~40 legacy, ~45 experimental).

## Constraints

- Do not edit tutorial code inline in markdown — edit the `.py` script file in `tutorial_scripts/` and update line ranges if needed.
- Reference docs auto-generate from source docstrings — to change method documentation, edit the docstring in the Python source, not the markdown.
- `mkdocs.yml` is at the repo root, not in `docs/` — it configures the entire doc build.
- Docs deploy to Read the Docs (configured via `.readthedocs.yaml` at repo root).
- Local build output goes to `docs_site/` (via `site_dir` in `mkdocs.yml`) — gitignored.
- Cross-referencing uses the `autorefs` plugin: `[display text][synapseclient.ClassName.method]` auto-resolves to mkdocstrings anchors.

### news.md
Release notes live in `docs/news.md`. Each release gets a heading with the version number and date, followed by bullet points describing changes. Group entries by category (Features, Bug Fixes, etc.). Reference Jira ticket numbers (SYNPY-XXXX) in each entry.
83 changes: 83 additions & 0 deletions synapseclient/api/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
<!-- Last reviewed: 2026-03 -->
Comment thread
BryanFauble marked this conversation as resolved.

## Project

REST API service layer — thin async functions that map to Synapse REST endpoints. One file per resource type. Called by model layer, never by end users directly.
Comment thread
BryanFauble marked this conversation as resolved.

## Conventions

### Function signature pattern
```python
async def verb_resource(
required_param: str,
optional_param: str = None,
*,
synapse_client: Optional["Synapse"] = None,
) -> Dict[str, Any]:
```
- All functions are `async def`
Comment thread
BryanFauble marked this conversation as resolved.
- `synapse_client` is **always** `Optional["Synapse"] = None` — never make it required. Callers omit it to use the cached singleton returned by `Synapse.get_client()`.
- `synapse_client` is always the last parameter, keyword-only (after `*`)
- Use `Synapse.get_client(synapse_client=synapse_client)` to get the client instance
- Use `TYPE_CHECKING` guard for `Synapse` import — avoids circular dependencies between `api/` and `client.py`
- Construct a `query_params` dictionary for non-null optional args, and pass it to the `params` arg of the REST call. See `entity_services.py` for the pattern.

### Docstring conventions
Module-level — every file opens with boilerplate linking to the Synapse REST controller:
```python
"""This module is responsible for exposing the services defined at:
<https://rest-docs.synapse.org/rest/#org.sagebionetworks.repo.web.controller.XController>
"""
```
Function-level (Google style):
```python
"""
One-line summary.

<https://rest-docs.synapse.org/rest/POST/endpoint.html>

Arguments:
param: Description.
synapse_client: If not passed in and caching was not disabled by
`Synapse.allow_client_caching(False)` this will use the last created
instance from the Synapse class constructor.

Returns:
Description of return value.
"""
```
- The `synapse_client` argument description is boilerplate — always copy it verbatim, not paraphrased.
- The REST endpoint URL uses `<link>` format (angled brackets), not markdown `[text](url)`.
- Parameter descriptions in `Arguments:` must be copied verbatim from the Synapse REST API docs for that endpoint — do not paraphrase or infer.

### REST call pattern
```python
client = Synapse.get_client(synapse_client=synapse_client)
return await client.rest_post_async(uri="/endpoint", body=json.dumps(request))
Comment thread
BryanFauble marked this conversation as resolved.
```
Available methods: `rest_get_async`, `rest_post_async`, `rest_put_async`, `rest_delete_async`. Pass `endpoint=client.fileHandleEndpoint` for file handle operations; omit for the default repository endpoint. Use `json.dumps()` for request bodies — not raw dicts. Always assign the response to a named `response` variable before returning or extracting attributes from it.

### Return values
- Most functions return raw `Dict[str, Any]` — transformation happens in the model layer via `fill_from_dict()`
- Some return typed dataclass instances (e.g., `EntityHeader` from `entity_services.py`) when the data is only used internally
- Delete operations return `None`

### Pagination
Comment thread
BryanFauble marked this conversation as resolved.
Use async pagination helpers when the API endpoint returns a list of results. For single-object responses, a simple `return` is sufficient.

Helpers from `api_client.py`:
- `rest_get_paginated_async()` — for GET endpoints with limit/offset. Expects `results` or `children` key in response.
- `rest_post_paginated_async()` — for POST endpoints with `nextPageToken`. Expects `page` array in response.
Both are async generators yielding individual items. Reference `entity_services.py`, `table_services.py`, or `evaluation_services.py` for pagination patterns.

### Entity factory (`entity_factory.py`)
Polymorphic entity deserialization via concrete type dispatch. Maps Java class names from `core/constants/concrete_types.py` to model classes. When adding a new entity type, register the type mapping here.

### When to add a new service file vs. update an existing one
Add a new file when the Synapse REST controller is different (each file maps to one controller). Update an existing file when adding endpoints under the same controller.

### Adding a new service file
Comment thread
BryanFauble marked this conversation as resolved.
1. Create `synapseclient/api/new_service.py`
2. Add all public functions to `api/__init__.py` imports and `__all__` — every public function must be re-exported
3. Use `json.dumps()` for request bodies (not dict)
4. Reference `entity_services.py` for CRUD pattern, `table_services.py` or `evaluation_services.py` for pagination pattern
Loading
Loading