|
| 1 | +<!-- Last reviewed: 2026-03 --> |
| 2 | + |
| 3 | +## Project |
| 4 | + |
| 5 | +Synapse Python Client — official Python SDK and CLI for Synapse (synapse.org), a collaborative science platform by Sage Bionetworks. Provides programmatic access to entities (projects, files, folders, tables, views), metadata, permissions, evaluations, and data curation workflows. Published to PyPI as `synapseclient`. |
| 6 | + |
| 7 | +## Stack |
| 8 | + |
| 9 | +- Python 3.10–3.14 (`setup.cfg`: `python_requires = >=3.10, <3.15`) |
| 10 | +- HTTP: httpx (async), requests (sync/legacy) |
| 11 | +- Models: stdlib dataclasses (NOT Pydantic) |
| 12 | +- Tests: pytest 8.2, pytest-asyncio, pytest-socket, pytest-xdist |
| 13 | +- Docs: MkDocs with Material theme, mkdocstrings |
| 14 | +- Linting: ruff, black (line-length 88), isort (profile=black), bandit |
| 15 | +- CI: GitHub Actions → SonarCloud, PyPI deploy on release |
| 16 | +- Docker: `Dockerfile` at repo root, published to `ghcr.io/sage-bionetworks/synapsepythonclient` |
| 17 | + |
| 18 | +## Commands |
| 19 | + |
| 20 | +```bash |
| 21 | +# Install for development |
| 22 | +pip install -e ".[boto3,pandas,pysftp,tests,curator,dev]" |
| 23 | + |
| 24 | +# Unit tests |
| 25 | +pytest -sv tests/unit |
| 26 | + |
| 27 | +# Integration tests (requires Synapse credentials, runs in parallel) |
| 28 | +pytest -sv --reruns 3 tests/integration -n 8 --dist loadscope |
| 29 | + |
| 30 | +# Pre-commit checks (ruff, black, isort, bandit) |
| 31 | +pre-commit run --all-files |
| 32 | + |
| 33 | +# Build docs locally |
| 34 | +pip install -e ".[docs]" && mkdocs serve |
| 35 | +``` |
| 36 | + |
| 37 | +## Conventions |
| 38 | + |
| 39 | +### Async-first with generated sync wrappers |
| 40 | +All new methods must be async with `_async` suffix. The `@async_to_sync` class decorator (`core/async_utils.py`) auto-generates sync counterparts at class definition time. Never write sync methods manually on model classes — the decorator handles it. |
| 41 | + |
| 42 | +### `wrap_async_to_sync()` for standalone functions |
| 43 | +Use `wrap_async_to_sync()` (not `@async_to_sync`) for free-standing async functions outside of classes — see `operations/` layer for the pattern. The class decorator only works on classes. |
| 44 | + |
| 45 | +### Protocol classes for sync type hints |
| 46 | +Each model in `models/` has a corresponding protocol in `models/protocols/` defining the sync method signatures. When adding a new async method to a model, add its sync signature to the protocol class so IDE type hints work. |
| 47 | + |
| 48 | +### Dataclass models with `fill_from_dict()` |
| 49 | +Models are `@dataclass` classes, NOT Pydantic. REST responses are deserialized via `fill_from_dict()` methods on each model. New models must follow this pattern. |
| 50 | + |
| 51 | +### Concrete types are Java class names |
| 52 | +`core/constants/concrete_types.py` maps Java class names (e.g., `org.sagebionetworks.repo.model.FileEntity`) for polymorphic entity deserialization. When adding new entity types, register the concrete type string here AND in `api/entity_factory.py` AND in `models/mixins/asynchronous_job.py` if it's an async job type. |
| 53 | + |
| 54 | +### Options dataclass pattern |
| 55 | +The `operations/` layer uses dataclass option objects (`StoreFileOptions`, `FileOptions`, `TableOptions`, etc.) to bundle type-specific configuration for CRUD operations. Follow this pattern for new entity-type-specific options. |
| 56 | + |
| 57 | +### Mixin composition for shared behavior |
| 58 | +Shared functionality lives in `models/mixins/` (AccessControllable, StorableContainer, AsynchronousJob, etc.). Prefer adding to existing mixins over duplicating logic across models. |
| 59 | + |
| 60 | +### `synapse_client` parameter pattern |
| 61 | +Most functions accept an optional `synapse_client` parameter. If omitted, `Synapse.get_client()` returns the cached singleton. Never pass `None` explicitly — omit the argument instead. |
| 62 | + |
| 63 | +### Branch naming |
| 64 | +Use `SYNPY-{issue_number}` or `synpy-{issue_number}` prefix for feature branches. PR titles follow `[SYNPY-XXXX] Description` format. |
| 65 | + |
| 66 | +## Architecture |
| 67 | + |
| 68 | +``` |
| 69 | +synapseclient/ |
| 70 | +├── client.py # Synapse class — public entry point, REST methods, auth (9600+ lines) |
| 71 | +├── api/ # REST API layer — one file per resource type (21 files) |
| 72 | +│ └── entity_factory.py # Polymorphic entity deserialization via concrete type dispatch |
| 73 | +├── models/ # Dataclass entities (Project, File, Table, etc.) (28 files) |
| 74 | +│ ├── protocols/ # Sync method type signatures for IDE hints (18 files) |
| 75 | +│ ├── mixins/ # Shared behavior (ACL, containers, async jobs, tables) (7 files) |
| 76 | +│ └── services/ # Model-level business logic (storable_entity, search) |
| 77 | +├── operations/ # High-level CRUD: get(), store(), delete() — factory dispatch |
| 78 | +├── core/ # Infrastructure: upload/download, retry, cache, creds, OTel |
| 79 | +│ ├── upload/ # Multipart upload (sync + async) |
| 80 | +│ ├── download/ # File download (sync + async) |
| 81 | +│ ├── credentials/ # Auth chain (PAT, env var, config file, AWS SSM) |
| 82 | +│ ├── constants/ # Concrete types, config keys, limits, method flags |
| 83 | +│ ├── models/ # ACL, Permission, DictObject, custom JSON serialization |
| 84 | +│ └── multithread_download/ # Threaded download manager |
| 85 | +├── extensions/ |
| 86 | +│ └── curator/ # Schema curation (pandas, networkx, rdflib) — optional |
| 87 | +├── services/ # JSON schema validation services |
| 88 | +└── entity.py, table.py, ... # Legacy classes (pre-OOP rewrite, read-only) |
| 89 | +
|
| 90 | +synapseutils/ # Legacy bulk utilities (copy, sync, migrate, walk) — sync-only |
| 91 | +``` |
| 92 | + |
| 93 | +Data flow: User → `operations/` factory → model async methods → `api/` service functions → `client.py` REST calls → Synapse API. Responses deserialized via `fill_from_dict()` on model instances. |
| 94 | + |
| 95 | +## Constraints |
| 96 | + |
| 97 | +- Do not use Pydantic for models — the codebase uses stdlib dataclasses with custom serialization. Mixing would break the `@async_to_sync` decorator and `fill_from_dict()` pattern. |
| 98 | +- For new tests, prefer async test modules. Existing synchronous unit tests under `tests/unit/` are retained and maintained; the `@async_to_sync` decorator is covered by a dedicated smoke test, so avoid adding duplicate sync/async test coverage. |
| 99 | +- On non-Windows platforms, unit tests must not make external network calls — `pytest-socket` blocks internet-facing sockets while allowing Unix domain sockets. Socket blocking is skipped on Windows. Use `pytest-mock` for HTTP mocking. |
| 100 | +- `develop` is the default/main branch, not `main` or `master`. PRs target `develop`. |
| 101 | +- Legacy classes in root `synapseclient/` (entity.py, table.py, etc.) are kept for backwards compatibility. New features go in `models/` using the dataclass pattern. |
| 102 | +- Avoid adding new methods to `client.py` (9600+ lines) — prefer the `api/` + `models/` layered pattern. |
| 103 | +- `synapseutils/` is legacy sync-only (uses `requests`, NOT `httpx`). Do not add async methods there — new async equivalents go in `models/` or `operations/`. |
| 104 | + |
| 105 | +## Testing |
| 106 | + |
| 107 | +- `asyncio_mode = auto` in pytest.ini — no need for `@pytest.mark.asyncio` |
| 108 | +- `asyncio_default_fixture_loop_scope = session` — all async tests share one event loop |
| 109 | +- Unit test client fixture: session-scoped, `skip_checks=True`, `cache_client=False` |
| 110 | +- Integration tests use `--reruns 3` for flaky retries and `-n 8 --dist loadscope` for parallelism |
| 111 | +- Integration fixtures create per-worker Synapse projects; use `schedule_for_cleanup()` for teardown |
| 112 | +- Auth env vars: `SYNAPSE_AUTH_TOKEN` (bearer token), `SYNAPSE_PROFILE` (config file profile, default: `"default"`), `SYNAPSE_TOKEN_AWS_SSM_PARAMETER_NAME` (AWS SSM path) |
| 113 | +- CI runs integration tests only on Python 3.10 and 3.14 (oldest + newest) to limit Synapse server load |
| 114 | + |
| 115 | +## Maintenance |
| 116 | + |
| 117 | +Each CLAUDE.md file has a `<!-- Last reviewed: YYYY-MM -->` header. Update this when the file is reviewed or modified. If a code change invalidates guidance in a CLAUDE.md file, update the guidance in the same PR. |
0 commit comments