Skip to content

Commit e06c715

Browse files
authored
Add initial CLAUDE.md for AI-assisted development (#1342)
* Add initial CLAUDE.md for AI-assisted development
1 parent fa8f6fa commit e06c715

16 files changed

Lines changed: 738 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
Synapse Python Client — official Python SDK and CLI for Synapse (synapse.org), a collaborative science platform by Sage Bionetworks. Provides programmatic access to entities (projects, files, folders, tables, views), metadata, permissions, evaluations, and data curation workflows. Published to PyPI as `synapseclient`.
6+
7+
## Stack
8+
9+
- Python 3.10–3.14 (`setup.cfg`: `python_requires = >=3.10, <3.15`)
10+
- HTTP: httpx (async), requests (sync/legacy)
11+
- Models: stdlib dataclasses (NOT Pydantic)
12+
- Tests: pytest 8.2, pytest-asyncio, pytest-socket, pytest-xdist
13+
- Docs: MkDocs with Material theme, mkdocstrings
14+
- Linting: ruff, black (line-length 88), isort (profile=black), bandit
15+
- CI: GitHub Actions → SonarCloud, PyPI deploy on release
16+
- Docker: `Dockerfile` at repo root, published to `ghcr.io/sage-bionetworks/synapsepythonclient`
17+
18+
## Commands
19+
20+
```bash
21+
# Install for development
22+
pip install -e ".[boto3,pandas,pysftp,tests,curator,dev]"
23+
24+
# Unit tests
25+
pytest -sv tests/unit
26+
27+
# Integration tests (requires Synapse credentials, runs in parallel)
28+
pytest -sv --reruns 3 tests/integration -n 8 --dist loadscope
29+
30+
# Pre-commit checks (ruff, black, isort, bandit)
31+
pre-commit run --all-files
32+
33+
# Build docs locally
34+
pip install -e ".[docs]" && mkdocs serve
35+
```
36+
37+
## Conventions
38+
39+
### Async-first with generated sync wrappers
40+
All new methods must be async with `_async` suffix. The `@async_to_sync` class decorator (`core/async_utils.py`) auto-generates sync counterparts at class definition time. Never write sync methods manually on model classes — the decorator handles it.
41+
42+
### `wrap_async_to_sync()` for standalone functions
43+
Use `wrap_async_to_sync()` (not `@async_to_sync`) for free-standing async functions outside of classes — see `operations/` layer for the pattern. The class decorator only works on classes.
44+
45+
### Protocol classes for sync type hints
46+
Each model in `models/` has a corresponding protocol in `models/protocols/` defining the sync method signatures. When adding a new async method to a model, add its sync signature to the protocol class so IDE type hints work.
47+
48+
### Dataclass models with `fill_from_dict()`
49+
Models are `@dataclass` classes, NOT Pydantic. REST responses are deserialized via `fill_from_dict()` methods on each model. New models must follow this pattern.
50+
51+
### Concrete types are Java class names
52+
`core/constants/concrete_types.py` maps Java class names (e.g., `org.sagebionetworks.repo.model.FileEntity`) for polymorphic entity deserialization. When adding new entity types, register the concrete type string here AND in `api/entity_factory.py` AND in `models/mixins/asynchronous_job.py` if it's an async job type.
53+
54+
### Options dataclass pattern
55+
The `operations/` layer uses dataclass option objects (`StoreFileOptions`, `FileOptions`, `TableOptions`, etc.) to bundle type-specific configuration for CRUD operations. Follow this pattern for new entity-type-specific options.
56+
57+
### Mixin composition for shared behavior
58+
Shared functionality lives in `models/mixins/` (AccessControllable, StorableContainer, AsynchronousJob, etc.). Prefer adding to existing mixins over duplicating logic across models.
59+
60+
### `synapse_client` parameter pattern
61+
Most functions accept an optional `synapse_client` parameter. If omitted, `Synapse.get_client()` returns the cached singleton. Never pass `None` explicitly — omit the argument instead.
62+
63+
### Branch naming
64+
Use `SYNPY-{issue_number}` or `synpy-{issue_number}` prefix for feature branches. PR titles follow `[SYNPY-XXXX] Description` format.
65+
66+
## Architecture
67+
68+
```
69+
synapseclient/
70+
├── client.py # Synapse class — public entry point, REST methods, auth (9600+ lines)
71+
├── api/ # REST API layer — one file per resource type (21 files)
72+
│ └── entity_factory.py # Polymorphic entity deserialization via concrete type dispatch
73+
├── models/ # Dataclass entities (Project, File, Table, etc.) (28 files)
74+
│ ├── protocols/ # Sync method type signatures for IDE hints (18 files)
75+
│ ├── mixins/ # Shared behavior (ACL, containers, async jobs, tables) (7 files)
76+
│ └── services/ # Model-level business logic (storable_entity, search)
77+
├── operations/ # High-level CRUD: get(), store(), delete() — factory dispatch
78+
├── core/ # Infrastructure: upload/download, retry, cache, creds, OTel
79+
│ ├── upload/ # Multipart upload (sync + async)
80+
│ ├── download/ # File download (sync + async)
81+
│ ├── credentials/ # Auth chain (PAT, env var, config file, AWS SSM)
82+
│ ├── constants/ # Concrete types, config keys, limits, method flags
83+
│ ├── models/ # ACL, Permission, DictObject, custom JSON serialization
84+
│ └── multithread_download/ # Threaded download manager
85+
├── extensions/
86+
│ └── curator/ # Schema curation (pandas, networkx, rdflib) — optional
87+
├── services/ # JSON schema validation services
88+
└── entity.py, table.py, ... # Legacy classes (pre-OOP rewrite, read-only)
89+
90+
synapseutils/ # Legacy bulk utilities (copy, sync, migrate, walk) — sync-only
91+
```
92+
93+
Data flow: User → `operations/` factory → model async methods → `api/` service functions → `client.py` REST calls → Synapse API. Responses deserialized via `fill_from_dict()` on model instances.
94+
95+
## Constraints
96+
97+
- Do not use Pydantic for models — the codebase uses stdlib dataclasses with custom serialization. Mixing would break the `@async_to_sync` decorator and `fill_from_dict()` pattern.
98+
- For new tests, prefer async test modules. Existing synchronous unit tests under `tests/unit/` are retained and maintained; the `@async_to_sync` decorator is covered by a dedicated smoke test, so avoid adding duplicate sync/async test coverage.
99+
- On non-Windows platforms, unit tests must not make external network calls — `pytest-socket` blocks internet-facing sockets while allowing Unix domain sockets. Socket blocking is skipped on Windows. Use `pytest-mock` for HTTP mocking.
100+
- `develop` is the default/main branch, not `main` or `master`. PRs target `develop`.
101+
- Legacy classes in root `synapseclient/` (entity.py, table.py, etc.) are kept for backwards compatibility. New features go in `models/` using the dataclass pattern.
102+
- Avoid adding new methods to `client.py` (9600+ lines) — prefer the `api/` + `models/` layered pattern.
103+
- `synapseutils/` is legacy sync-only (uses `requests`, NOT `httpx`). Do not add async methods there — new async equivalents go in `models/` or `operations/`.
104+
105+
## Testing
106+
107+
- `asyncio_mode = auto` in pytest.ini — no need for `@pytest.mark.asyncio`
108+
- `asyncio_default_fixture_loop_scope = session` — all async tests share one event loop
109+
- Unit test client fixture: session-scoped, `skip_checks=True`, `cache_client=False`
110+
- Integration tests use `--reruns 3` for flaky retries and `-n 8 --dist loadscope` for parallelism
111+
- Integration fixtures create per-worker Synapse projects; use `schedule_for_cleanup()` for teardown
112+
- Auth env vars: `SYNAPSE_AUTH_TOKEN` (bearer token), `SYNAPSE_PROFILE` (config file profile, default: `"default"`), `SYNAPSE_TOKEN_AWS_SSM_PARAMETER_NAME` (AWS SSM path)
113+
- CI runs integration tests only on Python 3.10 and 3.14 (oldest + newest) to limit Synapse server load
114+
115+
## Maintenance
116+
117+
Each CLAUDE.md file has a `<!-- Last reviewed: YYYY-MM -->` header. Update this when the file is reviewed or modified. If a code change invalidates guidance in a CLAUDE.md file, update the guidance in the same PR.

docs/CLAUDE.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
User-facing documentation for the Synapse Python Client. Built with MkDocs + Material theme, deployed via GitHub Pages. Follows the Diataxis documentation framework with four content types: tutorials, guides, reference, and explanations.
6+
7+
## Stack
8+
9+
MkDocs with Material theme, mkdocstrings (Google-style docstrings), termynal (CLI animations), markdown-include (file embedding).
10+
11+
### Python style
12+
- Use built-in generics (`list`, `dict`, `tuple`, `set`) instead of `typing.List`, `typing.Dict`, etc. (Python 3.9+)
13+
14+
## Conventions
15+
16+
### Content types (Diataxis framework)
17+
- **tutorials/** — Step-by-step learning (competence-building). Themed around a biomedical researcher working with Alzheimer's Disease data. Progressive build-up: Project → Folder → File → Annotations → etc.
18+
- **guides/** — How-to guides for specific use cases (problem-solution oriented). Includes extension-specific guides (curator).
19+
- **reference/** — API reference auto-generated from docstrings via mkdocstrings. Split into `experimental/sync/` and `experimental/async/` for new OOP API.
20+
- **explanations/** — Deep conceptual content ("why" not just "how"). Design decisions, internal machinery.
21+
22+
### File inclusion pattern (markdown-include)
23+
Tutorial code lives in `tutorials/python/tutorial_scripts/*.py` and is embedded in markdown via line-range includes:
24+
```markdown
25+
{!docs/tutorials/python/tutorial_scripts/annotation.py!lines=5-23}
26+
```
27+
Single source of truth — edit the `.py` file, not the markdown. Changing line numbers in scripts requires updating the line ranges in the corresponding `.md` files.
28+
29+
### mkdocstrings reference generation
30+
Reference markdown files use `::: synapseclient.ClassName` syntax to trigger auto-generation from docstrings. Key configuration:
31+
- `docstring_style: google` — parse Google-style docstrings
32+
- `members_order: source` — preserve source code order
33+
- `filters: ["!^_", "!to_synapse_request", "!fill_from_dict"]` — private members, `to_synapse_request()`, and `fill_from_dict()` are excluded from docs
34+
- `inherited_members: true` — shows mixin methods on inheriting classes
35+
- Member lists are explicit — each reference page specifies which methods to document
36+
37+
### Anchor links for cross-referencing
38+
Pattern: `[](){ #reference-anchor }` in reference pages. Tutorials link to reference via `[API Reference][project-reference-sync]`. Explicit type hints use: `[syn.login][synapseclient.Synapse.login]`.
39+
40+
### termynal CLI animations
41+
Terminal animation blocks marked with `<!-- termynal -->` HTML comment. Prompts configured as `$` or `>`. Used in authentication.md and installation docs.
42+
43+
### Custom CSS (`css/custom.css`)
44+
- API reference indentation: `doc-contents` has 25px left padding with border
45+
- Smaller table font (0.7rem) for API docs
46+
- Wide layout: `max-width: 1700px` for complex content
47+
48+
### Navigation structure
49+
Defined in `mkdocs.yml` nav section. 5 main sections: Home, Tutorials, How-To Guides, API Reference, Further Reading, News. API Reference has ~85 markdown files (~40 legacy, ~45 experimental).
50+
51+
## Constraints
52+
53+
- Do not edit tutorial code inline in markdown — edit the `.py` script file in `tutorial_scripts/` and update line ranges if needed.
54+
- Reference docs auto-generate from source docstrings — to change method documentation, edit the docstring in the Python source, not the markdown.
55+
- `mkdocs.yml` is at the repo root, not in `docs/` — it configures the entire doc build.
56+
- Docs deploy to Read the Docs (configured via `.readthedocs.yaml` at repo root).
57+
- Local build output goes to `docs_site/` (via `site_dir` in `mkdocs.yml`) — gitignored.
58+
- Cross-referencing uses the `autorefs` plugin: `[display text][synapseclient.ClassName.method]` auto-resolves to mkdocstrings anchors.
59+
60+
### news.md
61+
Release notes live in `docs/news.md`. Each release gets a heading with the version number and date, followed by bullet points describing changes. Group entries by category (Features, Bug Fixes, etc.). Reference Jira ticket numbers (SYNPY-XXXX) in each entry.

synapseclient/api/CLAUDE.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
REST API service layer — thin async functions that map to Synapse REST endpoints. One file per resource type. Called by model layer, never by end users directly.
6+
7+
## Conventions
8+
9+
### Function signature pattern
10+
```python
11+
async def verb_resource(
12+
required_param: str,
13+
optional_param: str = None,
14+
*,
15+
synapse_client: Optional["Synapse"] = None,
16+
) -> Dict[str, Any]:
17+
```
18+
- All functions are `async def`
19+
- `synapse_client` is **always** `Optional["Synapse"] = None` — never make it required. Callers omit it to use the cached singleton returned by `Synapse.get_client()`.
20+
- `synapse_client` is always the last parameter, keyword-only (after `*`)
21+
- Use `Synapse.get_client(synapse_client=synapse_client)` to get the client instance
22+
- Use `TYPE_CHECKING` guard for `Synapse` import — avoids circular dependencies between `api/` and `client.py`
23+
- Construct a `query_params` dictionary for non-null optional args, and pass it to the `params` arg of the REST call. See `entity_services.py` for the pattern.
24+
25+
### Docstring conventions
26+
Module-level — every file opens with boilerplate linking to the Synapse REST controller:
27+
```python
28+
"""This module is responsible for exposing the services defined at:
29+
<https://rest-docs.synapse.org/rest/#org.sagebionetworks.repo.web.controller.XController>
30+
"""
31+
```
32+
Function-level (Google style):
33+
```python
34+
"""
35+
One-line summary.
36+
37+
<https://rest-docs.synapse.org/rest/POST/endpoint.html>
38+
39+
Arguments:
40+
param: Description.
41+
synapse_client: If not passed in and caching was not disabled by
42+
`Synapse.allow_client_caching(False)` this will use the last created
43+
instance from the Synapse class constructor.
44+
45+
Returns:
46+
Description of return value.
47+
"""
48+
```
49+
- The `synapse_client` argument description is boilerplate — always copy it verbatim, not paraphrased.
50+
- The REST endpoint URL uses `<link>` format (angled brackets), not markdown `[text](url)`.
51+
- Parameter descriptions in `Arguments:` must be copied verbatim from the Synapse REST API docs for that endpoint — do not paraphrase or infer.
52+
53+
### REST call pattern
54+
```python
55+
client = Synapse.get_client(synapse_client=synapse_client)
56+
return await client.rest_post_async(uri="/endpoint", body=json.dumps(request))
57+
```
58+
Available methods: `rest_get_async`, `rest_post_async`, `rest_put_async`, `rest_delete_async`. Pass `endpoint=client.fileHandleEndpoint` for file handle operations; omit for the default repository endpoint. Use `json.dumps()` for request bodies — not raw dicts. Always assign the response to a named `response` variable before returning or extracting attributes from it.
59+
60+
### Return values
61+
- Most functions return raw `Dict[str, Any]` — transformation happens in the model layer via `fill_from_dict()`
62+
- Some return typed dataclass instances (e.g., `EntityHeader` from `entity_services.py`) when the data is only used internally
63+
- Delete operations return `None`
64+
65+
### Pagination
66+
Use async pagination helpers when the API endpoint returns a list of results. For single-object responses, a simple `return` is sufficient.
67+
68+
Helpers from `api_client.py`:
69+
- `rest_get_paginated_async()` — for GET endpoints with limit/offset. Expects `results` or `children` key in response.
70+
- `rest_post_paginated_async()` — for POST endpoints with `nextPageToken`. Expects `page` array in response.
71+
Both are async generators yielding individual items. Reference `entity_services.py`, `table_services.py`, or `evaluation_services.py` for pagination patterns.
72+
73+
### Entity factory (`entity_factory.py`)
74+
Polymorphic entity deserialization via concrete type dispatch. Maps Java class names from `core/constants/concrete_types.py` to model classes. When adding a new entity type, register the type mapping here.
75+
76+
### When to add a new service file vs. update an existing one
77+
Add a new file when the Synapse REST controller is different (each file maps to one controller). Update an existing file when adding endpoints under the same controller.
78+
79+
### Adding a new service file
80+
1. Create `synapseclient/api/new_service.py`
81+
2. Add all public functions to `api/__init__.py` imports and `__all__` — every public function must be re-exported
82+
3. Use `json.dumps()` for request bodies (not dict)
83+
4. Reference `entity_services.py` for CRUD pattern, `table_services.py` or `evaluation_services.py` for pagination pattern

0 commit comments

Comments
 (0)