Skip to content

Commit aa2c5d2

Browse files
committed
handle merge conflict
2 parents 6f049b6 + 5463edc commit aa2c5d2

7 files changed

Lines changed: 249 additions & 255 deletions

File tree

docs/guides/extensions/curator/metadata_curation.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ By following this guide, you will:
1818
- Python environment with synapseclient and the `curator` extension installed (ie. `pip install --upgrade "synapseclient[curator]"`)
1919
- An existing Synapse project and folder where you want to manage metadata
2020
- A JSON Schema registered in Synapse (many schemas are already available for Sage-affiliated projects, or you can register your own by following the [JSON Schema tutorial](../../../tutorials/python/json_schema.md))
21+
- If you are leveraging the [Curator CSV data model](../../../explanations/curator_data_model.md), you can create JSON schemas by following this [tutorial](../../extensions/curator/schema_operations.md)
2122
- (Optional) An existing Synapse team if you want multiple users to collaborate on the same Grid session. Pass the team's ID as `assignee_principal_id` when creating the curation task.
2223

2324
## Step 1: Authenticate and import required functions
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# How to Generate JSONschemas from Curator CSV data models
2+
3+
JSON Schema is a tool used to validate data. In Synapse, JSON Schemas can be used to validate the metadata applied to an entity such as project, file, folder, table, or view, including the [annotations](https://help.synapse.org/docs/Annotating-Data-With-Metadata.2667708522.html) applied to it. To learn more about JSON Schemas, check out [JSON-Schema.org](https://json-schema.org/).
4+
5+
Synapse supports a subset of features from [json-schema-draft-07](https://json-schema.org/draft-07). To see the list of features currently supported, see the [JSON Schema object definition](https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/schema/JsonSchema.html) from Synapse's REST API Documentation.
6+
7+
In this tutorial, you will learn how to create, register, and bind JSON Schemas using an existing data model.
8+
9+
## Tutorial Purpose
10+
11+
You will learn the complete JSON Schema workflow:
12+
13+
1. **Generate** JSON schemas from your data model
14+
2. **Register** schemas to a Synapse organization
15+
3. **Bind** schemas to Synapse entities for metadata validation
16+
17+
This tutorial uses the Python client as a library. To use the CLI tool, see the [command line documentation](../command_line_client.md).
18+
19+
## Prerequisites
20+
21+
* You have a working [installation](../../tutorials/installation.md) of the Synapse Python Client. You must install the Curator extensions package.
22+
* You have a data model, see this [data model_documentation](../../explanations/curator_data_model.md).
23+
24+
## 1. Initial set up
25+
26+
```python
27+
from synapseclient import Synapse
28+
from synapseclient.extensions.curator import (
29+
bind_jsonschema,
30+
generate_jsonschema,
31+
register_jsonschema,
32+
)
33+
34+
# Path or URL to your data model (CSV or JSONLD format)
35+
# Example: "path/to/my_data_model.csv" or "https://raw.githubusercontent.com/example.csv"
36+
DATA_MODEL_SOURCE = "tests/unit/synapseclient/extensions/schema_files/example.model.csv"
37+
# List of component names/data types to create schemas for, or None for all components/data types
38+
# Example: ["Patient", "Biospecimen"] or None
39+
DATA_TYPE = ["Patient"]
40+
# Directory where JSON Schema files will be saved
41+
OUTPUT_DIRECTORY = "temp"
42+
# Path to a generated JSON Schema file for registration
43+
SCHEMA_PATH = "temp/Patient.json"
44+
# Your Synapse organization name for schema registration
45+
ORGANIZATION_NAME = "my.organization"
46+
# Name for the schema
47+
SCHEMA_NAME = "patient.schema"
48+
# Version number for the schema
49+
SCHEMA_VERSION = "0.0.1"
50+
# Synapse entity ID to bind the schema to (file, folder, etc.)
51+
ENTITY_ID = "syn12345678"
52+
```
53+
54+
To create a JSON Schema you need a data model, and the data types you want to create.
55+
The data model must be in either CSV or JSON-LD form. The data model may be a local path or a URL.
56+
[Data model_documentation](../../explanations/curator_data_model.md).
57+
58+
The data types must exist in your data model. This can be a list of data types, or `None` to create all data types in the data model.
59+
60+
## 2. Create JSON Schemas
61+
62+
### Create multiple JSON Schema
63+
64+
The JSONschema looks like [this](https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/dpetest-test.schematic.Patient). By setting the `output` parameter as path to a "temp" directory, the file will be created as "temp/Patient.json". The `data_types` parameter is a list and can have multiple data types.
65+
66+
```python
67+
# Create JSON Schemas for multiple data types
68+
schemas, file_paths = generate_jsonschema(
69+
data_model_source=DATA_MODEL_SOURCE,
70+
output=OUTPUT_DIRECTORY,
71+
data_types=DATA_TYPE,
72+
synapse_client=syn,
73+
)
74+
```
75+
76+
### Create every JSON schema
77+
78+
If you don't set a `data_types` parameter a JSON Schema will be created for every data type in the data model.
79+
80+
```python
81+
# Create JSON Schemas for all data types
82+
schemas, file_paths = generate_jsonschema(
83+
data_model_source=DATA_MODEL_SOURCE,
84+
output=OUTPUT_DIRECTORY,
85+
synapse_client=syn,
86+
)
87+
```
88+
89+
### Create a JSON Schema with a certain path
90+
91+
If you have only one data type and set the `output` parameter to a file path(ending in.json), the JSON Schema file will have that path.
92+
93+
```python
94+
# Specify path for JSON Schema
95+
schemas, file_paths = generate_jsonschema(
96+
data_model_source=DATA_MODEL_SOURCE,
97+
data_types=DATA_TYPE,
98+
output="test.json",
99+
synapse_client=syn,
100+
)
101+
```
102+
103+
### Create a JSON Schema in the current working directory
104+
105+
If you don't set `output` parameter the JSON Schema file will be created in the current working directory.
106+
107+
```python
108+
schemas, file_paths = generate_jsonschema(
109+
data_model_source=DATA_MODEL_SOURCE,
110+
data_types=DATA_TYPE,
111+
synapse_client=syn,
112+
)
113+
```
114+
115+
116+
### Create a JSON Schema using display names
117+
118+
You can have Curator format the property names and valid values in the JSON Schema. This will remove whitespace and special characters.
119+
120+
```python
121+
# Create JSON Schema using display names for both properties names and valid values
122+
schemas, file_paths = generate_jsonschema(
123+
data_model_source=DATA_MODEL_SOURCE,
124+
data_types=DATA_TYPE,
125+
data_model_labels="display_label",
126+
synapse_client=syn,
127+
)
128+
```
129+
130+
131+
## 3. Register a JSON Schema to Synapse
132+
133+
Once you've created a JSON Schema file, you can register it to a Synapse organization.
134+
135+
The `register_jsonschema` function:
136+
- Takes a path to your generated JSON Schema file
137+
- Registers it with the specified organization in Synapse
138+
- Returns the schema URI and a success message
139+
- You can optionally specify a version (e.g., "0.0.1") or let it auto-generate
140+
141+
```python
142+
# Register a JSON Schema to Synapse
143+
json_schema = register_jsonschema(
144+
schema_path=SCHEMA_PATH,
145+
organization_name=ORGANIZATION_NAME,
146+
schema_name=SCHEMA_NAME,
147+
schema_version=SCHEMA_VERSION,
148+
synapse_client=syn,
149+
)
150+
print(f"Registered schema URI: {json_schema.uri}")
151+
```
152+
153+
154+
## 8. Bind a JSON Schema to a Synapse Entity
155+
156+
After registering a schema, you can bind it to Synapse entities (files, folders, etc.) for metadata validation.
157+
158+
The `bind_jsonschema` function:
159+
- Takes a Synapse entity ID (e.g., "syn12345678")
160+
- Binds the registered schema URI to that entity
161+
- Optionally enables derived annotations to auto-populate metadata
162+
- Returns binding details
163+
164+
```python
165+
# Bind a JSON Schema to a Synapse entity
166+
result = bind_jsonschema(
167+
entity_id=ENTITY_ID,
168+
json_schema_uri=json_schema.uri,
169+
enable_derived_annotations=True,
170+
synapse_client=syn,
171+
)
172+
print(f"Successfully bound schema to entity: {result}")
173+
```
174+
175+
## Reference
176+
- [JSON Schema Object Definition](https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/schema/JsonSchema.html)
177+
- [JSON Schema Draft 7](https://json-schema.org/draft-07)
178+
- [JSON-Schema.org](https://json-schema.org/)

docs/tutorials/python/schema_operations.md

Lines changed: 0 additions & 139 deletions
This file was deleted.

0 commit comments

Comments
 (0)