This guide walks you through programmatic metadata curation in Synapse, from setting up curation tasks to validating data and managing grid sessions.
- How to find schemas and create curation tasks
- How to manage grid sessions: import CSV data, download data, and synchronize changes
- How to check validation results before committing (pre-commit validation via WebSocket)
- How to check validation results after committing (export-based validation)
- How to manage curation task lifecycle (list, update, delete with cleanup)
- Python environment with
pip install --upgrade "synapseclient[curator]" - A Synapse account with project creation permissions
- A JSON Schema registered in Synapse (see JSON Schema tutorial or Schema Operations guide)
{!docs/guides/extensions/curator/scripts/setup_and_create_tasks.py!lines=7-15}The schema registry contains validated JSON schemas organized by data coordination center (DCC) and data type.
{!docs/guides/extensions/curator/scripts/setup_and_create_tasks.py!lines=17-24}To browse all available versions of a schema:
{!docs/guides/extensions/curator/scripts/setup_and_create_tasks.py!lines=26-31}A curation task guides collaborators through metadata entry. There are two types:
Use this when metadata is stored as tabular records, like a spreadsheet of sample annotations.
{!docs/guides/extensions/curator/scripts/setup_and_create_tasks.py!lines=33-51}This creates a RecordSet, a CurationTask, and an initial Grid session for collaborative editing.
Use this when metadata describes individual files in a folder.
{!docs/guides/extensions/curator/scripts/setup_and_create_tasks.py!lines=53-63}Grid sessions are the core editing interface for curation. You can create them, import CSV data, download data, check validation, and synchronize changes.
{!docs/guides/extensions/curator/scripts/grid_session_operations.py!lines=7-20}Upload CSV data into an active grid session. You can provide a local file path, a pandas DataFrame, or an existing file handle ID. The CSV must match the grid's column schema.
{!docs/guides/extensions/curator/scripts/grid_session_operations.py!lines=22-53}Export the current grid state to a local CSV file. The downloaded CSV does not include validation columns.
{!docs/guides/extensions/curator/scripts/grid_session_operations.py!lines=55-57}Apply grid session changes back to the source entity (table, view, or RecordSet).
{!docs/guides/extensions/curator/scripts/grid_session_operations.py!lines=59-67}{!docs/guides/extensions/curator/scripts/grid_session_operations.py!lines=69-78}There are two ways to check whether metadata passes schema validation:
Get per-row validation results from an active grid session without committing changes. This connects via WebSocket, reads the current grid state, and returns validation data.
{!docs/guides/extensions/curator/scripts/precommit_validation.py!lines=7-29}When to use: You want to check validation before committing changes. This is useful for automated pipelines that import data, validate, and only commit if validation passes.
Note: If you call get_snapshot() immediately after importing CSV data, some rows may show validation_status = "pending" while the backend processes validation. Wait briefly and retry if needed.
Export the grid session back to the RecordSet. This commits changes and generates detailed validation results.
{!docs/guides/extensions/curator/scripts/postcommit_validation.py!lines=7-34}When to use: You want committed validation results with full detail. The RecordSet's get_detailed_validation_results() returns a pandas DataFrame with row-level error messages.
{!docs/guides/extensions/curator/scripts/manage_tasks.py!lines=7-18}{!docs/guides/extensions/curator/scripts/manage_tasks.py!lines=20-22}{!docs/guides/extensions/curator/scripts/manage_tasks.py!lines=24-29}When delete_file_view=True, the task's associated EntityView is also deleted. This only applies to file-based metadata tasks. Record-based tasks do not have an EntityView.
For file-based workflows, you can validate annotations on files within a folder:
{!docs/guides/extensions/curator/scripts/validate_folder.py!lines=7-22}This example demonstrates the full workflow for power users who work entirely through the Python client without the grid UI:
{!docs/guides/extensions/curator/scripts/full_csv_workflow.py!lines=7-62}- [create_record_based_metadata_task][synapseclient.extensions.curator.create_record_based_metadata_task]
- [create_file_based_metadata_task][synapseclient.extensions.curator.create_file_based_metadata_task]
- [query_schema_registry][synapseclient.extensions.curator.query_schema_registry]
- [Grid.create][synapseclient.models.grid.Grid.create]
- [Grid.import_csv][synapseclient.models.grid.Grid.import_csv]
- [Grid.download_csv][synapseclient.models.grid.Grid.download_csv]
- [Grid.synchronize][synapseclient.models.grid.Grid.synchronize]
- [Grid.export_to_record_set][synapseclient.models.grid.Grid.export_to_record_set]
- [Grid.get_snapshot][synapseclient.models.grid.Grid.get_snapshot]
- [Grid.get_validation][synapseclient.models.grid.Grid.get_validation]
- [Grid.delete][synapseclient.models.grid.Grid.delete]
- [Grid.list][synapseclient.models.grid.Grid.list]
- [CurationTask.store][synapseclient.models.CurationTask.store]
- [CurationTask.get][synapseclient.models.CurationTask.get]
- [CurationTask.delete][synapseclient.models.CurationTask.delete]
- [CurationTask.list][synapseclient.models.CurationTask.list]
- [RecordSet.get_detailed_validation_results][synapseclient.models.RecordSet.get_detailed_validation_results]
- [Folder.get_schema_validation_statistics][synapseclient.models.Folder.get_schema_validation_statistics]
- [Folder.get_invalid_validation][synapseclient.models.Folder.get_invalid_validation]
- Schema Operations - Generate and register JSON schemas
- JSON Schema Tutorial - Learn JSON schema basics
- Curator Data Model - CSV data model format