Skip to content

added new sync_to_synapse method#1353

Open
andrewelamb wants to merge 8 commits intodevelopfrom
SYNPY-1800
Open

added new sync_to_synapse method#1353
andrewelamb wants to merge 8 commits intodevelopfrom
SYNPY-1800

Conversation

@andrewelamb
Copy link
Copy Markdown
Contributor

@andrewelamb andrewelamb commented Apr 6, 2026

Problem:

The syncToSynapse function doesn't fit the new OOP model paradigm.

Solution:

  • A new sync_to_synapse method was added to the StorableContainer class
  • syncToSynapse was deprecated

Testing:

  • Unit tests for the new helper functions
  • Integration tests for the new sync_to_synapse method

@andrewelamb andrewelamb requested a review from a team as a code owner April 6, 2026 15:17
@andrewelamb andrewelamb marked this pull request as draft April 6, 2026 15:17
@andrewelamb andrewelamb marked this pull request as ready for review April 6, 2026 16:43
@andrewelamb
Copy link
Copy Markdown
Contributor Author

Code review

Found 2 issues:

  1. "dataFileSizeBites" appears to be a typo for "dataFileSizeBytes" in NON_ANNOTATION_COLUMNS and docs. The integration test at test_dataset_async.py:44 uses "dataFileSizeBytes", confirming the correct spelling. With the typo, this column would not be recognized as a non-annotation column and would instead be treated as a user annotation during manifest upload, silently corrupting metadata.

"versionNumber",
"dataFileSizeBites",
"createdBy",

| versionNumber | version of the file |
| dataFileSizeBites | size of the file in bytes |
| createdBy | user who created the file |

  1. The new sync_to_synapse validates that the manifest contains a "parentId" column, but the existing generate_sync_manifest (and REQUIRED_FIELDS in sync.py) uses "parent" as the column name. A manifest generated by generate_sync_manifest and then passed to project.sync_to_synapse() will raise ValueError: Manifest must contain a 'parentId' column. The tutorial at upload_data_in_bulk.py demonstrates exactly this broken workflow.

# Validate required columns before attempting any column-dependent operations
for col in ("path", "parentId"):
if col not in df.columns:

REQUIRED_FIELDS = ["path", "parent"]
FILE_CONSTRUCTOR_FIELDS = ["name", "id", "synapseStore", "contentType"]

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant