refactor(datasets): use the shared scverse-misc dataset registry + downloader#1213
Draft
timtreis wants to merge 5 commits into
Draft
refactor(datasets): use the shared scverse-misc dataset registry + downloader#1213timtreis wants to merge 5 commits into
timtreis wants to merge 5 commits into
Conversation
Replace squidpy's internal pooch-based registry/downloader with the shared scverse_misc.datasets system (scverse-misc[datasets]): - _registry.py: build a scverse_misc DatasetRegistry from datasets.yaml, folding squidpy-specific shape/library_id into the generic metadata mapping. Drops squidpy's duplicated FileEntry/DatasetEntry/DatasetRegistry/DatasetType. - _downloader.py: register squidpy's domain loaders (image -> ImageContainer, visium_10x -> read.visium, spatialdata -> read_zarr) via register_loader and override the built-in anndata loader for the shape warning. The pooch download/verify/extract machinery now lives in scverse-misc. - _datasets.py: public API unchanged; type dispatch uses plain strings. - pyproject: drop direct pooch dep (now via scverse-misc[datasets]). Net ~750 lines deleted. Public API (sq.datasets.*) is unchanged. Depends on scverse/scverse-misc#40. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scverse-misc now ships a generic spatialdata loader, so squidpy no longer needs its own; it registers only its domain loaders (image, visium_10x) plus the anndata shape-warning override. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the redundant 'visium' prefix in visium() so downloads land in <datasetdir>/visium_10x/<sample>/ like every other type (was doubly nested under visium/visium_10x). Update the hires-image path assertion accordingly. Verified: all @internet datasets tests pass (8 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
registry.{anndata,image,spatialdata}_datasets were squidpy's old registry
properties, removed in the migration. Use dataset_names(type) instead. Visium
samples now cache to <datasetdir>/visium_10x/<sample>/ via the public API.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scverse-misc dropped its DatasetRegistry/Fetcher/FetchContext classes for a typed data model + functions. Adapt: - _registry: parse_registry() -> (base_url, dict[str, DatasetEntry]); get_registry() returns the dict, get_base_url() the base. shape/library_id/doc_header now live in entry.metadata. - _downloader: loaders are (entry, target, download, **kwargs); DatasetDownloader wraps fetch(); visium uses pooch.Untar instead of a manual tarfile loop. - _datasets/tests updated to the dict + metadata shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Demonstrates squidpy consuming the shared dataset infrastructure proposed in
scverse/scverse-misc#40, replacing squidpy's internal pooch-based registry/downloader.
What moves upstream vs. what stays
scverse-misc[datasets]datasets.yaml(unchanged)DatasetRegistry/DatasetEntry/FileEntrydatasets.yaml-> registry mapping (foldsshape/library_idintometadata)image->ImageContainer,visium_10x->read.visium,spatialdata->read_zarr,anndata(shape-warning override)Fetcher+ pluggableregister_loaderregistrysq.datasets.*(unchanged)Net effect
DatasetTypeenum -> free-formtypestrings dispatched viaregister_loader.poochdependency dropped (now transitive viascverse-misc[datasets]).sq.datasets.cells(),visium_hne_sdata(),visium(),the anndata/image loaders all keep the same signatures and behavior.
Validation
tests/datasets/{test_registry,test_downloader,test_dataset}.pyrewritten to the newstructure — 34 passed, 5 internet deselected locally.
sq.datasets.cells()->SpatialDatawith all elements (spatialdata loader)sq.datasets.imc()->AnnData (4668, 34)(anndata loader)Notes for reviewers
type(visium_10x,image) rather than the oldvisium/images. Internal-only, but the CI prefetch script (.scripts/ci/download_data.py)and any hard-coded cache paths should be updated in a follow-up.
🤖 Generated with Claude Code