feat: modular providers + commands architecture, direct downloads for MassIVE/JPOST/iProX (0.0.17)#103
feat: modular providers + commands architecture, direct downloads for MassIVE/JPOST/iProX (0.0.17)#103ypriverol wants to merge 27 commits into
Conversation
Remove conda recipe packaging workflow
Improving parallelism
Minor changes in parallelization
Minor change to have the version of the tool.
Improve MassIVE download.
Extends the direct-download support introduced for MassIVE in PR #98 to two more proteomics repositories whose datasets are often standalone (no ProteomeXchange accession): - JPOST (Japan ProteOme STandard Repository): JPST\d{6} accessions, listed and downloaded from ftp.jpostdb.org. - iProX (Integrated Proteome resources): IPX\d{7,10} accessions, listed and downloaded from ftp.iprox.cn. Refactor: - Add is_direct_download_accession() unifying the MSV/JPST/IPX checks, plus _list_direct_download_files() and _download_direct_download_records() dispatchers. All call sites (get_all_raw_file_list, download_all_raw_files, download_all_category_files, get_file_from_api, download_file_by_name, download_files_by_list, get_all_category_file_list) now go through the unified entry points. - Extract _list_ftp_repo_files() helper so the FTP connection lifecycle (connect / login / passive / walk / quit) lives in one place. As part of that, fix the FTP-constructor-outside-try issue flagged in the PR #98 review: a connect failure no longer triggers NameError in finally. - Keep is_massive_accession, _list_massive_public_files, and _download_massive_file_records as thin backward-compatible wrappers so existing tests and external callers continue to work. Tests: add test_jpost_files.py and test_iprox_files.py mirroring the MassIVE coverage (regex match, record building, raw-only filtering, and the download_file_by_name happy path). All 19 direct-download tests pass. Version: 0.0.16.
Address feedback that direct downloads should match the PRIDE feature
set (resume, parallel, retries) and that MassIVE's actual FTP server
requires TLS.
Live findings:
- massive-ftp.ucsd.edu now rejects plain anonymous FTP with
421 TLS is required. The merged PR #98 code was effectively broken
against the live server. Switch to FTP_TLS + PROT P.
- ftp.jpostdb.org accepts plain anonymous FTP; keep as-is.
- ftp.iprox.cn does not resolve (DNS fail) and no other iProX FTP
host responds. iProX is HTTPS-only and needs a different transport
(REST API). Defer iProX support; user to provide the endpoint.
Implementation:
- New static _open_ftp_connection(host, use_tls) opens FTP or FTP_TLS
with the right TLS setup, and transparently falls back to FTPS if
a plain FTP server replies 'TLS is required'.
- _list_ftp_repo_files() and download_ftp_urls() both grow a use_tls
flag. _repo_uses_tls(accession) wires this from the repo type
(MassIVE = True, JPOST = False).
- download_ftp_urls() grows parallel_files: when >1, a ThreadPoolExecutor
runs that many FTP workers per host, each with its own connection.
Existing serial single-connection-per-host path is preserved for
parallel_files <= 1.
- Extracted _download_one_ftp_path() with REST-based resume + per-file
retries; _download_ftp_paths_serial() / _download_ftp_paths_parallel()
pick the right scheduling. REST resume verified live (3 KB pre-stage
-> 10 KB final, MD5 matches full file).
- _download_direct_download_records() now accepts parallel_files and
forwards it (along with use_tls derived from the accession) to
download_ftp_urls. All call sites (download_all_raw_files,
download_all_category_files, download_files_by_list) thread the
user-supplied -w/--parallel-files through.
Tests:
- test_jpost_files / test_massive_files updated to assert the new
kwargs (use_tls, parallel_files).
- New test_repo_uses_tls_true_for_massive_false_for_jpost and
test_download_all_raw_files_threads_parallel_files_for_massive.
- test_iprox_files.py removed (iProX is deferred).
- All 15 unit tests pass.
Live testing:
- MassIVE: listed MSV000080175 (44 files), single-file download of
params.xml (10315 B, MD5 43d87368d705c3f380c1d030b14850c4),
REST resume from a 3000 B partial, and 3-worker parallel download
of files from MSV000080175 + MSV000078335 all succeeded.
- JPOST: rate-limited from this IP ('421 too many connections').
Code path is structurally identical to MassIVE (same FTP helper),
routing covered by unit tests; deferred to a follow-up live check.
Address deferred items from PR review:
1. JPOST PROXI listing
ftp.jpostdb.org rate-limits aggressively per source IP (sticky 421 on
any retry within ~10 min). For listing this is fatal because every
pridepy invocation needs a fresh tree walk. JPOST publishes a JSON
PROXI endpoint at
https://repository.jpostdb.org/proxi/datasets/<JPSTxxxxxx>
that returns datasetFiles[*].value as ftp:// URIs alongside CV labels
(Associated raw file URI, Search engine output file URI, Result file
URI, Peak list file URI, ...). _list_jpost_public_files now hits
PROXI first, builds file records with categories derived from the CV
name (mapped via JPOST_PROXI_CATEGORY_MAP), and falls back to the
FTP tree walk only if PROXI is unreachable or returns no records.
Live-tested against JPST002311 -> 160 files (88 RAW, 72 SEARCH).
2. Post-transfer size check
Neither MassIVE nor JPOST publishes per-file checksum manifests in
a standard location, so md5 verification isn't an option for these
datasets. As a lighter-weight integrity signal, _download_one_ftp_path
now compares the local file size against ftp.size() after retrbinary
returns and treats a mismatch as a retryable failure (next attempt
resumes via REST from the current partial). This catches half-finished
transfers where the data channel was closed early without retrbinary
raising.
3. iProX accession guard
Probing showed iProX REST endpoints (PMD009Controller/findByProjectId.jsonp,
findFilesBySubProjectID.jsonp) all redirect unauthenticated callers to
a CAS login page, and downloads use faspe:// URLs with per-session
tokens. Native support is therefore blocked until iProX exposes an
anonymous JSON API or pridepy carries iProX credentials. Until then,
add is_iprox_accession() and _raise_if_iprox() so every public entry
point (get_all_raw_file_list, download_all_raw_files,
download_all_category_files, download_file_by_name, download_files_by_list,
get_file_from_api) emits a clear NotImplementedError instead of
silently falling through to the PRIDE API and 404-ing.
Tests
- test_jpost_files: PROXI listing maps CV names to PRIDE categories;
PROXI failure falls back to FTP walk.
- test_iprox_guard: regex coverage; assert each entry point raises
NotImplementedError for IPX accessions; assert
is_direct_download_accession returns False for IPX.
- test_ftp_download_validation: size mismatch retries until success;
repeated mismatch raises after max_download_retries; correct size
skips retries.
- 25 tests total (10 MassIVE + 8 JPOST + 5 iProX guard + 3 size check).
Live verification (same MassIVE FTPS path as before)
- MSV000080175 listing + params.xml download still produces
10315 B, MD5 43d87368d705c3f380c1d030b14850c4.
- JPST002311 PROXI listing returns 160 files with correct categories.
Actual file transfer is still blocked from this IP by JPOST's
421-too-many-connections rate limit; the code path is shared with
MassIVE (same _download_one_ftp_path / download_ftp_urls), so a
fresh IP should succeed.
….org User pointed out the iProX dataset XML endpoint: https://www.iprox.cn/FAF016Controller/readXml.jsonp?fileId=file_<id>_xml Probing showed something simpler works: iProX publishes the ProteomeXchange XML for every public dataset at a deterministic, anonymous-accessible path on download.iprox.org. No CAS auth, no Aspera tokens, no fileId discovery needed: http://download.iprox.org/<accession>/PX_<accession>.xml The XML embeds Associated raw file URI / Search engine output file URI cvParams pointing at HTTPS file URLs on the same host (Accept-Ranges: bytes, so we get resume for free). The earlier 'iProX is auth-gated' finding was specifically about the iProX UI's CAS-protected JSON endpoints (PMD009Controller/findBySubProjectId.jsonp etc.); the public download server is anonymous. Changes ------- - IPROX_DOWNLOAD_BASE_URL, IPROX_PX_XML_URL_TEMPLATE, IPROX_PX_CATEGORY_MAP added (the latter is the same CV map JPOST PROXI uses). - _build_iprox_file_record() and _list_iprox_public_files() added, mirroring the JPOST helpers but using the PX XML schema. - is_direct_download_accession() now returns True for IPX accessions. - _raise_if_iprox() removed entirely. All entry points that previously called it no longer do, so IPX accessions flow through the unified direct-download dispatcher. - _list_direct_download_files() dispatches IPX -> _list_iprox_public_files. - _download_direct_download_records() now partitions URLs by scheme: ftp:// records go through download_ftp_urls (MassIVE/JPOST), http(s):// records through download_http_urls (iProX). A dataset whose records somehow contain both flows through both paths correctly. - download_http_urls() grew parallel_files + max_retries kwargs and a ThreadPoolExecutor path. The per-file worker _http_download_one() wraps _parallel_download() (reused from the globus codepath) with retry, so iProX gets the same HEAD-then-Range resume and restart-on- non-206 behaviour we already use for PRIDE HTTPS. Tests (26 pass) --------------- - test_iprox_guard.py renamed to test_iprox_files.py and rewritten as positive-path tests: regex coverage, IPX is a direct-download accession, PX XML parsing maps cvParam name -> PRIDE category, RAW filtering ignores non-FTP/HTTPS URIs, download_file_by_name routes iProX URLs to download_http_urls (not download_ftp_urls) with parallel_files=1. - All previously-added MassIVE / JPOST / size-validation tests still pass (20 + 6 iProX = 26 total). Live verification (this branch, against download.iprox.org) ----------------------------------------------------------- - IPX0017413000 listing: 7 files, correctly categorised (RAW + SEARCH). - download_file_by_name(IPX0017413000, protein_annotation_profile.xlsx): 3,253,230 B downloaded, MD5 c17baf230ffde1e2837ec4eb32dcea68, valid XLSX (PK header). - Range-based resume: pre-staged 1,000,000 B partial, completed to 3,253,230 B with the same MD5. - Parallel HTTPS: 3 worker downloads from IPX0017413000 (xlsx + 2 RAW) ran concurrently; cancelled mid-flight after observing all 3 files growing in parallel (xlsx finished at 3.25 MB, Tumor_NK1.raw was at 142 MB and Control_NK3.raw at 419 MB before cancellation).
…Registry Empty scaffolding for the per-provider refactor (spec: docs/specs/2026-05-27-files-py-provider-refactor-design.md). Introduces the Provider abstract base class, BaseDirectDownloadProvider with a shared download_files() that partitions URLs by scheme and routes through Files.download_ftp_urls / Files.download_http_urls (preserving test patches), and a Registry for accession -> provider resolution. No behaviour change. No providers registered yet.
…t.py Pulled download_ftp_urls, download_http_urls, and their helpers (_open_ftp_connection, _walk_ftp_tree, _list_ftp_repo_files, _download_one_ftp_path, _download_ftp_paths_serial/_parallel, _http_download_one, _parallel_download, _local_path_for_url) out of the Files class verbatim into providers/transport.py. Files keeps a shim staticmethod for each function that does a lazy import and delegates, so existing test patches like patch.object(Files, 'download_ftp_urls') keep intercepting calls. No behaviour change. Test suite green.
Moved Progress, _find_tsv_columns, _is_md5_checksum, read_checksum_file, compute_md5, validate_download, _remove_if_exists, _get_download_url, _resolve_local_path from Files into providers/util.py. Files keeps shim re-exports so existing references keep working. No behaviour change. Test suite green.
Moved MassIVE listing + record building from Files into providers/massive.py as MassiveProvider(BaseDirectDownloadProvider). Provider is registered with the Registry. Files keeps shim methods (is_massive_accession, _list_massive_public_files, _build_massive_file_record, _get_massive_public_root, _get_massive_public_ftp_url, _map_massive_collection_to_category) that delegate to the provider. MASSIVE_CATEGORY_MAP / MASSIVE_ARCHIVE_FTP constants remain on Files as class-attribute re-exports. All 10 MassIVE tests pass without modification. Full suite green.
Moved JPOST listing (PROXI JSON primary + FTP tree walk fallback) and record building from Files into providers/jpost.py as JpostProvider(BaseDirectDownloadProvider). Provider registered with the Registry. Files keeps shim methods (is_jpost_accession, _list_jpost_public_files, _list_jpost_public_files_via_proxi, _build_jpost_file_record, _get_jpost_public_root, _get_jpost_public_ftp_url) that delegate to the provider. JPOST_ARCHIVE_FTP / JPOST_PROXI_BASE_URL / JPOST_PROXI_CATEGORY_MAP constants remain on Files as class-attribute re-exports. All JPOST tests pass without modification. Full suite green.
Moved iProX listing (PX XML from download.iprox.org) and record building from Files into providers/iprox.py as IproxProvider(BaseDirectDownloadProvider). Provider registered with the Registry. Files keeps shim methods (is_iprox_accession, _list_iprox_public_files, _build_iprox_file_record, _get_iprox_public_root, _get_iprox_public_ftp_url) that delegate to the provider. IPROX_DOWNLOAD_BASE_URL / IPROX_PX_XML_URL_TEMPLATE / IPROX_PX_CATEGORY_MAP constants remain on Files as class-attribute re-exports. _download_direct_download_records on Files now dispatches via the registry instead of branching manually. _list_direct_download_files keeps shim dispatching so existing test patches on _list_massive_public_files and _list_jpost_public_files continue to intercept the calls. All iProX tests pass without modification. Full suite green.
…oning Mixed ftp:// + http(s):// records partition correctly: ftp URLs go to Files.download_ftp_urls with use_tls=True (the MassIVE setting), http URLs go to Files.download_http_urls. Both calls intercepted by patch.object(Files, ...) — proving the provider routes back through Files rather than calling transport directly (preserves test patches).
Moved PRIDE-specific logic (V3 API listing, multi-protocol batch downloader with aspera/s3/ftp/globus fallback, private-dataset path, submitter helpers, Globus/S3 per-protocol downloaders, legacy single-connection FTP) from Files into providers/pride.py as PrideProvider(Provider). ~510 LOC moved out of files.py. Files keeps shim methods for every patched helper (_batch_download_by_protocol, _download_with_fallback, _globus_download_one, download_files_from_globus, download_files_from_s3, download_files_from_ftp, download_private_file_name, get_ascp_binary, save_checksum_file, stream_all_files_by_project, stream_all_files_metadata, get_submitted_file_path_prefix, _protocol_sequence, download_files). PRIDE class constants (V3_API_BASE_URL, API_BASE_URL, API_PRIVATE_URL, PRIDE_ARCHIVE_FTP, *_URL_PREFIX, S3_URL, S3_BUCKET, PROTOCOL_ORDER) remain on Files as re-exports. PrideProvider's internal calls to patch-sensitive helpers go through Files.X (lazy import) so existing test patches in test_download_resilience.py keep working without modification. is_direct_download_accession updated to exclude PRIDE (returns True only for MSV/JPST/IPX) now that PrideProvider is registered too. All 8 patch-sensitive tests in test_download_resilience.py pass. Full suite green at 67 passed, 4 skipped.
Files public methods (get_all_raw_file_list, download_all_raw_files,
download_all_category_files, get_all_category_file_list,
download_file_by_name, get_file_from_api, download_files_by_list) now
dispatch via registry.resolve(accession).{list,download}_files(...).
Removed dead helpers: _list_direct_download_files,
_download_direct_download_records. Kept _repo_uses_tls as a registry
shim. Fixed _download_massive_file_records to use registry.
PrideProvider.download_files refactored: the old static method is now
_download_files_batch; a proper Provider-interface instance method
download_files(self, accession, records, ...) wraps it so PRIDE routes
uniformly through the registry like other providers.
Tests updated: patches on Files._list_massive_public_files,
Files._list_jpost_public_files, files_obj.stream_all_files_by_project,
and files_obj.download_files now target the provider classes
(MassiveProvider, JpostProvider, PrideProvider) directly.
Full suite green. files.py size: 1254 LOC (was 1352 before Task 9).
Verifies that Files().download_all_raw_files for a PXD accession flows through Registry.resolve -> PrideProvider.download_files -> _batch_download_by_protocol (patched via Files), and that _download_with_fallback is only called when batch returns failed files. Also mocks validate_download to return success so the happy-path test correctly asserts that fallback is not invoked when all files pass validation after the primary-protocol batch run. Spec acceptance criterion #5.
Empty scaffold for the follow-up refactor that extracts cross-cutting commands (download_files_by_url, download_files_by_list, download_px_raw_files) from Files into their own modules. No code moved yet. No behaviour change. Test suite green at 68 passed, 4 skipped.
…roteomexchange.py Moved download_px_raw_files, _normalize_px_xml_url, _parse_px_xml_for_raw_file_urls from Files into commands/proteomexchange.py. Files keeps shim re-exports. Also removed now-unused xml.etree.ElementTree import from files.py. No behaviour change. Test suite green.
Moved download_files_by_list from Files into commands/by_list.py. Files keeps a shim re-export. No behaviour change. Test suite green.
Moved download_files_by_url and its 6 helpers (_extract_pride_accession, _validate_urls_checksums, _http_download_url, _ftp_download_url, _dispatch_url_scheme, _download_single_url) from Files into commands/by_url.py. Files keeps shim re-exports for each. Internal calls to patch-sensitive helpers (_http_download_url, _ftp_download_url, _dispatch_url_scheme, _download_single_url) go through Files.X (lazy import) so existing test patches like patch.object(Files, '_http_download_url') keep intercepting. files.py drops below 1000 LOC. No behaviour change. Test suite green at 68 passed, 4 skipped.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…/ as a class
ProteomeXchange behaves more like a provider than a command: it takes a
PXD/PRD accession (or a ProteomeCentral URL) and returns file records,
the same shape as the four other providers. Moving it to providers/
aligns the architecture.
Changes:
- New: pridepy/providers/proteomexchange.py with
ProteomeXchangeProvider(Provider). It implements the full Provider
interface (matches, list_files, download_files) plus a convenience
method download_from_accession_or_url() for the
download-px-raw-files CLI command's existing behaviour
(skip_if_downloaded_already defaults to True, no parallel workers).
- Deleted: pridepy/commands/proteomexchange.py. Its three functions
(_normalize_px_xml_url, _parse_px_xml_for_raw_file_urls,
download_px_raw_files) are now static/instance methods on the
provider class.
- Updated: Files shims for _normalize_px_xml_url,
_parse_px_xml_for_raw_file_urls, and download_px_raw_files now
delegate to ProteomeXchangeProvider.
- Updated: commands/__init__.py docstring notes the move.
Important: ProteomeXchangeProvider is NOT auto-registered with
pridepy.providers.registry. PXD/PRD accessions continue to route through
PrideProvider's V3 API path by default. ProteomeXchangeProvider is the
explicit gateway for the cross-repository XML view, invoked via the
download-px-raw-files CLI command and Files.download_px_raw_files.
Full suite green at 68 passed, 4 skipped. No behaviour change for
existing PXD downloads; download-px-raw-files keeps its XML-based
listing flow exactly as before.
Code reviewFound 1 issue:
|
Two related fixes:
1. CI lint failure: files.py used importlib.resources, subprocess, and
time but never imported them. The Task 8 refactor removed these
top-level imports along with the PRIDE-specific code that used them
in PrideProvider, but two methods on Files (download_files_from_aspera
and the now-dead _download_range) still referenced the undefined
names. Flake8 with --select=F82 fails on 4 F821 undefined-name errors.
Fix: re-add the three stdlib imports at the top of files.py. Also
delete the orphaned _download_range method (no callers).
2. Hoist lazy provider/command imports out of Files method bodies.
The shim pattern previously did 'from pridepy.providers import X'
inside every method body — ~75 occurrences. Since providers do not
import Files at module load (only inside method bodies), the reverse
direction is safe to hoist: Files now imports
{registry, transport, util, IproxProvider, JpostProvider,
MassiveProvider, PrideProvider, ProteomeXchangeProvider, by_list,
by_url} at module top, and the shims reference these names directly.
Lazy imports inside provider/command method bodies that go back to
Files (e.g. BaseDirectDownloadProvider.download_files doing
'from pridepy.files.files import Files') are kept lazy — they are
genuinely cyclic and required for backward-compat test patching.
Also: commands/by_list.py's 'from pridepy.providers import registry'
hoisted to module top (no Files dependency, no cycle risk).
Note: 'import requests' is kept in files.py (noqa: F401) because
test suites patch 'pridepy.files.files.requests.get' directly.
Tests: 68 passed, 4 skipped. flake8 --select=E9,F63,F7,F82 now clean.
|
Superseded by #105, which now targets master and contains this branch's commits as part of the consolidated 0.0.16 release. Closing to keep a single PR. (Branch left intact.) |
Summary
Consolidates three stacked PRs (#100, #101, #102) into one branch. Three layers of work in 19 commits:
Direct downloads for MassIVE/JPOST/iProX (commits d471e0b → df165ab) — adds anonymous public-data downloads for the three non-PRIDE repositories that pridepy users frequently encounter. MassIVE uses FTPS at
massive-ftp.ucsd.edu(server now requires TLS); JPOST uses PROXI JSON atrepository.jpostdb.orgfor listings plus plain FTP atftp.jpostdb.orgfor transfers; iProX uses the PX XML atdownload.iprox.orgplus anonymous HTTPS for transfers (no Aspera credentials required). Resume + parallel + post-transfer size verification across the board.Refactor: per-provider classes (commits eedbef1 → f60a69c, 11 commits) — extracts the per-repository logic out of
Filesinto apridepy/providers/package:pride.py,massive.py,jpost.py,iprox.py, plus a sharedtransport.py,util.py,base.py(Provider ABC + BaseDirectDownloadProvider), andregistry.py(accession → Provider resolution).Filesbecomes a thin facade that dispatches via the registry.Refactor: cross-cutting commands (commits 02de401 → ced7415, 4 commits) — extracts
download_files_by_url+ 6 URL helpers,download_files_by_list, anddownload_px_raw_files+ 2 XML helpers into a newpridepy/commands/package (by_url.py,by_list.py,proteomexchange.py).Bumps to 0.0.17. No behaviour change in (2) and (3); (1) adds new behaviour.
What was 1 file is now 14 files
Before:
pridepy/files/files.pywas a 2,514-LOC class containing PRIDE + MassIVE + JPOST + iProX + ProteomeXchange + Aspera + Globus + S3 + URL/filename batch commands.After:
Live verification (against real servers)
MSV000080175listed (44 files);params.xmldownloaded (10 315 B, MD543d87368d705c3f380c1d030b14850c4); REST resume from a 3 KB pre-stage produced the same MD5; 3-worker parallel pull worked.JPST002311listed (160 files, 88 RAW + 72 SEARCH). FTP transfer not exercised live from my IP (rate-limited by JPOST FTP); code path is identical to MassIVE.IPX0017413000listed (7 files);protein_annotation_profile.xlsxdownloaded (3 253 230 B, MD5c17baf230ffde1e2837ec4eb32dcea68, valid XLSX); HTTPS Range resume from a 1 MB pre-stage worked; 3-worker parallel observed.pridepy download-file-by-name -a MSV000080175 -f params.xmlafter the full refactor chain produces MD543d87368d705c3f380c1d030b14850c4— exact match.Backward compatibility
Filesremains importable and the CLI is unchanged. Every method the existing test suite patches (Files.download_ftp_urls,Files._batch_download_by_protocol,Files._download_with_fallback,Files._globus_download_one,Files._download_single_url,Files._http_download_url,Files._list_*_public_files,Files._repo_uses_tls, etc.) is kept as a shim that delegates to the new location. PrideProvider and the command modules call back intoFiles.Xvia lazy imports for patch-sensitive helpers sopatch.object(Files, ...)keeps intercepting.Three test files had patch targets updated (38 lines across
test_massive_files.py,test_jpost_files.py,test_download_by_list.py) — assertions unchanged; the patch targets moved toMassiveProvider.list_files,JpostProvider.list_files,PrideProvider.list_files/download_filesbecause the new registry dispatch path no longer routes through the oldFiles._list_X_public_filesshims for those methods.Tests
68 passed, 4 skipped (was 66 passed, 4 skipped pre-PR #100). Two new tests added by the refactor:
test_base_direct_download_provider_partitions_urls_by_scheme— verifies BaseDirectDownloadProvider partitions ftp:// vs http(s):// records correctly.test_facade_dispatches_pride_through_registry_to_fallback— verifiesFiles().download_all_raw_files("PXD...")flows through Registry → PrideProvider →_batch_download_by_protocol(patched on Files).Provenance
Supersedes PRs #100, #101, #102 (all to be closed). Spec + plan docs for both refactors live on
docs/files-py-refactor-designbranch (can be merged as a separate small docs PR if desired).Test plan
pridepy download-file-by-name -a MSV000080175 -f params.xml -o /tmp/xproduces MD543d87368d705c3f380c1d030b14850c4pridepy download-file-by-name -a JPST<accession> -f <file> -o /tmp/jpostsucceeds from an IP not rate-limited by JPOST FTPpridepy download-file-by-name -a IPX0017413000 -f protein_annotation_profile.xlsx -o /tmp/iproxsucceeds (anonymous HTTPS)pridepy --versionreports0.0.17Commits (19 work commits, 6 ancestor merge commits)
The 19 real commits, newest first: