Conversation
Resolves conflict in pridepy/files/files.py: drop the now-obsolete GLOBUS_BASE_URL constant (renamed to PRIDE_ARCHIVE_HTTPS_URL_PREFIX on dev in 7958d9a), and keep the MASSIVE_ARCHIVE_FTP* constants alongside.
Previously download_files_by_list always called stream_all_files_by_project (PRIDE-only), which broke filename-based downloads for MassIVE accessions. Branch on is_massive_accession to list via _list_massive_public_files and download via _download_massive_file_records. Also clarify --parallel-files help text: it applies to all protocols, not only globus.
Add direct download support for MassIVE accessions
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
📝 WalkthroughWalkthroughpridepy extends file download and listing workflows to support MassIVE dataset accessions (MSV/rmsv format) via anonymous FTP enumeration. Core entry points now detect and branch on MassIVE accessions, reusing new FTP enumeration and file-record building helpers. FTP path extraction logic is also refactored to use URL parsing. ChangesMassIVE Dataset Download Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
129-129:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the Quick Start section numbering.
This should be
### 7)instead of### 6)to maintain sequential numbering after the new MassIVE examples were inserted above.📝 Proposed fix
-### 6) Download a named subset of files (manifest) +### 7) Download a named subset of files (manifest)Also update line 151:
-### 7) Download files from raw URLs +### 8) Download files from raw URLs🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 129, The Quick Start heading "### 6) Download a named subset of files (manifest)" is misnumbered after inserting MassIVE examples; update the heading text in README.md to "### 7) Download a named subset of files (manifest)" and also adjust the subsequent Quick Start numbering (the later heading referenced in the comment) so all section numbers remain sequential; locate and update the literal headings (e.g., the "### 6) Download a named subset of files (manifest)" token and the later Quick Start heading token) to the corrected numeric values.
🧹 Nitpick comments (1)
pridepy/files/files.py (1)
300-301: 💤 Low valueAdd debug logging when falling back from mlsd to LIST.
Silent exception handling here hides why the fallback path is taken. Adding a debug log would help troubleshoot FTP server compatibility issues.
💡 Proposed improvement
except (AttributeError, ftplib.error_perm): - pass + logging.debug(f"mlsd not supported for {remote_dir}, falling back to LIST")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pridepy/files/files.py` around lines 300 - 301, The except block catching (AttributeError, ftplib.error_perm) that falls back from using MLSD to LIST should log a debug message instead of silently passing; update the exception handler where MLSD -> LIST fallback occurs to call the module/logger debug method (e.g., logger.debug or processLogger.debug) and include context like "falling back from MLSD to LIST" plus the exception details and the FTP server address or path if available; keep behavior the same otherwise so the fallback continues after logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@README.md`:
- Line 129: The Quick Start heading "### 6) Download a named subset of files
(manifest)" is misnumbered after inserting MassIVE examples; update the heading
text in README.md to "### 7) Download a named subset of files (manifest)" and
also adjust the subsequent Quick Start numbering (the later heading referenced
in the comment) so all section numbers remain sequential; locate and update the
literal headings (e.g., the "### 6) Download a named subset of files (manifest)"
token and the later Quick Start heading token) to the corrected numeric values.
---
Nitpick comments:
In `@pridepy/files/files.py`:
- Around line 300-301: The except block catching (AttributeError,
ftplib.error_perm) that falls back from using MLSD to LIST should log a debug
message instead of silently passing; update the exception handler where MLSD ->
LIST fallback occurs to call the module/logger debug method (e.g., logger.debug
or processLogger.debug) and include context like "falling back from MLSD to
LIST" plus the exception details and the FTP server address or path if
available; keep behavior the same otherwise so the fallback continues after
logging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90f23e9a-b3a8-4362-8f79-67667310ea1e
📒 Files selected for processing (4)
README.mdpridepy/files/files.pypridepy/pridepy.pypridepy/tests/test_massive_files.py
This pull request adds support for downloading public MassIVE datasets directly using their
MSV...accessions, alongside existing PRIDE dataset functionality. The changes include both user-facing command-line improvements and substantial updates to the core file handling logic to enable MassIVE dataset discovery, file listing, and downloading via anonymous FTP. Documentation is also updated to reflect the new MassIVE capabilities.MassIVE dataset support:
Filesto enable this functionality. [1] [2]get_all_raw_file_list,download_all_raw_files,download_file_by_name,download_files_by_list,download_all_category_files,get_all_category_file_list) to support MassIVE accessions and route them through the new MassIVE-specific logic. [1] [2] [3] [4] [5] [6] [7] [8] [9]Command-line interface and documentation:
README.md. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]File download and FTP handling improvements:
General code improvements:
These changes collectively allow users to seamlessly download public datasets from MassIVE in addition to PRIDE, using the same CLI commands and Python API.