Query, filter, and retrieve proteomics dataset metadata from ProteomeXchange.
pxseek replaces the original Selenium-based web scraper with a clean, API-driven approach using the ProteomeCentral bulk TSV and per-dataset XML endpoints. No browser or ChromeDriver required.
pxseek has three core commands.
fetchdownloads the clean summary table.filternarrows that table by metadata.lookupfetches richer XML-derived metadata for a shortlist.
Requires Python 3.12-3.14.
pip install pxseekOr with uv:
uv tool install pxseekFor development setup and source checkout, see the Installation guide.
The shortest useful workflow is:
uv run pxseek fetch -o px_datasets.tsv
uv run pxseek filter -i px_datasets.tsv -s "Homo sapiens" -k "cancer" -o shortlist.tsv
uv run pxseek lookup --input shortlist.tsv -o detailed.tsvOne rule matters most. filter expects the cleaned artifact written by pxseek fetch, not the raw ProteomeCentral export.
If you want machine-friendly outputs, use --format json or -o - and keep the rest of the workflow the same. The detailed format and pipeline behavior live in the docs.
pxseek is CLI-first, but it exposes a small stable workflow API for code that should not shell out to the CLI.
from pxseek import fetch_datasets, filter_datasets, lookup_datasets
summary = fetch_datasets().df
filtered, _ = filter_datasets(summary, species="Homo sapiens", keywords="cancer")
details = lookup_datasets(filtered["dataset_id"]).dfThe supported root imports are fetch_datasets(), filter_datasets(), lookup_datasets(), read_artifact(), render_artifact(), and write_artifact().
More detailed documentation and examples live in the GitHub wiki.
The local development workflow matches CI.
uv sync --extra dev
uv run --extra dev pytest
uv run --extra dev ruff check src/ tests/
uv run --extra dev ruff format --check src/ tests/
uv buildThe original single-file Selenium scraper is preserved in legacy/proteomeXchange_scraper.py for reference.
If you use pxseek in your work, please cite it:
@software{pxseek2026,
title = {pxseek: Query, filter, and retrieve proteomics dataset metadata from ProteomeXchange},
author = {Enes K. Ergin and Kimia Rostin and Philipp F. Lange},
year = {2026},
url = {https://github.com/LangeLab/pxseek},
version = {0.5.1},
}A CITATION.cff file is also available in the repository root.
MIT License. See LICENSE for details.