Problem:
Oncodrivefml relies on pytabix, which is a deprecated and unmaintained package. This library is known to have significant issues, including incorrect indexing and position retrieval when querying genomic regions. These problems can lead to unreliable data handling and difficult-to-debug errors.
Proposed Solution:
I propose we replace all usage of pytabix with the pysam.TabixFile module. pysam is a well-maintained, robust library that provides a more reliable and efficient interface for handling tabix-indexed files.
Rationale:
- Stability:
pysam is actively developed and is the standard for handling SAM/BAM/VCF/BCF/tabix files in the Python bioinformatics ecosystem.
- Correctness: It resolves the known indexing and region querying bugs present in
pytabix.
- Precedent: This migration has been successfully implemented in other bioinformatics packages, such as
boostdm and intogen-core, demonstrating its viability and benefits.
This change will improve the long-term stability and correctness of our data processing pipelines.
[source]
Problem:
Oncodrivefml relies on
pytabix, which is a deprecated and unmaintained package. This library is known to have significant issues, including incorrect indexing and position retrieval when querying genomic regions. These problems can lead to unreliable data handling and difficult-to-debug errors.Proposed Solution:
I propose we replace all usage of
pytabixwith thepysam.TabixFilemodule.pysamis a well-maintained, robust library that provides a more reliable and efficient interface for handling tabix-indexed files.Rationale:
pysamis actively developed and is the standard for handling SAM/BAM/VCF/BCF/tabix files in the Python bioinformatics ecosystem.pytabix.boostdmandintogen-core, demonstrating its viability and benefits.This change will improve the long-term stability and correctness of our data processing pipelines.
[source]