Skip to content

x1han/drVizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drVizer

drVizer is a Python library for building transcript-structure figures from GTF models, BED annotations, and BAM coverage tracks.

It is designed for direct RNA sequencing and transcriptomics workflows that need reusable, scriptable, publication-ready matplotlib figures.

Features

  • Build transcript structure figures from one or more GTF files.
  • Overlay BED annotation tracks as interval blocks or numeric score bars.
  • Add BAM-derived coverage tracks with sum or mean aggregation.
  • Reuse parsed state across many plotting calls through a builder-style API.
  • Project transcript-coordinate BED and BAM inputs back into genomic space.
  • Split transcript-coordinate tracks into transcript-specific subtracks.
  • Share automatic y-axis scaling across matched numeric tracks with y_axis_group.
  • Export matplotlib figures for publication workflows.

Installation

Editable install for local development:

git clone https://github.com/x1han/drVizer.git
cd drVizer
pip install -e .

Regular local install:

pip install .

BAM coverage tracks require pysam:

pip install pysam

Quick start

The public workflow is centered on DrViz:

from drvizer import DrViz

parser = (
    DrViz()
    .load_gtf("genes.gtf")
    .add_bed_track("repeats.bed", label="TE")
    .add_bam_track("reads.bam", label="Coverage")
    .build()
)

fig = parser.plot("TP53", show=False)

build() freezes the configured GTF and tracks into a reusable parser, so plotting many genes does not repeat setup work.

Core API

load_gtf(...)

Load one or more GTF files. GTF parsing is required before tracks can be built.

.load_gtf("genes.gtf")
.load_gtf(["reference.gtf", "novel_transcripts.gtf"])
Parameter Type Default Description
gtf_files str or list[str] required Path to one GTF file or ordered list of GTF files. Exon/CDS features are parsed for transcript rendering. Loading a new GTF resets previously added tracks because tracks depend on the active GTF model.

drVizer uses exon/CDS features and supports gene ID, gene name, or transcript ID lookup during plotting.

add_bed_track(...)

Add BED-backed annotation tracks.

.add_bed_track("repeats.bed", label="TE", color="tomato")
Parameter Type Default Description
bed_files str or list[str] required BED file path or multiple BED files grouped into one logical track. BED3 and wider BED records are supported.
label str or None auto Track_N Track label shown on the left side of the figure. Duplicate labels are made unique in registration order.
color str or list[str] "orange" Matplotlib color for the track. When multiple BED files are passed, a list gives per-file colors.
alpha float or list[float] 0.8 Track transparency from 0 to 1. When multiple BED files are passed, a list gives per-file alpha values.
parser_type "distribution" or "score" "distribution" Rendering mode. distribution draws interval blocks; score draws numeric BED scores as bars.
y_axis_range float or None None Fixed y-axis maximum for score tracks. Takes precedence over automatic scaling and y_axis_group.
y_axis_group str or None None Shared automatic y-axis scaling group for numeric BED score tracks. Invalid for distribution tracks.
transcript_coord bool False Treat BED chrom field as transcript ID and project transcript coordinates back to genomic coordinates through the loaded GTF.
layer_order None, "ascending", or "descending" "ascending" Controls drawing order for layered BED elements.
split_by_transcript None, "nc", or "cn" None Split transcript-coordinate BED data into transcript-specific subtracks. Requires transcript_coord=True.

Score tracks can share automatic y-axis scaling:

.add_bed_track("control_m6a.bed", label="Control m6A", parser_type="score", y_axis_group="m6A")
.add_bed_track("treated_m6a.bed", label="Treated m6A", parser_type="score", y_axis_group="m6A")

add_bam_track(...)

Add BAM-backed coverage tracks.

.add_bam_track("reads.bam", label="Coverage", color="steelblue")
Parameter Type Default Description
bam_files str or list[str] required BAM file path or multiple BAM files grouped into one logical coverage track. BAM support requires pysam.
label str "Coverage" Track label shown on the left side of the figure. Duplicate labels are made unique in registration order.
color str or list[str] "steelblue" Matplotlib color for coverage. When multiple BAM files are rendered as per-file series, a list gives per-file colors.
alpha float or list[float] 0.6 Coverage transparency from 0 to 1. When multiple BAM files are rendered as per-file series, a list gives per-file alpha values.
aggregate_method "sum" or "mean" "sum" Combines multiple BAM files by summed coverage or average coverage.
y_axis_range float or None None Fixed coverage y-axis maximum. Takes precedence over automatic scaling and y_axis_group.
y_axis_group str or None None Shared automatic y-axis scaling group for numeric coverage tracks.
transcript_coord bool False Treat BAM reference names as transcript IDs and project coverage back to genomic coordinates through the loaded GTF.
layer_order None, "ascending", or "descending" "ascending" Controls drawing order for per-file coverage series where individual series are rendered.
split_by_transcript None, "nc", or "cn" None Split transcript-coordinate BAM coverage into transcript-specific subtracks. Requires transcript_coord=True.

Multiple BAM files can be combined:

.add_bam_track(
    ["sample_a.bam", "sample_b.bam"],
    label="Reads",
    aggregate_method="mean",
)

Supported aggregation modes:

  • aggregate_method="sum": sum coverage across BAM files.
  • aggregate_method="mean": average coverage across BAM files.

build() and plot(...)

Use build() when plotting multiple genes from the same inputs:

parser = (
    DrViz()
    .load_gtf("genes.gtf")
    .add_bed_track("repeats.bed", label="TE")
    .add_bam_track("reads.bam", label="Coverage")
    .build()
)

fig1 = parser.plot("TP53", show=False)
fig2 = parser.plot("MYC", show=False)

Use one-shot plot(...) for quick figures:

fig = (
    DrViz()
    .load_gtf("genes.gtf")
    .add_bed_track("repeats.bed", label="TE")
    .plot("TP53", show=False)
)

ReusableParser.plot(...) accepts these parameters:

Parameter Type Default Description
gene str or list[str] required Gene ID, gene name, transcript ID, or same-chromosome list of identifiers to plot.
transcript_to_show str, list[str], or None None Restrict output to one transcript or selected transcripts from the requested gene.
output str or None None Optional output path. When set, figure is saved with matplotlib using tight bounding box and 300 DPI.
figsize tuple or None None Explicit final figure size in inches. Overrides automatically computed size.
figfact tuple or None None Multiplicative width/height factor applied to automatically computed figure size. Ignored when figsize is set.
show bool True Display figure through matplotlib. If False, figure is closed after creation but still returned.
close bool False Close figure after showing it. Applies when show=True.
**kwargs any Forwarded to the visualizer, including transcript sorting and layout options.

Transcript-coordinate workflows

Set transcript_coord=True when BED or BAM records use transcript IDs instead of genomic chromosome names.

.add_bed_track("mods.transcript.bed", transcript_coord=True)
.add_bam_track("reads.transcript.bam", transcript_coord=True)

drVizer projects transcript-coordinate intervals and coverage back into genomic plotting space through the loaded GTF model.

Split transcript tracks

Transcript-coordinate BED and BAM tracks can be split by transcript with split_by_transcript:

split_by_transcript="nc"  # transcript-major order
split_by_transcript="cn"  # track-major order

Supported modes:

  • None: keep combined track behavior.
  • "nc": transcript-major ordering; each transcript groups its split tracks together.
  • "cn": track-major ordering; each track groups its transcript-specific subtracks together.

Split transcript tracks require transcript_coord=True and do not support multi-gene plotting.

Numeric y-axis control

Numeric tracks support two y-axis controls:

  • y_axis_range: manually fix the y-axis maximum.
  • y_axis_group: share automatic y-axis scaling across numeric tracks with the same group name.

y_axis_range takes precedence over y_axis_group.

parser = (
    DrViz()
    .load_gtf("genes.gtf")
    .add_bam_track("control.bam", label="Control", y_axis_group="reads")
    .add_bam_track("treated.bam", label="Treated", y_axis_group="reads")
    .add_bed_track("m6a.bed", label="m6A", parser_type="score", y_axis_group="mod_score")
    .build()
)

Output behavior

drVizer renders matplotlib figures and returns the generated Figure object from plot(...).

fig = parser.plot("TP53", show=False, output="tp53.pdf")
fig.set_size_inches((10, 6))
fig.savefig("tp53.png", dpi=300, bbox_inches="tight")

Documentation

Detailed docs live in docs/:

Testing

Use the project DRS environment for validation:

/datf/hanxi/software/miniconda3/envs/DRS/bin/python -m pytest -q

Public entry point

from drvizer import DrViz

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors