Skip to content

Commit 3aafa95

Browse files
authored
Merge pull request #7 from MPUSP/dev
feat: added pixi env and R circos plot example
2 parents 1505a9a + 4a9b1f1 commit 3aafa95

10 files changed

Lines changed: 5951 additions & 0 deletions

File tree

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# SCM syntax highlighting & preventing 3-way merges
2+
pixi.lock merge=binary linguist-language=YAML linguist-generated=true -diff

.github/workflows/test.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: Test
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
7+
jobs:
8+
test-notebooks:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- uses: actions/checkout@v6
12+
13+
- uses: prefix-dev/setup-pixi@v0
14+
with:
15+
# add separate env for each test
16+
environments: >-
17+
circlize
18+
cache: true
19+
20+
- name: test-circlize
21+
run: pixi run -e circlize test-circlize

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,6 @@
66
*.Rproj
77
output/*
88
*.tar.gz
9+
# pixi environments
10+
.pixi/*
11+
!.pixi/config.toml

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ This repository is a collection of reusable, self-contained code chunks and exam
2525
- [Plot sequence logos with `logomaker` (python)](https://MPUSP.github.io/bioinfo-code-chunks/plot_logos.html)
2626
- [Plot coverage tracks (R)](https://MPUSP.github.io/bioinfo-code-chunks/plot_coverage.nb.html)
2727
- [Plot Circos genomes with `pycircos` (python)](https://MPUSP.github.io/bioinfo-code-chunks/plot_circos.html)
28+
- [Plot Circos genomes with `circlize` (R)](https://MPUSP.github.io/bioinfo-code-chunks/plot_circos.nb.html)
2829
- [Homology search for protein sequences (python)](https://MPUSP.github.io/bioinfo-code-chunks/homology_search.html)
2930
- [ENA fastq data submission (python)](https://MPUSP.github.io/bioinfo-code-chunks/ena_submission.html)
3031

docs/plot_circos.nb.html

Lines changed: 2056 additions & 0 deletions
Large diffs are not rendered by default.

output/circlize.png

617 KB
Loading

pipeline/plot_circos.Rmd

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
title: "Plot Circos plots with R `circlize`"
3+
author: Michael Jahn
4+
date: "`r format(Sys.time(), '%d %B, %Y')`"
5+
output:
6+
html_notebook:
7+
theme: cosmo
8+
toc: no
9+
number_sections: no
10+
html_document:
11+
toc: no
12+
df_print: paged
13+
---
14+
15+
```{r setup, include=FALSE}
16+
knitr::opts_chunk$set(echo = TRUE)
17+
```
18+
19+
## Background
20+
21+
- `circlize` is a powerful R package to plot circular visualizations, so called 'Circos' plots
22+
- Circos plots are a great way to visualize genomic data in a compact and informative way
23+
- typically, they consist of a circular layout with different tracks representing various genomic features, such as annotated genes, GC content and GC skew, and overlaid coverage or interaction data
24+
25+
## Libraries and test data
26+
27+
### Packages
28+
29+
- `circlize` can be installed from within R
30+
- other packages used in this tutorial are `tidyverse`, `GenomicFeatures`, `GenomicRanges`, and `rtracklayer`
31+
32+
```{r, eval = FALSE}
33+
install.packages("circlize")
34+
```
35+
36+
- you can also use conda/mamba, or the pixi to install dependencies in a dedicated environment:
37+
38+
```{bash, eval = FALSE}
39+
pixi init
40+
pixi add r-circlize
41+
...
42+
```
43+
44+
- to render this notebook automatically with the enclosed pixi env, run:
45+
46+
```{bash, eval = FALSE}
47+
pixi run test-notebook
48+
```
49+
50+
- to start an interactive shell with the environment, run:
51+
52+
```{bash, eval = FALSE}
53+
pixi shell --environment circlize
54+
```
55+
56+
- load required libraries
57+
58+
```{r}
59+
suppressPackageStartupMessages({
60+
library(tidyverse)
61+
library(circlize)
62+
library(Biostrings)
63+
library(GenomicRanges)
64+
library(GenomicFeatures)
65+
library(rtracklayer)
66+
})
67+
```
68+
69+
### Import utility functions
70+
71+
- `validate_genomic_input` takes as input two data frames, one with genomic coordinates and one with chromosome information, and checks if coordinates correspond
72+
- `plot_circlize` takes as input two objects, a DNA sequence as `DNAStringSet` and a `GRangesList` with genomic features
73+
- from this data it will automatically plot a circular (genome) map with standard features and tracks
74+
- additional features or data can be plotted as additional tracks, see examples below
75+
76+
```{r}
77+
source("../source/circlize.R")
78+
```
79+
80+
### Import genome annotation
81+
82+
- we import a `*.fasta` and a `*.gff` file corresponding to the same genome assembly
83+
- we truncate the genome seqname(s) such that GFF and FASTA match
84+
85+
```{r}
86+
fasta <- Biostrings::readDNAStringSet("../data/spyogenes_genome.fna")
87+
gff <- rtracklayer::import("../data/spyogenes_genome.gff")
88+
89+
names(fasta) <- stringr::str_split_i(names(fasta), "[ \\|]", 1)
90+
```
91+
92+
### Check annotation data
93+
94+
- the plotting function contains an internal function to validate the genomic coordinates
95+
- however we can also check this up front and make corrections if necessary
96+
97+
```{r}
98+
# genome info
99+
df_chroms <- data.frame(
100+
name = names(fasta),
101+
start = rep(0, length(fasta)),
102+
end = width(fasta)
103+
)
104+
105+
# gene annotation
106+
genes <- gff[gff$type == "gene"]
107+
df_genes <- tibble(
108+
chr = as.character(seqnames(genes)),
109+
start = start(genes),
110+
end = end(genes)
111+
)
112+
113+
# validate if genomic coordinates from annotation and chromosome info correspond
114+
df_genes <- validate_genomic_input(df_genes, df_chroms)
115+
```
116+
117+
- we can also prepare extra data tracks that we supply as a named list including the desired settings
118+
119+
```{r}
120+
extra <- list(
121+
experiment = list(
122+
data = data.frame(
123+
chr = "NC_002737.2",
124+
start = df_genes$start[seq(1, nrow(df_genes), by = 10)],
125+
end = df_genes$end[seq(1, nrow(df_genes), by = 10)],
126+
value = rnorm(ceiling(nrow(df_genes) / 10), mean = 10, sd = 5)
127+
),
128+
type = "points",
129+
color = "#96389f",
130+
height = 0.07,
131+
ylim = c(0, 20)
132+
)
133+
)
134+
135+
extra[["experiment2"]] <- list(
136+
data = data.frame(
137+
chr = "NC_002737.2",
138+
start = df_genes$start[seq(1, nrow(df_genes), by = 10)],
139+
end = df_genes$end[seq(1, nrow(df_genes), by = 10)],
140+
value = rep(1, ceiling(nrow(df_genes) / 10))
141+
),
142+
type = "rect",
143+
color = sample(colors(), ceiling(nrow(df_genes) / 10))
144+
)
145+
```
146+
147+
### Plot Circos plot and save to disk
148+
149+
- use PNG to not get extremely large figures as can happen with vector graphics like PDF or SVG
150+
- plotting can take a while as there is a lot of information
151+
152+
```{r, message = FALSE, warning = FALSE, results = "hide"}
153+
png("../output/circlize.png", width = 2000, height = 2000, res = 300)
154+
plot_circlize(fasta, gff, extra = extra)
155+
dev.off()
156+
```
157+
158+
```{r, echo = FALSE}
159+
# display PNG file here
160+
knitr::include_graphics("../output/circlize.png")
161+
```

0 commit comments

Comments
 (0)