RNAseq differential expression analysis pipeline using the airway dataset as a worked example.
The analysis (rnaseq_analysis.qmd) covers the following steps:
| Step | Tool | Output |
|---|---|---|
| Differential expression | DESeq2 | deseq2_results.csv |
| Heatmap (top 50 variable genes) | ComplexHeatmap | inline figure |
| Volcano plot | EnhancedVolcano | inline figure |
| Gene-set enrichment (GSEA) | clusterProfiler GSEA() |
gsea_results.csv, inline figure |
| Over-representation analysis (ORA) | clusterProfiler enricher() |
ora_results.csv, inline figure |
Gene sets for both GSEA and ORA are the CP:KEGG_MEDICUS collection sourced from MSigDB via the msigdbr package.
Install the required packages from Bioconductor and CRAN:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"airway",
"DESeq2",
"ComplexHeatmap",
"EnhancedVolcano",
"clusterProfiler",
"org.Hs.eg.db",
"AnnotationDbi"
))
install.packages(c("msigdbr", "dplyr", "tibble", "circlize"))ashr (used for LFC shrinkage) is also required:
install.packages("ashr")
Alternatively, use the provided Dockerfile to build a container, changing the information between the <> characters as needed:
docker buildx build --platform linux/amd64,linux/arm64 --push -t <dockerhub_username>/<container_name>:<container_tag> .
Or Singularity pull to use on Oscar with Open On Demand RStudio on Singularity (https://docs.ccv.brown.edu/oscar/connecting-to-oscar/open-ondemand):
singularity pull chordoma_rna_example.sif docker://jwalla12/chordoma_rna_example:quarto
Open rnaseq_analysis.qmd in RStudio (or another Quarto-aware editor) to run the
analysis interactively chunk by chunk, or render it into a self-contained HTML report:
quarto::quarto_render("rnaseq_analysis.qmd")or from a terminal:
quarto render rnaseq_analysis.qmdAll CSV output files are written to the working directory. Figures are embedded directly in the rendered HTML document.
The airway
package provides RNA-Seq read counts from an experiment on airway smooth muscle cells
treated with dexamethasone (Himes et al., 2014, PLoS ONE). The contrast of interest is
treated (trt) vs untreated (untrt) while accounting for cell-line (cell) as a
covariate.
- Design formula:
~ cell + dex - Pre-filtering: genes with fewer than 10 total counts removed
- LFC shrinkage with the
ashrmethod
- Top 50 most variable genes selected after variance-stabilising transformation (VST)
- Rows Z-score normalised; columns annotated by treatment and cell line
- X-axis: shrunken log2 fold-change
- Y-axis: adjusted p-value
- Thresholds: |LFC| > 1, padj < 0.05
- Gene sets: CP:KEGG_MEDICUS (MSigDB C2 category, Homo sapiens)
- GSEA: genes ranked by DESeq2 Wald statistic;
GSEA()witheps = 0 - ORA: significant DE genes (padj < 0.05, |LFC| > 1) tested against all genes
with a valid symbol as the background universe;
enricher()used