Skip to content

Commit 10c9b66

Browse files
committed
Updated vignette
1 parent 94c9455 commit 10c9b66

5 files changed

Lines changed: 115 additions & 114 deletions

File tree

vignettes/bulkAnalyseR.Rmd

Lines changed: 115 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,53 @@ knitr::opts_chunk$set(
1616
)
1717
```
1818

19-
Bulk mRNA-seq experiments are essential for exploring a wide range of biological questions. To bring closer the data analysis to its interpretation and facilitate both interactive, exploratory tasks and the sharing of (easily accessible) information, we present bulkAnalyseR an R package that offers a seamless, customisable solution for most bulk RNAseq datasets. By integrating state-of-the-art approaches without relying on extensive computational support, and replacing static images with interactive panels, our aim is to further support and strengthen the reusability of data. bulkAnalyseR enables standard analyses of bulk data, using an expression matrix as starting point. It presents the outputs of various steps in an interactive web-based interface, making it easy to generate, explore and verify hypotheses. Moreover, the app can be easily shared and published, incentivising research reproducibility and allowing others to explore the same processed data and enhance the biological conclusions.
19+
Bulk mRNA-seq experiments are essential for exploring a wide range of biological questions. To bring the data analysis closer to its interpretation and facilitate both interactive, exploratory tasks and the sharing of (easily accessible) information, we present *bulkAnalyseR* an R package that offers a seamless, customisable solution for most bulk RNAseq datasets. By integrating state-of-the-art approaches without relying on extensive computational support, and replacing static images with interactive panels, our aim is to further support and strengthen the reusability of data. bulkAnalyseR enables standard analyses of bulk data, using an expression matrix as starting point. It presents the outputs of various steps in an interactive web-based interface, making it easy to generate, explore and verify hypotheses. Moreover, the app can be easily shared and published, incentivising research reproducibility and allowing others to explore the same processed data and enhance the biological conclusions.
2020

2121
```{r workflow, echo = FALSE, out.width = "80%"}
2222
knitr::include_graphics("figures/workflow.png")
2323
```
2424

2525
## Installation
2626

27-
To install the package, first install all bioconductor dependencies:
27+
To install the package, first install all CRAN dependencies:
28+
29+
```{r cran_install, eval=FALSE}
30+
packages.cran <- c("ggplot2",
31+
"shiny",
32+
"shinythemes",
33+
"gprofiler2",
34+
"stats",
35+
"ggrepel",
36+
"utils",
37+
"RColorBrewer",
38+
"circlize",
39+
"shinyWidgets",
40+
"shinyjqui",
41+
"dplyr",
42+
"magrittr",
43+
"ggforce",
44+
"rlang",
45+
"glue",
46+
"matrixStats",
47+
"noisyr",
48+
"tibble",
49+
"ggnewscale",
50+
"ggrastr",
51+
"visNetwork")
52+
new.packages.cran <- packages.cran[!(packages.cran %in% installed.packages()[, "Package"])]
53+
if(length(new.packages.cran))
54+
install.packages(new.packages.cran)
55+
```
56+
57+
Then install bioconductor dependencies:
2858

2959
```{r bioc_install, eval = FALSE}
30-
packages.bioc <- c("edgeR",
60+
packages.bioc <- c("edgeR",
3161
"DESeq2",
3262
"preprocessCore",
3363
"AnnotationDbi",
3464
"GENIE3",
3565
"ComplexHeatmap")
36-
)
3766
3867
new.packages.bioc <- packages.bioc[!(packages.bioc %in% installed.packages()[,"Package"])]
3968
if(length(new.packages.bioc)){
@@ -43,40 +72,9 @@ if(length(new.packages.bioc)){
4372
}
4473
```
4574

46-
Then, you can install *bulkAnalyseR* (and all its other dependencies) from CRAN:
47-
48-
```{r cran_install, eval=FALSE}
49-
install.packages("bulkAnalyseR")
50-
```
51-
52-
To install the latest stable version from GitHub, first install CRAN dependencies:
75+
Finally, you can install the latest stable version of *bulkAnalyseR* from GitHub:
5376

5477
```{r github_install, eval = FALSE}
55-
packages.cran <- c("ggplot2",
56-
"shiny",
57-
"shinythemes",
58-
"gprofiler2",
59-
"stats",
60-
"ggrepel",
61-
"utils",
62-
"RColorBrewer",
63-
"circlize",
64-
"shinyWidgets",
65-
"shinyjqui",
66-
"dplyr",
67-
"magrittr",
68-
"ggforce",
69-
"rlang",
70-
"glue",
71-
"matrixStats",
72-
"noisyr",
73-
"tibble",
74-
"ggnewscale",
75-
"ggrastr",
76-
"visNetwork")
77-
new.packages.cran <- packages.cran[!(packages.cran %in% installed.packages()[,"Package"])]
78-
if(length(new.packages.cran))
79-
install.packages(new.packages.cran)
8078
if (!requireNamespace("devtools", quietly = TRUE))
8179
install.packages("devtools")
8280
@@ -93,10 +91,10 @@ library(bulkAnalyseR)
9391

9492
### Loading an expression matrix
9593

96-
For this demonstration we will be using a subset of the count matrix for an experiment included in [a 2019 paper by Yang et al](https://www.sciencedirect.com/science/article/pii/S2405471219301152). Rows represent genes/features and columns represent samples:
94+
For this vignette we are using a subset of the count matrix for an experiment included in [a 2019 paper by Yang et al](https://www.sciencedirect.com/science/article/pii/S2405471219301152). Rows represent genes/features and columns represent samples:
9795

9896
```{r read}
99-
counts.in <- system.file("extdata", "counts_raw.csv", package = "bulkAnalyseR")
97+
counts.in <- system.file("extdata", "expression_matrix.csv", package = "bulkAnalyseR")
10098
exp <- as.matrix(read.csv(counts.in, row.names = 1))
10199
head(exp)
102100
```
@@ -112,9 +110,9 @@ meta <- data.frame(
112110
)
113111
```
114112

115-
```{r convert type, include=FALSE}
116-
meta$srr=as.character(meta$srr)
117-
meta$timepoint=as.character(meta$timepoint)
113+
```{r convert type, include = FALSE, eval = FALSE}
114+
meta$srr = as.character(meta$srr)
115+
meta$timepoint = as.character(meta$timepoint)
118116
```
119117

120118
This metadata table should be a data frame containing at minimum two columns: the first column must contain the column names of the expression.matrix, while the last column is assumed to contain the experimental conditions that will be tested for differential expression.
@@ -124,45 +122,46 @@ This metadata table should be a data frame containing at minimum two columns: th
124122
Before using the expression matrix to create our shiny app, some preprocessing should be performed. *bulkAnalyseR* contains the function **preprocessExpressionMatrix** which takes the expression matrix as input then denoises the data using [*noisyR*](https://github.com/Core-Bioinformatics/noisyR) and normalises using either quantile (by default) or RPM normalisation (specified using *normalisation.method* parameter). By specifying *output.plot = TRUE*, you can also print the expression-similarity line plots from *noisyR* to console and you can specify further parameters from the noisyR [*noisyr_counts*](https://core-bioinformatics.github.io/noisyR/reference/noisyr_counts.html).
125123

126124
```{r preprocess,fig.width=7, fig.height=5}
127-
exp.proc <- preprocessExpressionMatrix(exp,
128-
output.plot = TRUE)
125+
exp.proc <- preprocessExpressionMatrix(exp, output.plot = TRUE)
129126
```
130127

131128
It is not recommended to use data which has not been denoised and normalised as input to *generateShinyApp*. You can also perform your own preprocessing outside *preprocessExpressionMatrix*.
132129

133130
## Creating a shiny app
134131

135-
The central function in *bulkAnalyseR* is **generateShinyApp**. This function creates an app.R file and all required objects to run the app in .rda format in the target directory. The key inputs to **generateShinyApp** are *expression.matrix* (after being processed using *preprocessExpressionMatrix*) and *meta*. You can also specify the title of the app (which will appear in the navigation bar at the top of the app) with *app.title*, the directory where the app should be saved with *shiny.dir* and the shiny theme you wish to use ('flatly' is the default, you can find the other options [here](https://rstudio.github.io/shinythemes/)). You also need to specify the organism on which your data was generated, firstly using the *organism* parameter using the *gprofiler2* naming convention e.g. 'hsapiens','mmusculus' (see [here](https://biit.cs.ut.ee/gprofiler/page/organism-list) for the full list of organisms and IDs), and secondly specifying the database for annotations to convert ENSEMBL IDs to gene names e.g. org.Hs.eg.db - the full list of bioconductor packaged databases can be seen using this command:
132+
The central function in *bulkAnalyseR* is **generateShinyApp**. This function creates an app.R file and all required objects to run the app in .rda format in the target directory. The key inputs to **generateShinyApp** are *expression.matrix* (after being processed using *preprocessExpressionMatrix*) and *metadata*. You can also specify the title of the app (which will appear in the navigation bar at the top of the app) with *app.title*, the directory where the app should be saved with *shiny.dir* and the shiny theme you wish to use ('flatly' is the default, you can find the other options [here](https://rstudio.github.io/shinythemes/)). You also need to specify the organism on which your data was generated, firstly using the *organism* parameter using the *gprofiler2* naming convention e.g. 'hsapiens','mmusculus' (see [here](https://biit.cs.ut.ee/gprofiler/page/organism-list) for the full list of organisms and IDs), and secondly specifying the database for annotations to convert ENSEMBL IDs to gene names e.g. org.Hs.eg.db - the full list of bioconductor packaged databases can be seen using this command:
136133

137134
```{r bioconductor dbs}
138135
BiocManager::available("^org\\.")
139136
```
140137

141-
The dataset in this example was generated on *M. musculus* so we would generate the app using this function call:
138+
The dataset in this example was generated on *M. musculus* so we would generate the app using this function call (note that the org.Mm.eg.db needs to be installed):
142139

143140
```{r generate app, eval=FALSE}
144-
generateShinyApp(expression.matrix = exp.proc,
145-
metadata = meta,
146-
shiny.dir = "shiny_Yang2019",
147-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
148-
organism = "mmusculus",
149-
org.db = "org.Mm.eg.db"
150-
)
141+
generateShinyApp(
142+
expression.matrix = exp.proc,
143+
metadata = meta,
144+
shiny.dir = "shiny_Yang2019",
145+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
146+
organism = "mmusculus",
147+
org.db = "org.Mm.eg.db"
148+
)
151149
```
152150

153151
This will create a folder called *shiny_Yang2019* in which there will be 2 data files *expression_matrix.rda* and *metadata.rda* and *app.R* which defines the app. To see the app, you can call *shiny::runApp('shiny_Yang2019')* and the app will start. The app generated is standalone and can be shared with collaborators or published online through a platform like \href{https://www.shinyapps.io/}{shinyapps.io}. This provides an easy way for anyone to explore the data and verify the conclusions, increasing access and promoting reproducibility of the bioinformatics analysis.
154152

155-
By default, the app will have 9 panels: Sample select, QC, DE, Volcano/MA plots, DE summary, Enrichment, Expression patterns, Cross plots, GRN. You can choose to remove one or more panels using the *panels.default* parameter.
153+
By default, the app will have 9 panels: Sample select, Quality checks, Differential expression, Volcano and MA plots, DE summary, Enrichment, Expression patterns, Cross plots, GRN inference. You can choose to remove one or more panels using the *panels.default* parameter.
156154

157155
```{r only QC and DE panels, eval = FALSE}
158-
generateShinyApp(expression.matrix = exp.proc,
159-
metadata = meta,
160-
shiny.dir = "shiny_Yang2019_onlyQC_DE",
161-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
162-
organism = "mmusculus",
163-
org.db = "org.Mm.eg.db",
164-
panels.default = c('QC','DE')
165-
)
156+
generateShinyApp(
157+
expression.matrix = exp.proc,
158+
metadata = meta,
159+
shiny.dir = "shiny_Yang2019_onlyQC_DE",
160+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
161+
organism = "mmusculus",
162+
org.db = "org.Mm.eg.db",
163+
panels.default = c('QC','DE')
164+
)
166165
```
167166

168167
See the following sections for more details about the default panels:
@@ -266,10 +265,32 @@ knitr::include_graphics("figures/Enrichment.png")
266265

267266
### Expression patterns
268267

268+
The expression pattern tab allows the creation of expression patterns to identify potential genes of interest across a variety of conditions. The most common application of this is a time series, but it could be suitable for another logical progression between conditions. To define the series, the user must select a column of the metadata and drag states into the "Series of states to use" area.
269+
270+
The pattern identification is done by calculating a confidence interval for each gene in each condition, using all samples in that condition and the number of standard deviations away from the mean provided. The pattern between two consecutive conditions is defined as straight (S) if the intervals overlap and up (U) or down (D) if they don't. The full expression pattern is the concatenation of individual patterns (for example, "UUS" for 4 conditions).
271+
272+
The grouped expression matrix can then be downloaded, showing which pattern each gene was assigned to. Plots are also created for the genes in the selected pattern ("Pattern to plot").
273+
269274
```{r patterns, echo = FALSE, out.width = "80%"}
270275
knitr::include_graphics("figures/ExpressionPatterns.png")
271276
```
272277

278+
#### Pattern line plot
279+
280+
A line plot is shown with the mean expression of the genes assigned to the chosen pattern in each condition. The expression values are mean-scaled by default. A legend is shown if less than 10 genes are present.
281+
282+
```{r patternLine, echo = FALSE, out.width = "80%"}
283+
knitr::include_graphics("figures/ExpressionPatternsLinePlot.png")
284+
```
285+
286+
#### Pattern expression heatmap
287+
288+
A heatmap is shown with the mean expression of the genes assigned to the chosen pattern in each condition. The expression values are z-score transformed by default. Gene names are shown if less than 50 genes are present.
289+
290+
```{r patternHeatmap, echo = FALSE, out.width = "80%"}
291+
knitr::include_graphics("figures/ExpressionPatternsHeatmap.png")
292+
```
293+
273294
### Cross plots
274295

275296
The cross plot tab allows you to compare two differential expression analyses against each other, for example two comparisons of interest or the same comparison using edgeR and DESeq2. The plot shows the log2 fold change of the two differential expression calls on each axis. Genes which are DE in both comparisons are coloured purple, in comparison 1 but not comparison 2 in blue and in comparison 2 but not comparison 1 in red. You can label selected genes and click on genes on the plot itself to gain more information and generate hypotheses.
@@ -280,6 +301,8 @@ knitr::include_graphics("figures/CrossPlot.png")
280301

281302
### Gene regulatory networks (GRN)
282303

304+
The GRN tab enables the creation of small gene regulatory networks (GRNs) to facilitate further exploration and hypothesis generation based on genes of interest. Target genes can be selected and a small network with them as targets can be generated by clicking the "Start GRN inference" button. The number of regulators can then be adjusted and the plot downloaded in interactive html format. Note that target genes can also regulate each other if the selected genes are functionally similar.
305+
283306
```{r GRN, echo = FALSE, out.width = "80%"}
284307
knitr::include_graphics("figures/GRN.png")
285308
```
@@ -289,19 +312,20 @@ knitr::include_graphics("figures/GRN.png")
289312
Alongside the default 8 panels, you can also define your own panels and add them to the app. As an example, we could add an extra QC panel (in this case it will be exactly the same):
290313

291314
```{r add extra panel, eval = FALSE}
292-
generateShinyApp(expression.matrix = exp.proc,
293-
metadata = meta,
294-
shiny.dir = "shiny_Yang2019_ExtraQC",
295-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
296-
organism = "mmusculus",
297-
org.db = "org.Mm.eg.db",
298-
panels.extra = tibble::tibble(
299-
UIfun = "QCpanelUI",
300-
UIvars = "'QC2', metadata",
301-
serverFun = "QCpanelServer",
302-
serverVars = "'QC2', expression.matrix, metadata"
303-
)
304-
)
315+
generateShinyApp(
316+
expression.matrix = exp.proc,
317+
metadata = meta,
318+
shiny.dir = "shiny_Yang2019_ExtraQC",
319+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
320+
organism = "mmusculus",
321+
org.db = "org.Mm.eg.db",
322+
panels.extra = tibble::tibble(
323+
UIfun = "QCpanelUI",
324+
UIvars = "'QC2', metadata",
325+
serverFun = "QCpanelServer",
326+
serverVars = "'QC2', expression.matrix, metadata"
327+
)
328+
)
305329
```
306330

307331
If you need to add extra data or package imports for the extra panel(s) then you can do this using the *data.extra* and *packages.extra* parameters. Make sure you have the extra data loaded when you create the app. For example:
@@ -310,22 +334,23 @@ If you need to add extra data or package imports for the extra panel(s) then you
310334
311335
extra.data1 = matrix(rnorm(36),nrow=6)
312336
extra.data2 = matrix(rnorm(60),nrow=10)
313-
314-
generateShinyApp(expression.matrix = exp.proc,
315-
metadata = meta,
316-
shiny.dir = "shiny_Yang2019_ExtraData",
317-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
318-
organism = "mmusculus",
319-
org.db = "org.Mm.eg.db",
320-
panels.extra = tibble::tibble(
321-
UIfun = "QCpanelUI",
322-
UIvars = "'QC2', metadata",
323-
serverFun = "QCpanelServer",
324-
serverVars = "'QC2', expression.matrix, metadata"
325-
),
326-
data.extra = c("extra.data1", "extra.data2"),
327-
packages.extra = "somePackage",
328-
)
337+
338+
generateShinyApp(
339+
expression.matrix = exp.proc,
340+
metadata = meta,
341+
shiny.dir = "shiny_Yang2019_ExtraData",
342+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
343+
organism = "mmusculus",
344+
org.db = "org.Mm.eg.db",
345+
panels.extra = tibble::tibble(
346+
UIfun = "QCpanelUI",
347+
UIvars = "'QC2', metadata",
348+
serverFun = "QCpanelServer",
349+
serverVars = "'QC2', expression.matrix, metadata"
350+
),
351+
data.extra = c("extra.data1", "extra.data2"),
352+
packages.extra = "somePackage",
353+
)
329354
330355
```
331356

-400 KB
Loading
49.1 KB
Loading
91.3 KB
Loading

vignettes/quickStartGuide.Rmd

Lines changed: 0 additions & 24 deletions
This file was deleted.

0 commit comments

Comments
 (0)