You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bulk mRNA-seq experiments are essential for exploring a wide range of biological questions. To bring closer the data analysis to its interpretation and facilitate both interactive, exploratory tasks and the sharing of (easily accessible) information, we present bulkAnalyseR an R package that offers a seamless, customisable solution for most bulk RNAseq datasets. By integrating state-of-the-art approaches without relying on extensive computational support, and replacing static images with interactive panels, our aim is to further support and strengthen the reusability of data. bulkAnalyseR enables standard analyses of bulk data, using an expression matrix as starting point. It presents the outputs of various steps in an interactive web-based interface, making it easy to generate, explore and verify hypotheses. Moreover, the app can be easily shared and published, incentivising research reproducibility and allowing others to explore the same processed data and enhance the biological conclusions.
19
+
Bulk mRNA-seq experiments are essential for exploring a wide range of biological questions. To bring the data analysis closer to its interpretation and facilitate both interactive, exploratory tasks and the sharing of (easily accessible) information, we present *bulkAnalyseR* an R package that offers a seamless, customisable solution for most bulk RNAseq datasets. By integrating state-of-the-art approaches without relying on extensive computational support, and replacing static images with interactive panels, our aim is to further support and strengthen the reusability of data. bulkAnalyseR enables standard analyses of bulk data, using an expression matrix as starting point. It presents the outputs of various steps in an interactive web-based interface, making it easy to generate, explore and verify hypotheses. Moreover, the app can be easily shared and published, incentivising research reproducibility and allowing others to explore the same processed data and enhance the biological conclusions.
20
20
21
21
```{r workflow, echo = FALSE, out.width = "80%"}
22
22
knitr::include_graphics("figures/workflow.png")
23
23
```
24
24
25
25
## Installation
26
26
27
-
To install the package, first install all bioconductor dependencies:
27
+
To install the package, first install all CRAN dependencies:
if (!requireNamespace("devtools", quietly = TRUE))
81
79
install.packages("devtools")
82
80
@@ -93,10 +91,10 @@ library(bulkAnalyseR)
93
91
94
92
### Loading an expression matrix
95
93
96
-
For this demonstration we will be using a subset of the count matrix for an experiment included in [a 2019 paper by Yang et al](https://www.sciencedirect.com/science/article/pii/S2405471219301152). Rows represent genes/features and columns represent samples:
94
+
For this vignette we are using a subset of the count matrix for an experiment included in [a 2019 paper by Yang et al](https://www.sciencedirect.com/science/article/pii/S2405471219301152). Rows represent genes/features and columns represent samples:
```{r convert type, include = FALSE, eval = FALSE}
114
+
meta$srr = as.character(meta$srr)
115
+
meta$timepoint = as.character(meta$timepoint)
118
116
```
119
117
120
118
This metadata table should be a data frame containing at minimum two columns: the first column must contain the column names of the expression.matrix, while the last column is assumed to contain the experimental conditions that will be tested for differential expression.
@@ -124,45 +122,46 @@ This metadata table should be a data frame containing at minimum two columns: th
124
122
Before using the expression matrix to create our shiny app, some preprocessing should be performed. *bulkAnalyseR* contains the function **preprocessExpressionMatrix** which takes the expression matrix as input then denoises the data using [*noisyR*](https://github.com/Core-Bioinformatics/noisyR) and normalises using either quantile (by default) or RPM normalisation (specified using *normalisation.method* parameter). By specifying *output.plot = TRUE*, you can also print the expression-similarity line plots from *noisyR* to console and you can specify further parameters from the noisyR [*noisyr_counts*](https://core-bioinformatics.github.io/noisyR/reference/noisyr_counts.html).
It is not recommended to use data which has not been denoised and normalised as input to *generateShinyApp*. You can also perform your own preprocessing outside *preprocessExpressionMatrix*.
132
129
133
130
## Creating a shiny app
134
131
135
-
The central function in *bulkAnalyseR* is **generateShinyApp**. This function creates an app.R file and all required objects to run the app in .rda format in the target directory. The key inputs to **generateShinyApp** are *expression.matrix* (after being processed using *preprocessExpressionMatrix*) and *meta*. You can also specify the title of the app (which will appear in the navigation bar at the top of the app) with *app.title*, the directory where the app should be saved with *shiny.dir* and the shiny theme you wish to use ('flatly' is the default, you can find the other options [here](https://rstudio.github.io/shinythemes/)). You also need to specify the organism on which your data was generated, firstly using the *organism* parameter using the *gprofiler2* naming convention e.g. 'hsapiens','mmusculus' (see [here](https://biit.cs.ut.ee/gprofiler/page/organism-list) for the full list of organisms and IDs), and secondly specifying the database for annotations to convert ENSEMBL IDs to gene names e.g. org.Hs.eg.db - the full list of bioconductor packaged databases can be seen using this command:
132
+
The central function in *bulkAnalyseR* is **generateShinyApp**. This function creates an app.R file and all required objects to run the app in .rda format in the target directory. The key inputs to **generateShinyApp** are *expression.matrix* (after being processed using *preprocessExpressionMatrix*) and *metadata*. You can also specify the title of the app (which will appear in the navigation bar at the top of the app) with *app.title*, the directory where the app should be saved with *shiny.dir* and the shiny theme you wish to use ('flatly' is the default, you can find the other options [here](https://rstudio.github.io/shinythemes/)). You also need to specify the organism on which your data was generated, firstly using the *organism* parameter using the *gprofiler2* naming convention e.g. 'hsapiens','mmusculus' (see [here](https://biit.cs.ut.ee/gprofiler/page/organism-list) for the full list of organisms and IDs), and secondly specifying the database for annotations to convert ENSEMBL IDs to gene names e.g. org.Hs.eg.db - the full list of bioconductor packaged databases can be seen using this command:
136
133
137
134
```{r bioconductor dbs}
138
135
BiocManager::available("^org\\.")
139
136
```
140
137
141
-
The dataset in this example was generated on *M. musculus* so we would generate the app using this function call:
138
+
The dataset in this example was generated on *M. musculus* so we would generate the app using this function call (note that the org.Mm.eg.db needs to be installed):
142
139
143
140
```{r generate app, eval=FALSE}
144
-
generateShinyApp(expression.matrix = exp.proc,
145
-
metadata = meta,
146
-
shiny.dir = "shiny_Yang2019",
147
-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
148
-
organism = "mmusculus",
149
-
org.db = "org.Mm.eg.db"
150
-
)
141
+
generateShinyApp(
142
+
expression.matrix = exp.proc,
143
+
metadata = meta,
144
+
shiny.dir = "shiny_Yang2019",
145
+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
146
+
organism = "mmusculus",
147
+
org.db = "org.Mm.eg.db"
148
+
)
151
149
```
152
150
153
151
This will create a folder called *shiny_Yang2019* in which there will be 2 data files *expression_matrix.rda* and *metadata.rda* and *app.R* which defines the app. To see the app, you can call *shiny::runApp('shiny_Yang2019')* and the app will start. The app generated is standalone and can be shared with collaborators or published online through a platform like \href{https://www.shinyapps.io/}{shinyapps.io}. This provides an easy way for anyone to explore the data and verify the conclusions, increasing access and promoting reproducibility of the bioinformatics analysis.
154
152
155
-
By default, the app will have 9 panels: Sample select, QC, DE, Volcano/MA plots, DE summary, Enrichment, Expression patterns, Cross plots, GRN. You can choose to remove one or more panels using the *panels.default* parameter.
153
+
By default, the app will have 9 panels: Sample select, Quality checks, Differential expression, Volcano and MA plots, DE summary, Enrichment, Expression patterns, Cross plots, GRN inference. You can choose to remove one or more panels using the *panels.default* parameter.
156
154
157
155
```{r only QC and DE panels, eval = FALSE}
158
-
generateShinyApp(expression.matrix = exp.proc,
159
-
metadata = meta,
160
-
shiny.dir = "shiny_Yang2019_onlyQC_DE",
161
-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
162
-
organism = "mmusculus",
163
-
org.db = "org.Mm.eg.db",
164
-
panels.default = c('QC','DE')
165
-
)
156
+
generateShinyApp(
157
+
expression.matrix = exp.proc,
158
+
metadata = meta,
159
+
shiny.dir = "shiny_Yang2019_onlyQC_DE",
160
+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data",
161
+
organism = "mmusculus",
162
+
org.db = "org.Mm.eg.db",
163
+
panels.default = c('QC','DE')
164
+
)
166
165
```
167
166
168
167
See the following sections for more details about the default panels:
The expression pattern tab allows the creation of expression patterns to identify potential genes of interest across a variety of conditions. The most common application of this is a time series, but it could be suitable for another logical progression between conditions. To define the series, the user must select a column of the metadata and drag states into the "Series of states to use" area.
269
+
270
+
The pattern identification is done by calculating a confidence interval for each gene in each condition, using all samples in that condition and the number of standard deviations away from the mean provided. The pattern between two consecutive conditions is defined as straight (S) if the intervals overlap and up (U) or down (D) if they don't. The full expression pattern is the concatenation of individual patterns (for example, "UUS" for 4 conditions).
271
+
272
+
The grouped expression matrix can then be downloaded, showing which pattern each gene was assigned to. Plots are also created for the genes in the selected pattern ("Pattern to plot").
A line plot is shown with the mean expression of the genes assigned to the chosen pattern in each condition. The expression values are mean-scaled by default. A legend is shown if less than 10 genes are present.
A heatmap is shown with the mean expression of the genes assigned to the chosen pattern in each condition. The expression values are z-score transformed by default. Gene names are shown if less than 50 genes are present.
The cross plot tab allows you to compare two differential expression analyses against each other, for example two comparisons of interest or the same comparison using edgeR and DESeq2. The plot shows the log2 fold change of the two differential expression calls on each axis. Genes which are DE in both comparisons are coloured purple, in comparison 1 but not comparison 2 in blue and in comparison 2 but not comparison 1 in red. You can label selected genes and click on genes on the plot itself to gain more information and generate hypotheses.
The GRN tab enables the creation of small gene regulatory networks (GRNs) to facilitate further exploration and hypothesis generation based on genes of interest. Target genes can be selected and a small network with them as targets can be generated by clicking the "Start GRN inference" button. The number of regulators can then be adjusted and the plot downloaded in interactive html format. Note that target genes can also regulate each other if the selected genes are functionally similar.
Alongside the default 8 panels, you can also define your own panels and add them to the app. As an example, we could add an extra QC panel (in this case it will be exactly the same):
290
313
291
314
```{r add extra panel, eval = FALSE}
292
-
generateShinyApp(expression.matrix = exp.proc,
293
-
metadata = meta,
294
-
shiny.dir = "shiny_Yang2019_ExtraQC",
295
-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
296
-
organism = "mmusculus",
297
-
org.db = "org.Mm.eg.db",
298
-
panels.extra = tibble::tibble(
299
-
UIfun = "QCpanelUI",
300
-
UIvars = "'QC2', metadata",
301
-
serverFun = "QCpanelServer",
302
-
serverVars = "'QC2', expression.matrix, metadata"
303
-
)
304
-
)
315
+
generateShinyApp(
316
+
expression.matrix = exp.proc,
317
+
metadata = meta,
318
+
shiny.dir = "shiny_Yang2019_ExtraQC",
319
+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
320
+
organism = "mmusculus",
321
+
org.db = "org.Mm.eg.db",
322
+
panels.extra = tibble::tibble(
323
+
UIfun = "QCpanelUI",
324
+
UIvars = "'QC2', metadata",
325
+
serverFun = "QCpanelServer",
326
+
serverVars = "'QC2', expression.matrix, metadata"
327
+
)
328
+
)
305
329
```
306
330
307
331
If you need to add extra data or package imports for the extra panel(s) then you can do this using the *data.extra* and *packages.extra* parameters. Make sure you have the extra data loaded when you create the app. For example:
@@ -310,22 +334,23 @@ If you need to add extra data or package imports for the extra panel(s) then you
310
334
311
335
extra.data1 = matrix(rnorm(36),nrow=6)
312
336
extra.data2 = matrix(rnorm(60),nrow=10)
313
-
314
-
generateShinyApp(expression.matrix = exp.proc,
315
-
metadata = meta,
316
-
shiny.dir = "shiny_Yang2019_ExtraData",
317
-
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
318
-
organism = "mmusculus",
319
-
org.db = "org.Mm.eg.db",
320
-
panels.extra = tibble::tibble(
321
-
UIfun = "QCpanelUI",
322
-
UIvars = "'QC2', metadata",
323
-
serverFun = "QCpanelServer",
324
-
serverVars = "'QC2', expression.matrix, metadata"
325
-
),
326
-
data.extra = c("extra.data1", "extra.data2"),
327
-
packages.extra = "somePackage",
328
-
)
337
+
338
+
generateShinyApp(
339
+
expression.matrix = exp.proc,
340
+
metadata = meta,
341
+
shiny.dir = "shiny_Yang2019_ExtraData",
342
+
app.title = "Shiny app for visualisation of three timepoints from the Yang 2019 data - extra QC",
0 commit comments