Skip to content

Commit 6181d5d

Browse files
committed
minor changes
1 parent 22ff531 commit 6181d5d

6 files changed

Lines changed: 174 additions & 164 deletions

File tree

R/PlotScores.R

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -786,16 +786,16 @@ PlotScores_Categorical <- function(data, metadata, gene_sets,
786786
if (!is.null(title)) title <- wrap_title(title, width = widthTitle)
787787

788788
# Create label for y axis based on method.
789-
if (method == "ssGSEA") {
790-
ylab <- "ssGSEA Enrichment Score"
791-
} else if (method == "logmedian") {
792-
ylab <- "Normalised Signature Score"
793-
} else if (method == "ranking") {
794-
ylab <- "Signature Genes' Ranking"
795-
}
789+
# if (method == "ssGSEA") {
790+
# ylab <- "ssGSEA Enrichment Score"
791+
# } else if (method == "logmedian") {
792+
# ylab <- "Normalised Signature Score"
793+
# } else if (method == "ranking") {
794+
# ylab <- "Signature Genes' Ranking"
795+
# }
796796

797797
combined_plot <- ggpubr::annotate_figure(combined_plot,
798-
left = grid::textGrob(ylab,
798+
left = grid::textGrob(paste0("Gene Set's Score (", method, ")"),
799799
rot = 90, vjust = 1,
800800
gp = grid::gpar(cex = 1.3,
801801
fontsize = labsize)),
@@ -1093,16 +1093,16 @@ PlotScores_Numeric <- function(data,
10931093
if (!is.null(title)) title <- wrap_title(title, width = widthTitle)
10941094

10951095
# Create label for y axis based on method.
1096-
if (method == "ssGSEA") {
1097-
ylab <- "ssGSEA Enrichment Score"
1098-
} else if (method == "logmedian") {
1099-
ylab <- "Normalised Signature Score"
1100-
} else if (method == "ranking") {
1101-
ylab <- "Signature Genes' Ranking"
1102-
}
1096+
# if (method == "ssGSEA") {
1097+
# ylab <- "ssGSEA Enrichment Score"
1098+
# } else if (method == "logmedian") {
1099+
# ylab <- "Normalised Signature Score"
1100+
# } else if (method == "ranking") {
1101+
# ylab <- "Signature Genes' Ranking"
1102+
# }
11031103

11041104
combined_plot <- ggpubr::annotate_figure(combined_plot,
1105-
left = grid::textGrob(ylab,
1105+
left = grid::textGrob(paste0("Gene Set's Score (", method, ")"),
11061106
rot = 90,
11071107
vjust = 1,
11081108
gp = grid::gpar(cex = 1.3,

README.Rmd

Lines changed: 48 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ knitr::opts_chunk$set(
2626

2727
<!-- badges: end -->
2828

29-
**markeR** provides a suite of methods for using gene sets to quantify and evaluate the extent to which a given gene signature marks a specific phenotype from gene expression data. The package implements various scoring, enrichment and classification approaches, along with tools to compute performance metrics and visualize results.
29+
**`markeR`** is an R package that provides a modular and extensible framework for the systematic evaluation of gene sets as phenotypic markers using transcriptomic data. The package is designed to support both quantitative analyses and visual exploration of gene set behaviour across experimental and clinical phenotypes.
3030

31-
> **To cite markeR please use:**
31+
> **To cite `markeR` please use:**
3232
>
3333
> Martins-Silva R, Kaizeler A, Barbosa-Morais N (2025). _markeR: an R Toolkit for Evaluating Gene Sets as Phenotypic Markers_. Gulbenkian Institute for Molecular Medicine, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal. R package version 0.99.4, https://github.com/DiseaseTranscriptomicsLab/markeR.
3434
35-
The folder `inst/Paper/` is in the **paper** branch and contains all scripts and materials used in the original markeR paper to reproduce analyses and figures. You can browse it [here](https://github.com/DiseaseTranscriptomicsLab/markeR/tree/paper/inst/Paper).
35+
The folder `inst/Paper/` is in the **paper** branch and contains all scripts and materials used in the original `markeR` paper to reproduce analyses and figures. You can browse it [here](https://github.com/DiseaseTranscriptomicsLab/markeR/tree/paper/inst/Paper).
3636

3737

3838
![](man/figures/Workflow.png)
@@ -69,7 +69,7 @@ library(markeR)
6969
```
7070

7171

72-
Or install the latest development release of markeR from [GitHub](https://github.com/) with:
72+
Or install the latest development release of `markeR` from [GitHub](https://github.com/) with:
7373

7474
``` r
7575
# install.packages("devtools")
@@ -91,8 +91,6 @@ This package is officially supported on `R > 4.5.0`. ⚠️ Older versions of `R
9191

9292
## Common Workflow
9393

94-
`markeR` provides a modular pipeline to quantify transcriptomic signatures and assess their association with phenotypic or clinical variables. The typical workflow includes the following steps:
95-
9694
### 1. Input Requirements
9795

9896
Depending on the analysis mode, inputs vary slightly.
@@ -129,7 +127,10 @@ gene_sets
129127
```
130128

131129
* **Expression Data Frame**:
132-
A filtered and normalised gene expression data frame (genes × samples). Row names must be gene identifiers, and column names must match the sample IDs in the metadata.
130+
A filtered and normalised, non log-transformed, gene expression matrix (genes × samples). Row names must be gene identifiers; column names must match sample IDs in the metadata.
131+
132+
**Warning:** If you are using microarray data or outputs from common RNA-seq pipelines (*e.g.*, edgeR), note that the expression values may already be log2-normalised. The input to `markeR` must necessarily be **non-log-transformed**. If your data are log2-transformed, you can revert them by applying `2^data`.
133+
133134

134135
```{r example-expression-matrix, echo=FALSE}
135136
# Simulate expression matrix: 10 genes × 5 samples
@@ -145,7 +146,7 @@ head(expr_df)
145146
```
146147

147148
* **Sample Metadata**:
148-
A data frame with annotations for each sample, with the sample ID in the first column. The row names must match the column names of the expression matrix.
149+
A data frame with samples as rows and annotations as columns. The first column should contain sample IDs matching the expression matrix column names.
149150

150151
```{r example-metadata, echo=FALSE}
151152
# Simulate sample metadata
@@ -162,82 +163,79 @@ metadata
162163

163164
### 2. Select Mode of Analysis
164165

165-
* **Discovery Mode**:
166-
Explore how a single, well-characterised gene set relates to a specific variable of interest. Suitable for hypothesis generation.
166+
`markeR` provides two modes of operation:
167167

168-
* **Benchmarking Mode**:
169-
Evaluate one or more gene sets against multiple metadata variables using a standardised scoring and effect size framework. This mode provides comprehensive visualisations and comparisons across methods.
168+
* **Benchmarking**:
169+
evaluates gene sets' performance in marking a metadata variable, *i.e.*, a phenotype, returning comparative visualisations across scoring and enrichment methods.
170+
171+
* **Discovery**:
172+
examines the relationship between a gene set and one or more variables of interest, suitable for exploratory or hypothesis-generating analyses.
170173

171174
### 3. Choose a Quantification Approach
172175

173-
`markeR` supports two complementary strategies for quantifying the association between gene sets and phenotypes:
176+
Two complementary strategies are implemented for quantifying associations between gene sets and phenotypes:
174177

175178
#### 3.1 Score-Based Approach
176179

177-
This strategy generates a **single numeric score per sample**, reflecting the activity of a gene set. It enables flexible downstream analyses, including comparisons across phenotypic groups.
178-
179-
Three scoring methods are available:
180180

181-
* **Log2-median**: Calculates the median log2 expression of the genes in the set. Sensitive to absolute shifts in expression.
181+
A score summarising the collective expression of a gene set therein is assigned **to each sample**. Scores can be visualised using built-in functions, or used directly in downstream analyses (*e.g.*, comparisons between phenotypic groups of samples, correlations with numerical phenotypes).
182182

183-
* **Ranking**: Ranks all genes within each sample and averages the ranks of gene set members. Captures relative ordering rather than magnitude.
183+
Available methods:
184184

185-
* **ssGSEA**: Computes a single-sample gene set enrichment score using the ssGSEA algorithm. Reflects the coordinated up- or down-regulation of the set in each sample.
185+
* **Log2-median**: mean of the across-sample normalised log2 median-centred expression levels of the genes in the set; for bidirectional gene sets, the sample score is the partial score for the subset of putatively upregulated genes minus that of the downregulated subset.
186186

187-
These methods vary in assumptions and sensitivity. Robust gene sets are expected to perform consistently across all three.
187+
* **Ranking**: mean expression rank of gene set members in each sample; for bidirectional gene sets, the sample score is the partial score for the subset of putatively upregulated genes minus that of the downregulated subset, and normalised by the number of genes in the set.
188188

189-
#### 3.2 Enrichment-Based Approach
189+
* **ssGSEA**: single-sample gene set enrichment score using ssGSEA; for bidirectional gene sets, the sample score is the partial score for the subset of putatively upregulated genes minus that of the downregulated subset.
190190

191-
This approach uses a classical **gene set enrichment analysis (GSEA)** framework to evaluate whether the gene set is significantly overrepresented at the top or bottom of a ranked list of genes (e.g., ranked by fold change or correlation with phenotype).
191+
Gene sets that are robust phenotypic markers are expected to yield consistently high scores across methods.
192192

193-
* **GSEA**: Computes a Normalised Enrichment Score (NES) for each contrast or variable of interest, adjusting for gene set size and multiple testing.
193+
#### 3.2 Enrichment-Based Approach
194194

195-
Use this approach when interested in collective behaviour of gene sets in relation to ranked differential signals.
195+
Enrichment-based methods implement **Gene Set Enrichment Analysis (GSEA)**. Genes are ranked according to differential expression statistics, and a Normalised Enrichment Score (NES) per variable of interest is computed, accompanied by a p-value adjusted for multiple hypothesis testing.
196196

197197
### 4. Visualisation and Evaluation
198198

199-
In **Benchmarking Mode**, `markeR` offers a range of visual summaries:
199+
In **Benchmarking Mode**, `markeR` offers a range of visual summaries:
200+
201+
* Violin plots of score distributions by categorical phenotype;
202+
* Scatter plots of association between scores and numerical phenotypes;
203+
* Volcano plots and heatmaps of scores or differential gene set expression based on effect sizes (Cohen’s *d* or *f*);
204+
* ROC curves and respective AUC values of gene sets' phenotypic classification performance;
205+
* Violin plots of effect size distributions (Cohen’s *d*) for pairwise group differences in scores, for original and simulated gene sets;
206+
* Plots summarising NES alongside adjusted p-values (*e.g.*, lollipop plots);
207+
* GSEA plots showing running enrichment scores across ranked gene lists.
200208

201-
* Violin or scatter plots showing score distributions by phenotype
202-
* Volcano plots and heatmaps based on effect sizes (Cohen’s *d* or *f*)
203-
* ROC curves and AUC values
204-
* Null distribution testing using random gene sets matched for size and directionality
205-
* Lollipop plots summarising enrichment scores (NES) with adjusted p-values
206-
* Enrichment plots showing running enrichment scores across ranked gene lists
207-
* Volcano plots
208209

209210
In **Discovery Mode**, the output focuses on a single gene set:
210211

211-
* Score distributions by phenotype
212-
* Pairwise contrasts (Cohen’s *d*) and overall effect sizes (Cohens *f*)
213-
* Enrichment score summaries (NES) with adjusted p-values (e.g., lollipop plots)
212+
* Score distributions stratified by variable;
213+
* Effect sizes for pairwise and multiple-group differences (Cohen's *d* and *f*, respectively);
214+
* Cross-variable summaries of NES and adjusted p-values (*e.g.*, lollipop plots).
214215

215-
Benchmarking mode offers the most comprehensive set of features and allows users to seamlessly move from discovery to benchmarking mode once a variable of interest has been identified and further testing is required. The main difference from Discovery mode is that Benchmarking is designed to evaluate multiple gene sets simultaneously, whereas Discovery mode focuses on quantifying a single, robust gene set.
216+
The Benchmarking Mode offers the most comprehensive set of features. Users are allowed to seamlessly move from Discovery to Benchmarking once a variable of interest has been identified and further testing is required. Benchmarking is designed to evaluate multiple gene sets simultaneously, whereas Discovery focuses on the performance of a single gene set.
216217

217218
### 5. Individual Gene Exploration
218219

219-
To better understand the contribution of individual genes within a gene set and identify whether specific genes drive the overall signal, `markeR` offers a suite of gene-level exploratory analyses, including:
220+
To better understand the contribution of individual genes within a gene set, and identify whether specific genes drive the set's collective signal, `markeR` provides `VisualiseIndividualGenes.` Available options include:
220221

221-
* Expression heatmaps of genes across samples and groups
222-
* Violin plots showing expression distributions of individual genes
223-
* Correlation heatmaps to reveal co-expression patterns among genes in the set
224-
* ROC curves and AUC values for individual genes to evaluate their discriminatory power
225-
* Effect size calculations (Cohen’s *d*) per gene to quantify differential expression
226-
* Principal Component Analysis (PCA) on gene set genes to assess variance explained and sample clustering
222+
* Expression heatmaps of genes across samples or groups of samples;
223+
* Violin plots showing cross-sample expression distributions of individual genes;
224+
* Heatmaps of pairwise cross-sample expression correlation between genes in the set;
225+
* ROC curves and AUC values to evaluate single genes' performance as phenotypic markers;
226+
* Effect size estimation (Cohen’s *d*) of expression differences between groups of samples;
227+
* Principal Component Analysis (PCA) of expression of genes in the set, to evaluate which genes dominate collective variance and how samples separate according to the gene set's expression.
227228

228229
### 6. Compare with Reference Gene Sets
229230

230-
`markeR` allows comparison of user-defined gene sets to reference sets (e.g., from MSigDB) using:
231+
`markeR` also supports comparison of user-defined gene sets against reference collections (e.g., MSigDB). Two complementary similarity metrics are implemented:
231232

232233
* **Jaccard Index**:
233-
Measures gene overlap relative to union size.
234+
the ratio of the number of genes in common over the total number of genes in the two sets.
234235

235-
* **Log Odds Ratio (logOR)**:
236-
Computes enrichment using a user-defined gene universe and Fisher’s exact test.
237-
238-
Filters can be applied based on similarity thresholds (e.g., minimum Jaccard, OR, or p-value).
239-
236+
* **Log Odds Ratio (logOR)** from Fisher’s exact test of association between gene sets, given a specified gene universe.
240237

238+
Filters can be applied based on similarity thresholds (e.g., minimum Jaccard, OR, or Fisher's test p-value).
241239

242240
## Contact
243241

0 commit comments

Comments
 (0)