SISBID
diff --git a/‎2025_SISBID_R_packages.Rmd‎
Lines changed: 35 additions & 0 deletions b/‎2025_SISBID_R_packages.Rmd‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎2025_SISBID_Testing_Lab.Rmd‎
Lines changed: 149 additions & 0 deletions b/‎2025_SISBID_Testing_Lab.Rmd‎
Lines changed: 149 additions & 0 deletions
diff --git a/‎2025_SISBID_Testing_Lab.html‎
Lines changed: 531 additions & 0 deletions b/‎2025_SISBID_Testing_Lab.html‎
Lines changed: 531 additions & 0 deletions
diff --git a/‎2025_SISBID_Testing_Lab.pdf‎
535 KB b/‎2025_SISBID_Testing_Lab.pdf‎
535 KB
diff --git a/‎2025_SISBID_rmarkdown_install.Rmd‎
Lines changed: 56 additions & 0 deletions b/‎2025_SISBID_rmarkdown_install.Rmd‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎2025_SISBID_rmarkdown_install.html‎
Lines changed: 464 additions & 0 deletions b/‎2025_SISBID_rmarkdown_install.html‎
Lines changed: 464 additions & 0 deletions
diff --git a/‎2025_SISBID_rmarkdown_install.pdf‎
110 KB b/‎2025_SISBID_rmarkdown_install.pdf‎
110 KB
@@ -0,0 +1,35 @@
+---
+title: "Packages to Install"
+author: "Genevera I. Allen & Yufeng Liu"
+output:
+  pdf_document: default
+  html_document:
+    df_print: paged
+---
+
+#Install packages for demos and labs
+```{r setup, include=FALSE}
+options(repos="https://cran.rstudio.com" )
+install.packages('GGally')
+install.packages('igraph')
+install.packages('fastICA')
+install.packages('kknn')
+install.packages('ggplot2')
+install.packages('huge')
+install.packages('glasso')
+install.packages('umap')
+if (!require("BiocManager", quietly = TRUE))
+    install.packages("BiocManager")
+BiocManager::install("Biobase")
+BiocManager::install("GO.db")
+BiocManager::install("impute")
+BiocManager::install("preprocessCore")
+install.packages('NMF')
+#install.packages('WGCNA')
+install.packages('ISLR')
+install.packages('sda')
+install.packages('Rtsne')
+install.packages("tidyr")
+install.packages("glmnet")
+```
+
@@ -0,0 +1,149 @@
+---
+title: "2025 SISBID High-Dimensional Hypothesis Testing Lab"
+author: "Genevera I. Allen and Yufeng Liu"
+output:
+  html_document: default
+  pdf_document: default
+always_allow_html: yes
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+Load Packages
+```{r message= FALSE}
+library(sda)
+library(ggplot2)
+```
+
+H_0 : feature is not associated with the response.  
+## Data set 1 - Simulated Data  
+Small simulated data set to demonstrate multiple testing when **all null hypthesis hold**. 
+```{r}
+#simulate data
+x <- matrix(rnorm(1000*50),ncol=50)
+y <- sample(c(0,1),50,rep=TRUE)
+ps <- NULL
+for(i in 1:1000){
+  ps <- c(ps,t.test(x[i,y==0],x[i,y==1])$p.value)
+}
+cat("Around 5% of p-values are below 0.05:",mean(ps<.05),fill=TRUE)
+```
+
+Benjamini-Hochberg Algorithm for FDR Control
+```{r}
+fdrs.bh <- p.adjust(ps, method="BH")
+BHData = data.frame(cbind(ps,fdrs.bh))
+colnames(BHData) = c("OriginalP","BH.P")
+ggplot(BHData) + 
+  geom_point(mapping = aes(x = OriginalP, y = BH.P)) + 
+  ggtitle("Original p-value vs p-value after Benjamini-Hochberg correction") + 
+  theme(plot.title = element_text(hjust = 0.5)) + 
+  xlab("Orginal p-values") + ylab("p-values after Benjamini-Hochberg correction")
+```
+
+
+```{r}
+BHData$index = 1:nrow(BHData)
+ggplot(BHData) + 
+  geom_point(mapping = aes(x = index, y = BH.P)) + 
+  ggtitle("p-value after Benjamini-Hochberg correction") + 
+  theme(plot.title = element_text(hjust = 0.5)) + 
+  xlab("Index") + ylab("p-values after Benjamini-Hochberg correction")
+```
+
+
+## Data set 2 - Simulated Data   
+Small simulated data set to demonstrate multiple testing when **not all null hypthesis hold**.
+```{r}
+#simulate data
+x <- matrix(rnorm(1000*50),ncol=50)
+y <- sample(c(0,1),50,rep=TRUE)
+x[1:100,y==0] <- x[1:100,y==0] + 1
+ps <- NULL
+for(i in 1:1000) {
+  ps <- c(ps,t.test(x[i,y==0],x[i,y==1])$p.value)
+}
+cat("Way more than 5% of p-values are below 0.05:",mean(ps<.05),fill=TRUE)
+```
+
+```{r}
+fdrs.bh <- p.adjust(ps, method="BH")
+# plot
+BHData = data.frame(cbind(ps,fdrs.bh))
+colnames(BHData) = c("OriginalP","BH.P")
+ggplot(BHData) + 
+  geom_point(mapping = aes(x = OriginalP, y = BH.P)) + 
+  ggtitle("Original p-value vs p-value after Benjamini-Hochberg correction") + 
+  theme(plot.title = element_text(hjust = 0.5)) + 
+  xlab("Orginal p-values") + ylab("p-values after Benjamini-Hochberg correction")
+
+BHData$index = 1:nrow(BHData)
+ggplot(BHData) + 
+  geom_point(mapping = aes(x = index, y = BH.P)) + 
+  ggtitle("p-value after Benjamini-Hochberg correction") + 
+  theme(plot.title = element_text(hjust = 0.5)) + 
+  xlab("Index") + ylab("p-values after Benjamini-Hochberg correction")
+```
+
+
+```{r}
+cat("Number of Tests with FDR below 0.4:",sum(fdrs.bh<0.4), fill=TRUE)
+cat("Compute the BH FDR Directly:",max(which(sort(ps,decreasing=FALSE) < .4*(1:1000)/1000)),
+                                        fill=TRUE)
+BHData = BHData[order(ps,decreasing = FALSE),]
+BHData$index = 1:nrow(BHData)
+# plot
+ggplot(BHData) + 
+  geom_point(mapping = aes(x = index, y = OriginalP)) + 
+  ggtitle("Original p-values with different correction methods") + 
+  geom_abline(intercept = 0.4/1000,slope = 0,col= "red") + #Bonferroni
+  geom_abline(intercept = 0 ,slope = 0.4/1000,col= "blue") + #BH procedure
+  theme(plot.title = element_text(hjust = 0.5)) + 
+  xlab("Index") + ylab("Ordered p-values")
+
+```
+
+### Data set 3, Real Data: Prostate Data (Singh et al. 2002). This data set consists of gene expression levels for 6033 genes among 102 men.   
+The dataset is available from the R package "sda"  
+* Problem 1 - We wish to identify important genes to differetiate cancer or healthy patients. What kind of tests are reasonable?  
+* Problem 2 - In order to adjust for multiple comparisons, which procedures should one use?  
+* Problem 3 - Examine the list of genes identified.  
+```{r}
+## import data
+data(singh2002)
+x = singh2002$x
+y = singh2002$y
+
+n1 = sum(y == "healthy")
+n2 = length(y) - n1
+```
+
+
+```{r}
+ps<-NULL
+for(i in 1:ncol(x)) {
+  ps <- c(ps, t.test(x[1:n1,i], x[(n1+1):(n1+n2),i])$p.value)
+}
+## ordered p-values names(ps)<-seq(1,ncol(x),1)
+p1 =sort (ps)
+
+## plot ordered p-values 
+plot(p1[1:100], pch=rep('*',100),ylim=c(0,0.003), ylab="ordered p-values")
+## rejection boundry of Benjamini-Hochberg's procedure
+abline(a=0, b=0.1/ncol(x), col="red")
+## rejection boundary of Bonferroni at 0.1 
+abline(a=0.1/ncol(x), b=0, col="blue",lty=5)
+
+cat("Compute the no. rejection by Bonferroni:",
+    max(which(sort(ps,decreasing=FALSE) < .1/ncol(x))), fill=TRUE)
+cat("Compute the BH FDR Directly:",
+max(which(sort(ps,decreasing=FALSE) < .1*(1:ncol(x))/ncol(x))),
+                                        fill=TRUE)
+
+arrows(x0 = 61, y0 = 0.00085, x1 = 58, y1 = p1[57], length = 0.1)
+text(63.5, 0.00085, labels="imax = 57", cex=.8, pos=4, col="black")
+legend("topleft",legend=c("BH's Procedure", "Bonferroni", "Ordered p-values"),
+       lty=c(1, 5, NA), col=c("red","blue", "black"), pch = c(NA, NA, '*'))
+```
@@ -0,0 +1,56 @@
+---
+title: "Install R Markdown"
+author: "Genevera I. Allen & Yufeng Liu"
+output:
+  pdf_document: default
+  html_document: default
+---
+
+# A Guide to Installing and Using R Markdown
+
+## Install RStudio First
+RStudio is a desktop application that serves as an integrated development environment for R computing language. Use the link below to install the proper version of RStudio depending on your computer.
+
+https://rstudio.com/products/rstudio/download/
+
+R Markdown should be included in R Studio.
+
+For more information on R Markdown and how to use it, follow the link below.
+
+https://rmarkdown.rstudio.com/authoring_quick_tour.html#Installation
+
+## Topics we are covering in this module
+
+- Dimension Reduction
+- Clustering
+- Testing 
+- Graphical Models
+- Validation
+
+
+# Knit R Markdown File
+In order to knit the R Markdown file, look at the top of the document and click "knit". If LaTex is not installed on your computer (LaTex is not a requirement at all), the document will default knit to an HTML file. Knit this file to make sure R Markdown is properly installed. 
+
+# Install Packages
+In order to install the packages needed to complete the labs, please refer to the r packages Rmd file ("2025_SISBID_R_packages.Rmd"). The file contains the packages needed for the module, organized by each lab. To install, place cursor on the line you would like to run and click the "Run" button at the top of the document. 
+
+# Install Packages
+Packages that need to be installed: 
+
+- GGally
+- igraph
+- fastICA
+- kknn
+- ggplot2
+- huge
+- glasso
+- umap
+- NMF
+- ISLR
+- sda
+- Rtsne
+- tidyr
+- glmnet
+
+
+