Identifying differentially methylated CpG sites between healthy and diseased individuals using Illumina HumanMethylation450k array data.
Course: DNA/RNA Dynamics
Platform: Illumina HumanMethylation450k Array
Language: R (Bioconductor)
This project performs a full differential DNA methylation analysis pipeline, from raw IDAT files through quality control, normalization, statistical testing, and visualization. The goal is to detect CpG sites with significant methylation differences between healthy and diseased sample groups.
Raw IDAT Files
↓
Sample & Probe Quality Control
↓
Detection p-value Filtering (threshold: 0.05)
↓
Quantile Normalization (preprocessQuantile)
↓
Statistical Testing (t-tests per CpG)
↓
Correction for Multiple Testing
↓
Visualization & Reporting
epigenetic-methylation-450k/
├── analysis.R # Complete methylation analysis pipeline
├── .gitignore # Files to ignore (data, outputs)
└── README.md # This file
Note: Raw IDAT files and sample sheets are not included due to file size. Place them in the working directory before running the script.
git clone https://github.com/MahanBalooei/epigenetic-methylation-450k.git
cd epigenetic-methylation-450kif (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"minfi",
"IlluminaHumanMethylation450kmanifest",
"IlluminaHumanMethylation450kanno.ilmn12.hg19"
))
install.packages(c("gplots", "qqman"))Place your raw IDAT files (or methylation beta matrix) and sample sheet into the working directory. Update file paths and sample group labels in analysis.R as needed.
Open analysis.R in RStudio and run the script, or from the terminal:
Rscript analysis.R| Package | Purpose |
|---|---|
minfi |
Reading IDAT files, QC, normalization |
IlluminaHumanMethylation450kmanifest |
Array annotation |
gplots |
Heatmap visualization |
qqman |
Manhattan and QQ plots |
The pipeline produces the following plots:
- Density plots — Beta and M value distributions (raw vs. normalized)
- 6-panel QC plot — Overall sample quality assessment
- PCA plots — Colored by group, sex, and Sentrix ID
- Boxplots — p-value distributions (raw, uncorrected, corrected)
- Volcano plot — Significant CpG sites by effect size and significance
- Manhattan plot — Genome-wide view of differential methylation
- Heatmap — Hierarchical clustering of top differentially methylated probes
Quality Control
- Per-sample and per-probe QC using
minfi - Detection p-value filtering at threshold 0.05 to remove unreliable probes
Normalization
- Quantile normalization via
preprocessQuantileto reduce technical variation between arrays
Statistical Testing
- Pairwise t-tests for each CpG site between healthy and diseased groups
- Multiple testing correction applied to control false discovery rate
| Name | GitHub |
|---|---|
| Elif Güler | @elif-guler |
| Eyip Sinay Dalmaz | @Dalmaz-ES |
| Simay Erol | @Simay9 |
| Barkin Kemec | @mbkemec |
| Negin Nilforoosh | @neginnilforosh |
| Kimia Kanouni | @kanounik |
| Mahan Balooei | @MahanBalooei |
This project is licensed under the MIT License — see the LICENSE file for details.