This repository contains the complete downstream RNA-seq analysis pipeline for transcriptomic data derived from heterozygous and homozygous NAA15 mutant samples. The analysis spans differential expression, functional enrichment (GO/KEGG), and targeted screening for overlap with curated neurodevelopmental disorder (NDD) gene sets, with the aim of contextualizing transcriptional alterations in a human NDD framework.
- 🔬 Differential Expression — DESeq2 with apeglm shrinkage to identify significant DEGs across Homo and Hetero contrasts, with volcano plots, MA plots, and heatmaps.
- 📊 Exploratory Data Analysis — Variance Stabilizing Transformation (VST) and PCA to assess sample quality and genotype-driven variation.
- 🧩 Functional Enrichment — GO Biological Process and KEGG pathway over-representation analysis using clusterProfiler.
- 🧠 NDD Gene Contextualization — Systematic screening of DEGs against a curated NDD gene list, validated with Fisher's Exact Test and fgsea GSEA.
- R ≥ 4.5.0
- renv for reproducible dependency management
git clone https://github.com/Hariharan-M-2/Neurodevelopmental-Disorder-RNAseq-Analysis.git
cd Neurodevelopmental-Disorder-RNAseq-AnalysisAll package versions are locked in renv.lock. Restore the environment in R:
if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
renv::restore()Open the project in RStudio (or set the working directory to the project root) and run:
source("analysis/00_run_pipeline.r")This will execute all analysis scripts sequentially, creating outputs in output/ and plots in plots/.
Neurodevelopmental-Disorder-RNAseq-Analysis/
│
├── analysis/
│ ├── 00_run_pipeline.r # Runs the full workflow
│ ├── 01_gene_annotation.r # Converts Ensembl IDs to gene symbols
│ ├── 02_deseq_dataset_creation.r # DESeq2 object creation & filtering
│ ├── 03_exploratory_data_analysis.r # Perform VST and PCA
│ ├── 04_differential_expression_analysis.r # Run DESeq2 for DE analysis
│ ├── 05_functional_enrichment_analysis.r # GO & KEGG enrichment analysis
│ └── 06_ndd_gene_presence_analysis.r # NDD gene screening & GSEA
│
├── data/ # Input data (counts, metadata, NDD list)
├── docs/ # Biological interpretation & figures
├── case_studies/ # Extended NDD case study analysis
├── renv.lock # Locked dependency versions
└── README.md
| Document | Description |
|---|---|
docs/experimental-design.md |
Study objective, sample design (13 SRA samples across 5 genotypes), and detailed methodology for each analysis step |
docs/biological-interpretation.md |
Combined results & discussion with figures, GO/KEGG enrichment tables, and NDD gene contextualization |
case_studies/genotype_ndd_analysis.md |
In-depth case study on NAA15 biology — NatA complex function, genotype–phenotype correlations, and molecular mechanisms linking NAA15 mutations to NDD |
Raw Count Matrix + Metadata
↓
Gene Annotation
↓
DESeq2 Dataset Creation
↓
Exploratory Data Analysis
↓
Differential Expression (Homo vs WT, Hetero vs WT)
↓
GO & KEGG Functional Enrichment
↓
NDD Gene Screening & GSEA
🧑🏻💻 Hariharan M - This work was completed during an internship at the Computational and Genomics Lab, CRIC — CDFD, Hyderabad under the supervision of Dr. Akaash Ranjan.
The goal is to turn data into information 🦋 , and information into insight.