A multimodal clinical and brain morphology analysis
Repository: alzheimer_brain_morphology_mental_health
Goal: Identify statistical and predictive relationships between brain morphology, daily functioning, mental health severity, and Alzheimer’s diagnosis.
This project investigates how structural brain measures and mental‑health‑related clinical indices relate to Alzheimer’s disease severity and diagnosis. It integrates:
- clinical severity indices
- daily activity performance
- brain volume measurements
- group‑based statistical tests (ANOVA, MANOVA, Tukey)
- predictive modeling (Linear Regression, Random Forest)
- correlation analysis between neuroimaging and clinical variables
The entire codebase follows a clean, modular, industry‑grade architecture, with strict separation between computation, visualization, and result storage.
alzheimer_brain_morphology_mental_health/
│
├── data/
│ ├── raw/ # Raw data
│ └── processed/ # Cleaned and transformed data
│
├── notebooks/
│ ├── 01_EDA_relationships.ipynb # Exploratory Data Analysis
│ ├── 02_modeling.ipynb # Predictive modeling
│ ├── 03_correlation_analysis.ipynb # Statistical correlations
│ └── 04_subgroups.ipynb # Subgroup-based analysis
│
├── src/
│ ├── analysis.py # ANOVA, MANOVA, correlations, chi-square, Tukey
│ ├── config.py # Global variables, column groups, parameters
│ ├── modeling.py # LR, RF, model comparison
│ ├── preprocessing.py # Data loading, cleaning, missing values
│ └── visualization.py # Modular plotting utilities
│
├── reports/
│ ├── figures/ # Generated plots
│ ├── tables/ # Statistical outputs (ANOVA, Tukey, MANOVA…)
│ └── executive_summary.md # High-level summary of findings
│
├── spark/ # Optional PySpark version (non‑required)
│ ├── spark_version_optional.ipynb
│ └── environment.yml
│
├── requirements.txt
├── README.md
├── Project_Highlights.md
└── .gitignore
The spark/ folder contains:
spark_version_optional.ipynbenvironment.yml
This reflects the original PySpark implementation of the project.
However, the repository has been intentionally adapted so that:
- recruiters do NOT need to install PySpark
- the main workflow runs entirely in pandas
- Spark is provided only as an optional, advanced version
This design ensures:
- lightweight execution
- compatibility with standard Python environments
- demonstration of scalability to distributed systems without imposing Spark as a dependency
The Spark notebook showcases the ability to scale the analysis to large datasets while keeping the main workflow accessible.
- CSV loading
- numeric conversion
- missing value handling
- clean separation between raw and processed data
- detection of severity and activity columns
- descriptive statistics
- visualization of group differences
- Combined subgroup ANOVA
- Tukey HSD post‑hoc comparisons
- MANOVA for multivariate effects
- Correlation matrices and p‑value heatmaps
- ANOVA with covariates (
female,educ)
- Linear Regression
- Random Forest
- R² comparison across severity indices
- Feature importance extraction
- Regression plots
- Barplots and group‑mean plots
- Feature importance charts
- Heatmaps (correlations, p‑values)
All visualizations are optional to save (save_path=None by default), ensuring full modularity.
- data loading
- numeric conversion
- missing value handling
- summary statistics
- Pearson correlations
- ANOVA, MANOVA
- Chi‑square tests
- Tukey post‑hoc
- Returns clean DataFrames
- Linear Regression
- Random Forest
- Model comparison utilities
- Returns structured dictionaries (model, R², predictions, importances)
- Modular plotting utilities
- Saving only when
save_pathis provided - Consistent, publication‑ready aesthetics
- Column groups
- Brain volume categories
- Statistical parameters
- Default covariates
Python 3.10+ pandas numpy matplotlib seaborn scikit-learn statsmodels scipy
PySpark is optional and only required for the notebook inside /spark/.
- Clone the repository
- Install dependencies
- Run notebooks in order (
01_...→02_...→03_...→04_...) - Results will be saved automatically to
reports/tables/andreports/figures/
- Several daily activities show significant differences by gender and education.
- Women exhibit stronger impairments in stove use, attention, social events, and games.
- Strong correlations exist between temporal/hippocampal volumes and clinical severity.
- Random Forest consistently outperforms Linear Regression in predictive accuracy.
MIT License.
Patricia C. Torrell
Clinical Data Analyst transitioning into Data Analytics
Focused on clinical modeling, reproducible pipelines, and interpretable ML.
LinkedIn: https://www.linkedin.com/in/patricia-c-torrell GitHub: https://github.com/PatriCT240.github.io
- Industry‑grade project architecture with strict modular separation (
preprocessing,analysis,modeling,visualization,config). - Reproducible and transparent workflow, with all saving logic handled from notebooks and no side‑effects inside
src/. - Advanced statistical expertise: ANOVA, MANOVA, Tukey HSD, correlation matrices, chi‑square tests, subgroup analysis.
- Predictive modeling proficiency using Linear Regression and Random Forest, with structured model outputs and feature importance analysis.
- Clinical domain understanding, working with severity indices, daily functioning measures, and structural brain morphology metrics.
- Clean data engineering practices: raw vs processed data separation, numeric conversion, missing‑value strategies, and standardized preprocessing.
- Professional visualization layer with modular, publication‑ready plots and optional saving paths.
- PySpark‑ready pipeline included as an optional scalable version, demonstrating ability to work with distributed systems without imposing heavy dependencies.
- Clear communication and documentation, including an executive summary, project highlights, and a recruiter‑friendly README.