Skip to content

BojarLab/Bio513_metabolomics

Repository files navigation

Computational Metabolomics (Computer Lab Module in BIO513)

The Lab

The aim of this lab is to gain practical insight into metabolomics by reading the article:

Mickiewicz B. et al. (2020). NMR-based metabolic profiling provides diagnostic and prognostic information in critically ill children with suspected infection. Critical Care 16:R172. https://doi.org/10.1038/s41598-020-77319-0

and analysing the corresponding dataset children_infection.csv using Python.

The lab has three steps:

  1. Read the relevant parts of the article and try to understand what analyses were performed and what the main findings were.
  2. Apply the uni- and multivariate methods you have learnt on the same data using Python. Three student notebooks (block1, block2, block3) are provided as a starting point. The full worked example (metabolomics_workflow_notebook.ipynb) and its documentation (Guide_to_Metabolomics_Workflow.docx) are also available for reference.
  3. Compare your results with the article and discuss similarities, differences, and possible reasons for them.

You are free to perform any relevant analyses. However, since PCA and OPLS-DA are standard methods in metabolomics, you are encouraged to include them. Univariate analysis should also be performed.

Note on software: R is more commonly used in metabolomics research due to the availability of specialised packages. Here we use Python, which gives you more flexibility in how you structure the analysis.


The Dataset

The dataset children_infection.csv contains ¹H NMR urine metabolite profiles from children admitted to a paediatric intensive care unit (PICU), across three groups: Infection, SIRS (systemic inflammatory response without confirmed infection), and Control (healthy children).


Repository Contents

File Description
block1_data_loading_preprocessing.ipynb Student notebook — Block 1: data loading and preprocessing
block2_univariate_pca.ipynb Student notebook — Block 2: univariate analysis and PCA
block3_oplsda_visualisation.ipynb Student notebook — Block 3: OPLS-DA and visualisation
metabolomics_workflow_notebook.ipynb Full worked example pipeline for reference
Guide_to_Metabolomics_Workflow.docx Guide to the workflow and all parameters
children_infection.csv NMR metabolite dataset (all three groups)

Getting Started

Option A — Google Colab (recommended, no installation required)

  1. Go to colab.research.google.com, choose File → Open notebook → GitHub, and paste this repository URL.

  2. Run the Step 1 cell to install packages. When prompted, go to Runtime → Restart session.

  3. Run the Step 2 cell — children_infection.csv will be downloaded automatically from GitHub.

⚠️ Colab sessions are temporary. Download any output files you want to keep before closing the session. At the start of the next block, upload the files from the previous block when prompted.

Option B — Local Jupyter

  1. Clone the repository:
git clone https://github.com/BojarLab/Bio513_metabolomics.git
cd Bio513_metabolomics
  1. Install dependencies:
pip install pandas numpy matplotlib scipy scikit-learn seaborn
  1. Launch Jupyter and open the notebooks in order:
jupyter notebook

Intermediate Files

Each block saves output files that are used as input in the next block:

File Generated by Used by
results/processed_data.csv Block 1 Blocks 2 & 3
results/transformed_unscaled_data.csv Block 1 Blocks 2 & 3
results/univariate_results.csv Block 2 Block 3
results/metabolomics_report.pdf Block 3 Report submission

Local Jupyter: Files are saved and loaded automatically.
Google Colab: Download files at the end of each block and re-upload them at the start of the next.


Group Work

The lab is designed to be completed in same groups as your seminar. Groups should:

  • Discuss results together before writing them
  • Submit one report per group

Report Guidelines

Submit one report per group, written as a short scientific paper (approximately 1500–2500 words, excluding figures). Present your results and compare methods and findings with those reported in the original article. Discuss your results in relation to the study's conclusions.

Structure

Section Content
Introduction Background on NMR metabolomics, the clinical context, and the aim of your analysis
Methods Preprocessing choices with justification; statistical tests; multivariate modelling; validation strategy
Results Univariate findings, PCA, OPLS-DA performance and top metabolites — reference your figures
Discussion Biological interpretation; comparison with Mickiewicz et al.; limitations

Figures

Include at least four figures with captions:

  1. Volcano plot
  2. PCA scores plot
  3. OPLS-DA scores plot and permutation test
  4. VIP plot or S-plot
  5. (Optional) Boxplots or clustered heatmap

Assessment criteria

  • Correct execution and interpretation of the analyses
  • Quality and clarity of figures and captions
  • Depth of biological interpretation and connection to the original article
  • Critical discussion of methodological choices and their limitations
  • Clarity of writing

About

Jupyter notebooks for the metabolomics computer lab in BIO513 — preprocessing, univariate statistics, PCA, and OPLS-DA of NMR urine metabolite profiles from critically ill children, using Python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors