The aim of this lab is to gain practical insight into metabolomics by reading the article:
Mickiewicz B. et al. (2020). NMR-based metabolic profiling provides diagnostic and prognostic information in critically ill children with suspected infection. Critical Care 16:R172. https://doi.org/10.1038/s41598-020-77319-0
and analysing the corresponding dataset children_infection.csv using Python.
The lab has three steps:
- Read the relevant parts of the article and try to understand what analyses were performed and what the main findings were.
- Apply the uni- and multivariate methods you have learnt on the same data using Python. Three student notebooks (
block1,block2,block3) are provided as a starting point. The full worked example (metabolomics_workflow_notebook.ipynb) and its documentation (Guide_to_Metabolomics_Workflow.docx) are also available for reference. - Compare your results with the article and discuss similarities, differences, and possible reasons for them.
You are free to perform any relevant analyses. However, since PCA and OPLS-DA are standard methods in metabolomics, you are encouraged to include them. Univariate analysis should also be performed.
Note on software: R is more commonly used in metabolomics research due to the availability of specialised packages. Here we use Python, which gives you more flexibility in how you structure the analysis.
The dataset children_infection.csv contains ¹H NMR urine metabolite profiles from children admitted to a paediatric intensive care unit (PICU), across three groups: Infection, SIRS (systemic inflammatory response without confirmed infection), and Control (healthy children).
| File | Description |
|---|---|
block1_data_loading_preprocessing.ipynb |
Student notebook — Block 1: data loading and preprocessing |
block2_univariate_pca.ipynb |
Student notebook — Block 2: univariate analysis and PCA |
block3_oplsda_visualisation.ipynb |
Student notebook — Block 3: OPLS-DA and visualisation |
metabolomics_workflow_notebook.ipynb |
Full worked example pipeline for reference |
Guide_to_Metabolomics_Workflow.docx |
Guide to the workflow and all parameters |
children_infection.csv |
NMR metabolite dataset (all three groups) |
-
Go to colab.research.google.com, choose File → Open notebook → GitHub, and paste this repository URL.
-
Run the Step 1 cell to install packages. When prompted, go to Runtime → Restart session.
-
Run the Step 2 cell —
children_infection.csvwill be downloaded automatically from GitHub.
⚠️ Colab sessions are temporary. Download any output files you want to keep before closing the session. At the start of the next block, upload the files from the previous block when prompted.
- Clone the repository:
git clone https://github.com/BojarLab/Bio513_metabolomics.git
cd Bio513_metabolomics- Install dependencies:
pip install pandas numpy matplotlib scipy scikit-learn seaborn- Launch Jupyter and open the notebooks in order:
jupyter notebookEach block saves output files that are used as input in the next block:
| File | Generated by | Used by |
|---|---|---|
results/processed_data.csv |
Block 1 | Blocks 2 & 3 |
results/transformed_unscaled_data.csv |
Block 1 | Blocks 2 & 3 |
results/univariate_results.csv |
Block 2 | Block 3 |
results/metabolomics_report.pdf |
Block 3 | Report submission |
Local Jupyter: Files are saved and loaded automatically.
Google Colab: Download files at the end of each block and re-upload them at the start of the next.
The lab is designed to be completed in same groups as your seminar. Groups should:
- Discuss results together before writing them
- Submit one report per group
Submit one report per group, written as a short scientific paper (approximately 1500–2500 words, excluding figures). Present your results and compare methods and findings with those reported in the original article. Discuss your results in relation to the study's conclusions.
| Section | Content |
|---|---|
| Introduction | Background on NMR metabolomics, the clinical context, and the aim of your analysis |
| Methods | Preprocessing choices with justification; statistical tests; multivariate modelling; validation strategy |
| Results | Univariate findings, PCA, OPLS-DA performance and top metabolites — reference your figures |
| Discussion | Biological interpretation; comparison with Mickiewicz et al.; limitations |
Include at least four figures with captions:
- Volcano plot
- PCA scores plot
- OPLS-DA scores plot and permutation test
- VIP plot or S-plot
- (Optional) Boxplots or clustered heatmap
- Correct execution and interpretation of the analyses
- Quality and clarity of figures and captions
- Depth of biological interpretation and connection to the original article
- Critical discussion of methodological choices and their limitations
- Clarity of writing