Skip to content

zavolanlab/TemperatureDependentCelegansAging_Spang

Repository files navigation

Temperature-Dependent C.elegans Aging (In collaboration with Anne Spang lab) - Analysis and Pipelines

This repository contains the computational workflows and downstream analysis notebooks related to the experiments on Temperature-Dependent C.elegans Aging conducted by the lab of Prof. Anne Spang.

The repository is optimized for running BOTH the workflows and analysis in jupyter notebook on HPC cluster.

On sciCORE HPC, running jupyter notebook on a computational node is nicely enabled by OnDemand service.

We utilize a hybrid approach: Snakemake for robust, scalable data processing on HPC clusters (sciCORE), and Jupyter Notebooks for interactive downstream analysis and visualization.

Current state

Currently, we're analyzing the bulk RNA-seq data from C.elegans kept under two conditions: stable 23C and oscilating temperature

Repository Structure

.
├── TemperatureDependentCelegansAging_Spang.current.ipynb                  # a Jupyter notebook dedicated to the project, includes analysis and workflow configuration
├── TemperatureDependentCelegansAging_Spang.template.env       # Template for required environment variables/paths
└── WF/                                 # Snakemake Workflow Engine
    ├── Snakefile-prepare-faster-se     # 1st step of the Pipeline: RNA-seq data processing (alignment, FastQC) optimized for single-end reads prepared with Takara SMART-Seq mRNA LP kit
    ├── Snakefile-quantification-faster # 2nd step of the Pipeline: quantification of gene expression
    ├── config.template.yaml            # Template configuration for Snakemake parameters
    ├── envs/                           # Conda environments isolated for specific Snakemake rules
    ├── profile/                        # SLURM execution profile for the HPC
    └── scripts/                        # Python and R scripts utilized by both Snakemake and Jupyter

Quick Start & Setup

To ensure strict reproducibility and security, this project uses .env files to manage all absolute paths (data directories, genome annotations, etc.). Do not hardcode paths into the Python or Snakemake files.

1. Clone the Repository

Clone this repository into your local user space ($HOME):

git clone https://github.com/zavolanlab/TemperatureDependentCelegansAging_Spang.git
cd TemperatureDependentCelegansAging_Spang

2. Configure Environment Paths

You must map the project to your local HPC paths. First, copy the template, rename it, and fill in your absolute paths, for example like that:

cp TemperatureDependentCelegansAging_Spang.template.env TemperatureDependentCelegansAging_Spang.scicore.env
# Open .env and edit the "Base Directories" section to match your system
  • Recommended if you are a Zavolan group member on sciCORE: move the TemperatureDependentCelegansAging_Spang.scicore.env to Project GROUP folder and symlink into your local repository directory:
    ln -s <a file with specified sciCORE paths> TemperatureDependentCelegansAging_Spang.scicore.env

This way TemperatureDependentCelegansAging_Spang.scicore.env will be automatically accessible by group members but will not be tracked by git. (Note: *.env files are ignored by git to protect private cluster paths, except the TemperatureDependentCelegansAging_Spang.template.env file). **(TemperatureDependentCelegansAging_Spang.scicore.env does exist in the GROUP folder of the Project on Scicore. Look for README there.)

3. Install the conda environment with zavolab_pyutils

Analysis in the notebook is largely based on the functions from zavolab_pyutils repository. Follow the instruction from that repo "Developer Setup from source, with conda environment". Use the created conda environment "zavolab_pyutils" to execute the Jupyter Notebook.

4. Essential for developpers! Install nbstripout

When in the TemperatureDependentCelegansAging_Spang directory, run:

nbstripout --install

This will automatically hide the output of cells in juputer notebooks when pushed to github! Otherwise there is a risk of exposing your HPC cluster paths to public.

5. Use the juputer notebook to configure the workflow and input table preparation

Configuration of the workflows (i.e. creation of input .tsv with sample specification and .yaml config is done inside the jupyter notebook)

6. Executing the Workflows

The heavy lifting is divided into (currently, three) separate Snakemake workflows located in the WF/ directory.

Bash commands are also prepared inside the jupyter notebook. They should be further copied into command line and executed.

On an HPC cluster like sciCORE, workflows should be executed on a login node. Snakemake further automatically submits jobs to computational nodes.

7. Downstream Analysis

Once the Snakemake workflows are complete, all results are routed to the shared group directories defined in your .env file.

Use respective sections of the Jupyter Notebook to analyze the outputs.

The notebook automatically loads your .env paths using python-dotenv, allowing it to dynamically locate all workflow results, figures, and metadata regardless of where you cloned this repository.

About

This repository contains the computational workflows and downstream analysis notebooks related to the experiments on Temperature-Dependent C.elegans Aging conducted by the lab of Prof. Anne Spang.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors