Skip to content

kircherlab/mpra_capture_flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake workflow: MPRACaptureFlow

Snakemake

Build Status

Workflow for processing and analyzing Capture-C MPRA data, following the methodology described in the preprint Capture-C MPRA: A high-throughput method to simultaneously characterize promoter interactions and regulatory activity by Arnould, Keukeleire, et al. (2025), https://doi.org/10.1101/2025.06.11.658967.

Developers

  • Pia Keukeleire (@pi-zz-a), Institute of Human Genetics, UKSH / University of Lübeck

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository as well as the preprint (see above).

Cloning this repository, as well as those required (described in step 2) takes only a minute on a normal machine. Snakemake will install all required packages in conda environments automatically, which might take a few minutes up to an hour.

Running the workflow on the provided sample data should take around 5 minutes on a normal machine.

Step 1: Obtain a copy of this workflow

  1. Create a new github repository using this workflow as a template.
  2. Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Download required tools

This workflow relies on CHiCAGO (Freire-Pritchett et al., 2021) for identifying significant cHi-C loops, on MPRAsnakeflow for creating count tables from the MPRA sequencing data and on BCalm (Keukeleire et al., 2025) for quantifying MPRA activity.

Step 3: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup. For running the workflow on the small example dataset (data/small_test.fastq.gz), one can use config/example_config.yaml and config/example_samples.tsv.

Step 4: Install Snakemake

Install Snakemake version >= 7.15.2 using conda:

conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation.

Step 5: Execute workflow

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda --configfile config/example_config.yaml -n

The workflow needs to be run twice: once to get the input files for MPRAsnakeflow, and then after running MPRAsnakeflow, once to get the BCalm output quantification. For the first run, the third rule in the Snakefile needs to be commented out. For the second run, include the third rule.

For the example data, I included an example output count matrix which you can find in data/mprasnakeflow_example_counts.tsv.gz.

For running MPRAsnakeflow, one can use the following configurations (for more detailed instructions, see the MPRAsnakeflow repository):

---
experiments:
example:
	bc_length: 15
	umi_length: 16
	data_folder: data/ # folder containing your MPRA sequencing files
	experiment_file: # file describing the sequencing files
	demultiplex: false
	assignments:
		fromFile:
			type: file
			assignment_file: mpra_capture_flow/results/example_project/mprasf/assignment_barcodes.sorted.tsv.gz
	design_file: ../../mpra_capture_flow/results/example_project/mprasf/mprasnakeflow_design.fa
	configs:
		minimal:
			filter:
				bc_threshold: 1
				min_dna_counts: 1
				min_rna_counts: 1

For further analysis and for reproducing the manuscript figures, see the repository containing my analysis notebooks: https://github.com/kircherlab/CMPRA_figures.

See the Snakemake documentation for further details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors