Mold Mutation Detection Pipeline

Author: PSBioinfo
Objective: Build a Python pipeline to identify mutations responsible for different mold colors in ear samples of 50 patients.

Overview

This pipeline performs the following steps:

Demultiplexes pooled FASTQ files by barcodes from clinical metadata.
Trims low-quality reads.
Aligns reads to a reference genome using BWA.
Converts & sorts alignment files using SAMtools.
Detects mutations using pileup and basic variant analysis.
Generates a report mapping mutations to mold color and read counts per sample.

Project Structure

mold-mutation-pipeline/ ├── data/ # Input files (mock FASTQ and metadata) ├── scripts/ # Python scripts (Pipeline and parser) ├── output/ # Output report file ├── fastqs/ # Auto-generated trimmed FASTQ files ├── sam_files/ # Auto-generated SAM files ├── bam_files/ # Auto-generated sorted BAM files ├── requirements.txt # Python dependencies (pysam) ├── README.md # This project description └── LICENSE # MIT License

Sample Data

Mock data files are provided for demonstration purposes:

data/sample.fastq: Two short FASTQ reads with barcodes
data/sample_clinical_data.txt: Clinical data file with sample names, mold colors, and barcodes

These files simulate the input format expected by the pipeline and allow the code to be tested without real patient data.

How to Run

From the root directory, execute: python3 scripts/Pipeline.py
-f data/sample.fastq
-c data/sample_clinical_data.txt
-o fastqs

This will create intermediate files and generate a file report.txt in the output/ directory

Example Output

Sample Patient01 had a green mold, 1452 reads, and had 48.62% of reads at position 125 with the mutation T. Sample Patient02 had a blue mold, 1380 reads, and had 36.44% of reads at position 147 with the mutation A.

Notes

The reference genome (dgorgon_reference.fa) should be placed in the project directory and indexed with bwa index before alignment.
Intermediate files (fastqs/, sam_files/, bam_files/) are auto-created during runtime.
The script expects barcode matching to be exact.

Dependencies

Install Python dependencies:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
Pipeline.py		Pipeline.py
READme.md		READme.md
Sample.fastq.txt		Sample.fastq.txt
parseFastq.py		parseFastq.py
requirements.txt		requirements.txt
sample_clinical_data.txt		sample_clinical_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mold Mutation Detection Pipeline

Overview

Project Structure

Sample Data

How to Run

Example Output

Notes

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mold Mutation Detection Pipeline

Overview

Project Structure

Sample Data

How to Run

Example Output

Notes

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages