snakemake-assembly-postprocessing/config/README.md at 328c40a50be8e475a28bc3349b1311750ce84932 · MPUSP/snakemake-assembly-postprocessing

Workflow overview

A Snakemake workflow for the post-processing of microbial genome assemblies.

Parse samples.csv table containing the samples's meta data (python)
Annotate assemblies using one of the following tools:
1. NCBI's Prokaryotic Genome Annotation Pipeline (PGAP). Note: needs to be installed manually
2. prokka, a fast and light-weight prokaryotic annotation tool
3. bakta, a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
Create a QC report for the assemblies using Quast
Create a pangenome analysis (orthologs/homologs) using Panaroo
Compute pairwise average nucleotide identity (ANI) between the assemblies using FastANI and plot a phylogenetic tree based on the ANI distances.

This workflow requires fasta input data. The samplesheet table has the following layout:

sample	species	strain	id_prefix	file
EC2224	"Streptococcus pyogenes"	SF370	SPY	assembly.fasta
...	...	...	...	...

Note: Pangenome analysis with Panaroo and pairwise similarity analysis with FastANI requires at least two samples.