Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 1.56 KB

File metadata and controls

26 lines (19 loc) · 1.56 KB

Workflow overview

A Snakemake workflow for the post-processing of microbial genome assemblies.

  1. Parse samples.csv table containing the samples's meta data (python)
  2. Annotate assemblies using one of the following tools:
    1. NCBI's Prokaryotic Genome Annotation Pipeline (PGAP). Note: needs to be installed manually
    2. prokka, a fast and light-weight prokaryotic annotation tool
    3. bakta, a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
  3. Create a QC report for the assemblies using Quast
  4. Create a pangenome analysis (orthologs/homologs) using Panaroo
  5. Compute pairwise average nucleotide identity (ANI) between the assemblies using FastANI and plot a phylogenetic tree based on the ANI distances.

Running the workflow

Input data

This workflow requires fasta input data. The samplesheet table has the following layout:

sample species strain id_prefix file
EC2224 "Streptococcus pyogenes" SF370 SPY assembly.fasta
... ... ... ... ...

Note: Pangenome analysis with Panaroo and pairwise similarity analysis with FastANI requires at least two samples.