Skip to content

Commit 5d6c810

Browse files
committed
feat: fix panaroo bug, added fastANI rule, harmonized multi-threading
1 parent 50f20da commit 5d6c810

9 files changed

Lines changed: 75 additions & 9 deletions

File tree

.test/config/config.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,7 @@ panaroo:
2727
remove_source: "cmsearch"
2828
remove_feature: "tRNA|rRNA|ncRNA|exon|sequence_feature"
2929
extra: "--clean-mode strict --remove-invalid-genes"
30+
31+
fastani:
32+
skip: False
33+
extra: ""

config/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ A Snakemake workflow for the post-processing of microbial genome assemblies.
99
3. [bakta](https://github.com/oschwengers/bakta), a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
1010
3. Create a QC report for the assemblies using [Quast](https://github.com/ablab/quast)
1111
4. Create a pangenome analysis (orthologs/homologs) using [Panaroo](https://gthlab.au/panaroo/)
12+
5. Compute pairwise average nucleotide identity (ANI) between the assemblies using [FastANI](https://github.com/ParBLiSS/FastANI) and plot a phylogenetic tree based on the ANI distances.
1213

1314
## Running the workflow
1415

@@ -22,4 +23,4 @@ The samplesheet table has the following layout:
2223
| EC2224 | "Streptococcus pyogenes" | SF370 | SPY | assembly.fasta |
2324
| ... | ... | ... | ... | ... |
2425

25-
**Note:** Pangenome analysis with `Panaroo` requires at least two samples.
26+
**Note:** Pangenome analysis with `Panaroo` and pairwise similarity analysis with `FastANI` requires at least two samples.

config/config.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,7 @@ panaroo:
2727
remove_source: "cmsearch"
2828
remove_feature: "tRNA|rRNA|ncRNA|exon|sequence_feature"
2929
extra: "--clean-mode strict --remove-invalid-genes"
30+
31+
fastani:
32+
skip: False
33+
extra: ""

config/schemas/config.schema.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,17 @@ properties:
119119
type: string
120120
description: Extra command-line arguments for Panaroo
121121
default: "--clean-mode strict --remove-invalid-genes"
122-
122+
fastani:
123+
type: object
124+
properties:
125+
skip:
126+
type: boolean
127+
description: Whether to skip FastANI analysis
128+
default: false
129+
extra:
130+
type: string
131+
description: Extra command-line arguments for FastANI
132+
default: ""
123133
required:
124134
- samplesheet
125135
- tool
@@ -128,3 +138,4 @@ required:
128138
- bakta
129139
- quast
130140
- panaroo
141+
- fastani

workflow/envs/fastani.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
name: panaroo
2+
channels:
3+
- conda-forge
4+
- bioconda
5+
- nodefaults
6+
dependencies:
7+
- fastani=1.34

workflow/envs/panaroo.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,5 @@ channels:
66
dependencies:
77
- numpy=1.26.4
88
- scipy=1.11.4
9-
- panaroo=1.5.2
9+
- biopython=1.84
10+
- panaroo=1.6.0

workflow/rules/annotate.smk

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@ rule annotate_pgap:
4848
"results/annotation/pgap/logs/{sample}_pgap.log",
4949
conda:
5050
"../envs/base.yml"
51-
threads: 1
5251
params:
5352
pgap=config["pgap"]["bin"],
5453
use_yaml_config=config["pgap"]["use_yaml_config"],
@@ -87,7 +86,7 @@ rule annotate_prokka:
8786
"results/annotation/prokka/logs/{sample}_prokka.log",
8887
conda:
8988
"../envs/prokka.yml"
90-
threads: workflow.cores * 0.25
89+
threads: max(workflow.cores * 0.5, 1)
9190
params:
9291
prefix=lambda wc: wc.sample,
9392
locustag=lambda wc: samples.loc[wc.sample]["id_prefix"],
@@ -127,7 +126,7 @@ rule get_bakta_db:
127126
"results/annotation/bakta/database/db.log",
128127
conda:
129128
"../envs/bakta.yml"
130-
threads: workflow.cores * 0.25
129+
threads: max(workflow.cores * 0.25, 1)
131130
params:
132131
download_db=config["bakta"]["download_db"],
133132
existing_db=config["bakta"]["existing_db"],
@@ -160,7 +159,7 @@ rule annotate_bakta:
160159
"results/annotation/bakta/logs/{sample}_bakta.log",
161160
conda:
162161
"../envs/bakta.yml"
163-
threads: workflow.cores * 0.25
162+
threads: max(workflow.cores * 0.25, 1)
164163
params:
165164
prefix=lambda wc: wc.sample,
166165
locustag=lambda wc: format_bakta_locustag(samples.loc[wc.sample]["id_prefix"]),

workflow/rules/common.smk

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,11 @@ def get_final_input(wildcards):
6262
"results/qc/panaroo/{tool}/summary_statistics.txt",
6363
tool=config["tool"],
6464
)
65+
if len(samples.index) > 1 and not config["fastani"]["skip"]:
66+
inputs += expand(
67+
"results/qc/fastani/{tool}/summary.txt",
68+
tool=config["tool"],
69+
)
6570
return inputs
6671

6772

workflow/rules/qc.smk

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ rule quast:
77
"results/qc/quast/{tool}/quast.log",
88
conda:
99
"../envs/quast.yml"
10-
threads: 4
10+
threads: max(workflow.cores * 0.5, 1)
1111
params:
1212
outdir=lambda wc, output: os.path.dirname(output.report),
1313
ref_fasta=(
@@ -36,6 +36,40 @@ rule quast:
3636
"""
3737

3838

39+
rule fastani:
40+
input:
41+
fasta=get_quast_fasta,
42+
output:
43+
txt="results/qc/fastani/{tool}/summary.txt",
44+
log:
45+
"results/qc/fastani/{tool}/fastani.log",
46+
conda:
47+
"../envs/fastani.yml"
48+
threads: max(workflow.cores * 0.5, 1)
49+
params:
50+
outdir=lambda wc, output: os.path.dirname(output.txt),
51+
ref_fasta=(
52+
" ".join(["-r", config["quast"]["reference_fasta"]])
53+
if config["quast"]["reference_fasta"]
54+
else []
55+
),
56+
extra=config["fastani"]["extra"],
57+
message:
58+
"""--- Running FastANI to compare genome similarity (all vs all) ---"""
59+
shell:
60+
"""
61+
printf '%s\n' {input.fasta} > {params.outdir}/input_files.txt;
62+
{params.ref_fasta} >> {params.outdir}/input_files.txt;
63+
fastANI \
64+
--ql {params.outdir}/input_files.txt \
65+
--rl {params.outdir}/input_files.txt \
66+
--output {output.txt} \
67+
--threads {threads} \
68+
{params.extra} \
69+
> {log} 2>&1
70+
"""
71+
72+
3973
rule prepare_panaroo:
4074
input:
4175
fasta="results/annotation/{tool}/{sample}/{sample}.fna",
@@ -74,7 +108,7 @@ rule panaroo:
74108
"results/qc/panaroo/{tool}/panaroo.log",
75109
conda:
76110
"../envs/panaroo.yml"
77-
threads: 4
111+
threads: max(workflow.cores * 0.5, 1)
78112
params:
79113
outdir=lambda wc, output: os.path.dirname(output.stats),
80114
extra=config["panaroo"]["extra"],

0 commit comments

Comments
 (0)