You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/docs/workflows/MPUSP/snakemake-assembly-postprocessing).
13
22
23
+
Detailed information about input data and workflow configuration can also be found in the [`config/README.md`](config/README.md).
24
+
14
25
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.
15
26
16
-
## Workflow overview
27
+
_Workflow overview:_
17
28
18
-
1. Parse `samples.csv` table containing the samples's meta data (`python`)
19
-
2. Annotate assemblies using NCBI's Prokaryotic Genome Annotation Pipeline ([PGAP](https://github.com/ncbi/pgap))
> Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A. _Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification_. Microb Genom, 7(11):000685 **2021**. PMID: 34739369. https://doi.org/10.1099/mgen.0.000685.
111
+
64
112
> Li W, O'Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, Thibaud-Nissen F. _RefSeq: Expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation._ Nucleic Acids Res, **2021** Jan 8;49(D1):D1020-D1028. https://doi.org/10.1093/nar/gkaa1105
65
113
114
+
> Gurevich A, Saveliev V, Vyahhi N, Tesler G. _QUAST: quality assessment tool for genome assemblies_. Bioinformatics. 29(8):1072-5, **2013**. PMID: 23422339. https://doi.org/10.1093/bioinformatics/btt086.
115
+
116
+
> Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G, Lees JA, Gladstone RA, Lo S, Beaudoin C, Floto RA, Frost SDW, Corander J, Bentley SD, Parkhill J. _Producing polished prokaryotic pangenomes with the Panaroo pipeline_. Genome Biol. 21(1):180, **2020**. PMID: 32698896. https://doi.org/10.1186/s13059-020-02090-4.
117
+
66
118
> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. _Sustainable data analysis with Snakemake_. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.
**Note:** Pangenome analysis with `Panaroo` requires at least two samples.
26
+
27
+
### Parameters
28
+
29
+
This table lists all parameters that can be used to run the workflow.
30
+
31
+
| Parameter | Type | Details | Default |
32
+
|:---|:---|:---|:---|
33
+
|**samplesheet**| string | Path to the sample sheet file in csv format ||
34
+
|**tool**| array[string]| Annotation tool to use (one of `prokka`, `pgap`, `bakta`) ||
35
+
|**pgap**|| PGAP configuration object ||
36
+
| bin | string | Path to the PGAP script ||
37
+
| use_yaml_config | boolean | Whether to use YAML configuration for PGAP |`False`|
38
+
|_prepare_yaml_files_|| Paths to YAML templates for PGAP ||
39
+
| generic | string | Path to the generic YAML configuration file ||
40
+
| submol | string | Path to the submol YAML configuration file ||
41
+
|**prokka**|| Prokka configuration object ||
42
+
| center | string | Center name for Prokka annotation (used in sequence IDs) ||
43
+
| extra | string | Extra command-line arguments for Prokka |`--addgenes`|
44
+
|**bakta**|| Bakta configuration object ||
45
+
| download_db | string | Bakta database type (`full`, `light`, or `none`) |`light`|
46
+
| existing_db | string | Path to an existing Bakta database (optional). Needs to be combined with `download_db='none'`|`--keep-contig-headers --compliant`|
47
+
| extra | string | Extra command-line arguments for Bakta ||
48
+
|**quast**|| QUAST configuration object ||
49
+
| reference_fasta | string | Path to the reference genome for QUAST ||
50
+
| reference_gff | string | Path to the reference annotation for QUAST |
51
+
| extra | string | Extra command-line arguments for QUAST ||
52
+
|**panaroo**|| Panaroo configuration object ||
53
+
| remove_source | string | Source types to remove in Panaroo (regex supported) |`cmsearch`|
54
+
| remove_feature | string | Feature types to remove in Panaroo (regex supported) |`tRNA\|rRNA\|ncRNA\|exon\|sequence_feature`|
55
+
| extra | string | Extra command-line arguments for Panaroo |`--clean-mode strict --remove-invalid-genes`|
0 commit comments