|
1 | | -## Running the workflow |
2 | | - |
3 | | -### Input data |
| 1 | +## Workflow overview |
4 | 2 |
|
5 | | -This workflow requires `fasta` input data. |
6 | | -The samplesheet table has the following layout: |
| 3 | +A Snakemake workflow for the post-processing of microbial genome assemblies. |
7 | 4 |
|
8 | | -| sample | species | strain | id_prefix | file | |
9 | | -| ------ | ------------------------ | ------ | --------- | -------------- | |
10 | | -| EC2224 | "Streptococcus pyogenes" | SF370 | Spy | assembly.fasta | |
| 5 | +1. Parse `samples.csv` table containing the samples's meta data (`python`) |
| 6 | +2. Annotate assemblies using one of the following tools: |
| 7 | + 1. NCBI's Prokaryotic Genome Annotation Pipeline ([PGAP](https://github.com/ncbi/pgap)). Note: needs to be installed manually |
| 8 | + 2. [prokka](https://github.com/tseemann/prokka), a fast and light-weight prokaryotic annotation tool |
| 9 | + 3. [bakta](https://github.com/oschwengers/bakta), a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB) |
| 10 | +3. Create a QC report for the assemblies using [Quast](https://github.com/ablab/quast) |
| 11 | +4. Create a pangenome analysis (orthologs/homologs) using [Panaroo](https://gthlab.au/panaroo/) |
11 | 12 |
|
12 | | -### Execution |
| 13 | +## Installation |
13 | 14 |
|
14 | | -To run the workflow from command line, change to the working directory and activate the conda environment. |
| 15 | +**Step 1: Clone this repository** |
15 | 16 |
|
16 | 17 | ```bash |
| 18 | +git clone https://github.com/MPUSP/snakemake-assembly-postprocessing.git |
17 | 19 | cd snakemake-assembly-postprocessing |
18 | | -conda activate snakemake-assembly-postprocessing |
19 | 20 | ``` |
20 | 21 |
|
21 | | -Adjust options in the default config file `config/config.yml`. |
22 | | -Before running the entire workflow, perform a dry run using: |
| 22 | +**Step 2: Install dependencies** |
23 | 23 |
|
24 | | -```bash |
25 | | -snakemake --cores 1 --sdm conda --directory .test --dry-run |
26 | | -``` |
| 24 | +It is recommended to install snakemake and run the workflow with `conda` or `mamba`. [Miniforge](https://conda-forge.org/download/) is the preferred conda-forge installer and includes `conda`, `mamba` and their dependencies. |
| 25 | + |
| 26 | +**Step 3: Create snakemake environment** |
27 | 27 |
|
28 | | -To run the workflow with test files using **conda**: |
| 28 | +This step creates a new conda environment called `snakemake-assembly-postprocessing`. |
29 | 29 |
|
30 | 30 | ```bash |
31 | | -snakemake --cores 1 --sdm conda --directory .test |
| 31 | +mamba create -c conda-forge -c bioconda -n snakemake-assembly-postprocessing snakemake pandas |
| 32 | +conda activate snakemake-assembly-postprocessing |
32 | 33 | ``` |
| 34 | + |
| 35 | +**Step 4: Install PGAP** |
| 36 | + |
| 37 | +- if you want to use [PGAP](https://github.com/ncbi/pgap) for annotation, it needs to be installed separately |
| 38 | +- PGAP can be downloaded from https://github.com/ncbi/pgap. Please follow the installation instructions there. |
| 39 | +- Define the path to the `pgap.py` script (located in the `scripts` folder) in the `config` file (recommended: `./resources`) |
| 40 | + |
| 41 | +## Running the workflow |
| 42 | + |
| 43 | +### Input data |
| 44 | + |
| 45 | +This workflow requires `fasta` input data. |
| 46 | +The samplesheet table has the following layout: |
| 47 | + |
| 48 | +| sample | species | strain | id_prefix | file | |
| 49 | +| ------ | ------------------------ | ------ | --------- | -------------- | |
| 50 | +| EC2224 | "Streptococcus pyogenes" | SF370 | SPY | assembly.fasta | |
0 commit comments