Skip to content

Commit a4a147a

Browse files
committed
fix: update readmes
1 parent 0e829f6 commit a4a147a

2 files changed

Lines changed: 79 additions & 22 deletions

File tree

README.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,27 @@
88

99
A Snakemake workflow for the post-processing of microbial genome assemblies.
1010

11+
- [snakemake-assembly-postprocessing](#snakemake-assembly-postprocessing)
12+
- [Usage](#usage)
13+
- [Workflow overview](#workflow-overview)
14+
- [Installation](#installation)
15+
- [Deployment options](#deployment-options)
16+
- [Authors](#authors)
17+
- [References](#references)
18+
1119
## Usage
1220

1321
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/docs/workflows/MPUSP/snakemake-assembly-postprocessing).
1422

23+
Detailed information about input data and workflow configuration can also be found in the [`config/README.md`](config/README.md).
24+
1525
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.
1626

27+
_Workflow overview:_
28+
29+
<!-- include overview-->
30+
<img src="resources/images/dag.svg" align="center" />
31+
1732
## Workflow overview
1833

1934
1. Parse `samples.csv` table containing the samples's meta data (`python`)
@@ -24,10 +39,6 @@ If you use this workflow in a paper, don't forget to give credits to the authors
2439
3. Create a QC report for the assemblies using [Quast](https://github.com/ablab/quast)
2540
4. Create a pangenome analysis (orthologs/homologs) using [Panaroo](https://gthlab.au/panaroo/)
2641

27-
## Requirements
28-
29-
- [PGAP](https://github.com/ncbi/pgap)
30-
3142
## Installation
3243

3344
**Step 1: Clone this repository**
@@ -52,9 +63,37 @@ conda activate snakemake-assembly-postprocessing
5263

5364
**Step 4: Install PGAP**
5465

66+
- if you want to use [PGAP](https://github.com/ncbi/pgap) for annotation, it needs to be installed separately
5567
- PGAP can be downloaded from https://github.com/ncbi/pgap. Please follow the installation instructions there.
5668
- Define the path to the `pgap.py` script (located in the `scripts` folder) in the `config` file (recommended: `./resources`)
5769

70+
## Deployment options
71+
72+
To run the workflow from command line, change the working directory.
73+
74+
```bash
75+
cd path/to/snakemake-simple-mapping
76+
```
77+
78+
Adjust options in the default config file `config/config.yml`.
79+
Before running the complete workflow, you can perform a dry run using:
80+
81+
```bash
82+
snakemake --dry-run
83+
```
84+
85+
To run the workflow with test files using **conda**:
86+
87+
```bash
88+
snakemake --cores 2 --sdm conda --directory .test
89+
```
90+
91+
To run the workflow with test files using **apptainer**:
92+
93+
```bash
94+
snakemake --cores 2 --sdm conda apptainer --directory .test
95+
```
96+
5897
## Authors
5998

6099
- Dr. Rina Ahmed-Begrich

config/README.md

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,50 @@
1-
## Running the workflow
2-
3-
### Input data
1+
## Workflow overview
42

5-
This workflow requires `fasta` input data.
6-
The samplesheet table has the following layout:
3+
A Snakemake workflow for the post-processing of microbial genome assemblies.
74

8-
| sample | species | strain | id_prefix | file |
9-
| ------ | ------------------------ | ------ | --------- | -------------- |
10-
| EC2224 | "Streptococcus pyogenes" | SF370 | Spy | assembly.fasta |
5+
1. Parse `samples.csv` table containing the samples's meta data (`python`)
6+
2. Annotate assemblies using one of the following tools:
7+
1. NCBI's Prokaryotic Genome Annotation Pipeline ([PGAP](https://github.com/ncbi/pgap)). Note: needs to be installed manually
8+
2. [prokka](https://github.com/tseemann/prokka), a fast and light-weight prokaryotic annotation tool
9+
3. [bakta](https://github.com/oschwengers/bakta), a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
10+
3. Create a QC report for the assemblies using [Quast](https://github.com/ablab/quast)
11+
4. Create a pangenome analysis (orthologs/homologs) using [Panaroo](https://gthlab.au/panaroo/)
1112

12-
### Execution
13+
## Installation
1314

14-
To run the workflow from command line, change to the working directory and activate the conda environment.
15+
**Step 1: Clone this repository**
1516

1617
```bash
18+
git clone https://github.com/MPUSP/snakemake-assembly-postprocessing.git
1719
cd snakemake-assembly-postprocessing
18-
conda activate snakemake-assembly-postprocessing
1920
```
2021

21-
Adjust options in the default config file `config/config.yml`.
22-
Before running the entire workflow, perform a dry run using:
22+
**Step 2: Install dependencies**
2323

24-
```bash
25-
snakemake --cores 1 --sdm conda --directory .test --dry-run
26-
```
24+
It is recommended to install snakemake and run the workflow with `conda` or `mamba`. [Miniforge](https://conda-forge.org/download/) is the preferred conda-forge installer and includes `conda`, `mamba` and their dependencies.
25+
26+
**Step 3: Create snakemake environment**
2727

28-
To run the workflow with test files using **conda**:
28+
This step creates a new conda environment called `snakemake-assembly-postprocessing`.
2929

3030
```bash
31-
snakemake --cores 1 --sdm conda --directory .test
31+
mamba create -c conda-forge -c bioconda -n snakemake-assembly-postprocessing snakemake pandas
32+
conda activate snakemake-assembly-postprocessing
3233
```
34+
35+
**Step 4: Install PGAP**
36+
37+
- if you want to use [PGAP](https://github.com/ncbi/pgap) for annotation, it needs to be installed separately
38+
- PGAP can be downloaded from https://github.com/ncbi/pgap. Please follow the installation instructions there.
39+
- Define the path to the `pgap.py` script (located in the `scripts` folder) in the `config` file (recommended: `./resources`)
40+
41+
## Running the workflow
42+
43+
### Input data
44+
45+
This workflow requires `fasta` input data.
46+
The samplesheet table has the following layout:
47+
48+
| sample | species | strain | id_prefix | file |
49+
| ------ | ------------------------ | ------ | --------- | -------------- |
50+
| EC2224 | "Streptococcus pyogenes" | SF370 | SPY | assembly.fasta |

0 commit comments

Comments
 (0)