fix: update readmes

m-jahn · m-jahn · commit a4a147aa1fe1 · 2025-12-10T12:43:20.000+01:00
diff --git a/README.md b/README.md
@@ -8,12 +8,27 @@
 
 A Snakemake workflow for the post-processing of microbial genome assemblies.
 
+- [snakemake-assembly-postprocessing](#snakemake-assembly-postprocessing)
+  - [Usage](#usage)
+  - [Workflow overview](#workflow-overview)
+  - [Installation](#installation)
+  - [Deployment options](#deployment-options)
+  - [Authors](#authors)
+  - [References](#references)
+
 ## Usage
 
 The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/docs/workflows/MPUSP/snakemake-assembly-postprocessing).
 
+Detailed information about input data and workflow configuration can also be found in the [`config/README.md`](config/README.md).
+
 If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.
 
+_Workflow overview:_
+
+<!-- include overview-->
+<img src="resources/images/dag.svg" align="center" />
+
 ## Workflow overview
 
 1. Parse `samples.csv` table containing the samples's meta data (`python`)
@@ -24,10 +39,6 @@ If you use this workflow in a paper, don't forget to give credits to the authors
 3. Create a QC report for the assemblies using [Quast](https://github.com/ablab/quast)
 4. Create a pangenome analysis (orthologs/homologs) using [Panaroo](https://gthlab.au/panaroo/)
 
-## Requirements
-
-- [PGAP](https://github.com/ncbi/pgap)
-
 ## Installation
 
 **Step 1: Clone this repository**
@@ -52,9 +63,37 @@ conda activate snakemake-assembly-postprocessing
 
 **Step 4: Install PGAP**
 
+- if you want to use [PGAP](https://github.com/ncbi/pgap) for annotation, it needs to be installed separately
 - PGAP can be downloaded from https://github.com/ncbi/pgap. Please follow the installation instructions there.
 - Define the path to the `pgap.py` script (located in the `scripts` folder) in the `config` file (recommended: `./resources`)
 
+## Deployment options
+
+To run the workflow from command line, change the working directory.
+
+```bash
+cd path/to/snakemake-simple-mapping
+```
+
+Adjust options in the default config file `config/config.yml`.
+Before running the complete workflow, you can perform a dry run using:
+
+```bash
+snakemake --dry-run
+```
+
+To run the workflow with test files using **conda**:
+
+```bash
+snakemake --cores 2 --sdm conda --directory .test
+```
+
+To run the workflow with test files using **apptainer**:
+
+```bash
+snakemake --cores 2 --sdm conda apptainer --directory .test
+```
+
 ## Authors
 
 - Dr. Rina Ahmed-Begrich
diff --git a/config/README.md b/config/README.md
@@ -1,32 +1,50 @@
-## Running the workflow
-
-### Input data
+## Workflow overview
 
-This workflow requires `fasta` input data.
-The samplesheet table has the following layout:
+A Snakemake workflow for the post-processing of microbial genome assemblies.
 
-| sample | species                  | strain | id_prefix | file           |
-| ------ | ------------------------ | ------ | --------- | -------------- |
-| EC2224 | "Streptococcus pyogenes" | SF370  | Spy       | assembly.fasta |
+1. Parse `samples.csv` table containing the samples's meta data (`python`)
+2. Annotate assemblies using one of the following tools:
+   1. NCBI's Prokaryotic Genome Annotation Pipeline ([PGAP](https://github.com/ncbi/pgap)). Note: needs to be installed manually
+   2. [prokka](https://github.com/tseemann/prokka), a fast and light-weight prokaryotic annotation tool
+   3. [bakta](https://github.com/oschwengers/bakta), a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
+3. Create a QC report for the assemblies using [Quast](https://github.com/ablab/quast)
+4. Create a pangenome analysis (orthologs/homologs) using [Panaroo](https://gthlab.au/panaroo/)
 
-### Execution
+## Installation
 
-To run the workflow from command line, change to the working directory and activate the conda environment.
+**Step 1: Clone this repository**
 
 ```bash
+git clone https://github.com/MPUSP/snakemake-assembly-postprocessing.git
 cd snakemake-assembly-postprocessing
-conda activate snakemake-assembly-postprocessing
 ```
 
-Adjust options in the default config file `config/config.yml`.
-Before running the entire workflow, perform a dry run using:
+**Step 2: Install dependencies**
 
-```bash
-snakemake --cores 1 --sdm conda --directory .test --dry-run
-```
+It is recommended to install snakemake and run the workflow with `conda` or `mamba`. [Miniforge](https://conda-forge.org/download/) is the preferred conda-forge installer and includes `conda`, `mamba` and their dependencies.
+
+**Step 3: Create snakemake environment**
 
-To run the workflow with test files using **conda**:
+This step creates a new conda environment called `snakemake-assembly-postprocessing`.
 
 ```bash
-snakemake --cores 1 --sdm conda --directory .test
+mamba create -c conda-forge -c bioconda -n snakemake-assembly-postprocessing snakemake pandas
+conda activate snakemake-assembly-postprocessing
 ```
+
+**Step 4: Install PGAP**
+
+- if you want to use [PGAP](https://github.com/ncbi/pgap) for annotation, it needs to be installed separately
+- PGAP can be downloaded from https://github.com/ncbi/pgap. Please follow the installation instructions there.
+- Define the path to the `pgap.py` script (located in the `scripts` folder) in the `config` file (recommended: `./resources`)
+
+## Running the workflow
+
+### Input data
+
+This workflow requires `fasta` input data.
+The samplesheet table has the following layout:
+
+| sample | species                  | strain | id_prefix | file           |
+| ------ | ------------------------ | ------ | --------- | -------------- |
+| EC2224 | "Streptococcus pyogenes" | SF370  | SPY       | assembly.fasta |