Skip to content

Commit f66c06f

Browse files
authored
Update README.md
1 parent 4fd144b commit f66c06f

1 file changed

Lines changed: 34 additions & 34 deletions

File tree

README.md

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,12 @@ Note that the name of leaves of the tree (species name) should be the same as th
1414
And there shouldn't be any repeated names in leaves names and internal node names. The tree should not be with quotation.
1515

1616
3. The omamer database which is available for download from the [OMA browser](https://omabrowser.org/oma/current/).
17-
The FastOMA workflow will automatically download the omamer database for LUCA if the argument `--omamer_db` is not
17+
The FastOMA workflow will automatically download the omamer database for LUCA (7.7 GB) if the argument `--omamer_db` is not
1818
provided on the command line. The argument can be a local file (e.g. a previously downloaded omamer database file) or
19-
a URL to an alternative omamer database, e.g. a subset of the LUCA database which is smaller. However, we recommend
20-
to use the LUCA database if possible.
19+
a URL to an alternative omamer database, e.g. a subset of the LUCA database which is smaller, like Primates with this [link](https://omabrowser.org/All/Primates-v2.0.0.h5) which is ~100MB. However, to have a broader reference gene families, we recommend to use the LUCA database if possible.
2120

2221

23-
You can see an example in the [testdata](https://github.com/sinamajidian/FastOMA/tree/master/testdata/in_folder) folder.
22+
You can see an example in the [testdata](https://github.com/DessimozLab/FastOMA/tree/main/testdata/in_folder) folder.
2423
```
2524
$ ls proteome
2625
AQUAE.fa CHLTR.fa MYCGE.fa
@@ -30,7 +29,8 @@ $ cat species_tree.nwk
3029

3130
Besides, the internal node should not contain any special character (e.g. `\` `/` or `space`).
3231
The reason is that FastOMA write some files whose names contain the internal node's name.
33-
If the species tree does not have label for some/all internal nodes, FastOMA labels them sequentially.
32+
If the species tree does not have label for some/all internal nodes, FastOMA labels them sequentially.
33+
The updated tree will be stored in the output folder named as `species_tree_checked.nwk`.
3434

3535

3636

@@ -63,7 +63,7 @@ See also [How to install FastOMA](#how-to-install-FastOMA) for additional ways h
6363
section on the different [profiles](#using-different-nextflow-profiles).
6464

6565

66-
## More details on how to run
66+
### More details on how to run
6767
We provide for every commit of the repository a docker image for FastOMA on dockerhub. You can specify the container as
6868
part of the nextflow command with the parameter `container_version`. If you want to use the container of the current
6969
git checkout version, you can specify this in the following way:
@@ -78,26 +78,27 @@ nextflow run FastOMA.nf -profile docker \
7878

7979
# How to install FastOMA
8080

81-
## Running workflow directly
81+
There are four ways to run/install FastOMA detailed below:
8282

83-
The FastOMA workflow can be run directly using nextflow's ability to fetch a workflow from github. A specific version
84-
can be selected by specifying the `-r` option to nextflow to select a specific version of FastOMA:
83+
### 1. Running workflow directly
84+
85+
The FastOMA workflow can be run directly without any installation using nextflow's ability to fetch a workflow from github. A specific version can be selected by specifying the `-r` option to nextflow to select a specific version of FastOMA:
8586

8687
```bash
8788
nextflow run desimozlab/FastOMA -r 0.2.0 -profile conda
8889
```
8990

90-
This will fetch version 0.2.0 from github and run the FastOMA workflow using the conda profile.
91+
This will fetch version 0.2.0 from github and run the FastOMA workflow using the conda profile. See section [How to run fastOMA](#how-to-run-fastoma).
9192

92-
## Cloning the FastOMA repo and running from there
93+
### 2. Cloning the FastOMA repo and running from there
9394

9495
```bash
9596
git clone https://github.com/DessimozLab/FastOMA.git
9697
cd FastOMA
9798
nextflow run FastOMA.nf -profile docker --container_version "sha-$(git rev-list --max-count=1 --abbrev-commit HEAD)" ...
9899
```
99100

100-
## Manual installation (for development) in python virtual environment
101+
### 3. Manual installation (for development) in python virtual environment
101102

102103
- install [mafft](https://mafft.cbrc.jp/alignment/software) and [FastTree](http://www.microbesonline.org/fasttree/) and ensure the software is accessible on the PATH.
103104
- install python >= 3.9
@@ -115,7 +116,7 @@ nextflow run FastOMA.nf -profile docker --container_version "sha-$(git rev-list
115116
```
116117

117118

118-
## Manual installation in conda/mamba environment
119+
### 4. Manual installation in conda/mamba environment
119120
In the FastOMA repository, we provide a conda environment file that can be used to generate a conda / mamba
120121
environment:
121122
```
@@ -130,15 +131,15 @@ Afterwards, you can run the workflow using nextflow (which is installed as part
130131
```
131132
nextflow run FastOMA.nf -profile standard|slurm --input_folder /path/to/input_folder --output_folder /path/to/output
132133
```
134+
Note that you should use either the profile `standard` or `slurm` such the nextflow executor will use the activated environment.
133135

134-
not that you should use either the profile `standard` or `slurm` such the nextflow executor will use the activated environment.
135136

136-
# Using different nextflow profiles
137+
## Using different nextflow profiles
137138

138139
Nextflow provides support to run a workflow on different infrastructures. Selection of this is done using the `-profile` argument.
139140
For FastOMA, we've implemented the following profiles below. Additional ones can also be created by specifying them in the `nextflow.config` file.
140141

141-
## Docker
142+
### Docker
142143
With `-profile docker` one can use docker as an execution platform. It requires docker to be installed on the system. The pipeline
143144
will automatically fetch missing containers from dockerhub (e.g. dessimozlab/fastoma) if not found locally. By default, the version
144145
`latest` is used by the pipeline, however we provide images for any branch and release as well; even for every recent commit.
@@ -153,35 +154,37 @@ nextflow run FastOMA.nf -profile docker \
153154
This will use the container that is tagged with the current commit id. Similarly, one could also use
154155
`--container_version "0.2.0"` to use the container with version `dessimozlab/fastoma:0.2.0` from dockerhub.
155156

156-
## Singularity
157+
### Singularity
157158
With `-profile singularity` singularity containers will be used to run the workflow. It requires singularity to
158159
be installed on your system. The containers are automatically pulled from dockerhub and converted to singularity
159160
containers. The same options as for [Docker](#docker) will be available.
160161

161-
## Conda
162+
### Conda
162163
with `-profile conda`, the FastOMA workflow will create a conda environment which contains the necessary
163164
dependencies and use this environment to run the workflow steps. Note that this environment does not need
164165
to be activated manually. If you prefer to install the dependencies inside a conda or mamba environment
165166
yourself, this can be achieved as described in [](#manual-installation-for-development-in-python-virtual-environment).
166167

167-
## Slurm (with singularity/conda)
168+
### Slurm (with singularity/conda)
168169
On a HPC system you typically run processes using a scheduler system such as slurm or LSF. We provide
169170
profiles `-profile slurm`, `-profile slurm_singularity` and `-profile slurm_conda` to run FastOMA with
170171
the respective engine using [slurm](https://slurm.schedmd.com/overview.html) as a scheduler system.
171172
If you need a different scheduler, it is quite straight forward to
172173
set it up in `nextflow.config` based on the existing profiles and the documentation of
173174
[nextflow executors](https://www.nextflow.io/docs/latest/executor.html).
174175

176+
175177
# How to run FastOMA on the test data
176-
Then, cd to the `testdata` folder and download the omamer database and change its name to `omamerdb.h5`.
178+
First, cd to the `testdata` folder and download the omamer database (optional) and change its name to `omamerdb.h5`.
177179
```
178180
cd FastOMA/testdata
179181
wget https://omabrowser.org/All/Primates-v2.0.0.h5 # 105MB
180182
mv Primates-v2.0.0.h5 in_folder/omamerdb.h5
181183
```
182-
(This is for the test however, I would suggest downloading the `LUCA-v2.0.0.h5` instead of `Primates-v2.0.0.h5` for your real analysis.). Check the item 2 in the [input section](https://github.com/sinamajidian/FastOMA#input) for details.
184+
(This is for the test however, I would suggest downloading the `LUCA-v2.0.0.h5` instead of `Primates-v2.0.0.h5` for your real analysis.).
185+
Check the item 2 in the [input section](https://github.com/sinamajidian/FastOMA#input) for details.
183186

184-
Now we have such a structure in our testdata folder.
187+
Now we have such a structure in our testdata folder.
185188
```
186189
$ tree ../testdata/in_folder
187190
├── omamerdb.h5
@@ -205,7 +208,7 @@ nextflow run ../FastOMA.nf \
205208

206209
Note that to have a comprehensive test, we set the default value of needed cpus as 10.
207210

208-
## expected log for test data
211+
## Expected log for test data
209212
After few minutes, the run for test data finishes.
210213
```
211214
[] process > check_input () [100%] 1 of 1 ✔
@@ -224,8 +227,12 @@ These are decided based on the FASTA file size. Finally, once all jobs of `hog_b
224227

225228
If the run interrupted, by adding `-resume` to the nextflow commond line, you might be able to continue your previous nextflow job.
226229

230+
Pro-tip. Nextflow creat a folder named `work` for storing its temprorary files. The characters in the bracket of the nextflow log (not shown here) are the short form of the folder address in `work/`
231+
where the last task of such job were done.
232+
e.g `[3f/2efg] process > check_input (1)` you can `cd work/3f/2efg` then use tab to complete the folder name, then you can see the temporary files of `check_input` task. In such folder there are some hidden files `.command.log/sh/run`.f
227233

228-
## expected output structure for test data
234+
235+
## Expected output structure for test data
229236

230237
The output of FastOMA includes several output files regarding orthology inference
231238
(`OrthologousGroups.tsv`, `RootHOGs.tsv`, `FastOMA_HOGs.orthoxml`, `orthologs.tsv.gz` and `species_tree_checked.nwk`),
@@ -327,7 +334,7 @@ if activated (in `_config.py` and fastOMA installed with `pip -e` ).
327334

328335

329336

330-
### using omamer's output
337+
### Using omamer's output
331338
The first step of the FastOMA pipele is to run [OMAmer](https://github.com/DessimozLab/omamer). If you already have the hogmap files, you can put them in the `in_folder/hogmap_in`.
332339
Then your structure of files will be
333340
```
@@ -349,7 +356,7 @@ Let's save the planet together with
349356
[green computational Biology](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009324).
350357

351358

352-
## Run on a cluster
359+
### Run on a cluster
353360
For running on a SLURM cluster you can add `-c ../nextflow_slurm.config` to the commond line.
354361

355362
```
@@ -362,7 +369,7 @@ nextflow ../FastOMA.nf -c ../nextflow_slurm.config --input_folder in_folder
362369

363370
You may need to re-run nextflow command line by adding `-resume`, if the allocated time is not enough for your dataset.
364371

365-
You may need to increase the number of opoened files in your system with `ulimit -n 131072` or higher.
372+
You may need to increase the number of opoened files in your system with `ulimit -n 131072` or higher as nextflow generates hundreds of files depending on the size of your input dataset.
366373

367374

368375
## Handle splice files
@@ -399,13 +406,6 @@ These are initial gene families that are used in `infer_subhogs` step, which cou
399406

400407

401408

402-
# Downstream analysis
403-
404-
- High resolution tree inference
405-
406-
- Phylostragraphy with pyham
407-
408-
409409
## Change log
410410
- Update v0.1.6: adding dynamic resources, additional and improved output
411411
- Update v0.1.5: docker, add help, clean nextflow

0 commit comments

Comments
 (0)