You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-34Lines changed: 34 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,13 +14,12 @@ Note that the name of leaves of the tree (species name) should be the same as th
14
14
And there shouldn't be any repeated names in leaves names and internal node names. The tree should not be with quotation.
15
15
16
16
3. The omamer database which is available for download from the [OMA browser](https://omabrowser.org/oma/current/).
17
-
The FastOMA workflow will automatically download the omamer database for LUCA if the argument `--omamer_db` is not
17
+
The FastOMA workflow will automatically download the omamer database for LUCA (7.7 GB) if the argument `--omamer_db` is not
18
18
provided on the command line. The argument can be a local file (e.g. a previously downloaded omamer database file) or
19
-
a URL to an alternative omamer database, e.g. a subset of the LUCA database which is smaller. However, we recommend
20
-
to use the LUCA database if possible.
19
+
a URL to an alternative omamer database, e.g. a subset of the LUCA database which is smaller, like Primates with this [link](https://omabrowser.org/All/Primates-v2.0.0.h5) which is ~100MB. However, to have a broader reference gene families, we recommend to use the LUCA database if possible.
21
20
22
21
23
-
You can see an example in the [testdata](https://github.com/sinamajidian/FastOMA/tree/master/testdata/in_folder) folder.
22
+
You can see an example in the [testdata](https://github.com/DessimozLab/FastOMA/tree/main/testdata/in_folder) folder.
24
23
```
25
24
$ ls proteome
26
25
AQUAE.fa CHLTR.fa MYCGE.fa
@@ -30,7 +29,8 @@ $ cat species_tree.nwk
30
29
31
30
Besides, the internal node should not contain any special character (e.g. `\``/` or `space`).
32
31
The reason is that FastOMA write some files whose names contain the internal node's name.
33
-
If the species tree does not have label for some/all internal nodes, FastOMA labels them sequentially.
32
+
If the species tree does not have label for some/all internal nodes, FastOMA labels them sequentially.
33
+
The updated tree will be stored in the output folder named as `species_tree_checked.nwk`.
34
34
35
35
36
36
@@ -63,7 +63,7 @@ See also [How to install FastOMA](#how-to-install-FastOMA) for additional ways h
63
63
section on the different [profiles](#using-different-nextflow-profiles).
64
64
65
65
66
-
## More details on how to run
66
+
###More details on how to run
67
67
We provide for every commit of the repository a docker image for FastOMA on dockerhub. You can specify the container as
68
68
part of the nextflow command with the parameter `container_version`. If you want to use the container of the current
69
69
git checkout version, you can specify this in the following way:
@@ -78,26 +78,27 @@ nextflow run FastOMA.nf -profile docker \
78
78
79
79
# How to install FastOMA
80
80
81
-
## Running workflow directly
81
+
There are four ways to run/install FastOMA detailed below:
82
82
83
-
The FastOMA workflow can be run directly using nextflow's ability to fetch a workflow from github. A specific version
84
-
can be selected by specifying the `-r` option to nextflow to select a specific version of FastOMA:
83
+
### 1. Running workflow directly
84
+
85
+
The FastOMA workflow can be run directly without any installation using nextflow's ability to fetch a workflow from github. A specific version can be selected by specifying the `-r` option to nextflow to select a specific version of FastOMA:
85
86
86
87
```bash
87
88
nextflow run desimozlab/FastOMA -r 0.2.0 -profile conda
88
89
```
89
90
90
-
This will fetch version 0.2.0 from github and run the FastOMA workflow using the conda profile.
91
+
This will fetch version 0.2.0 from github and run the FastOMA workflow using the conda profile. See section [How to run fastOMA](#how-to-run-fastoma).
91
92
92
-
## Cloning the FastOMA repo and running from there
93
+
### 2. Cloning the FastOMA repo and running from there
## Manual installation (for development) in python virtual environment
101
+
### 3. Manual installation (for development) in python virtual environment
101
102
102
103
- install [mafft](https://mafft.cbrc.jp/alignment/software) and [FastTree](http://www.microbesonline.org/fasttree/) and ensure the software is accessible on the PATH.
(This is for the test however, I would suggest downloading the `LUCA-v2.0.0.h5` instead of `Primates-v2.0.0.h5` for your real analysis.). Check the item 2 in the [input section](https://github.com/sinamajidian/FastOMA#input) for details.
184
+
(This is for the test however, I would suggest downloading the `LUCA-v2.0.0.h5` instead of `Primates-v2.0.0.h5` for your real analysis.).
185
+
Check the item 2 in the [input section](https://github.com/sinamajidian/FastOMA#input) for details.
183
186
184
-
Now we have such a structure in our testdata folder.
187
+
Now we have such a structure in our testdata folder.
185
188
```
186
189
$ tree ../testdata/in_folder
187
190
├── omamerdb.h5
@@ -205,7 +208,7 @@ nextflow run ../FastOMA.nf \
205
208
206
209
Note that to have a comprehensive test, we set the default value of needed cpus as 10.
207
210
208
-
## expected log for test data
211
+
## Expected log for test data
209
212
After few minutes, the run for test data finishes.
210
213
```
211
214
[] process > check_input () [100%] 1 of 1 ✔
@@ -224,8 +227,12 @@ These are decided based on the FASTA file size. Finally, once all jobs of `hog_b
224
227
225
228
If the run interrupted, by adding `-resume` to the nextflow commond line, you might be able to continue your previous nextflow job.
226
229
230
+
Pro-tip. Nextflow creat a folder named `work` for storing its temprorary files. The characters in the bracket of the nextflow log (not shown here) are the short form of the folder address in `work/`
231
+
where the last task of such job were done.
232
+
e.g `[3f/2efg] process > check_input (1)` you can `cd work/3f/2efg` then use tab to complete the folder name, then you can see the temporary files of `check_input` task. In such folder there are some hidden files `.command.log/sh/run`.f
227
233
228
-
## expected output structure for test data
234
+
235
+
## Expected output structure for test data
229
236
230
237
The output of FastOMA includes several output files regarding orthology inference
231
238
(`OrthologousGroups.tsv`, `RootHOGs.tsv`, `FastOMA_HOGs.orthoxml`, `orthologs.tsv.gz` and `species_tree_checked.nwk`),
@@ -327,7 +334,7 @@ if activated (in `_config.py` and fastOMA installed with `pip -e` ).
327
334
328
335
329
336
330
-
### using omamer's output
337
+
### Using omamer's output
331
338
The first step of the FastOMA pipele is to run [OMAmer](https://github.com/DessimozLab/omamer). If you already have the hogmap files, you can put them in the `in_folder/hogmap_in`.
332
339
Then your structure of files will be
333
340
```
@@ -349,7 +356,7 @@ Let's save the planet together with
You may need to re-run nextflow command line by adding `-resume`, if the allocated time is not enough for your dataset.
364
371
365
-
You may need to increase the number of opoened files in your system with `ulimit -n 131072` or higher.
372
+
You may need to increase the number of opoened files in your system with `ulimit -n 131072` or higher as nextflow generates hundreds of files depending on the size of your input dataset.
366
373
367
374
368
375
## Handle splice files
@@ -399,13 +406,6 @@ These are initial gene families that are used in `infer_subhogs` step, which cou
399
406
400
407
401
408
402
-
# Downstream analysis
403
-
404
-
- High resolution tree inference
405
-
406
-
- Phylostragraphy with pyham
407
-
408
-
409
409
## Change log
410
410
- Update v0.1.6: adding dynamic resources, additional and improved output
0 commit comments