Skip to content

Commit 071b6cd

Browse files
committed
Merge branch 'main' into rel-0.4.0
2 parents e71e471 + 466e876 commit 071b6cd

3 files changed

Lines changed: 20 additions & 15 deletions

File tree

FastOMA.nf

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ process omamer_run{
217217

218218

219219
process infer_roothogs{
220-
label "process_single"
220+
label "process_medium"
221221
publishDir = [
222222
path: params.temp_output,
223223
enabled: params.debug_enabled,
@@ -237,7 +237,6 @@ process infer_roothogs{
237237
--hogmap hogmaps \
238238
--splice ${splice_folder} \
239239
--out-rhog-folder "omamer_rhogs" \
240-
--min-sequence-length ${params.min_sequence_length} \
241240
-vv
242241
"""
243242
}
@@ -411,6 +410,7 @@ process hog_rest{
411410

412411

413412
process collect_subhogs{
413+
label "process_high"
414414
publishDir params.output_folder, mode: 'copy'
415415
input:
416416
path pickles, stageAs: "pickle_folders/?"
@@ -456,6 +456,7 @@ process extract_pairwise_ortholog_relations {
456456

457457

458458
process fastoma_report {
459+
label "process_medium"
459460
publishDir params.output_folder, mode: 'copy'
460461
input:
461462
path notebook

README.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
FastOMA
22
======
3-
FastOMA is a scalable software package to infer orthology relationship.
3+
FastOMA is a scalable software package to infer orthology relationship.
44

5-
Want to learn more about FastOMA and try it online, check out [FastOMA academy](https://omabrowser.org/oma/academy/module/fastOMA_2023) and FastOMA talk at ISMB 2023 on [YouTube](https://youtu.be/KGetTUMDvlA?si=efeqKKarwpIFgXyN)!
5+
Want to learn more about FastOMA and try it online, check out [FastOMA academy](https://omabrowser.org/oma/academy/module/fastOMA) and FastOMA talk at ISMB 2023 on [YouTube](https://youtu.be/KGetTUMDvlA?si=efeqKKarwpIFgXyN)! And read FastOMA's publication in [Nature Methods](https://www.nature.com/articles/s41592-024-02552-8).
6+
7+
8+
<div align="center">
9+
<img width="300px" src="./archive/fastOMA_logo.png" alt="FastOMA logo" />
10+
</div>
611

712
# Input and Output:
813

@@ -145,7 +150,7 @@ bashMiniconda3.sh
145150

146151
Then follow the instruction on the terminal. Finally, close and re-open the terminal and run
147152
```
148-
conda create -n fastoma python=3.9 --file environment-conda.yml
153+
conda env create -n fastoma python=3.9 --file environment-conda.yml
149154
conda activate fastoma
150155
```
151156
Then, clone and install fastOMA using
@@ -419,8 +424,8 @@ You may need to increase the number of opoened files in your system with `ulimit
419424

420425
## Handle splice files
421426
You can put the splice files in the folder `in_folder/splice`. They should be named as `species_name.splice` for each species.
422-
For each row of different isforoms of a preotien, FastOMA selects the best one (based on omamer family score and isoform length).
423-
We also use those proteins that are not in splice file but present in the FASTA proteome file.
427+
For each row of different isoforms of a protein, FastOMA selects the best one (based on OMAmer family score and isoform length).
428+
We also use those proteins that are not in the splice file but present in the FASTA proteome file.
424429
```
425430
$ head HUMAN.splice
426431
HUMAN00001;HUMAN00002;HUMAN00003;HUMAN00004;HUMAN00005;HUMAN00006
@@ -432,26 +437,25 @@ HUMAN00036
432437
HUMAN00037
433438
```
434439

435-
The selected isforoms will be added as a new column to the input splice files stored as tsv at `out_folder/temp_output/selected_isoforms/`
440+
To find the selected isoforms you can follow the instruction [here](https://github.com/DessimozLab/FastOMA/wiki/How-to-find-the-selected-isoforms).
436441

437-
## Under the hood: what are fastOMA gene families?
442+
## Under the hood: what are FastOMA gene families?
438443
Firstly, those proteins that are mapped to the same OMAdb rootHOG (e.g. HOG:D0066142 for HOG:D0066142.1a.1a) by OMAmer are
439444
grouped together to create query rootHOGs (no protein from OMAdb is stored), from now on called rootHOG.
440-
Then, as OMAmer provide us with alternative mapping, we try to merge those rootHOGs (high chance of split HOGs) that have
445+
Then, as OMAmer provides us with alternative mapping, we try to merge those rootHOGs (high chance of split HOGs) that have
441446
many shared mappings. The query proteins of these rootHOGs will be stored in only one rootHOG.
442447
These will be saved as fasta files in `out_folder/temp_output/temp_omamer_rhogs` with file names format `HOG_LXXXXX.fa`. `L` is the release ID of OMADB.
443448
Replacing `_` with ':' gives the HOG ID which could be investigated in the [OMA Browser](https://omabrowser.org/oma/hog/HOG:D0114562/Sar/iham/).
444449

445450
There are some cases that only one protein is mapped to one rootHOG, called singleton (which is not good, we are hoping for orthologous groups/pairs).
446-
Using alternative OMAmer mapping, FastOMA tries to put these to other rootHOGs. Still some will be left.
451+
Using alternative OMAmer mappings, FastOMA tries to put these to other rootHOGs. Still some will be left.
447452

448-
FastOMA uses the [linclust](https://github.com/soedinglab/MMseqs2#cluster) software to find new gene families on set of unmapped proteins and singletons.
449-
These will be saved as fasta files in `out_folder/temp_output/temp_omamer_rhogs` with file names format `HOG_clustXXXXX.fa`.
453+
FastOMA uses the [linclust](https://github.com/soedinglab/MMseqs2#cluster) software to find new gene families on the set of unmapped proteins and singletons.
454+
These will be saved as fasta files in `out_folder/temp_output/temp_omamer_rhogs` with a file names format as `HOG_clustXXXXX.fa`.
450455
These are initial gene families that are used in `infer_subhogs` step, which could be split into a few smaller gene families.
451456

452457
## Cite us
453-
454-
Majidian, Sina, Yannis Nevers, Ali Yazdizadeh Kharrazi, Alex Warwick Vesztrocy, Stefano Pascarelli, David Moi, Natasha Glover, Adrian M. Altenhoff, and Christophe Dessimoz. "Orthology inference at scale with FastOMA." bioRxiv (2024): 2024-01. https://www.biorxiv.org/content/10.1101/2024.01.29.577392v1.full
458+
Citation: Majidian, Sina, Yannis Nevers, Ali Yazdizadeh Kharrazi, Alex Warwick Vesztrocy, Stefano Pascarelli, David Moi, Natasha Glover, Adrian M. Altenhoff, and Christophe Dessimoz. "Orthology inference at scale with FastOMA." Nature Methods (2025). https://www.nature.com/articles/s41592-024-02552-8 [Preprint](https://www.biorxiv.org/content/10.1101/2024.01.29.577392v1.full).
455459

456460

457461
## Change log

archive/fastOMA_logo.png

58.1 KB
Loading

0 commit comments

Comments
 (0)