You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -174,6 +174,8 @@ We provide a number of models that can be used out of the box. These models are
174
174
175
175
We have performed extensive experiments to evaluate the performance of Model2Vec models. The results are documented in the [results](results/README.md) folder. The results are presented in the following sections:
@@ -39,22 +36,22 @@ For readability, the MTEB task names are abbreviated as follows:
39
36
- Sum: Summarization
40
37
</details>
41
38
42
-
The results show that [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is the most performant static embedding model. It reaches 92.11% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with an average MTEB score of 51.66 while being orders of magnitude faster.
43
-
44
-
Note: the [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M), [static-retrieval-mrl-en-v1](https://huggingface.co/minishlab/static-retrieval-mrl-en-v1), and [static-similarity-mrl-multilingual-v1](https://huggingface.co/minishlab/static-similarity-mrl-multilingual-v1) models are task-specific models. We've included them for completeness, but they should not be compared directly to the other models for tasks that they are not designed for.
39
+
The results show that [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is the most performant static embedding model. It reaches 93.21% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with an average MTEB score of 52.13 while being orders of magnitude faster.
45
40
46
41
The figure below shows the relationship between the number of sentences per second and the average MTEB score. The circle sizes correspond to the number of parameters in the models (larger = more parameters).
47
42
This plot shows that the potion and M2V models are much faster than the other models, while still being competitive in terms of performance with the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model.
48
43
NOTE: for fairness of comparison, we disabled multiprocessing for Model2Vec for this benchmark. All sentence-transformers models are run with the [sentence-transformers](https://github.com/UKPLab/sentence-transformers) library's default settings for `encode`.
|*Figure: The average MTEB score plotted against sentences per second. The circle size indicates model size.*|
53
48
54
49
55
-
###MMTEB Results (Multilingual)
50
+
## MMTEB Results (Multilingual)
56
51
The results for the multilingual models are shown in the table below. We compare against the [LaBSE](https://huggingface.co/sentence-transformers/LaBSE) model, as well as other multilingual static embedding models.
57
52
53
+
Note: the MMTEB leaderboard ranks models using a [Borda count](https://en.wikipedia.org/wiki/Borda_count) over per-task ranks rather than a simple average. This rewards models that perform consistently well across all tasks, rather than those that excel on one task type while performing poorly on others.
54
+
58
55
| Model | Mean (Task) | Mean (TaskType) | BitMining | Class | Clust | InstRet | MultiClass | PairClass | Rank | Ret | STS |
@@ -74,27 +71,26 @@ For readability, the MMTEB task names are abbreviated as follows:
74
71
- Class: Classification
75
72
- Clust: Clustering
76
73
- InstRet: Instruction Retrieval
77
-
-MuliClass: Multilabel Classification
74
+
-MultiClass: Multilabel Classification
78
75
- PairClass: PairClassification
79
76
- Rank: Reranking
80
77
- Ret: Retrieval
81
78
- STS: Semantic Textual Similarity
82
79
83
80
</details>
84
81
85
-
###Retrieval Results
82
+
## Retrieval Results
86
83
87
-
A subset of models we created and compare against are specifically designed for retrieval tasks. The results are shown in the table below, including two general-purpose models for comparison and a transformer.
84
+
Some of our models are specifically designed for retrieval tasks. The results are shown in the table below, with [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) included as a transformer baseline and [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) as a general-purpose static baseline.
As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M)model is the most performant static retrieval model, reaching 86.65%% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with a retrieval score of 36.35.
93
+
As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is the most performant static retrieval model, reaching 81.69% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with a retrieval score of 35.06.
0 commit comments