Skip to content

Commit 9461399

Browse files
committed
candidate 0.5.0
1 parent c6dd3c9 commit 9461399

9 files changed

Lines changed: 135 additions & 55 deletions

File tree

.github/workflows/wheels.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ jobs:
5151
runs-on: macos-latest
5252
strategy:
5353
matrix:
54-
python-version: ["3.8", "3.9", "3.10", "3.11"]
54+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
5555
steps:
5656
- uses: actions/checkout@v3
5757
- uses: actions-rs/toolchain@v1
@@ -79,7 +79,7 @@ jobs:
7979
runs-on: windows-latest
8080
strategy:
8181
matrix:
82-
python-version: ["3.8", "3.9", "3.10", "3.11"]
82+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
8383
steps:
8484
- uses: actions/checkout@v3
8585
- uses: actions-rs/toolchain@v1

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
For changes to the main Rust package, please see <https://github.com/kno10/rust-kmedoids/blob/main/CHANGELOG.md>
44

5+
## kmedoids 0.5.0 (2023-12-10)
6+
7+
- add DynMSC, Silhouette clustering with optimal number of clusters
8+
- update dependency versions
9+
510
## kmedoids 0.4.3 (2023-04-20)
611

712
- fix silhouette evaluation for k > 2 (in Rust)

CITATION.cff

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@ message: "If you use this software, please cite it as below."
33
authors:
44
- family-names: Schubert
55
given-names: Erich
6-
orcid: 0000-0001-9143-4880
6+
orcid: "https://orcid.org/0000-0001-9143-4880"
77
- family-names: Lenssen
88
given-names: Lars
9-
orcid: 0000-0003-0037-0418
9+
orcid: "https://orcid.org/0000-0003-0037-0418"
1010
title: "Fast k-medoids Clustering in Rust and Python"
1111
journal: "J. Open Source Softw."
1212
doi: 10.21105/joss.04183
13-
version: 0.4.3
14-
date-released: 2023-04-20
13+
version: 0.5.0
14+
date-released: 2023-12-10
1515
license: GPL-3.0
1616
preferred-citation:
1717
title: "Fast k-medoids Clustering in Rust and Python"
@@ -40,6 +40,8 @@ references:
4040
year: "2021"
4141
type: article
4242
journal: "Inf. Syst."
43+
volume: 101
44+
start: 101804
4345
authors:
4446
- family-names: Schubert
4547
given-names: Erich
@@ -55,3 +57,15 @@ references:
5557
given-names: Lars
5658
- family-names: Schubert
5759
given-names: Erich
60+
- title: "Medoid silhouette clustering with automatic cluster number selection"
61+
doi: "10.1016/j.is.2023.102290"
62+
year: "2024"
63+
type: article
64+
journal: "Inf. Syst."
65+
volume: 120
66+
start: 102290
67+
authors:
68+
- family-names: Lenssen
69+
given-names: Lars
70+
- family-names: Schubert
71+
given-names: Erich

Cargo.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22
edition = "2021"
33
name = "kmedoids"
4-
version = "0.4.3"
4+
version = "0.5.0"
55
authors = ["Erich Schubert <erich.schubert@tu-dortmund.de>", "Lars Lenssen <lars.lenssen@tu-dortmund.de>"]
66
description = "k-Medoids clustering with the FasterPAM algorithm"
77
homepage = "https://github.com/kno10/python-kmedoids"
@@ -14,13 +14,13 @@ name = "kmedoids"
1414
crate-type = ["cdylib"]
1515

1616
[dependencies]
17-
rustkmedoids = { version = "0.4.3", package = "kmedoids", git = "https://github.com/kno10/rust-kmedoids" }
18-
numpy = "0.18"
17+
rustkmedoids = { version = "0.5.0", package = "kmedoids", git = "https://github.com/kno10/rust-kmedoids" }
18+
numpy = "0.20"
1919
ndarray = "0.15"
2020
rand = "0.8"
21-
rayon = "1.7"
21+
rayon = "1.8"
2222

2323
[dependencies.pyo3]
24-
version = "0.18"
24+
version = "0.20"
2525
features = ["extension-module"]
2626

README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,10 @@ For further details on medoid Silhouette clustering with automatic cluster numbe
3838
> Lars Lenssen, Erich Schubert:
3939
> **Medoid silhouette clustering with automatic cluster number selection**
4040
> Information Systems (120), 2024, 102290
41-
> <https://doi.org/10.1016/j.is.2023.102290>
41+
> <https://doi.org/10.1016/j.is.2023.102290>
42+
> Preprint: <https://arxiv.org/abs/2309.03751>
4243
43-
an earlier version was published as:
44+
the basic FasterMSC method was first published as:
4445

4546
> Lars Lenssen, Erich Schubert:
4647
> **Clustering by Direct Optimization of the Medoid Silhouette**
@@ -139,6 +140,12 @@ print("Loss with PAM:", pam.loss)
139140

140141
### Choose the optimal number of clusters
141142

143+
This package includes DynMSC, an algorithm that optimizes the Medoid Silhouette,
144+
and chooses the "optimal" number of clusters in a range of 2..kmax.
145+
Beware that if you allow a too large kmax, the optimum result will likely have many
146+
one-elemental clusters. A too high kmax may mask more desirable results, hence it
147+
is recommended that you choose only 2-3 times the number of clusters you expect as maximum.
148+
142149
```python
143150
import kmedoids, numpy
144151
from sklearn.datasets import fetch_openml
@@ -169,7 +176,7 @@ For larger data sets, it is recommended to only cluster a representative sample
169176
* Silhouette index for evaluation (Rousseeuw, 1987)
170177
* **FasterMSC** (Lenssen and Schubert, 2022)
171178
* FastMSC (Lenssen and Schubert, 2022)
172-
* DynMSC (Lenssen and Schubert, 2023)
179+
* **DynMSC** (Lenssen and Schubert, 2023)
173180
* PAMSIL (Van der Laan and Pollard, 2003)
174181
* PAMMEDSIL (Van der Laan and Pollard, 2003)
175182
* Medoid Silhouette index for evaluation (Van der Laan and Pollard, 2003)

docs/index.rst

Lines changed: 37 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Example
8181
print("Loss is:", c.loss)
8282
8383
Using the sklearn-compatible API
84-
-------------------
84+
--------------------------------
8585

8686
Note that KMedoids defaults to the `"precomputed"` metric, expecting a pairwise distance matrix.
8787
If you have sklearn installed, you can use `metric="euclidean"`.
@@ -114,8 +114,14 @@ MNIST (10k samples)
114114
print("PAM took: %.2f ms" % ((time.time() - start)*1000))
115115
print("Loss with PAM:", pam.loss)
116116
117-
Choose the optimal number of clusters
118-
-------------------
117+
Choosing the optimal number of clusters
118+
---------------------------------------
119+
120+
This package includes :ref:`DynMSC<dynmsc>`, an algorithm that optimizes the Medoid Silhouette,
121+
and chooses the "optimal" number of clusters in a range of 2..kmax.
122+
Beware that if you allow a too large kmax, the optimum result will likely have many
123+
one-elemental clusters. A too high kmax may mask more desirable results, hence it
124+
is recommended that you choose only 2-3 times the number of clusters you expect as maximum.
119125

120126
.. code-block:: python
121127
@@ -142,18 +148,26 @@ For larger data sets, it is recommended to only cluster a representative sample
142148
Implemented Algorithms
143149
======================
144150

151+
K-Medoids Clustering:
152+
145153
* :ref:`FasterPAM<fasterpam>` (Schubert and Rousseeuw, 2020, 2021)
146154
* :ref:`FastPAM1<fastpam1>` (Schubert and Rousseeuw, 2019, 2021)
147155
* :ref:`PAM<pam>` (Kaufman and Rousseeuw, 1987) with BUILD and SWAP
148-
* :ref:`Alternating<alternating>` (k-means-style approach)
149156
* :ref:`BUILD<build>` (Kaufman and Rousseeuw, 1987)
150-
* :ref:`Silhouette<silhouette>` (Kaufman and Rousseeuw, 1987)
157+
* :ref:`Alternating<alternating>` (k-means-style approach)
158+
159+
Silhouette Clustering:
160+
161+
* :ref:`DynMSC<dynmsc>` (Lenssen and Schubert, 2023)
151162
* :ref:`FasterMSC<fastermsc>` (Lenssen and Schubert, 2022)
152163
* :ref:`FastMSC<fastmsc>` (Lenssen and Schubert, 2022)
153-
* :ref:`DynMSC<dynmsc>` (Lenssen and Schubert, 2023)
154-
* :ref:`PAMSIL<pamsil>` (Van der Laan and Pollard, 2003)
155164
* :ref:`PAMMEDSIL<pammedsil>` (Van der Laan and Pollard, 2003)
156-
* :ref:`MedoidSilhouette<medoid_silhouette>` (Van der Laan and Pollard, 2003)
165+
* :ref:`PAMSIL<pamsil>` (Van der Laan and Pollard, 2003)
166+
167+
Evaluation:
168+
169+
* :ref:`Medoid Silhouette<medoid_silhouette>` (Van der Laan and Pollard, 2003)
170+
* :ref:`Silhouette<silhouette>` (Kaufman and Rousseeuw, 1987)
157171

158172
Note that the k-means style "alternating" algorithm yields rather poor result quality
159173
(see Schubert and Rousseeuw 2021 for an example and explanation).
@@ -193,6 +207,13 @@ PAM BUILD
193207

194208
.. autofunction:: pam_build
195209

210+
.. _DynMSC:
211+
212+
DynMSC
213+
======
214+
215+
.. autofunction:: dynmsc
216+
196217
.. _FasterMSC:
197218

198219
FasterMSC
@@ -207,12 +228,12 @@ FastMSC
207228

208229
.. autofunction:: fastmsc
209230

210-
.. _DynMSC:
231+
.. _PAMMEDSIL:
211232

212-
DynMSC
233+
PAMMEDSIL
213234
=========
214235

215-
.. autofunction:: dynmsc
236+
.. autofunction:: pammedsil
216237

217238
.. _PAMSIL:
218239

@@ -221,13 +242,6 @@ PAMSIL
221242

222243
.. autofunction:: pamsil
223244

224-
.. _PAMMEDSIL:
225-
226-
PAMMEDSIL
227-
=========
228-
229-
.. autofunction:: pammedsil
230-
231245
.. _Silhouette:
232246

233247
Silhouette
@@ -288,10 +302,11 @@ an earlier (slower, and now obsolete) version was published as:
288302
289303
For further details on medoid Silhouette clustering with automatic cluster number selection (FasterMSC, DynMSC), see:
290304

291-
| Lars Lenssen, Erich Schubert:
292-
| **Medoid silhouette clustering with automatic cluster number selection**
293-
| Information Systems (120), 2024, 102290
294-
| https://doi.org/10.1016/j.is.2023.102290
305+
| Lars Lenssen, Erich Schubert:
306+
| **Medoid silhouette clustering with automatic cluster number selection**
307+
| Information Systems (120), 2024, 102290
308+
| https://doi.org/10.1016/j.is.2023.102290
309+
| Preprint: https://arxiv.org/abs/2309.03751
295310
296311
an earlier version was published as:
297312

kmedoids/__init__.py

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,22 @@
88
- PAM (the original Partitioning Around Medoids algorithm)
99
- Alternating (k-means style algorithm, yields results of lower quality)
1010
- BUILD (the initialization of PAM)
11-
- Silhouette evaluation
1211
1312
Additionally, the package implements clustering algorithms
1413
for direct optimization of the (Medoid) Silhouette,
1514
in decreasing order of performance:
1615
1716
- FasterMSC
1817
- FastMSC (same result as PAMMEDSIL; but faster)
19-
- DynMSC
18+
- DynMSC (automatic choice of k; faster than repeated FasterMSC)
2019
- PAMMEDSIL
2120
- PAMSIL
2221
22+
Evaluation measures:
23+
24+
- Silhouette evaluation
25+
- Medoid Silhouette evaluation
26+
2327
References:
2428
2529
| Erich Schubert and Lars Lenssen:
@@ -43,6 +47,7 @@
4347
| Medoid silhouette clustering with automatic cluster number selection
4448
| Information Systems (120), 2024, 102290
4549
| <https://doi.org/10.1016/j.is.2023.102290>
50+
| Preprint: <https://arxiv.org/abs/2309.03751>
4651
4752
| Lars Lenssen, Erich Schubert:
4853
| Clustering by Direct Optimization of the Medoid Silhouette
@@ -78,7 +83,9 @@
7883
"alternating",
7984
"pam_build",
8085
"silhouette",
81-
"KMedoidsResult"
86+
"medoid_silhouette",
87+
"KMedoidsResult",
88+
"DynkResult",
8289
]
8390

8491
class KMedoidsResult:
@@ -113,7 +120,7 @@ def __repr__(self):
113120

114121
class DynkResult:
115122
"""
116-
K-medoids clustering result with automatic number of clusters
123+
K-medoids or Silhouette clustering result with automatic number of clusters
117124
118125
:param loss: Loss of this clustering (sum of deviations)
119126
:type loss: float
@@ -519,6 +526,7 @@ def fastmsc(diss, medoids, max_iter=100, init="random", random_state=None):
519526
| Medoid silhouette clustering with automatic cluster number selection
520527
| Information Systems (120), 2024, 102290
521528
| <https://doi.org/10.1016/j.is.2023.102290>
529+
| Preprint: <https://arxiv.org/abs/2309.03751>
522530
523531
| Lars Lenssen, Erich Schubert:
524532
| Clustering by Direct Optimization of the Medoid Silhouette
@@ -568,6 +576,7 @@ def fastermsc(diss, medoids, max_iter=100, init="random", random_state=None):
568576
| Medoid silhouette clustering with automatic cluster number selection
569577
| Information Systems (120), 2024, 102290
570578
| <https://doi.org/10.1016/j.is.2023.102290>
579+
| Preprint: <https://arxiv.org/abs/2309.03751>
571580
572581
| Lars Lenssen, Erich Schubert:
573582
| Clustering by Direct Optimization of the Medoid Silhouette
@@ -617,10 +626,11 @@ def dynmsc(diss, medoids, max_iter=100, init="random", random_state=None):
617626
| Medoid silhouette clustering with automatic cluster number selection
618627
| Information Systems (120), 2024, 102290
619628
| <https://doi.org/10.1016/j.is.2023.102290>
629+
| Preprint: <https://arxiv.org/abs/2309.03751>
620630
621631
:param diss: square numpy array of dissimilarities
622632
:type diss: ndarray
623-
:param medoids: maximum number of clusters to find or existing medoids with length of maximum number of clusters to find
633+
:param medoids: maximum number of clusters to find or existing medoids with length of maximum number of clusters to find
624634
:type medoids: int or ndarray
625635
:param max_iter: maximum number of iterations
626636
:type max_iter: int
@@ -831,6 +841,7 @@ class KMedoids(SKLearnClusterer):
831841
| Medoid silhouette clustering with automatic cluster number selection
832842
| Information Systems (120), 2024, 102290
833843
| <https://doi.org/10.1016/j.is.2023.102290>
844+
| Preprint: <https://arxiv.org/abs/2309.03751>
834845
835846
| Lars Lenssen, Erich Schubert:
836847
| Clustering by Direct Optimization of the Medoid Silhouette
@@ -850,7 +861,7 @@ class KMedoids(SKLearnClusterer):
850861
| In: Journal of Statistical Computation and Simulation, pp 575-584, 2003
851862
| https://doi.org/10.1080/0094965031000136012
852863
853-
:param n_clusters: The number of clusters to form
864+
:param n_clusters: The number of clusters to form (maximum number of clusters if `method="dynmsc"`)
854865
:type n_clusters: int
855866
:param metric: It is recommended to use 'precomputed', in particular when experimenting with different `n_clusters`.
856867
If you have sklearn installed, you may pass any metric supported by `sklearn.metrics.pairwise_distances`.

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
[build-system]
2-
requires = ["maturin>=0.14,<0.15"]
2+
requires = ["maturin>=1.4,<2"]
33
build-backend = "maturin"
44

55
[project]
66
name = "kmedoids"
7-
version = "0.4.3"
7+
version = "0.5.0"
88
description = "k-Medoids Clustering in Python with FasterPAM"
99
requires-dist = ["numpy"]
1010
classifier = [

0 commit comments

Comments
 (0)