implement n-trial greedy heuristic by nlhepler · Pull Request #4 · genbattle/rkm

nlhepler · 2019-09-19T21:37:39Z

This is apparently what a lot of implementations actually use.

genbattle · 2019-09-21T11:03:26Z

This is apparently what a lot of implementations actually use.

Thank you for your contribution.

Can you link to an implementation that uses this, or a paper that describes this approach and the advantages/disadvantages? I'm not familiar with it, and the advantages are not clear to me at present. If I can understand the motive behind this change it will make it easier to review.

nlhepler · 2019-09-22T21:26:30Z

The scikit-learn implementation does this, referencing the original k-means++ paper, notably a remark in the conclusion:

Also, experiments showed thatk-means++generallyperformed better if it selected several new centers duringeach iteration, and then greedily chose the one thatdecreasedφas much as possible

(https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf)

genbattle · 2019-09-29T08:02:11Z

src/lib.rs

+) -> Array2<V> {
    assert!(k > 1);
    assert!(data.dim().0 > 0);
+    let n_trials = n_trials.unwrap_or(2 + (k as f64).ln().floor() as usize);


I have issues with the default value as it doesn't preserve the original behavior of this ~~algorithm~~ implementation.

genbattle · 2019-09-29T08:08:18Z

src/lib.rs

 pub fn kmeans_lloyd<V: Value>(
    data: &ArrayView2<V>,
    k: usize,
+    n_trials: Option<usize>,


This changes the behavior and structure of the algorithm sufficiently that I think it's better to have a separate implementation. Create a new kmeans_lloyd_with_n_trials function that takes the n_trials parameter.

As long as the original behavior is preserved when calling this function when n_trials is none I'm happy to drop this one. It might still be worth splitting the implementation of initialize_plusplus for the two different cases though.

Lance Hepler added 2 commits September 19, 2019 14:36

implement n-trial greedy heuristic

ced642e

fix test, address warning about deprectated subview_mut

b886132

genbattle requested changes Sep 29, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement n-trial greedy heuristic#4

implement n-trial greedy heuristic#4
nlhepler wants to merge 2 commits intogenbattle:masterfrom
nlhepler:master

nlhepler commented Sep 19, 2019

Uh oh!

genbattle commented Sep 21, 2019 •

edited

Loading

Uh oh!

nlhepler commented Sep 22, 2019

Uh oh!

genbattle Sep 29, 2019 •

edited

Loading

Uh oh!

genbattle Sep 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nlhepler commented Sep 19, 2019

Uh oh!

genbattle commented Sep 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nlhepler commented Sep 22, 2019

Uh oh!

genbattle Sep 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

genbattle Sep 29, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

genbattle commented Sep 21, 2019 •

edited

Loading

genbattle Sep 29, 2019 •

edited

Loading