k-means++ seeding

English Tech/Engineering Computers (general) Detecting Meaningful Clusters from High-dimensional Data: A Strongly Consistent Sparse Center-based Clustering Approach

Algorithm 1 gives a formal description of the LW-kmeans algorithm. The algorithm is initiated by randomly
choosing the k initial cluster centroids from from the n
datapoints. A k-means++ seeding [61] is also possible and
leads to slight improvement in the results as shown in
Section 6 of the supplement.

References

(k-means++) (seeding)

Helena Chavarria

Discussion

Althea Draper Jan 27, 2022: This paper might be a help where it talks of selection of seeds. "انتخاب هوشمندانه مراکز اولیه در الگوریتم خوشه بندی K-means به‌منظور بهبود تشخیص موضوع" https://jcsit.ir/article/49
Althea Draper Jan 27, 2022: One method of determining clusters in a set of data points is centroid based clustering where each data point belongs to the cluster whose center is within closest distance of that data point. Say there were 3 clusters in the data, then 3 initial centroid points would be randomly chosen. These are called seeds. The distances between the data points and the centroids would be calculated, and based on the results a better estimation for each of the 3 centroids would be worked out. The process would be repeated until the centroids stop 'moving' ie the process has homed in on the 'true' centroids. There is also partial clustering seeding - this is where K-means ++ seeding comes in. What the k-means++ algorithm does is generate the initial seeds (starting centroid points) which are then fed into the K-means algorithm instead of randomly chosen seeds. This generally results in more accurately defined clusters than if you started off with randomly chosen seeds (starting centroid points). https://www.csc.kth.se/utbildning/kth/kurser/DD143X/dkand13/... (pages 1-8) and this link has some good graphics https://devopedia.org/k-means-clustering
Helena Chavarria Jan 27, 2022: Seeding You use seeding to provide initial values for lookup lists, for demo purposes, proof of concepts etc.
teimoor bahrami (asker) Jan 27, 2022: what does seeding means here? Is it means initializing?
Althea Draper Jan 27, 2022: "The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. After picking (i-1) centers, pick the ith center to be a point p with probability proportional to the square of the Euclidean distance of p to the closest previously (i − 1) chosen centers." from https://www.google.co.uk/books/edition/Theory_and_Applicatio... (page 7) Some more information at the bottom of this page https://www.mathworks.com/help/stats/kmeans.html and in this piece https://medium.com/@srv96/kmeans-a-careful-seeding-technique...

Reference comments

55 mins

Helena Chavarria

Spain

1822 answers

Native in English & English

Reference:

(k-means++) (seeding)

Theorem 1.1. For any set of data points, E[φ] ≤ 8(ln k + 2)φOP T .
This sampling is both fast and simple, and it already achieves approximation guarantees that k-means cannot. We propose using it to seed the initial centers for k-means, leading to a combined algorithm we call k-means++

https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf
I've no idea what it means, but it's two terms: 'k-means++' plus 'seeding'

--------------------------------------------------
Note added at 58 mins (2022-01-27 10:57:41 GMT)
--------------------------------------------------

n data mining, k-means++ is an algorithm for choosing the initial values (or "seeds") for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means problem—a way of avoiding the sometimes poor clusterings found by the standard k-means algorithm. It is similar to the first of three seeding methods proposed, in independent work, in 2006 by Rafail Ostrovsky, Yuval Rabani, Leonard Schulman and Chaitanya Swamy. (The distribution of the first seed is different.)

https://en.wikipedia.org/wiki/K-means++

--------------------------------------------------
Note added at 3 hrs (2022-01-27 13:44:28 GMT)
--------------------------------------------------

No, I'm afraid I can't refer you to the 'best dictionary for understanding words'. That's one of the reasons why translation is so difficult. You might be lucky and find an online glossary but usually translators need to use places like ProZ, dictionaries, websites, personal experience and common sense. I'm sorry I can't help you.

Note from asker:

Thanks can you refer me to the best dictionary for understanding words?

English term

k-means++ seeding

Discussion

Reference comments

(k-means++) (seeding)

Something went wrong...

Your current localization setting

Select a language

English term

k-means++ seeding

Discussion

Reference comments

(k-means++) (seeding)

Something went wrong...

You have native languages that can be verified

Your current localization setting

Select a language