Selection of K in K-means clustering

被引：367

作者：

Pham, DT ^{[1
]}

Dimov, SS ^{[1
]}

Nguyen, CD ^{[1
]}

机构：

[1] Cardiff Univ, Mfg Engn Ctr, Cardiff CF24 OYF, Wales

来源：

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE | 2005年 / 219卷 / 01期

关键词：

clustering; K-means algorithm; cluster number selection;

D O I：

10.1243/095440605X8298

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

The K-means algorithm is a popular data-clustering algorithm. However, one of its drawbacks is the requirement for the number of clusters, K, to be specified before the algorithm is applied. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Factors that affect this selection are then discussed and a new measure to assist the selection is proposed. The paper concludes with an analysis of the results of using the proposed measure to determine the number of clusters for the K-means algorithm for different data sets.

引用

页码：103 / 119

页数：17

共 31 条

[1] New methods for the initialisation of clusters [J].

AlDaoud, MB ;

Roberts, SA .

PATTERN RECOGNITION LETTERS, 1996, 17 (05) :451-455

[2]

ALDAOUD MB, 1995, 9518 U LEEDS SCH COM

[3]

[Anonymous], 208 STANF U DEP STAT

[4]

[Anonymous], 1998, PATTERN RECOGNITION

[5]

[Anonymous], 2002, ACM SIGKDD EXPLOR NE, DOI [10.1145/568574.568575, DOI 10.1145/568574.568575]

[6]

[Anonymous], ELECT ENG COMPUTER S

[7]

Bayardo R.J., 1999, P 5 ACM SIGKDD INT C

[8]

BILMES J, TR97018 INT COMP SCI

[9]

Blake C.L., 1998, UCI repository of machine learning databases

[10]

Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585

← 1 2 3 4 →