The seeding algorithms for spherical k-means clustering

被引:39
作者
Li, Min [1 ]
Xu, Dachuan [2 ]
Zhang, Dongmei [3 ]
Zou, Juan [4 ]
机构
[1] Shandong Normal Univ, Sch Math & Stat, Jinan 250014, Peoples R China
[2] Beijing Univ Technol, Beijing Inst Sci & Engn Comp, Beijing 100124, Peoples R China
[3] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Peoples R China
[4] Qufu Normal Univ, Sch Math Sci, Qufu 273165, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximation algorithm; Spherical k-means clustering; k-means problem; Separable set; Seeding algorithm;
D O I
10.1007/s10898-019-00779-w
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem (alpha spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the alpha-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the alpha-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.
引用
收藏
页码:695 / 708
页数:14
相关论文
共 21 条
[11]   Scalable K-Means++ [J].
Bahmani, Bahman ;
Moseley, Benjamin ;
Vattani, Andrea ;
Kumar, Ravi ;
Vassilvitskii, Sergei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (07) :622-633
[12]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1345
[13]  
Blömer J, 2016, LECT NOTES COMPUT SC, V9220, P81, DOI 10.1007/978-3-319-49487-6_3
[14]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[15]   Clustering large graphs via the Singular Value Decomposition [J].
Drineas, P ;
Frieze, A ;
Kannan, R ;
Vempala, S ;
Vinay, V .
MACHINE LEARNING, 2004, 56 (1-3) :9-33
[16]  
Endo Yasunori, 2015, Modeling Decisions for Artificial Intelligence. 12th International Conference, MDAI 2015. Proceedings: 9321, P103, DOI 10.1007/978-3-319-23240-9_9
[17]  
Hornik K, 2012, J STAT SOFTW, V50, P1
[18]   Improved and simplified inapproximability for k-means [J].
Lee, Euiwoong ;
Schmidt, Melanie ;
Wright, John .
INFORMATION PROCESSING LETTERS, 2017, 120 :40-43
[19]  
LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489
[20]   The Effectiveness of Lloyd-Type Methods for the k-Means Problem [J].
Ostrovsky, Rafail ;
Rabani, Yuval ;
Schulman, Leonard J. ;
Swamy, Chaitanya .
JOURNAL OF THE ACM, 2012, 59 (06)