The seeding algorithms for spherical k-means clustering

被引:39
作者
Li, Min [1 ]
Xu, Dachuan [2 ]
Zhang, Dongmei [3 ]
Zou, Juan [4 ]
机构
[1] Shandong Normal Univ, Sch Math & Stat, Jinan 250014, Peoples R China
[2] Beijing Univ Technol, Beijing Inst Sci & Engn Comp, Beijing 100124, Peoples R China
[3] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Peoples R China
[4] Qufu Normal Univ, Sch Math Sci, Qufu 273165, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximation algorithm; Spherical k-means clustering; k-means problem; Separable set; Seeding algorithm;
D O I
10.1007/s10898-019-00779-w
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem (alpha spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the alpha-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the alpha-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.
引用
收藏
页码:695 / 708
页数:14
相关论文
共 21 条
[1]  
Ackermann M.R, 2009, THESIS
[2]   Quality Controlled ECG Compression using Discrete Cosine Transform (DCT) and Laplacian Pyramid (LP) [J].
Aggarwal, Vibha ;
Patterh, Manjeet Singh .
2009 INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES, 2009, :12-+
[3]   Multi Objective Evolutionary Approach for Biometric Fusion [J].
Ahmadian, Kushan ;
Gavrilova, Marina .
ICBAKE: 2009 INTERNATIONAL CONFERENCE ON BIOMETRICS AND KANSEI ENGINEERING, 2009, :12-17
[4]   NP-hardness of Euclidean sum-of-squares clustering [J].
Aloise, Daniel ;
Deshpande, Amit ;
Hansen, Pierre ;
Popat, Preyas .
MACHINE LEARNING, 2009, 75 (02) :245-248
[5]  
[Anonymous], 2007, P 18 ANN ACM SIAM S
[6]  
[Anonymous], 2015, 31 INT S COMP GEOM S, DOI 10.4230/LIPICS.SOCG.2015.754
[7]  
[Anonymous], 2001, TECHNICAL REPORT
[8]  
Bachem O., 2016, ADV NEURAL INFORM PR, P55, DOI DOI 10.5555/3157096.3157103
[9]  
Bachem O, 2016, AAAI CONF ARTIF INTE, P1459
[10]  
Bachem O, 2017, PR MACH LEARN RES, V70