The seeding algorithms for spherical k-means clustering

被引：41

作者：

Li, Min ^{[1
]}

Xu, Dachuan ^{[2
]}

Zhang, Dongmei ^{[3
]}

Zou, Juan ^{[4
]}

机构：

[1] Shandong Normal Univ, Sch Math & Stat, Jinan 250014, Peoples R China

[2] Beijing Univ Technol, Beijing Inst Sci & Engn Comp, Beijing 100124, Peoples R China

[3] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Peoples R China

[4] Qufu Normal Univ, Sch Math Sci, Qufu 273165, Shandong, Peoples R China

来源：

JOURNAL OF GLOBAL OPTIMIZATION | 2020年 / 76卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Approximation algorithm; Spherical k-means clustering; k-means problem; Separable set; Seeding algorithm;

D O I：

10.1007/s10898-019-00779-w

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem (alpha spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the alpha-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the alpha-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.

引用

页码：695 / 708

页数：14

共 21 条

[11] Scalable K-Means++ [J].

Bahmani, Bahman ;

Moseley, Benjamin ;

Vattani, Andrea ;

Kumar, Ravi ;

Vassilvitskii, Sergei .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (07) :622-633

[12]

Banerjee A, 2005, J MACH LEARN RES, V6, P1345

[13]

Blömer J, 2016, LECT NOTES COMPUT SC, V9220, P81, DOI 10.1007/978-3-319-49487-6_3

[14] Concept decompositions for large sparse text data using clustering [J].

Dhillon, IS ;

Modha, DS .

MACHINE LEARNING, 2001, 42 (1-2) :143-175

[15] Clustering large graphs via the Singular Value Decomposition [J].

Drineas, P ;

Frieze, A ;

Kannan, R ;

Vempala, S ;

Vinay, V .

MACHINE LEARNING, 2004, 56 (1-3) :9-33

[16]

Endo Yasunori, 2015, Modeling Decisions for Artificial Intelligence. 12th International Conference, MDAI 2015. Proceedings: 9321, P103, DOI 10.1007/978-3-319-23240-9_9

[17]

Hornik K, 2012, J STAT SOFTW, V50, P1

[18] Improved and simplified inapproximability for k-means [J].

Lee, Euiwoong ;

Schmidt, Melanie ;

Wright, John .

INFORMATION PROCESSING LETTERS, 2017, 120 :40-43

[19]

LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489

[20] The Effectiveness of Lloyd-Type Methods for the k-Means Problem [J].

Ostrovsky, Rafail ;

Rabani, Yuval ;

Schulman, Leonard J. ;

Swamy, Chaitanya .

JOURNAL OF THE ACM, 2012, 59 (06)

← 1 2 3 →