The provably good parallel seeding algorithms for the k-means problem with penalties

被引:1
作者
Li, Min [1 ]
Xu, Dachuan [2 ]
Zhang, Dongmei [3 ]
Zhou, Huiling [2 ]
机构
[1] Shandong Normal Univ, Sch Math & Stat, Jinan 250014, Peoples R China
[2] Beijing Univ Technol, Dept Operat Res & Sci Comp, Beijing 100124, Peoples R China
[3] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Peoples R China
基金
中国国家自然科学基金;
关键词
approximation algorithm; k-means; k-means problem with penalties; parallel seeding algorithm; APPROXIMATION ALGORITHMS;
D O I
10.1111/itor.12808
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
As a classic NP-hard problem in machine learning and computational geometry, the k-means problem aims to partition the given points into k sets to minimize the within-cluster sum of squares. The k-means problem with penalties (k-MPWP), as a generalizing problem of the k-means problem, allows a point that can be either clustered or penalized with some positive cost. In this paper, we mainly apply the parallel seeding algorithm to the k-MPWP, and show sufficient analysis to bound the expected solution quality in the case where both the number of iterations and the number of points sampled in each iteration can be given arbitrarily. The approximate guarantee can be obtained as O(f(M)lnk), where f(M) is a polynomial function involving the maximal ratio M between the penalties. On one hand, this result can be viewed as a further improvement on the parallel algorithm for k-MPWP given by Li et al., where the number of iterations depends on other factors. On the other hand, our result also generalizes the one solving the k-means problem presented by Bachem et al., because k-MPWP is a variant of the k-means problem. Moreover, we present a numerical experiment to show the effectiveness of the parallel algorithm for k-means with penalties.
引用
收藏
页码:158 / 171
页数:14
相关论文
共 26 条
[1]  
Aggarwal A, 2009, LECT NOTES COMPUT SC, V5687, P15, DOI 10.1007/978-3-642-03685-9_2
[2]   Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms [J].
Ahmadian, Sara ;
Norouzi-Fard, Ashkan ;
Svensson, Ola ;
Ward, Justin .
2017 IEEE 58TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2017, :61-72
[3]   NP-hardness of Euclidean sum-of-squares clustering [J].
Aloise, Daniel ;
Deshpande, Amit ;
Hansen, Pierre ;
Popat, Preyas .
MACHINE LEARNING, 2009, 75 (02) :245-248
[4]  
[Anonymous], 2007, SOC IND APPL MATH
[5]   An output-oriented classification of multiple attribute decision-making techniques based on fuzzy c-means clustering method [J].
Asgharizadeh, Ezzatollah ;
Yazdi, Mohammadreza Taghizadeh ;
Balani, Abdolkarim Mohammadi .
INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2019, 26 (06) :2476-2493
[6]  
Awasthi Pranjal, 2015, 31 INT S COMPUTATION, V34, P754
[7]  
Bachem O., 2016, P NIPS, P1
[8]  
Bachem O, 2016, AAAI CONF ARTIF INTE, P1459
[9]  
Bachem O, 2017, PR MACH LEARN RES, V70
[10]   Scalable K-Means++ [J].
Bahmani, Bahman ;
Moseley, Benjamin ;
Vattani, Andrea ;
Kumar, Ravi ;
Vassilvitskii, Sergei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (07) :622-633