Scalable Fast Evolutionary k-means Clustering

被引:3
作者
de Oliveira, Gilberto Viana [1 ]
Naldi, Murilo Coelho [1 ]
机构
[1] Univ Fed Vicosa, Dept Informat, Vicosa, MG, Brazil
来源
2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015) | 2015年
关键词
evolutionary clustering; k-means; MapReduce;
D O I
10.1109/BRACIS.2015.20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increasing amount of data requires greater scalability for clustering algorithms. The intrinsic parallelism of the MapReduce model confers management and reliability to large-scale distributed operations. However, its restrictions hinder the direct application of several traditional clustering algorithms. k-means is one of the few clustering algorithms that satisfy the MapReduce constraints, but it requires the prior specification of the number of clusters and is sensitive to their initialization. This paper proposes a MapReduce algorithm able to evolve clusters with no need to specify k-means' parameters. Through evolutive operators, obtained clusters are used to search for better solutions, allowing the algorithm to find quality solutions quickly. The algorithm is compared with state-of-the-art MapReduce versions of a systematic algorithm which is able to find the number of k-means clusters and initializations. Computational experiments and statistical analyses of the results indicate that the proposed algorithm is able to obtain clusters with quality equal or superior to clusters of the compared algorithm, but faster.
引用
收藏
页码:74 / 79
页数:6
相关论文
共 26 条
[1]  
Alves VS, 2006, IEEE C EVOL COMPUTAT, P1761
[2]  
[Anonymous], 2004, OSDI 04
[3]  
[Anonymous], 2006, PROBABILITY STAT ENG
[4]  
[Anonymous], 1987, Multiple comparison procedures
[5]   Scalable K-Means++ [J].
Bahmani, Bahman ;
Moseley, Benjamin ;
Vattani, Andrea ;
Kumar, Ravi ;
Vassilvitskii, Sergei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (07) :622-633
[6]  
de Vega FF, 2010, STUD COMPUT INTELL, V269, P1
[7]  
Hamstra M, 2015, LEARNING SPARK LIGHT
[8]   Evolutionary algorithms for clustering gene-expression data [J].
Hruschka, ER ;
de Castro, LN ;
Campello, RJGB .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :403-406
[9]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[10]  
Lawrence D., 1991, Handbook of Genetic Algorithms