Enhancing Parallel k-Means Using Map Reduce for Discovering Knowledge from Big Data

被引:0
作者
Moertini, Veronica S. [1 ]
Venica, Liptia [1 ]
机构
[1] Parahyangan Catholic Univ, Dept Informat, Bandung, Indonesia
来源
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016) | 2016年
关键词
clustering big data; parallel k-means; MapReduce; MAPREDUCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowledge discovery from data using clustering algorithm include stages of data preprocessing, clustering the preprocessed dataset and evaluating patterns for obtaining knowledge. Along with the popularity of Hadoop, k-Means algorithm has been enhanced based on MapReduce for clustering big dataset. We enhance this existing algorithm such that it includes the capabilities for performing data preprocessing, generating patterns and measures such that these can be used for evaluating the quality of clusters. Our preliminary experiment results in small Hadoop cluster indicate that our proposed technique performs well for clustering a case study big dataset.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 13 条
[1]  
[Anonymous], 1997, IEEE T AUTOM CONTROL, DOI DOI 10.1109/TAC.1997.633847
[2]  
[Anonymous], 2002, INTEGRAL
[3]  
Chius S., 2011, DATA MINING MARKET I
[4]  
Han J, 2012, MOR KAUF D, P1
[5]  
Holmes A., 2012, Hadoop in practice
[6]  
Ibrahim Niko, 2015, Journal of Theoretical and Applied Information Technology, V71, P61
[7]  
Lam C., 2010, HADOOP ACTION
[8]   An Improved K-means Algorithm based on Mapreduce and Grid [J].
Ma, Li ;
Gu, Lei ;
Li, Bo ;
Ma, Yue ;
Wang, Jin .
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (01) :189-199
[9]  
Moertini Veronica S., 2015, Journal of Theoretical and Applied Information Technology, V74, P300
[10]  
Sammer E., 2012, Hadoop operations