HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data

被引:0
作者
Bandyopadhyay, Soumyendu Sekhar [1 ]
Halder, Anup Kumar [2 ]
Chatterjee, Piyali [3 ]
Nasipuri, Mita [2 ]
Basu, Subhadip [2 ]
机构
[1] Regent Educ & Res Fdn, Dept MCA, Kolkata 700121, India
[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, India
[3] Netaji Subhash Engn Coll, Dept Comp Sci & Engn, Kolkata 700152, India
来源
2017 IEEE CALCUTTA CONFERENCE (CALCON) | 2017年
关键词
Clustering; K-Means; Big Data; Hadoop; MapReduce; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering is one of the most important unsupervised learning used for prediction and overcome anomalies by grouping of data. As the quantity of the data is increasing every day, it has become a troublesome job to process these data with limited computational resources. The era is in need to treat it as a Big Data problem, which requires an advance technology to store, and process the data in seamlessly distributed fashion. Apache Hadoop offers a solution for this problem by designing techniques using commodity hardware to run parallel jobs. In this paper, we have discussed an algorithm to process K-Means algorithm in Hadoop by varying the data set and cluster centers. We then draw a comparison on parallel and sequential execution, keeping the other factors same. The experimental result depicts that our algorithm can efficiently process large dataset on Hadoop environment.
引用
收藏
页码:452 / 456
页数:5
相关论文
共 15 条
[1]  
Aggarwal C C., 2013, DATA Custering Algorithms and Applications
[2]  
Anchalia PP, 2013, INT C INFO SCI APPL
[3]  
[Anonymous], BIG DATA MANAG TECHN
[4]  
[Anonymous], ARXIV12124692
[5]  
[Anonymous], 2013 4 INT C COMP CO
[6]  
[Anonymous], INFORM SYSTEMS DESIG
[7]  
Borthakur D, 2007, The hadoop distributed file system: Architecture and design
[8]   Clustering large graphs via the Singular Value Decomposition [J].
Drineas, P ;
Frieze, A ;
Kannan, R ;
Vempala, S ;
Vinay, V .
MACHINE LEARNING, 2004, 56 (1-3) :9-33
[9]  
Eren B, 2015, LECT NOTES ENG COMP, P814
[10]   Toward Scalable Systems for Big Data Analytics: A Technology Tutorial [J].
Hu, Han ;
Wen, Yonggang ;
Chua, Tat-Seng ;
Li, Xuelong .
IEEE ACCESS, 2014, 2 :652-687