Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids

被引:11
作者
Sardar T.H. [1 ]
Ansari Z. [2 ]
机构
[1] School of Computer Science & Engineering, Jain University, Bangalore
[2] Electrical Engineering Section, University Polytechnic, Aligarh Muslim University, Aligarh
关键词
Big data; Distributed computing; Distributed fuzzy C-Medoids; Document clustering; Hadoop; MapReduce;
D O I
10.1007/s40031-021-00647-w
中图分类号
学科分类号
摘要
Efficient big data clustering is a requirement for massive data generating in this digitalized connected world. The traditional clustering algorithms do not scale over massively sized and highly unstructured big data. Thus, to obtain efficiency in clustering big data new architecture and programming paradigm is required. In this work, a novel MapReduce-based Fuzzy C-Medoids clustering algorithm is designed and experimented with to cluster big data repository of documents datasets. The performance of the proposed algorithm is experimentally evaluated for different-sized Hadoop cluster sizes and different-sized document datasets. The algorithm is found to be scalable and efficient in performing clustering jobs. © 2021, The Institution of Engineers (India).
引用
收藏
页码:73 / 82
页数:9
相关论文
共 32 条
[1]  
Oussous A., Et al., Big data technologies: a survey, J. King Saud Univ. Comput. Inf. Sci., 30, 4, pp. 431-448, (2018)
[2]  
Sardar T.H., Faizabadi A.R., Ansari Z., An evaluation of MapReduce framework in cluster analysis, 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) (IEEE, 2017), pp. 110-114
[3]  
Sardar T.H., Faizabadi A.R., Ansari Z., An analysis of data processing using MapReduce paradigm on the hadoop framework, Spec. Issue Int. J. Emerg. Res. Manag. Technol., 6, 5, pp. 922-927, (2017)
[4]  
Shah N., Mahajan S., Document clustering: a detailed review, Int. J. Appl. Inf. Syst., 4, 5, pp. 30-38, (2012)
[5]  
Sardar T.H., Ansari Z., Partition-based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions, Future Comput. Inform. J., 3, 2, pp. 247-261, (2018)
[6]  
Ghosh S., Dubey S.K., Comparative analysis of k-means and fuzzy c-means algorithms, Int. J. Adv. Comput. Sci. Appl., 4, 4, (2013)
[7]  
Zahid A., Faizabadi A.R., Afzal A., Fuzzy c-Least Medians clustering for the discovery of web access patterns from web user sessions data, Intell. Data Anal., 21, 3, pp. 553-575, (2017)
[8]  
Han J., Pei J., Kamber M., Data mining: concepts and techniques, (2011)
[9]  
Blazewicz J., Et al., Handbook on data management in information systems, (2012)
[10]  
He Y., Tan H., Luo W., Mao H., Ma D., Feng S., Fan J., Mr-dbscan: An efficient parallel density-based clustering algorithm using mapreduce, 2011 IEEE 17Th International Conference on Parallel and Distributed Systems, pp. 473-480, (2011)