Big data clustering with varied density based on MapReduce

被引:35
作者
Heidari, Safanaz [1 ]
Alborzi, Mahmood [1 ]
Radfar, Reza [1 ]
Afsharkazemi, Mohammad Ali [2 ]
Ghatari, Ali Rajabzadeh [3 ]
机构
[1] Islamic Azad Univ, Dept Informat Technol Management, Sci & Res Branch, Tehran, Iran
[2] Islamic Azad Univ, Dept Ind Management, Cent Tehran Branch, Tehran, Iran
[3] Tarbiat Modares Univ, Dept Management, Tehran, Iran
关键词
Map-Reduce; Density-based clustering; Big data; ALGORITHM;
D O I
10.1186/s40537-019-0236-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The DBSCAN algorithm is a prevalent method of density-based clustering algorithms, the most important feature of which is the ability to detect arbitrary shapes and varied clusters and noise data. Nevertheless, this algorithm faces a number of challenges, including failure to find clusters of varied densities. On the other hand, with the rapid development of the information age, plenty of data are produced every day, such that a single machine alone cannot process this volume of data; hence, new technologies are required to store and extract information from this volume of data. A large volume of data that is beyond the capabilities of existing software is called Big data. In this paper, we have attempted to introduce a new algorithm for clustering big data with varied density using a Hadoop platform running MapReduce. The main idea of this research is the use of local density to find each point's density. This strategy can avoid the situation of connecting clusters with varying densities. The proposed algorithm is implemented and compared with other algorithms using the MapReduce paradigm and shows the best varying density clustering capability and scalability.
引用
收藏
页数:16
相关论文
共 39 条
[1]  
Ahmed K N., 2016, International Journal of Advanced Research in Computer and Communication Engineering, V5, P360, DOI [10.17148/IJARCCE.2016.5277, DOI 10.17148/IJARCCE.2016.5277]
[2]  
Aktar N, 2015, INT C COMP INT COMM
[3]  
[Anonymous], 2012, J EMERGING TECHNOLOG, DOI [DOI 10.1145/1980022.1980143, 10.1145/1980022.1980143]
[4]   Efficient incremental density-based algorithm for clustering large datasets [J].
Bakr, Ahmad M. ;
Ghanem, Nagia M. ;
Ismail, Mohamed A. .
ALEXANDRIA ENGINEERING JOURNAL, 2015, 54 (04) :1147-1154
[5]   VDMR-DBSCAN: Varied Density MapReduce DBSCAN [J].
Bhardwaj, Surbhi ;
Dash, Subrat Kumar .
BIG DATA ANALYTICS, BDA 2015, 2015, 9498 :134-150
[6]   DDSC : A Density Differentiated Spatial Clustering Technique [J].
Borah, B. ;
Bhattacharyya, D. K. .
JOURNAL OF COMPUTERS, 2008, 3 (02) :72-79
[7]   Data-intensive applications, challenges, techniques and technologies: A survey on Big Data [J].
Chen, C. L. Philip ;
Zhang, Chun-Yang .
INFORMATION SCIENCES, 2014, 275 :314-347
[8]   Big Data: A Survey [J].
Chen, Min ;
Mao, Shiwen ;
Liu, Yunhao .
MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) :171-209
[9]  
Chih-Wei L, 2013, IMPROVEMENT DATA SER, P463
[10]  
Dai BR, 2012, 5 INT C CLOUD COMP