A NEW DENSITY BASED SAMPLING TO ENHANCE DBSCAN CLUSTERING ALGORITHM

被引:6
作者
Al-mamory, Safaa O. [1 ]
Kamil, Israa S. [2 ]
机构
[1] Univ Informat Technol & Commun, Coll Business Informat, Baghdad 10067, Iraq
[2] Al Mustaqbal Univ Coll, Babylon 51002, Iraq
关键词
database; clustering; DBSCAN; sampling;
D O I
10.22452/mjcs.vol32no4.5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DBSCAN is one of the efficient density-based clustering algorithms. It is characterized by its ability to discover clusters with different shapes and sizes, and to separate noise and outliers. However, when the dataset contain different densities, DBSCAN clustering will be inefficient. In this paper, we propose an approach to enable DBSCAN to cluster dataset having different densities by preprocess the dataset to make it with one density level. This system composed of four stages: firstly, a new approach to separate dataset based on density is presented. Secondly, a new density biased sampling technique is proposed. Thirdly, the resulted sparse data from the last two stages is clustered with DBSCAN. Finally, the remaining data from sampling will be clustered with KNN. The experimental results on synthetic and real datasets on average show that the clustering of the proposed algorithm is better than that of DBSCAN by more than 7% and retains time complexity of DBSCAN.
引用
收藏
页码:315 / 327
页数:13
相关论文
共 30 条
  • [1] Amini A., 2014, J NETW COMPUT APPL, P1
  • [2] An introduction to MCMC for machine learning
    Andrieu, C
    de Freitas, N
    Doucet, A
    Jordan, MI
    [J]. MACHINE LEARNING, 2003, 50 (1-2) : 5 - 43
  • [3] Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
  • [4] [Anonymous], P 6 INT C MACH LEARN
  • [5] Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14
  • [6] GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid
    Chen Xiaoyun
    Min Yufang
    Zhao Yan
    Wang Ping
    [J]. PROCEEDINGS OF THE ICEBE 2008: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, 2008, : 780 - 783
  • [7] An adaptive spatial clustering algorithm based on delaunay triangulation
    Deng, Min
    Liu, Qiliang
    Cheng, Tao
    Shi, Yan
    [J]. COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2011, 35 (04) : 320 - 332
  • [8] A local density based spatial clustering algorithm with noise
    Duan, Lian
    Xiong, Deyi
    Lee, Jun
    Guo, Feng
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 4061 - +
  • [9] Ester M., 1996, P 2 INT C KNOWL DISC
  • [10] Han J., 2005, The Morgan Kaufmann Series in Data Management Systems, V2nd