APSCAN: A parameter free algorithm for clustering

被引:52
作者
Chen, Xiaoming [1 ,2 ]
Liu, Wanquan [1 ]
Qiu, Huining [1 ,3 ]
Lai, Jianhuang [2 ]
机构
[1] Curtin Univ Technol, Dept Comp, Bentley, WA 6102, Australia
[2] Sun Yat Sen Univ, Sch Informat Sci & Technol, Guangzhou 510275, Guangdong, Peoples R China
[3] Sun Yat Sen Univ, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China
关键词
Clustering algorithm; DBSCAN; Affinity propagation algorithm; KERNEL;
D O I
10.1016/j.patrec.2011.02.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DBSCAN is a density based clustering algorithm and its effectiveness for spatial datasets has been demonstrated in the existing literature. However, there are two distinct drawbacks for DBSCAN: (i) the performances of clustering depend on two specified parameters. One is the maximum radius of a neighborhood and the other is the minimum number of the data points contained in such neighborhood. In fact these two specified parameters define a single density. Nevertheless, without enough prior knowledge, these two parameters are difficult to be determined; (ii) with these two parameters for a single density, DBSCAN does not perform well to datasets with varying densities. The above two issues bring some difficulties in applications. To address these two problems in a systematic way, in this paper we propose a novel parameter free clustering algorithm named as APSCAN. Firstly, we utilize the Affinity Propagation (AP) algorithm to detect local densities for a dataset and generate a normalized density list. Secondly, we combine the first pair of density parameters with any other pair of density parameters in the normalized density list as input parameters for a proposed DDBSCAN (Double-Density-Based SCAN) to produce a set of clustering results. In this way, we can obtain different clustering results with varying density parameters derived from the normalized density list. Thirdly, we develop an updated rule for the results obtained by implementing the DDBSCAN with different input parameters and then synthesize these clustering results into a final result. The proposed APSCAN has two advantages: first it does not need to predefine the two parameters as required in DBSCAN and second, it not only can cluster datasets with varying densities but also preserve the nonlinear data structure for such datasets. Crown Copyright (C) 2011 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:973 / 986
页数:14
相关论文
共 34 条
  • [1] [Anonymous], 1994, VLDB J, DOI [10.1007/BF01231602, DOI 10.1007/BF01231602]
  • [2] [Anonymous], 1988, Algorithms for Clustering Data
  • [3] BEN HA, 2002, J MACHINE LEARN RES, V2, P125
  • [4] ST-DBSCAN: An algorithm for clustering spatial-temp oral data
    Birant, Derya
    Kut, Alp
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) : 208 - 221
  • [5] CASTRO VE, 2000, P 4 EUR WORKSH PRINC, P208
  • [6] CRISTIANINI N, 2001, NIPS, V14, P649
  • [7] '1+1 > 2':: Merging distance and density based clustering
    Dash, M
    Liu, H
    Xu, XW
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2001, : 32 - 39
  • [8] Dueck D, 2007, IEEE I CONF COMP VIS, P198
  • [9] ESTER M, 1996, P INT C VER LARG DAT, P28
  • [10] Everitt B. S., 2001, CLUSTER ANAL