ROBP a robust border-peeling clustering using Cauchy kernel

被引:22
作者
Du, Mingjing [1 ]
Wang, Ru [1 ]
Ji, Ru [1 ]
Wang, Xia [1 ]
Dong, Yongquan [1 ]
机构
[1] Jiangsu Normal Univ, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Density-based clustering; Border-peeling clustering; Cauchy kernel; INFORMATION; SPARSE; FACTORIZATION; SIMILARITY;
D O I
10.1016/j.ins.2021.04.089
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently a novel density-based clustering algorithm, namely, border-peeling (BP) clustering algorithm, is proposed to group data by iteratively identifying border points and peeling off them until separable areas of data remain. The BP clustering is able to correctly recognize the true structure of clusters and automatically detect the outliers on several test cases. However, there are some drawbacks in BP, and these may hinder its widespread application. The BP clustering might yield bad results on datasets with non-uniformly distributed clusters. Especially, the BP clustering tends to over-partition the data with complex shape. To overcome these defects, a robust border-peeling clustering algorithm (named as ROBP) is proposed in this paper. Our method improves the BP clustering algorithm from two aspects: density influence (i.e. density estimation) and linkage criterion (i.e. association strategy). In density estimation, we use Cauchy kernel with longer tails instead of Gaussian kernel in the local scaling function, and further propose a kernel density estimator, i.e., the density estimator based on Cauchy kernel. It can calculate quickly and accurately the density influence value of each point. In association strategy, we design a linkage criterion based on the shared neighborhood information. The linkage criterion can create some links between peeled border points and their neighboring peeled border points, in order to avoid over-segmentation of the clusters. We integrate the proposed linkage criterion and the uni-directional association strategy, and further propose a bidirectional association strategy. In experiments, we compare ROBP with 7 representative density-based clustering (or hierarchical clustering) algorithms, including BP, DBSCAN, HDBSCAN, density peak (DP) clustering, DPC-KNN, DPC-DBFN and McDPC, on 8 synthetic datasets and 11 real-world datasets. Results show that the proposed algorithm outperforms 7 competitors in most cases. Moreover, we compare the robustness of ROBP and BP, and evaluate their running time. Experimental results indicate that ROBP is much more robust and reliable, as well as it is competitive to BP in computational efficiency. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:375 / 400
页数:26
相关论文
共 48 条
[1]  
Abbas M. A., 2012, 2012 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA), P1192, DOI 10.1109/ISSPA.2012.6310472
[2]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[3]   Border-Peeling Clustering [J].
Averbuch-Elor, Hadar ;
Bar, Nadav ;
Cohen-Or, Daniel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (07) :1791-1797
[4]  
Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14
[5]   Density-based clustering [J].
Campello, Ricardo J. G. B. ;
Kroeger, Peer ;
Sander, Jorg ;
Zimek, Arthur .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (02)
[6]   Automatic clustering based on density peak detection using generalized extreme value distribution [J].
Ding, Jiajun ;
He, Xiongxiong ;
Yuan, Junqing ;
Jiang, Bo .
SOFT COMPUTING, 2018, 22 (09) :2777-2796
[7]   An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood [J].
Ding, Shifei ;
Du, Mingjing ;
Sun, Tongfeng ;
Xu, Xiao ;
Xue, Yu .
KNOWLEDGE-BASED SYSTEMS, 2017, 133 :294-313
[8]   Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145
[9]  
Ertöz L, 2003, SIAM PROC S, P47
[10]  
Ester M., 1996, PROC 2 INT C KNOWLED, P226, DOI DOI 10.5555/3001460.3001507