Three-way evidence theory-based density peak clustering with the principle of justifiable granularity

被引:9
作者
Ju, Hengrong [1 ]
Lu, Yang [1 ]
Ding, Weiping [1 ]
Cao, Jinxin [1 ]
Yang, Xibei [2 ]
机构
[1] Nantong Univ, Sch Informat Sci & Technol, Nantong 226000, Peoples R China
[2] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212003, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; Density peaks; Evidence theory; Justifiable granularity; Three-way clustering; Two-layer nearest neighbor; INFORMATION GRANULARITY; K-MEANS; FUNDAMENTALS;
D O I
10.1016/j.asoc.2023.111217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering by fast search and find of density peaks (DPC) is an effective clustering approach that can find all the cluster centers at once with just one parameter and without iterative processing. However, the cutoff distance, a key parameter of density measurement in the DPC approach, affects the quality of the final clustering results. Its selection relies on experimental experience and lacks of a semantic explanation. Furthermore, the allocation strategy of the traditional DPC approach may cause several points to be assigned incorrectly, leading to subsequent points being assigned incorrectly and ultimately forming continuous allocation errors. To overcome the deficiencies, this paper proposes a novel three-way evidence theory-based density peak clustering with the principle of justifiable granularity (3 W-PEDP). First, the computation of the cutoff distance is converted into the search for nearest neighbors. From the perspective of granular computing, 3 W-PEDP transforms the neighbor selection issue into the construction of justifiable granularity. And the optimal neighbors can be achieved with the construction of coverage and specificity criteria. Second, inspired by three-way clustering, we adopt a twostage method for sample allocation. On the one hand, for core point allocation, a two-layer nearest neighbor is constructed based on the achieved optimal neighbors. On the other hand, we designed a new evidence mass function to guide us in assigning the remaining points. In this novel evidence mass function, not only the labels of the assigned samples are considered, but also the information of the neighborhoods around the unassigned samples is fused. Finally, we assess the effectiveness of 3 W-PEDP on numerous public synthetic datasets and UCI real-world datasets. Then, detail comparing results with several popular clustering methods are presented. In addition, experimental studies verify the effectiveness of constructing justifiable granularity in selecting the optimal neighbors. The experimental results demonstrate 3 W-PEDP has good adaptability and robustness, which can achieve better clustering performance. Our source code is available at https://github.com/Luyangabc/ 3W-PEDP.
引用
收藏
页数:19
相关论文
共 49 条
[1]   Variance based three-way clustering approaches for handling overlapping clustering [J].
Afridi, Mohammad Khan ;
Azam, Nouman ;
Yao, JingTao .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2020, 118 :47-63
[2]   A three-way clustering approach for handling missing data using GTRS [J].
Afridi, Mohammad Khan ;
Azam, Nouman ;
Yao, JingTao ;
Alanazi, Eisa .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 98 :11-24
[3]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[4]   A Historical Account of Types of Fuzzy Sets and Their Relationships [J].
Bustince, Humberto ;
Barrenechea, Edurne ;
Pagola, Miguel ;
Fernandez, Javier ;
Xu, Zeshui ;
Bedregal, Benjamin ;
Montero, Javier ;
Hagras, Hani ;
Herrera, Francisco ;
De Baets, Bernard .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2016, 24 (01) :179-194
[5]   An Efficient Split-Merge Re-Start for the K-Means Algorithm [J].
Capo, Marco ;
Perez, Aritz ;
Antonio, Jose A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (04) :1618-1627
[6]   BLOCK-DBSCAN: Fast clustering for large scale data [J].
Chen, Yewang ;
Zhou, Lida ;
Bouguila, Nizar ;
Wang, Cheng ;
Chen, Yi ;
Du, Jixiang .
PATTERN RECOGNITION, 2021, 109
[7]   A dynamic K-means-based clustering algorithm using fuzzy logic for CH selection and data transmission based on machine learning [J].
Choudhary, Anupam ;
Badholia, Abhishek ;
Sharma, Anurag ;
Patel, Brijesh ;
Jain, Sapna .
SOFT COMPUTING, 2023, 27 (10) :6135-6149
[8]   A K-NEAREST NEIGHBOR CLASSIFICATION RULE-BASED ON DEMPSTER-SHAFER THEORY [J].
DENOEUX, T .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (05) :804-813
[9]   Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145
[10]   A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects [J].
Ezugwu, Absalom E. ;
Ikotun, Abiodun M. ;
Oyelade, Olaide O. ;
Abualigah, Laith ;
Agushaka, Jeffery O. ;
Eke, Christopher I. ;
Akinyelu, Andronicus A. .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 110