Similarity-Based Three-Way Clustering by Using Dimensionality Reduction

被引:3
作者
Li, Anlong [1 ]
Meng, Yiping [2 ]
Wang, Pingxin [2 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212100, Peoples R China
[2] Jiangsu Univ Sci & Technol, Sch Sci, Zhenjiang 212100, Peoples R China
基金
中国国家自然科学基金;
关键词
three-way clustering; co-association frequency; dimension reduction; similar classes; ROUGH SET; MODEL; PREDICTION; NETWORKS;
D O I
10.3390/math12131951
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Three-way clustering uses core region and fringe region to describe a cluster, which divide the dataset into three parts. The division helps identify the central core and outer sparse regions of a cluster. One of the main challenges in three-way clustering is the meaningful construction of the two sets. Aimed at handling high-dimensional data and improving the stability of clustering, this paper proposes a novel three-way clustering method. The proposed method uses dimensionality reduction techniques to reduce data dimensions and eliminate noise. Based on the reduced dataset, random sampling and feature extraction are performed multiple times to introduce randomness and diversity, enhancing the algorithm's robustness. Ensemble strategies are applied on these subsets, and the k-means algorithm is utilized to obtain multiple clustering results. Based on these results, we obtain co-association frequency between different samples and fused clustering result using the single-linkage method of hierarchical clustering. In order to describe the core region and fringe region of each cluster, the similar class of each sample is defined by co-association frequency. The lower and upper approximations of each cluster are obtained based on similar class. The samples in the lower approximation of each cluster belong to the core region of the cluster. The differences between lower and upper approximations of each cluster are defined as fringe region. Therefore, a three-way explanation of each cluster is naturally formed. By employing various UC Irvine Machine Learning Repository (UCI) datasets and comparing different clustering metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Accuracy (ACC), the experimental results show that the proposed strategy is effective in improving the structure of clustering results.
引用
收藏
页数:19
相关论文
共 65 条
[1]   Machine learning models for prediction of co-occurrence of diabetes and cardiovascular diseases: a retrospective cohort study [J].
Abdalrada, Ahmad Shaker ;
Abawajy, Jemal ;
Al-Quraishi, Tahsien ;
Islam, Sheikh Mohammed Shariful .
JOURNAL OF DIABETES AND METABOLIC DISORDERS, 2022, 21 (01) :251-261
[2]   Comparative Study between Physics-Informed CNN and PCA in Induction Motor Broken Bars MCSA Detection [J].
Boushaba, Abderrahim ;
Cauet, Sebastien ;
Chamroo, Afzal ;
Etien, Erik ;
Rambault, Laurent .
SENSORS, 2022, 22 (23)
[3]   A Short Review on Minimum Description Length: An Application to Dimension Reduction in PCA [J].
Bruni, Vittoria ;
Cardinali, Maria Lucia ;
Vitulano, Domenico .
ENTROPY, 2022, 24 (02)
[4]   A graph-convolutional neural network model for the prediction of chemical reactivity [J].
Coley, Connor W. ;
Jin, Wengong ;
Rogers, Luke ;
Jamison, Timothy F. ;
Jaakkola, Tommi S. ;
Green, William H. ;
Barzilay, Regina ;
Jensen, Klavs F. .
CHEMICAL SCIENCE, 2019, 10 (02) :370-377
[5]   Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions [J].
D'Ambrosio, Antonio ;
Amodio, Sonia ;
Iorio, Carmela ;
Pandolfo, Giuseppe ;
Siciliano, Roberta .
JOURNAL OF CLASSIFICATION, 2021, 38 (01) :112-128
[6]   Learning Bayesian networks: approaches and issues [J].
Daly, Ronan ;
Shen, Qiang ;
Aitken, Stuart .
KNOWLEDGE ENGINEERING REVIEW, 2011, 26 (02) :99-157
[7]  
Darwiche A, 2008, FOUND ARTIF INTELL, P467, DOI 10.1016/S1574-6526(07)03011-8
[8]  
Dimitriadou E, 2001, LECT NOTES COMPUT SC, V2130, P217
[9]   Decision-theoretic rough set: A multicost strategy [J].
Dou, Huili ;
Yang, Xibei ;
Song, Xiaoning ;
Yu, Hualong ;
Wu, Wei-Zhi ;
Yang, Jingyu .
KNOWLEDGE-BASED SYSTEMS, 2016, 91 :71-83
[10]   Ensemble learning using three-way density-sensitive spectral clustering [J].
Fan, Jiachen ;
Wang, Pingxin ;
Jiang, Chunmao ;
Yang, Xibei ;
Song, Jingjing .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2022, 149 :70-84