An automatic three-way clustering method based on sample similarity

被引:27
作者
Jia, Xiuyi [1 ]
Rao, Ya [1 ]
Li, Weiwei [2 ]
Yang, Sichun [3 ]
Yu, Hong [4 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Astronaut, Nanjing 210016, Peoples R China
[3] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Three-way decisions; Three-way clustering; Sample similarity;
D O I
10.1007/s13042-020-01255-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The three-way clustering is an extension of traditional clustering by adding the concept of fringe region, which can effectively solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data in traditional two-way clustering methods. The existing three-way clustering works often select the appropriate number of clusters and the thresholds for three-way partition according to subjective tuning. However, the method of fixing the number of clusters and the thresholds of the partition cannot automatically select the optimal number of clusters and partition thresholds for different data sets with different sizes and densities. To address the above problem, this paper proposed an improved three-way clustering method. First, we define the roughness degree by introducing the sample similarity to measure the uncertainty of the fringe region. Moreover, based on the roughness degree, we define a novel partitioning validity index to measure the clustering partitions and propose an automatic threshold selection method. Second, based on the concept of sample similarity, we introduce the intra-class similarity and the inter-class similarity to describe the quantitative change of the relationship between the sample and the clusters, and define a novel clustering validity index to measure the clustering performance under different numbers of clusters through the integration of the above two kinds of similarities. Furthermore, we propose an automatic cluster number selection method. Finally, we give an automatic three-way clustering approach by combining the proposed threshold selection method and the cluster number selection method. The comparison experiments demonstrate the effectiveness of our proposal.
引用
收藏
页码:1545 / 1556
页数:12
相关论文
共 42 条
[1]   A three-way clustering approach for handling missing data using GTRS [J].
Afridi, Mohammad Khan ;
Azam, Nouman ;
Yao, JingTao ;
Alanazi, Eisa .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 98 :11-24
[2]  
Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046
[3]   ON SOME INVARIANT CRITERIA FOR GROUPING DATA [J].
FRIEDMAN, HP ;
RUBIN, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1967, 62 (320) :1159-&
[4]  
Gu YN, 2015, PR IEEE I C PROGR IN, P51, DOI 10.1109/PIC.2015.7489808
[5]  
Hong Yu, 2012, Rough Sets and Current Trends in Computing. Proceedings 8th International Conference, RSCTC 2012, P277, DOI 10.1007/978-3-642-32115-3_33
[6]   Three-way decisions based on semi-three-way decision spaces [J].
Hu, Bao Qing .
INFORMATION SCIENCES, 2017, 382 :415-440
[7]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[8]  
Jia X.Y., 2014, LNCS Transactions on Rough Sets, VXVIII, P69, DOI [10.1007/978-3-662-44680-55, DOI 10.1007/978-3-662-44680-5_5, 10.1007/978-3-662-44680-5_5]
[9]   Similarity-based attribute reduction in rough set theory: a clustering perspective [J].
Jia, Xiuyi ;
Rao, Ya ;
Shang, Lin ;
Li, Tongjun .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (05) :1047-1060
[10]   A multiphase cost-sensitive learning method based on the multiclass three-way decision-theoretic rough set model [J].
Jia, Xiuyi ;
Li, Weiwei ;
Shang, Lin .
INFORMATION SCIENCES, 2019, 485 (248-262) :248-262