Research on Sampling Method of CFSFDP Clustering Algorithm and Its Criteria for Determining the Best Sample Size

被引:6
作者
Cheng, Chen [1 ]
Yang, Jun [1 ]
Kong, Xuefeng [1 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, Beijing, Peoples R China
来源
2018 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ARTIFICIAL INTELLIGENCE (ICAAI 2018) | 2015年
关键词
CFSFDP; simple random sampling; sampling rate; sampling accuracy; the best sample size; FAST SEARCH; FIND;
D O I
10.1145/3292448.3292451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering by fast search and find of density peaks (CFSFDP) is a novel density-based fast clustering method, which has been widely studied and applied in many fields. However, when the sample size of data is too large, the algorithm is inefficient, since it consumes a lot of time and storage space. To solve the above problem, a simple random sampling (SRS) method is provided to speed up the optimized CFSFDP algorithm for real data with large sample size. The rate of correct classification of the sample is defined to measure its clustering performance, and we call it as sampling accuracy. We first use SRS method to generate small samples for cluster analysis. Then, we explore the relationship between the sampling rate and the sampling accuracy. Finally, in order to determine the best sample size that can achieve high sampling accuracy with high efficiency, the mean and standard deviation of the sampling accuracy are adopted as two criteria, and the best sample size is determined based on them. A real case study is given to show the implementation and effectiveness of the proposed method.
引用
收藏
页码:24 / 28
页数:5
相关论文
共 16 条
[1]  
[Anonymous], 2014, THESIS
[2]   Adaptive fuzzy clustering by fast search and find of density peaks [J].
Bie, Rongfang ;
Mehmood, Rashid ;
Ruan, Shanshan ;
Sun, Yunchuan ;
Dawood, Hussain .
PERSONAL AND UBIQUITOUS COMPUTING, 2016, 20 (05) :785-793
[3]   Robust support vector data description for outlier detection with noise or uncertain data [J].
Chen, Guijun ;
Zhang, Xueying ;
Wang, Zizhong John ;
Li, Fenglian .
KNOWLEDGE-BASED SYSTEMS, 2015, 90 :129-137
[4]   A new method to estimate ages of facial image for large database [J].
Chen, Ye-Wang ;
Lai, De-He ;
Qi, Han ;
Wang, Jiong-Liang ;
Du, Ji-Xiang .
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (05) :2877-2895
[5]   Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145
[6]   SDenPeak: Semi-Supervised Nonlinear Clustering based on Density and Distance [J].
Fan, Wen-Qi ;
Wang, Chang-Dong ;
Lai, Jian-Huang .
PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, :269-275
[7]   HDenDist: Nonlinear Hierarchical Clustering Based on Density and Min-distance [J].
Fan, Wen-Qi ;
Wang, Chang-Dong ;
Chen, Yuan-Wei ;
Lai, Jian-Huang .
PROCEEDINGS 2015 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING BDCLOUD 2015, 2015, :45-50
[8]   Chameleon: Hierarchical clustering using dynamic modeling [J].
Karypis, G ;
Han, EH ;
Kumar, V .
COMPUTER, 1999, 32 (08) :68-+
[9]  
Li S, 2015, IEEE ANN INT CONF CY, P133, DOI 10.1109/CYBER.2015.7287923
[10]  
Li Y, 2015, 2015 IEEE 16TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), P925, DOI 10.1109/ICCT.2015.7399974