Parallel Fuzzy C-Means Text Clustering Algorithm Based on Improved Canopy

被引:0
作者
Luan, Lan [1 ]
Du, ShaoBo [1 ]
机构
[1] GuiZhou Univ Commerce, Sch Comp & Informat Engn, Guiyang, Peoples R China
来源
2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022) | 2022年
关键词
fuzzy c-means; canopy algorithm; parallelization; spark; clustering;
D O I
10.1109/ICICN56848.2022.10006478
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fuzzy C-Means Clustering (FCM) algorithm is a flexible clustering algorithm, which can be used for large-scale data mining. However, FCM algorithm is sensitive to the selection of cluster centers and runs slowly when clustering large-scale text data.Therefore, a parallel fuzzy C-means clustering algorithm based on improved Canopy is proposed.Canopy algorithm needs to specify a reasonable threshold to calculate, and the concept of weight is introduced to improve the Canopy algorithm to improve the FCM clustering accuracy.Firstly, the density value of each sample point in the data set is calculated, and the maximum value is selected as the first clustering center. In the calculation of the remaining sample points in the weight, and then select the center point of other clusters. Finally, the parallel computing is carried out through the Spark framework to improve the efficiency of the algorithm. The experimental results show that the fuzzy C-means parallel text clustering algorithm based on improved Canopy improves the clustering accuracy by 10% -20% compared with FCM algorithm, and the overall performance of the algorithm has been improved.
引用
收藏
页码:625 / 631
页数:7
相关论文
共 15 条
[1]  
[Anonymous], 2014, INT J INNOV ADV COMP
[2]  
[陈胜发 Chen Shengfa], 2019, [计算机工程与科学, Computer Engineering and Science], V41, P1823
[3]  
Fan Jiulun, 2004, ACTA ELECT SINICA, P350
[4]  
Gao Xin-Bo, 2000, Acta Electronica Sinica, V28, P80
[5]  
Ke Wu, 2004, FEATURE SELECTION WE, P42
[6]  
Lu Zhaoyan, 2014, ACTA AERONAUTICA AST, V35, P179
[7]   Fuzzy C-Means (FCM) Clustering Algorithm: A Decade Review from 2000 to 2014 [J].
Nayak, Janmenjoy ;
Naik, Bighnaraj ;
Behera, H. S. .
COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 2, 2015, 32 :133-149
[8]  
Sun J.-J., 2020, J COMPUTER APPL, V40, P1769
[9]  
Wang Guilan, 2016, Journal of Computer Applications, V36, P342, DOI 10.11772/j.issn.1001-9081.2016.02.0342
[10]  
[王永贵 Wang Yonggui], 2014, [计算机工程, Computer Engineering], V40, P47