Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

被引:362
作者
Abualigah, Laith Mohammad [1 ]
Khader, Ahamad Tajudin [1 ]
机构
[1] USM, Sch Comp Sci, Gelugor, Pulau Pinang, Malaysia
关键词
Unsupervised text feature selection; Particle swarm optimization; Genetic operators; K-mean text clustering; Hybridization; DIMENSION REDUCTION; KRILL HERD;
D O I
10.1007/s11227-017-2046-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The text clustering technique is an appropriate method used to partition a huge amount of text documents into groups. The documents size affects the text clustering by decreasing its performance. Subsequently, text documents contain sparse and uninformative features, which reduce the performance of the underlying text clustering algorithm and increase the computational time. Feature selection is a fundamental unsupervised learning technique used to select a new subset of informative text features to improve the performance of the text clustering and reduce the computational time. This paper proposes a hybrid of particle swarm optimization algorithm with genetic operators for the feature selection problem. The k-means clustering is used to evaluate the effectiveness of the obtained features subsets. The experiments were conducted using eight common text datasets with variant characteristics. The results show that the proposed algorithm hybrid algorithm (H-FSPSOTC) improved the performance of the clustering algorithm by generating a new subset of more informative features. The proposed algorithm is compared with the other comparative algorithms published in the literature. Finally, the feature selection technique encourages the clustering algorithm to obtain accurate clusters.
引用
收藏
页码:4773 / 4795
页数:23
相关论文
共 30 条
  • [1] Abualigah L., 2016, Unsupervised feature selection technique based on genetic algorithm for improving the text clustering, P1, DOI [10.1109/CSIT.2016.7549453, DOI 10.1109/CSIT.2016.7549453]
  • [2] Abualigah L. M. Q., 2015, INT J COMPUTER SCI E, V5, P19, DOI DOI 10.5121/IJCSEA.2015.5102
  • [3] Abualigah LM, 2016, 2016 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE), P67, DOI 10.1109/ISCAIE.2016.7575039
  • [4] Abualigah LM, 2016, MULTIOBJECTIVES BASE, P1
  • [5] [Anonymous], 2017, UNSUPERVISED TEXT FE
  • [6] [Anonymous], 2016, J SUPERCOMPUT
  • [7] Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    [J]. APPLIED SOFT COMPUTING, 2016, 43 : 20 - 34
  • [8] Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (06) : 3105 - 3114
  • [9] A three-stage unsupervised dimension reduction method for text clustering
    Bharti, Kusum Kumari
    Singh, P. K.
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2014, 5 (02) : 156 - 169
  • [10] A Two-Stage Unsupervised Dimension Reduction Method for Text Clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    [J]. PROCEEDINGS OF SEVENTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS (BIC-TA 2012), VOL 2, 2013, 202 : 529 - 542