Particle Swarm Optimization based Two-Stage Feature Selection in Text Mining

被引:24
作者
Bai, Xiaohan [1 ]
Gao, Xiaoying [1 ]
Xue, Bing [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
来源
2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC) | 2018年
关键词
Feature selection; text mining; particle swarm optimization; two-stage method; ALGORITHM; CLASSIFICATION;
D O I
10.1109/CEC.2018.8477773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text mining is an important and popular data mining topic, where a fundamental objective is to enable users to extract informative data from text-based assets and perform related operations on the text, like retrieval, classification, and summarization. For text classification, one of the most important steps is feature selection, because not all the features in the text dataset are useful for classification. Irrelevant and redundant features should be removed to increase the accuracy and decrease the complexity and running time, but it is often an expensive process, and most existing methods using a simple filter to remove features, which might potentially loose some useful ones because of feature interactions. Furthermore, there is little research using particle swarm optimization ( PSO) algorithms to select informative features for text classification. This paper presents an approach using a novel two-stage method for text feature selection, where with the features selected by four different filter ranking methods at the first stage, more irrelevant features are removed by PSO to compose the final feature subset. The proposed algorithm is compared with four traditional feature selection methods on the commonly used Reuter-21578 dataset. The experimental results show that the proposed two-stage method can substantially reduce the dimensionality of the feature space and improve the classification accuracy.
引用
收藏
页码:989 / 996
页数:8
相关论文
共 35 条
  • [1] Text feature selection using ant colony optimization
    Aghdam, Mehdi Hosseinzadeh
    Ghasem-Aghaee, Nasser
    Basiri, Mohammad Ehsan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6843 - 6853
  • [2] Allahyari Mehdi, 2017, ARXIV
  • [3] [Anonymous], 1971, The SMART Retrieval System-Experiments in Automatic Document Processing
  • [4] [Anonymous], 2007, Tech. rep.
  • [5] Two step particle swarm optimization to solve the feature selection problem
    Bello, Rafael
    Gomez, Yudel
    Nowe, Ann
    Garcia, Maria M.
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 691 - +
  • [6] Benesty J, 2009, SPRINGER TOP SIGN PR, V2, P37, DOI 10.1007/978-3-642-00296-0_5
  • [7] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    [J]. MEMETIC COMPUTING, 2016, 8 (01) : 3 - 15
  • [8] Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization
    Ghamisi, Pedram
    Benediktsson, Jon Atli
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2015, 12 (02) : 309 - 313
  • [9] Guyon I., 2003, INTRO VARIABLE FEATU
  • [10] Kennedy J, 1995, 1995 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS PROCEEDINGS, VOLS 1-6, P1942, DOI 10.1109/icnn.1995.488968