Revisiting two-stage feature selection based on coverage policies for text classification

被引:2
作者
Mendez-Molina, Arquimides [1 ,2 ]
Li Ona-Garcia, Ana [1 ,2 ]
Ariel Carrasco-Ochoa, Jesus [1 ]
Martinez-Trinidad, Jose Fco. [1 ]
机构
[1] INAOE, Comp Sci Coordinat, Puebla, Mexico
[2] Univ Camaguey, Dept Comp Sci, Camaguey, Cuba
关键词
Text classification; feature selection; parameter tunning;
D O I
10.3233/JIFS-169480
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a crucial aspect in classification problems, especially in domains such as text classification, where usually there is a large number of features. Recently, a two-stage feature selection method for text classification which combines class-based and corpus-based feature selection, was introduced. Based on their experiments, the authors conclude what parameter values for both, corpus-based and class-based approaches, allow a feature selection which improves the traditional methods in text classification. In this paper, we revisit this two-stage feature selection method and based on several experiments we come to a different conclusion: the parameters suggested by the original work do not necessarily provide the best results. Based on our experiments, we conclude that by combining the best parameter value for each stage, for the specific corpus under study, the two stage selection method based on coverage policies provides a subset of features which allows to get statistically significant increase over the traditional methods in the success rates of the classifier.
引用
收藏
页码:2949 / 2957
页数:9
相关论文
共 21 条
  • [1] [Anonymous], 1997, ICML
  • [2] Church K. W., 1990, Computational Linguistics, V16, P22
  • [3] Dasgupta A, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P230
  • [4] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [5] Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
  • [6] Forman George, 2007, COMPUTATIONAL METHOD
  • [7] Benchmarking attribute selection techniques for discrete class data mining
    Hall, MA
    Holmes, G
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (06) : 1437 - 1447
  • [8] Ikonomakis M., 2005, WSEAS Transactions on Computers, V4, P966
  • [9] A two-stage Markov blanket based feature selection algorithm for text classification
    Javed, Kashif
    Maruf, Sameen
    Babri, Haroon A.
    [J]. NEUROCOMPUTING, 2015, 157 : 91 - 104
  • [10] Joachims T., 1998, Machine Learning: ECML-98. 10th European Conference on Machine Learning. Proceedings, P137, DOI 10.1007/BFb0026683