Using typical testors for feature selection in text categorization

被引:0
作者
Pons-Porratal, Aurora [1 ]
Gil-Garcia, Reynaldo [1 ]
Berlanga-Liavori, Rafael [2 ]
机构
[1] Univ Oriente, Ctr Pattern Recognit & Data Mining, Santiago De Cuba, Cuba
[2] Univ Jaume 1, Castellon de La Plana, Spain
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS | 2007年 / 4756卷
关键词
feature selection; typical testors; text categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters-21578 and Reuters Corpus Version 1 (RCV1v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out.
引用
收藏
页码:643 / +
页数:2
相关论文
共 17 条
  • [1] [Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
  • [2] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [3] Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
  • [4] Gil-Garcia R, 2007, LECT NOTES COMPUT SC, V4641, P328
  • [5] Statistical pattern recognition: A review
    Jain, AK
    Duin, RPW
    Mao, JC
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (01) : 4 - 37
  • [6] John GH, 1994, P 11 INT C MACH LEAR, P121, DOI 10.1016/B978-1-55860-335-6.50023-4
  • [7] An overview of the evolution of the concept of testor
    Lazo-Cortes, M
    Ruiz-Shulcloper, J
    Alba-Cabrera, E
    [J]. PATTERN RECOGNITION, 2001, 34 (04) : 753 - 762
  • [8] Lewis DD, 2004, J MACH LEARN RES, V5, P361
  • [9] LEWIS DD, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P37
  • [10] Mladenic D, 1999, MACHINE LEARNING, PROCEEDINGS, P258