A global-ranking local feature selection method for text categorization

被引:56
作者
Pinheiro, Roberto H. W. [1 ]
Cavalcanti, George D. C. [1 ]
Correa, Renato F. [2 ]
Ren, Tsang Ing [1 ]
机构
[1] Fed Univ Pernambuco UFPE, Ctr Informat CIn, BR-50740560 Recife, PE, Brazil
[2] Fed Univ Pernambuco UFPE, Dept Informat Sci DCI, BR-50740550 Recife, PE, Brazil
关键词
Text categorization; Feature selection; Filtering method; Variable Ranking; ALOFT;
D O I
10.1016/j.eswa.2012.05.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the proposed method with the Variable Ranking method using three text categorization benchmarks (Reuters-21578, 20 Newsgroup and WebKB), two different classifiers (k-Nearest Neighbor and Naive Bayes) and five feature evaluation functions. The experiments show that ALOFT obtains equivalent or better results than the classical Variable Ranking. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12851 / 12857
页数:7
相关论文
共 28 条
[1]  
Almuallim H., 1991, AAAI
[2]  
Apte C., 1998, WORKSH LEARN TEXT WE
[3]  
Bekkerman R., 2003, Journal of Machine Learning Research, V3, P1183, DOI 10.1162/153244303322753625
[4]   Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method [J].
Chang, Yu-Chuan ;
Chen, Shyi-Ming ;
Liau, Churn-Jung .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) :1948-1953
[5]   Feature selection for text classification with Naive Bayes [J].
Chen, Jingnian ;
Huang, Houkuan ;
Tian, Shengfeng ;
Qu, Youli .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5432-5435
[6]   Improving self-organization of document collections by semantic mapping [J].
Correa, Renato Fernandes ;
Ludermir, Teresa Bernarda .
NEUROCOMPUTING, 2006, 70 (1-3) :62-69
[7]   Automated multi-label text categorization with VG-RAM weightless neural networks [J].
De Souza, Alberto F. ;
Pedroni, Felipe ;
Oliveira, Elias ;
Ciarelli, Patrick M. ;
Henrique, Wallace Favoreto ;
Veronese, Lucas ;
Badue, Claudine .
NEUROCOMPUTING, 2009, 72 (10-12) :2209-2217
[8]  
Debole F., 2003, ACM S APPL COMP
[9]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[10]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670