Term-weighting learning via genetic programming for text classification

被引:49
作者
Jair Escalante, Hugo [1 ]
Garcia-Limon, Mauricio A. [1 ]
Morales-Reyes, Alicia [1 ]
Graff, Mario [2 ]
Montes-y-Gomez, Manuel [1 ]
Morales, Eduardo F. [1 ]
Martinez-Carranza, Jose [1 ]
机构
[1] Inst Nacl Astrofis Opt & Electr, Dept Comp Sci, Puebla 72840, Mexico
[2] INFOTEC Ctr Invest & Innovac Tecnol Informac & Co, Aguascalientes, Mexico
关键词
Term-weighting learning; Genetic programming; Text mining; Representation learning; Bag of words; SCHEMES;
D O I
10.1016/j.knosys.2015.03.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:176 / 189
页数:14
相关论文
共 45 条
[1]   Text Classification Using Machine Learning Methods-A Survey [J].
Agarwal, Basant ;
Mittal, Namita .
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 :701-709
[2]   Analytical evaluation of term weighting schemes for text categorization [J].
Altincay, Hakan ;
Erenel, Zafer .
PATTERN RECOGNITION LETTERS, 2010, 31 (11) :1310-1323
[3]  
[Anonymous], INT WORKSH STAT LEAR
[4]  
[Anonymous], IEEE CVPR 2004 WORKS
[5]  
[Anonymous], FDN GENETIC PROGRAMM
[6]  
[Anonymous], 2003, P ACM S APPL COMP
[7]  
[Anonymous], 2010, INTRO EVOLUTIONARY C
[8]  
[Anonymous], 1997, ICML
[9]  
[Anonymous], 2008, VLFeat: An open and portable library of computer vision algorithms
[10]  
[Anonymous], 2012, MINING TEXT DATA