Information-theoretic feature selection algorithms for text classification

被引:0
|
作者
Novovicová, J [1 ]
Malík, A [1 ]
机构
[1] Acad Sci Czech Republ, Inst Informat Theory & Automat, CR-18208 Prague, Czech Republic
来源
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5 | 2005年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper we present four new algorithms for feature/word selection for the purpose of text classification. We use sequential forward selection methods based on improved mutual information criterion functions. The performance of the proposed evaluation functions compared to the information gain which evaluate features individually is discussed. We present experimental results using naive Bayes classifier based on multinomial model, linear support vector machine and k-nearest neighbor classifiers on the Reuters data set. Finally, we analyze the experimental results from various perspectives, including precision, recall and F-1-mcasure. Preliminary experimental results indicate the effectiveness of the proposed feature selection algorithms in a text classification.
引用
收藏
页码:3272 / 3277
页数:6
相关论文
共 50 条
  • [1] Information-theoretic feature selection for classification
    Joshi, Alok A.
    James, Scott M.
    Meckl, Peter H.
    King, Galen B.
    Jennings, Kristofer
    2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 787 - +
  • [2] Information-theoretic feature selection for the classification of hysteresis curves
    Gomez-Verdejo, Vanessa
    Verleysen, Michel
    Fleury, Jerome
    COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 522 - +
  • [3] Information-theoretic feature selection for functional data classification
    Gomez-Verdejo, Vanessa
    Verleysen, Michel
    Fleury, Jerome
    NEUROCOMPUTING, 2009, 72 (16-18) : 3580 - 3589
  • [4] A divisive information-theoretic feature clustering algorithm for text classification
    Dhillon, Inderjit S.
    Mallela, Subramanyam
    Kumar, Rahul
    Journal of Machine Learning Research, 2003, 3 : 1265 - 1287
  • [5] Information-theoretic algorithm for feature selection
    Last, M
    Kandel, A
    Maimon, O
    PATTERN RECOGNITION LETTERS, 2001, 22 (6-7) : 799 - 811
  • [6] Information-theoretic approaches to SVM feature selection for metagenome read classification
    Garbarine, Elaine
    DePasquale, Joseph
    Gadia, Vinay
    Polikar, Robi
    Rosen, Gail
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2011, 35 (03) : 199 - 209
  • [7] An Information-Theoretic Approach for Clonal Selection Algorithms
    Cutello, Vincenzo
    Nicosia, Giuseppe
    Pavone, Mario
    Stracquadanio, Giovanni
    ARTIFICIAL IMMUNE SYSTEMS, 2010, 6209 : 144 - 157
  • [8] Generalized Information-Theoretic Measures for Feature Selection
    Sluga, Davor
    Lotric, Uros
    ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, ICANNGA 2013, 2013, 7824 : 189 - 197
  • [9] Hypergraph based information-theoretic feature selection
    Zhang, Zhihong
    Hancock, Edwin R.
    PATTERN RECOGNITION LETTERS, 2012, 33 (15) : 1991 - 1999
  • [10] Efficient information-theoretic unsupervised feature selection
    Lee, J.
    Seo, W.
    Kim, D. -W.
    ELECTRONICS LETTERS, 2018, 54 (02) : 76 - 77