A Bayesian feature selection paradigm for text classification

被引:29
|
作者
Feng, Guozhong [1 ,2 ]
Guo, Jianhua [1 ,2 ]
Jing, Bing-Yi [3 ]
Hao, Lizhu [1 ,2 ]
机构
[1] NE Normal Univ, Sch Math & Stat, Changchun 130024, Jilin Province, Peoples R China
[2] NE Normal Univ, Key Lab Appl Stat MOE, Changchun 130024, Jilin Province, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Math, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Bayesian feature selection; Metropolis search; Mixture model; Text classification; VARIABLE SELECTION; MODELS; CATEGORIZATION;
D O I
10.1016/j.ipm.2011.08.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The automated classification of texts into predefined categories has witnessed a booming interest, due to the increased availability of documents in digital form and the ensuing need to organize them. An important problem for text classification is feature selection, whose goals are to improve classification effectiveness, computational efficiency, or both. Due to categorization unbalancedness and feature sparsity in social text collection, filter methods may work poorly. In this paper, we perform feature selection in the training process, automatically selecting the best feature subset by learning, from a set of preclassified documents, the characteristics of the categories. We propose a generative probabilistic model, describing categories by distributions, handling the feature selection problem by introducing a binary exclusion/inclusion latent vector, which is updated via an efficient Metropolis search. Real-life examples illustrate the effectiveness of the approach. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:283 / 302
页数:20
相关论文
共 50 条
  • [1] Comparative Analysis on Feature Selection Based Bayesian Text Classification
    Yang, Guang
    Lin, Zhong-Yi
    Chang, Yu-Xin
    Wang, Lei
    Tian, Jin-Kun
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1190 - 1194
  • [2] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [3] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [4] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [5] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [6] Hybrid feature selection for text classification
    Gunal, Serkan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [7] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [8] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [9] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [10] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +