A Bayesian feature selection paradigm for text classification

被引:29
|
作者
Feng, Guozhong [1 ,2 ]
Guo, Jianhua [1 ,2 ]
Jing, Bing-Yi [3 ]
Hao, Lizhu [1 ,2 ]
机构
[1] NE Normal Univ, Sch Math & Stat, Changchun 130024, Jilin Province, Peoples R China
[2] NE Normal Univ, Key Lab Appl Stat MOE, Changchun 130024, Jilin Province, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Math, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Bayesian feature selection; Metropolis search; Mixture model; Text classification; VARIABLE SELECTION; MODELS; CATEGORIZATION;
D O I
10.1016/j.ipm.2011.08.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The automated classification of texts into predefined categories has witnessed a booming interest, due to the increased availability of documents in digital form and the ensuing need to organize them. An important problem for text classification is feature selection, whose goals are to improve classification effectiveness, computational efficiency, or both. Due to categorization unbalancedness and feature sparsity in social text collection, filter methods may work poorly. In this paper, we perform feature selection in the training process, automatically selecting the best feature subset by learning, from a set of preclassified documents, the characteristics of the categories. We propose a generative probabilistic model, describing categories by distributions, handling the feature selection problem by introducing a binary exclusion/inclusion latent vector, which is updated via an efficient Metropolis search. Real-life examples illustrate the effectiveness of the approach. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:283 / 302
页数:20
相关论文
共 50 条
  • [31] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386
  • [32] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    IEEE ACCESS, 2024, 12 : 139316 - 139335
  • [33] A Comparative Study on Feature Selection in Unbalance Text Classification
    Xu, Yan
    2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 44 - 47
  • [34] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [35] Feature Selection For Text Classification Using Genetic Algorithms
    Bidi, Noria
    Elberrichi, Zakaria
    PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
  • [36] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [37] Feature Selection by Using Heuristic Methods for Text Classification
    Sel, Ilhami
    Yeroglu, Celalettin
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [38] Two-Stage Feature Selection for Text Classification
    Ozgur, Levent
    Gungor, Tunga
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 329 - 337
  • [39] Comparison of feature selection methods in Kurdish text classification
    Ari M. Saeed
    Soran Badawi
    Sara A. Ahmed
    Diyari A. Hassan
    Iran Journal of Computer Science, 2024, 7 (1) : 55 - 64
  • [40] An application of MOGW optimization for feature selection in text classification
    Razieh Asgarnezhad
    S. Amirhassan Monadjemi
    Mohammadreza Soltanaghaei
    The Journal of Supercomputing, 2021, 77 : 5806 - 5839