An Efficient Feature Selection using Hidden Topic in Text Categorization

被引:10
作者
Zhang, Zhiwei [1 ]
Phan, Xuan-Hieu [1 ]
Horiguchi, Susumu [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan
来源
2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3 | 2008年
关键词
D O I
10.1109/WAINA.2008.137
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired Usual v, researches on feature selection use only a proper measurement such as information gain. In this paper, we propose a new feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. Experiments dealing with the well-known Reuters-21578 and Ohsumed datasets show that our method can achieve a better classification accuracy while reducing the feature dimension dramatically.
引用
收藏
页码:1223 / 1228
页数:6
相关论文
共 17 条
  • [1] An introduction to MCMC for machine learning
    Andrieu, C
    de Freitas, N
    Doucet, A
    Jordan, MI
    [J]. MACHINE LEARNING, 2003, 50 (1-2) : 5 - 43
  • [2] [Anonymous], 2005, PARAMETER ESTIMATION
  • [3] Berger AL, 1996, COMPUT LINGUIST, V22, P39
  • [4] BHATTACHARYA I, 2006, P SIAM SDM
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [7] 2-9
  • [8] STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES
    GEMAN, S
    GEMAN, D
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) : 721 - 741
  • [9] Finding scientific topics
    Griffiths, TL
    Steyvers, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 : 5228 - 5235
  • [10] Guyon I, 2003, J MACH LEARN RES, P1157, DOI [10.1016/j.aca.2011.07.027, DOI 10.1016/J.ACA.2011.07.027]