Divergence-Based Feature Selection for Naive Bayes Text Classification

被引:0
作者
Wang, Huizhen [1 ]
Zhu, Jingbo [1 ]
Su, Keh-Yih [2 ]
机构
[1] Northeastern Univ, Nat Language Proc Lab, Shenyang, Liaoning, Peoples R China
[2] Behav Design Corp, Hsinchu, Taiwan
来源
IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING | 2008年
关键词
Divergence-based; feature selection; text classification; overall-divergence;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A new divergence-based approach to feature selection for naive Bayes text classification is proposed in this paper. In this approach, the discrimination power of each feature is directly used for ranking various features through a criterion named overall-divergence, which is based on the divergence measures evaluated between various class density function pairs. Compared with other state-of-the-art algorithms (e.g. IG and CHI), the proposed approach shows more discrimination power for classifying confusing classes, and achieves better or comparable performance oil evaluation data sets.
引用
收藏
页码:209 / +
页数:3
相关论文
共 25 条
[1]  
[Anonymous], 1996, BOW TOOLKIT STAT LAN
[2]  
[Anonymous], 1999, WORKSH MACH LEARN IN
[3]   On the selection and classification of independent features [J].
Bressan, M ;
Vitrià, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (10) :1312-1317
[4]  
CHEN W, 2005, INT J COMPUTER PROCE, V18, P197
[5]  
Cover TM, 2006, Elements of Information Theory
[6]  
Devijver P., 1982, PATTERN RECOGN
[7]  
HULTH A, 2006, P ACL06
[8]   Feature selection: Evaluation, application, and small sample performance [J].
Jain, A ;
Zongker, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (02) :153-158
[9]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[10]  
JOACHIMS T, 1997, P ICML 97