A Bayesian Classification Approach Using Class-Specific Features for Text Categorization

被引:87
作者
Tang, Bo [1 ]
He, Haibo [1 ]
Baggenstoss, Paul M. [2 ]
Kay, Steven [1 ]
机构
[1] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[2] Frauhnhofer FKIE, Fraunhoferstr 20, D-53343 Wachtberg, Germany
基金
美国国家科学基金会;
关键词
Feature selection; text categorization; class-specific features; PDF projection and estimation; naive Bayes; dimension reduction; SELECTION;
D O I
10.1109/TKDE.2016.2522427
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a Bayesian classification approach for automatic text categorization using class-specific features. Unlike conventional text categorization approaches, our proposed method selects a specific feature subset for each class. To apply these class-specific features for classification, we follow Baggenstoss's PDF Projection Theorem (PPT) to reconstruct the PDFs in raw data space from the class-specific PDFs in low-dimensional feature subspace, and build a Bayesian classification rule. One noticeable significance of our approach is that most feature selection criteria, such as Information Gain (IG) and Maximum Discrimination (MD), can be easily incorporated into our approach. We evaluate our method's classification performance on several real-world benchmarks, compared with the state-of-the-art feature selection approaches. The superior results demonstrate the effectiveness of the proposed approach and further indicate its wide potential applications in data mining.
引用
收藏
页码:1602 / 1606
页数:5
相关论文
共 25 条
[1]  
[Anonymous], 1998, LEARNING TEXT CATEGO
[2]  
[Anonymous], 2008, Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08
[3]  
[Anonymous], 1997, Technical report, DOI DOI 10.5555/645526.657130
[4]  
Bagenstoss PM, 1999, IEEE T SIGNAL PROCES, V47, P3428, DOI 10.1109/78.806092
[5]   The PDF projection theorem and the class-specific method [J].
Baggenstoss, PM .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2003, 51 (03) :672-685
[6]   Document clustering using locality preserving indexing [J].
Cai, D ;
He, XF ;
Han, JW .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) :1624-1637
[7]  
Eyheramendy Susana., 2003, Proceedings of The Ninth International Workshop on Artificial Intelligence and Statistics, P332
[8]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
[9]  
Galavotti L, 2000, LECT NOTES COMPUT SC, V1923, P59
[10]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284