Text claslsification based on the TAN Model

被引:3
作者
Shi, HB [1 ]
Wang, ZH [1 ]
Huang, HK [1 ]
Jing, LP [1 ]
机构
[1] No Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
来源
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS | 2002年
关键词
text classification; TAN; naive Bayes; Bayesian network; feature selection;
D O I
10.1109/TENCON.2002.1181210
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a text classification method based on TAN model. Naive Bayesian classifier is the most effective and popular. text classification method, but its attribute independence assumption makes it unable to express the dependence among text terms. TAN (Tree Augmented Naive Bayes) combines the simplicity of Naive Bayesian with the ability to express the dependence among attributes in Bayesian network. This paper reviews some existing text methods, introduces TAN model, and applies TAN model to text classification. Naive Bayesian and TAN classifiers are also compared by our experiments. Experimental results show TAN classifier has better performance.
引用
收藏
页码:43 / 46
页数:4
相关论文
共 12 条
[1]  
[Anonymous], COMP EVENT MODELS NA
[2]  
CHICKERING DM, 1996, AI STATV
[3]  
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
[4]  
EAMONN JK, 1999, LEARNING AUGMENTED B
[5]   Bayesian network classifiers [J].
Friedman, N ;
Geiger, D ;
Goldszmidt, M .
MACHINE LEARNING, 1997, 29 (2-3) :131-163
[6]  
Han E. -H. S., 2001, PAC AS C KNOWL DISC, P53, DOI DOI 10.1007/3-540-45357-1_9
[7]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[8]  
LAM W, 1998, P 21 ANN INT ACM SIG, P81, DOI DOI 10.1145/290941.290961
[9]  
LANGLEY P, 1992, AAAI-92 PROCEEDINGS : TENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, P223
[10]  
Pearl Judea., PROBABILISTIC REASON