On machine learning methods for Chinese document categorization

被引:64
作者
He, J
Tan, AH
Tan, CL
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 119260, Singapore
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
关键词
text categorization; machine learning; comparative experiments;
D O I
10.1023/A:1023202221875
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluated their learning capabilities and efficiency in mining text classification knowledge. Benchmark experiments showed that their predictive performance were roughly comparable, especially on clean and well organized data sets. While kNN and ARAM yield better performances than SVM on small and clean data sets, SVM and ARAM significantly outperformed kNN on noisy data. Comparing efficiency, kNN was notably more costly in terms of time and memory than the other two methods. SVM is highly efficient in learning from well organized samples of moderate size, although on relatively large and noisy data the efficiency of SVM and ARAM are comparable.
引用
收藏
页码:311 / 322
页数:12
相关论文
共 21 条
[1]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[2]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[3]  
Cao Suqing, 1999, Journal of the China Society for Scientific and Technical Information, V18, P27
[4]   FUZZY ART - FAST STABLE LEARNING AND CATEGORIZATION OF ANALOG PATTERNS BY AN ADAPTIVE RESONANCE SYSTEM [J].
CARPENTER, GA ;
GROSSBERG, S ;
ROSEN, DB .
NEURAL NETWORKS, 1991, 4 (06) :759-771
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]  
Dasarathy B.V., 1991, IEEE COMPUTER SOC TU
[7]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[8]  
Joachims T., 1999, ADV KERNEL METHODS S, V1999, P169, DOI DOI 10.17877/DE290R-5098
[9]  
JOACHIMS T, 1998, P EUR C MACH LEARN S
[10]  
LEWIS DD, 1992, THESIS U MAASSACHUSE