A Term Weighting Scheme Approach for Vietnamese Text Classification

被引:2
作者
Vu Thanh Nguyen [1 ]
Nguyen Tri Hai [1 ]
Nguyen Hoang Nghia [1 ]
Tuan Dinh Le [2 ]
机构
[1] Univ Informat Technol VNU HCM, Ho Chi Minh City, Vietnam
[2] Long An Univ Econ & Ind, Tan An City, Long An Provinc, Vietnam
来源
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015 | 2015年 / 9446卷
关键词
Term weighting scheme; Vietnamese text classification; tf.idf; tf.rf;
D O I
10.1007/978-3-319-26135-5_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The term weighting scheme, which is used to convert the documents to vectors in the term space, is a vital step in automatic text categorization. The previous studies showed that term weighting schemes dominate the performance. There have been extensive studies on term weighting for English text classification. However, not many works have been studied on Vietnamese text classification.. In this paper, we proposed a term weighting scheme called normalize(tf.rf(max)), which is based on tf.rf term weighting scheme - one of the most effective term weighting schemes to date. We conducted experiments to compare our proposed normalize(tf.rf(max)) term weighting scheme to tf.rf and tf.idf on Vietnamese text classification benchmark. The results showed that our proposed term weighting scheme can achieve about 3 %-5 % accuracy better than other term weighting schemes.
引用
收藏
页码:46 / 53
页数:8
相关论文
共 11 条
[1]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[2]  
Debole F, 2004, STUD FUZZ SOFT COMP, V138, P81
[3]  
Deng ZH, 2004, LECT NOTES COMPUT SC, V3007, P588
[4]  
Hoang VCD, 2007, 2007 IEEE INT C RES
[5]   A comparison of methods for multiclass support vector machines [J].
Hsu, CW ;
Lin, CJ .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02) :415-425
[6]  
Joachims T, 1998, EUR C MACH LEARN, P137
[7]  
Lei HS, 2005, LECT NOTES COMPUT SC, V3541, P156
[8]   Text categorization with support vector machines.: How to represent texts in input space? [J].
Leopold, E ;
Kindermann, J .
MACHINE LEARNING, 2002, 46 (1-3) :423-444
[9]  
Phuong LH, 2008, LECT NOTES COMPUT SC, V5196, P240, DOI 10.1007/978-3-540-88282-4_23
[10]   TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL [J].
SALTON, G ;
BUCKLEY, C .
INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) :513-523