Text Classification Using SVM with Exponential Kernel

被引:5
作者
Chen, Junting [1 ]
Zhong, Jian [1 ]
Xie, Yicai [1 ]
Cai, Caiyun
机构
[1] Gannan Normal Univ, Modern Educ Technol Ctr, Ganzhou 341000, Peoples R China
来源
COMPUTER AND INFORMATION TECHNOLOGY | 2014年 / 519-520卷
关键词
Text classification; exponential kernel; semantic similarity; support vector machine (SVM); kernel method;
D O I
10.4028/www.scientific.net/AMM.519-520.807
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification presents difficult challenges due to the high dimensionality and sparsity of text data, and to the complex semantics of the natural language. Typically, in text classification the documents are represented in the vector space using the "Bag of words (BoW)" technique. Despite its ease of use, BoW representation does not consider the semantic similarity between words. In this paper, we overcome the shortage of the BoW approach by applying the exponential kernel, which models semantic similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information, to enrich the BoW representation. Combined with the support vector machine (SVM), experimental evaluation on real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BoW approach.
引用
收藏
页码:807 / +
页数:2
相关论文
共 8 条
[1]  
[Anonymous], 2004, KERNEL METHODS PATTE
[2]   Latent semantic kernels [J].
Cristianini, N ;
Shawe-Taylor, J ;
Lodhi, H .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2002, 18 (2-3) :127-152
[3]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[4]  
Kandola J., 2003, ADV NEURAL INFORM PR, V15, P657
[5]  
NASIR JA, 2011, P 18 INT C STRING PR, V7024, P261
[6]  
Wang Pu., 2008, P 14 ACM SIGKDD INT, P713, DOI 10.1145/1401890.1401976
[7]  
Wang TH, 2013, LECT NOTES COMPUT SC, V8131, P545, DOI 10.1007/978-3-642-40728-4_68
[8]  
Zhang Q., 2012, J COMPUTATIONAL INFO, V8, P8569