Combining Lexical and Semantic Features for Short Text Classification

被引:40
作者
Yang, Lili [1 ]
Li, Chunping [1 ]
Ding, Qiang [2 ]
Li, Li [2 ]
机构
[1] Tsinghua Univ, Sch Software, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
[2] Huawei Technol Co LTD, Shannon Lab, Beijing 100095, Peoples R China
来源
17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013 | 2013年 / 22卷
关键词
Short text; Topic model; Wikipedia; Feature selection;
D O I
10.1016/j.procs.2013.09.083
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel approach to classify short texts by combining both their lexical and semantic features. We present an improved measurement method for lexical feature selection and furthermore obtain the semantic features with the background knowledge repository which covers target category domains. The combination of lexical and semantic features is achieved by mapping words to topics with different weights. In this way, the dimensionality of feature space is reduced to the number of topics. We here use Wikipedia as background knowledge and employ Support Vector Machine (SVM) as classifier. The experiment results show that our approach has better effectiveness compared with existing methods for classifying short texts. (C) 2013 The Authors. Published by Elsevier B.V.
引用
收藏
页码:78 / 86
页数:9
相关论文
共 17 条
[1]  
[Anonymous], NIPS
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]  
BOLLEGALA D, 2007, P 16 INT C WORLD WID
[4]  
Chen M., 2011, IJCAI, P1776, DOI DOI 10.5591/978-1-57735-516-8/IJCAI11-298
[5]  
Diao Q., 2012, P 50 ANN M ASS COMP, V1
[6]  
Ferragina Paolo, 2010, P CIKM
[7]  
Jin Ou, 2011, P 20 ACM INT C INF K
[8]   Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA [J].
Lu, Yue ;
Mei, Qiaozhu ;
Zhai, ChengXiang .
INFORMATION RETRIEVAL, 2011, 14 (02) :178-203
[9]  
Mladenic D, 1999, MACHINE LEARNING, PROCEEDINGS, P258
[10]  
Park Jinhee, 2012, P 6 INT C UB INF MAN