SBTM: Topic Modeling over Short Texts

被引:7
作者
Pang, Jianhui [1 ]
Li, Xiangsheng [1 ]
Xie, Haoran [2 ]
Rao, Yanghui [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
[2] Caritas Inst Higher Educ, Hong Kong, Hong Kong, Peoples R China
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2016 | 2016年 / 9645卷
关键词
Short text classification; Sentiment detection; Topic-based similarity; Biterm topic model; COMMUNITY; SEARCH;
D O I
10.1007/978-3-319-32055-7_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of social media services such as Twitter, Sina Weibo and so forth, short texts are becoming more and more prevalent. However, inferring topics from short texts is always full of challenges for many content analysis tasks because of the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a classification model named sentimental biterm topic model (SBTM), which is applied to sentiment classification over short texts. To alleviate the problem of sparsity in short texts, the similarity between words and documents are firstly estimated by singular value decomposition. Then, the most similar words are added to each short document in the corpus. Extensive evaluations on sentiment detection of short text validate the effectiveness of the proposed method.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 26 条
[1]  
[Anonymous], 2007, SEMEVAL2007
[2]  
[Anonymous], 2006, Proceedings of the 15th international conference on World Wide Web
[3]  
Banerjee Somnath, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P787, DOI 10.1145/1277741.1277909
[4]   Mining Social Emotions from Affective Text [J].
Bao, Shenghua ;
Xu, Shengliang ;
Zhang, Li ;
Yan, Rong ;
Su, Zhong ;
Han, Dingyi ;
Yu, Yong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (09) :1658-1670
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]   BTM: Topic Modeling over Short Texts [J].
Cheng, Xueqi ;
Yan, Xiaohui ;
Lan, Yanyan ;
Guo, Jiafeng .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (12) :2928-2941
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[8]  
2-9
[9]   Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool [J].
Gangemi, Aldo ;
Presutti, Valentina ;
Recupero, Diego Reforgiato .
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2014, 9 (01) :20-30
[10]   STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES [J].
GEMAN, S ;
GEMAN, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) :721-741