Exploiting Global Semantic Similarity Biterms for Short-text Topic Discovery

被引:3
作者
Lu, Heng-yang [1 ]
Ge, Gao-jian [1 ]
Li, Yun [1 ]
Wang, Chong-jun [1 ]
Xie, Jun-yuan [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, Natl Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
来源
2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI) | 2018年
基金
中国国家自然科学基金;
关键词
topic model; short texts; semantic similarity; word embeddings; global biterms;
D O I
10.1109/ICTAI.2018.00151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The demand for mining massive short-text data from the Internet has promoted researches on topic models. There exist many schemes trying to solve the sparsity problems brought by short texts, mainly based on data aggregation or model improvement. Among them, Biterm Topic Model changes the way of modeling topics, which is on document-level biterms and has shown creativity and effectiveness. However, this may ignore those semantically similar and rarely co-occurrent word pairs, which are denoted as global biterms in this paper. Inspired by the successful application of word embeddings in GPU-DMM, we exploit word embeddings to extract semantically similar word pairs from the whole corpus to help discover better topics. We call this model as GloSS, which takes advantages of both the approach to model topics and word embeddings. Experimental results on two open-source and real datasets are superior to state-of-the-art topic models for short texts.
引用
收藏
页码:975 / 982
页数:8
相关论文
共 29 条
[1]  
[Anonymous], 2008, Polya Urn Models
[2]  
[Anonymous], 2005, Advances in neural information processing systems
[3]  
[Anonymous], 2010, P 3 ACM INT C WEB SE, DOI DOI 10.1145/1718487.1718520
[4]  
[Anonymous], 2014, P 18 C EMP METH NAT, DOI DOI 10.3115/V1/D14-1082
[5]  
[Anonymous], 2010, INTERSPEECH, DOI DOI 10.1016/J.CSL.2010.08.008
[6]  
Baroni M, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P238
[7]  
Bengio Y., 2008, Scholarpedia, V3, P3881, DOI [DOI 10.4249/SCHOLARPEDIA.3881, 10.4249/scholarpedia.3881]
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   A Semantic Graph based Topic Model for Question Retrieval in Community Question Answering [J].
Chen, Long ;
Jose, Joemon M. ;
Yu, Haitao ;
Yuan, Fajie ;
Zhang, Dell .
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, :287-296
[10]  
Dent K. D., 2011, Proceedings of the AAAI 2011 Workshop on Analyzing Microtext, P8