Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings

被引:26
作者
Li, Ximing [1 ,2 ]
Zhang, Ang [1 ,2 ]
Li, Changchun [1 ,2 ]
Guo, Lantian [3 ]
Wang, Wenting [2 ,4 ]
Ouyang, Jihong [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China
[2] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China
[3] Northwestern Polytech Univ, Sch Automat, Xian, Shaanxi, Peoples R China
[4] Jilin Univ, Coll Math, Changchun, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
short text; topic modeling; word embeddings; clustering; text similarity;
D O I
10.1093/comjnl/bxy037
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts.
引用
收藏
页码:359 / 372
页数:14
相关论文
共 43 条
  • [1] Agirre E., 2009, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P19
  • [2] [Anonymous], 2008, Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08
  • [3] [Anonymous], 2004, P 20 INT C COMP LING
  • [4] [Anonymous], 2012, P 50 ANN M ASS COMP
  • [5] [Anonymous], 2016, ARXIV PREPRINT ARXIV
  • [6] [Anonymous], 2010, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, DOI DOI 10.1145/1835804.1835922
  • [7] Aonan Zhang, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8188, P670, DOI 10.1007/978-3-642-40988-2_43
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Canini K. R., 2009, INT C ART INT STAT C, P62
  • [10] Chang J., 2009, AISTATS