A CWTM Model of Topic Extraction for Short Text

被引：2

作者：

Diao, Yunlan ^{[1
]}

Du, Yajun ^{[1
]}

Xiao, Pan ^{[1
]}

Liu, Jia ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China

来源：

KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: LANGUAGE, KNOWLEDGE, AND INTELLIGENCE, CCKS 2017 | 2017年 / 784卷

关键词：

Topic model; Short texts; Couple word;

D O I：

10.1007/978-981-10-7359-5_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The topic model is designed to find potential topics from the massive micro-blog data. On the one hand, the extraction of potential topics contributes to the next analysis. On the other hand, because of the particularity of the data, we can not deal with it directly with the traditional topic model algorithm. In the field of data mining, although the traditional text topic mining has been widely studied, a short text like micro-blog has the distinctive characteristics of network languages and emerging novel words. Owning to the short message, the sparsity of data and incomplete description, the micro-blog can not be obtained efficiently. In this paper, we propose a simple, fast, and effective topic model for short texts, named couple-word topic model (CWTM). Based on Dirichlet Multinomial Mixture (DMM) model, it can leverage couple word co-occurrence to help distill better topics over short texts instead of the traditional word co-occurrence way. The method can alleviate the data sparseness problems, improve the performance of the model and adopt the Gibbs sampling algorithm to derive parameters. Through extensive experiments on two real-world short text collections, we find that CWTM achieves comparable or better topic representations than traditional topic model.

引用

页码：80 / 91

页数：12