A CWTM Model of Topic Extraction for Short Text

被引:2
作者
Diao, Yunlan [1 ]
Du, Yajun [1 ]
Xiao, Pan [1 ]
Liu, Jia [1 ]
机构
[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
来源
KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: LANGUAGE, KNOWLEDGE, AND INTELLIGENCE, CCKS 2017 | 2017年 / 784卷
关键词
Topic model; Short texts; Couple word;
D O I
10.1007/978-981-10-7359-5_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The topic model is designed to find potential topics from the massive micro-blog data. On the one hand, the extraction of potential topics contributes to the next analysis. On the other hand, because of the particularity of the data, we can not deal with it directly with the traditional topic model algorithm. In the field of data mining, although the traditional text topic mining has been widely studied, a short text like micro-blog has the distinctive characteristics of network languages and emerging novel words. Owning to the short message, the sparsity of data and incomplete description, the micro-blog can not be obtained efficiently. In this paper, we propose a simple, fast, and effective topic model for short texts, named couple-word topic model (CWTM). Based on Dirichlet Multinomial Mixture (DMM) model, it can leverage couple word co-occurrence to help distill better topics over short texts instead of the traditional word co-occurrence way. The method can alleviate the data sparseness problems, improve the performance of the model and adopt the Gibbs sampling algorithm to derive parameters. Through extensive experiments on two real-world short text collections, we find that CWTM achieves comparable or better topic representations than traditional topic model.
引用
收藏
页码:80 / 91
页数:12
相关论文
共 24 条
  • [1] [Anonymous], 2010, 4 INT AAAI C WEBLOGS
  • [2] [Anonymous], 2012, P 21 ACM INT C INF K
  • [3] [Anonymous], 2010, P 3 ACM INT C WEB SE, DOI DOI 10.1145/1718487.1718520
  • [4] [Anonymous], 2010, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, DOI DOI 10.1145/1835804.1835922
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Chen JL, 2010, CHI2010: PROCEEDINGS OF THE 28TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1185
  • [7] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [8] 2-9
  • [9] Probabilistic latent semantic indexing
    Hofmann, T
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57
  • [10] Hong L., 2010, P 1 WORKSH SOC MED A, P80, DOI DOI 10.1145/1964858.1964870