Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

被引:120
作者
Shi, Tian [1 ]
Kang, Kyeongpil [2 ]
Choo, Jaegul [2 ]
Reddy, Chandan K. [1 ]
机构
[1] Virginia Tech, Blacksburg, VA USA
[2] Korea Univ, Seoul, South Korea
来源
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018) | 2018年
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Topic modeling; short texts; non-negative matrix factorization; word embedding;
D O I
10.1145/3178876.3186009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle this problem, in this paper, we propose a semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus. The SeaNMF model is solved using a block coordinate descent algorithm. We also develop a sparse variant of the SeaNMF model which can achieve a better model interpretability. Extensive quantitative evaluations on various real-world short text datasets demonstrate the superior performance of the proposed models over several other state-of-the-art methods in terms of topic coherence and classification accuracy. The qualitative semantic analysis demonstrates the interpretability of our models by discovering meaningful and consistent topics. With a simple formulation and the superior performance, SeaNMF can be an effective standard topic model for short texts.
引用
收藏
页码:1105 / 1114
页数:10
相关论文
共 31 条
  • [1] [Anonymous], 2012, INT C DATA MINING SO
  • [2] [Anonymous], 2015, J GLOBAL OPTIM, DOI DOI 10.1007/s10898-014-0247-2
  • [3] [Anonymous], 2009, NONNEGATIVE MATRIX T
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Weakly supervised nonnegative matrix factorization for user-driven clustering
    Choo, Jaegul
    Lee, Changhyun
    Reddy, Chandan K.
    Park, Haesun
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (06) : 1598 - 1621
  • [6] UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
    Choo, Jaegul
    Lee, Changhyun
    Reddy, Chandan K.
    Park, Haesun
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 1992 - 2001
  • [7] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [8] 2-9
  • [9] Fan RE, 2008, J MACH LEARN RES, V9, P1871
  • [10] Probabilistic latent semantic indexing
    Hofmann, T
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57