L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization

被引:0
|
作者
Suh, Sangho [1 ]
Choo, Jaegul [1 ]
Lee, Joonseok [2 ]
Reddy, Chandan K. [3 ]
机构
[1] Korea Univ, Seoul, South Korea
[2] Google Res, Mountain View, CA USA
[3] Virginia Tech, Arlington, VA USA
来源
2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2016年
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Topic modeling; ensemble learning; matrix factorization; gradient boosting; local weighting; CONSTRAINED LEAST-SQUARES;
D O I
10.1109/ICDM.2016.108
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nonnegative matrix factorization (NMF) has been widely applied in many domains. In document analysis, it has been increasingly used in topic modeling applications, where a set of underlying topics are revealed by a low-rank factor matrix from NMF. However, it is often the case that the resulting topics give only general topic information in the data, which tends not to convey much information. To tackle this problem, we propose a novel ensemble model of nonnegative matrix factorization for discovering high-quality local topics. Our method leverages the idea of an ensemble model, which has been successful in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. Our algorithm for updating the input matrix has novelty in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We evaluate our proposed method by comparing it against other topic modeling methods, such as a few variants of NMF and latent Dirichlet allocation, in terms of various evaluation measures representing topic coherence, diversity, coverage, computing time, and so on. We also present qualitative evaluation on the topics discovered by our method using several real-world data sets.
引用
收藏
页码:479 / 488
页数:10
相关论文
共 24 条
  • [1] Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization
    Suh, Sangho
    Shin, Sungbok
    Lee, Joonseok
    Reddy, Chandan K.
    Choo, Jaegul
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (03) : 503 - 531
  • [2] Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization
    Sangho Suh
    Sungbok Shin
    Joonseok Lee
    Chandan K. Reddy
    Jaegul Choo
    Knowledge and Information Systems, 2018, 56 : 503 - 531
  • [3] Topic Modeling on Triage Notes With Semiorthogonal Nonnegative Matrix Factorization
    Li, Yutong
    Zhu, Ruoqing
    Qu, Annie
    Ye, Han
    Sun, Zhankun
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (536) : 1609 - 1624
  • [4] Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization
    Kim, Hannah
    Choo, Jaegul
    Kim, Jingu
    Reddy, Chandan K.
    Park, Haesun
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 567 - 576
  • [5] Nonnegative Matrix Factorization Via Archetypal Analysis
    Javadi, Hamid
    Montanari, Andrea
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (530) : 896 - 907
  • [6] Sparse nonnegative matrix factorization for protein sequence motif discovery
    Kim, Wooyoung
    Chen, Bernard
    Kim, Jingu
    Pan, Yi
    Park, Haesun
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13198 - 13207
  • [7] Stability of topic modeling via matrix factorization
    Belford, Mark
    Mac Namee, Brian
    Greene, Derek
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 159 - 169
  • [8] Social Spammer Detection via Convex Nonnegative Matrix Factorization
    Shen, Hua
    Wang, Bangyu
    Liu, Xinyue
    Zhang, Xianchao
    IEEE ACCESS, 2022, 10 : 91192 - 91202
  • [9] Sparse nonnegative matrix factorization with l0-constraints
    Peharz, Robert
    Pernkopf, Franz
    NEUROCOMPUTING, 2012, 80 : 38 - 46
  • [10] UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
    Choo, Jaegul
    Lee, Changhyun
    Reddy, Chandan K.
    Park, Haesun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 1992 - 2001