Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

被引:5
|
作者
Suh, Sangho [1 ]
Shin, Sungbok [2 ]
Lee, Joonseok [3 ]
Reddy, Chandan K. [4 ]
Choo, Jaegul [2 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
[2] Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea
[3] Google Res, Machine Percept, Mountain View, CA USA
[4] Virginia Tech, Dept Comp Sci, Arlington, VA USA
基金
新加坡国家研究基金会; 美国国家科学基金会;
关键词
Topic modeling; Ensemble learning; Matrix factorization; Gradient boosting; Local weighting; CONSTRAINED LEAST-SQUARES; ALGORITHMS;
D O I
10.1007/s10115-017-1147-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.
引用
收藏
页码:503 / 531
页数:29
相关论文
共 40 条
  • [1] Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization
    Sangho Suh
    Sungbok Shin
    Joonseok Lee
    Chandan K. Reddy
    Jaegul Choo
    Knowledge and Information Systems, 2018, 56 : 503 - 531
  • [2] L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization
    Suh, Sangho
    Choo, Jaegul
    Lee, Joonseok
    Reddy, Chandan K.
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 479 - 488
  • [3] UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
    Choo, Jaegul
    Lee, Changhyun
    Reddy, Chandan K.
    Park, Haesun
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 1992 - 2001
  • [4] Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization
    Kim, Hannah
    Choo, Jaegul
    Kim, Jingu
    Reddy, Chandan K.
    Park, Haesun
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 567 - 576
  • [5] Topic Modeling on Triage Notes With Semiorthogonal Nonnegative Matrix Factorization
    Li, Yutong
    Zhu, Ruoqing
    Qu, Annie
    Ye, Han
    Sun, Zhankun
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (536) : 1609 - 1624
  • [6] Nonnegative Matrix Factorization Via Archetypal Analysis
    Javadi, Hamid
    Montanari, Andrea
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (530) : 896 - 907
  • [7] Neural nonnegative matrix factorization for hierarchical multilayer topic modeling
    Haddock, Jamie
    Will, Tyler
    Vendrow, Joshua
    Zhang, Runyu
    Molitor, Denali
    Needell, Deanna
    Gao, Mengdi
    Sadovnik, Eli
    SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS, 2024, 22 (01):
  • [8] DC-NMF: nonnegative matrix factorization based on divide-and-conquer for fast clustering and topic modeling
    Du, Rundong
    Kuang, Da
    Drake, Barry
    Park, Haesun
    JOURNAL OF GLOBAL OPTIMIZATION, 2017, 68 (04) : 777 - 798
  • [9] Sparse nonnegative matrix factorization for protein sequence motif discovery
    Kim, Wooyoung
    Chen, Bernard
    Kim, Jingu
    Pan, Yi
    Park, Haesun
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13198 - 13207
  • [10] Stability of topic modeling via matrix factorization
    Belford, Mark
    Mac Namee, Brian
    Greene, Derek
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 159 - 169