Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

被引:5
作者
Suh, Sangho [1 ]
Shin, Sungbok [2 ]
Lee, Joonseok [3 ]
Reddy, Chandan K. [4 ]
Choo, Jaegul [2 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
[2] Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea
[3] Google Res, Machine Percept, Mountain View, CA USA
[4] Virginia Tech, Dept Comp Sci, Arlington, VA USA
基金
新加坡国家研究基金会; 美国国家科学基金会;
关键词
Topic modeling; Ensemble learning; Matrix factorization; Gradient boosting; Local weighting; CONSTRAINED LEAST-SQUARES; ALGORITHMS;
D O I
10.1007/s10115-017-1147-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nonnegative matrix factorization (NMF) has been widely used in topic modeling of large-scale document corpora, where a set of underlying topics are extracted by a low-rank factor matrix from NMF. However, the resulting topics often convey only general, thus redundant information about the documents rather than information that might be minor, but potentially meaningful to users. To address this problem, we present a novel ensemble method based on nonnegative matrix factorization that discovers meaningful local topics. Our method leverages the idea of an ensemble model, which has shown advantages in supervised learning, into an unsupervised topic modeling context. That is, our model successively performs NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The algorithm we employ to update is novel in two aspects. The first lies in utilizing the residual matrix inspired by a state-of-the-art gradient boosting model, and the second stems from applying a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users. We subsequently extend this ensemble model by adding keyword- and document-based user interaction to introduce user-driven topic discovery.
引用
收藏
页码:503 / 531
页数:29
相关论文
共 40 条
  • [31] ROBUST NONNEGATIVE MATRIX FACTORIZATION VIA L1 NORM REGULARIZATION BY MULTIPLICATIVE UPDATING RULES
    Shen, Bin
    Liu, Bao-Di
    Wang, Qifan
    Ji, Rongrong
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5282 - 5286
  • [32] Motor imagery classification via combinatory decomposition of ERP and ERSP using sparse nonnegative matrix factorization
    Lu, Na
    Yin, Tao
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2015, 249 : 41 - 49
  • [33] A topic sentiment based method for friend recommendation in online social networks via matrix factorization
    Cai, Chongchao
    Xu, Huahu
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 65
  • [34] Cold-start link prediction integrating community information via multi-nonnegative matrix factorization
    Tang, Minghu
    Wang, Wenjun
    [J]. CHAOS SOLITONS & FRACTALS, 2022, 162
  • [35] Comparison of Initialization Techniques for the Accurate Extraction of Muscle Synergies from Myoelectric Signals via Nonnegative Matrix Factorization
    Soomro, Mumtaz Hussain
    Conforto, Silvia
    Giunta, Gaetano
    Ranaldi, Simone
    De Marchis, Cristiano
    [J]. APPLIED BIONICS AND BIOMECHANICS, 2018, 2018
  • [36] Dual auto-weighted multi-view clustering via autoencoder-like nonnegative matrix factorization
    Xiang, Si-Jia
    Li, Heng-Chao
    Yang, Jing-Hua
    Feng, Xin-Ru
    [J]. INFORMATION SCIENCES, 2024, 667
  • [37] User-Specific Rating Prediction for Mobile Applications via Weight-based Matrix Factorization
    Meng, Jingke
    Zheng, Zibin
    Tao, Guanhong
    Liu, Xuanzhe
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS), 2016, : 728 - 731
  • [38] Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization
    Wang, Hua
    Huang, Heng
    Ding, Chris
    Nie, Feiping
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (04) : 344 - 358
  • [39] Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations
    Shi, Tian
    Kang, Kyeongpil
    Choo, Jaegul
    Reddy, Chandan K.
    [J]. WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 1105 - 1114
  • [40] DGPathinter: a novel model for identifying driver genes via knowledge-driven matrix factorization with prior knowledge from interactome and pathways
    Xi, Jianing
    Wang, Minghui
    Li, Ao
    [J]. PEERJ COMPUTER SCIENCE, 2017,