Continuous Semi-Supervised Nonnegative Matrix Factorization

被引:2
作者
Lindstrom, Michael R. R. [1 ]
Ding, Xiaofu [2 ]
Liu, Feng [2 ]
Somayajula, Anand [2 ]
Needell, Deanna [2 ]
机构
[1] Univ Texas Rio Grande Valley, Sch Math & Stat Sci, Edinburg, TX 78539 USA
[2] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90095 USA
关键词
topic modelling; regression; nonnegative matrix factorization; optimization; CONSTRAINED LEAST-SQUARES; ALGORITHMS;
D O I
10.3390/a16040187
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain applications it is desirable to extract topics and use them to predict quantitative outcomes. In this paper, we show Nonnegative Matrix Factorization can be combined with regression on a continuous response variable by minimizing a penalty function that adds a weighted regression error to a matrix factorization error. We show theoretically that as the weighting increases, the regression error in training decreases weakly. We test our method on synthetic data and real data coming from Rate My Professors reviews to predict an instructor's rating from the text in their reviews. In practice, when used as a dimensionality reduction method (when the number of topics chosen in the model is fewer than the true number of topics), the method performs better than doing regression after topics are identified-both during training and testing-and it retrains interpretability.
引用
收藏
页数:16
相关论文
共 30 条
  • [1] [Anonymous], RAT MY PROF
  • [2] [Anonymous], SCIP OPT NNLS
  • [3] Austin W, 2018, INT GEOSCI REMOTE SE, P5772, DOI 10.1109/IGARSS.2018.8518592
  • [4] Algorithms and applications for approximate nonnegative matrix factorization
    Berry, Michael W.
    Browne, Murray
    Langville, Amy N.
    Pauca, V. Paul
    Plemmons, Robert J.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 155 - 173
  • [5] Bleske-Rechek A., 2011, PRACT ASSESS RES EVA, V16, P18
  • [6] Bro R, 1997, J CHEMOMETR, V11, P393, DOI 10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO
  • [7] 2-L
  • [8] Experimental explorations on short text topic mining between LDA and NMF based Schemes
    Chen, Yong
    Zhang, Hui
    Liu, Rui
    Ye, Zhiwen
    Lin, Jianying
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 1 - 13
  • [9] A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates
    Freijeiro-Gonzalez, Laura
    Febrero-Bande, Manuel
    Gonzalez-Manteiga, Wenceslao
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2022, 90 (01) : 118 - 145
  • [10] Haddock Jamie, 2021, 2021 55th Asilomar Conference on Signals, Systems, and Computers, P1355, DOI 10.1109/IEEECONF53345.2021.9723109