Learning Low-Rank Document Embeddings with Weighted Nuclear Norm Regularization

被引:0
|
作者
Pfahler, Lukas [1 ]
Morik, Katharina [1 ]
Elwert, Frederik [2 ]
Tabti, Samira [2 ]
Krech, Volkhard [2 ]
机构
[1] TU Dortmund Univ, D-44227 Dortmund, Germany
[2] Ruhr Univ Bochum, D-44801 Bochum, Germany
来源
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2017年
关键词
D O I
10.1109/DSAA.2017.46
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, neural embeddings of documents have shown success in various language processing tasks. These low dimensional and dense feature vectors of text documents capture semantic similarities better than traditional methods. However, the underlying optimization problem is non-convex and usually solved using stochastic gradient descent. Hence solutions are most-likely sub-optimal and not reproducible, as they are the result of a randomized algorithm. We present an alternative formulation for learning low-rank representations.Instead of explicitly learning low-dimensional features, we compute a low-rank representation implicitly by regularizing full-dimensional solutions. Our approach uses the weighted nuclear norm, a regularizer that penalizes singular values of matrices. We optimize the regularized objective using accelerated proximal gradient descent. We apply the approach to learn embeddings of documents. We show that our approach outperforms traditional convex approaches in a numerical study. Furthermore we demonstrate that the embeddings are useful for detecting similarities on a standard dataset. Then we apply our approach in an interdisciplinary research project to detect topics in religious online discussions. The topic descriptions obtained from a clustering of embeddings are coherent and insightful. In comparison to existing approaches, they are also reproducible. An earlier version of this work stated, that the weighted nuclear norm is a convex regularizer. This is wrong - the weighted nuclear norm is non-convex, even though the name falsely suggests that it is a matrix norm.
引用
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [1] Weighted truncated nuclear norm regularization for low-rank quaternion matrix completion
    Yang, Liqiao
    Kou, Kit Ian
    Miao, Jifei
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81
  • [2] Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization
    Yao, Quanming
    Wang, Yaqing
    Han, Bo
    Kwok, James T.
    arXiv, 2022,
  • [3] Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization
    Yao, Quanming
    Wang, Yaqing
    Han, Bo
    Kwok, James T.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [4] Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization
    Yao, Quanming
    Wang, Yaqing
    Han, Bo
    Kwok, James T.
    Journal of Machine Learning Research, 2022, 23
  • [5] Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization
    Xue, Shengke
    Qiu, Wenyuan
    Liu, Fan
    Jin, Xinyu
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2600 - 2605
  • [6] Nuclear norm regularization with a low-rank constraint for matrix completion
    Zhang, Hui
    Cheng, Lizhi
    Zhu, Wei
    INVERSE PROBLEMS, 2010, 26 (11)
  • [7] Weighted hybrid truncated norm regularization method for low-rank matrix completion
    Xiying Wan
    Guanghui Cheng
    Numerical Algorithms, 2023, 94 : 619 - 641
  • [8] Weighted hybrid truncated norm regularization method for low-rank matrix completion
    Wan, Xiying
    Cheng, Guanghui
    NUMERICAL ALGORITHMS, 2023, 94 (02) : 619 - 641
  • [9] Low-Rank Subspace Learning of Multikernel Based on Weighted Truncated Nuclear Norm for Image Segmentation
    Li, Li
    Wang, Xiao
    Pu, Lei
    Wang, Jing
    Zhang, Xiaoqian
    IEEE ACCESS, 2022, 10 : 66290 - 66299
  • [10] Low-Rank Tensor Completion via Tensor Nuclear Norm With Hybrid Smooth Regularization
    Zhao, Xi-Le
    Nie, Xin
    Zheng, Yu-Bang
    Ji, Teng-Yu
    Huang, Ting-Zhu
    IEEE ACCESS, 2019, 7 : 131888 - 131901