A probabilistic matrix factorization algorithm for approximation of sparse matrices in natural language processing

被引:4
作者
Tarantino, Gianmaria [1 ]
Monica, Stefania [1 ]
Bergenti, Federico [1 ]
机构
[1] Univ Parma, Dipartimento Sci Matemat Fis & Informat, I-43124 Parma, Italy
关键词
Latent semantic analysis; Natural language processing; Singular value decomposition;
D O I
10.1016/j.icte.2018.04.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper suggests a variation of a well-known probabilistic matrix factorization algorithm which is commonly used in data analysis and scientific computing, and which has been considered recently to serve natural language processing. The proposed variation is meant to take benefit from the fact that matrices processed in natural language processing tasks are normally sparse rectangular matrices with one dimension much larger than the other, and this can be used to ensure adequate accuracy with acceptable computation time. Preliminary experiments on real-world textual corpora show that the proposed algorithm achieves relevant improvements compared to the original one. (C) 2018 The Korean Institute of Communications and Information Sciences (KICS). Publishing Services by Elsevier B.V.
引用
收藏
页码:87 / 90
页数:4
相关论文
共 9 条
[1]  
[Anonymous], 1989, J HOPKINS SERIES MAT
[2]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[3]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[4]  
2-9
[5]   Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions [J].
Halko, N. ;
Martinsson, P. G. ;
Tropp, J. A. .
SIAM REVIEW, 2011, 53 (02) :217-288
[6]  
Levy O, 2014, ADV NEUR IN, V27
[7]   A RANDOMIZED BLOCKED ALGORITHM FOR EFFICIENTLY COMPUTING RANK-REVEALING FACTORIZATIONS OF MATRICES [J].
Martinsson, Per-Gunnar ;
Voronin, Sergey .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05) :S485-S507
[8]  
Mikolov T., 2013, P 26 INT C NEURAL IN, P3111
[9]  
Pennington J., 2014, 2014 C EMP METH NAT, P43