SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

被引:13
作者
Li, Hao [1 ,2 ,3 ]
Li, Zixuan [1 ,2 ]
Li, Kenli [1 ,2 ]
Rellermeyer, Jan S. [3 ]
Chen, Lydia Y. [3 ]
Li, Keqin [1 ,2 ,4 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China
[3] Delft Univ Technol, NL-2628 CD Delft, Netherlands
[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
瑞士国家科学基金会; 中国国家自然科学基金;
关键词
Tensors; Sparse matrices; Optimization; Stochastic processes; Matrix decomposition; Indexes; Data models; High-order; high-dimension and sparse tensor; low-rank representation learning; machine learning algorithm; sparse tucker decomposition; stochastic optimization; parallel strategy; FACTORIZATION; REDUCTION; NETWORKS;
D O I
10.1109/TPDS.2020.3047460
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal low-rank representation feature for the High-Order, High-Dimension, and Sparse Tensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices Khatri-Rao product, Kronecker product, and matrix-matrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD_Tucker) is proposed for STD which can automatically divide the high-dimension intermediate variables into small batches of intermediate matrices. Specifically, SGD_Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD_Tucker features the two distinct advancements over the state of the art. First, SGD_Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low data-dependence of SGD_Tucker enables fine-grained parallelization, which makes SGD_Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD_Tucker runs at least 2X faster than the state of the art.
引用
收藏
页码:1828 / 1841
页数:14
相关论文
共 53 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Balazevic Ivana, 2019, ARXIV190109590, P5188
[3]  
Becker S, 2018, P 32 INT C NEUR INF
[4]  
Boyd S. P., 2004, Convex Optimization
[5]   On Optimizing Distributed Non-negative Tucker Decomposition [J].
Chakaravarthy, Venkatesan T. ;
Pandian, Shivmaran S. ;
Raje, Saurabh ;
Sabharwal, Yogish .
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, :238-249
[6]   On Optimizing Distributed Tucker Decomposition for Sparse Tensors [J].
Chakaravarthy, Venkatesan T. ;
Choi, Jee W. ;
Joseph, Douglas J. ;
Murali, Prakash ;
Pandian, Shivmaran S. ;
Sabharwal, Yogish ;
Sreedhar, Dheeraj .
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, :374-384
[7]   Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview [J].
Chi, Yuejie ;
Lu, Yue M. ;
Chen, Yuxin .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (20) :5239-5269
[8]  
Cichocki A, 2016, FOUND TRENDS MACH LE, V9, P431, DOI [10.1561/2200000059, 10.1561/2200000067]
[9]   Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Part 1 Low-Rank Tensor Decompositions [J].
Cichocki, Andrzej ;
Lee, Namgil ;
Oseledets, Ivan ;
Anh-Huy Phan ;
Zhao, Qibin ;
Mandic, Danilo P. .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2016, 9 (4-5) :I-+
[10]  
Dutta S, 2018, PR MACH LEARN RES, V84