SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

被引：13

作者：

Li, Hao ^{[1
,2
,3
]}

Li, Zixuan ^{[1
,2
]}

Li, Kenli ^{[1
,2
]}

Rellermeyer, Jan S. ^{[3
]}

Chen, Lydia Y. ^{[3
]}

Li, Keqin ^{[1
,2
,4
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China

[2] Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China

[3] Delft Univ Technol, NL-2628 CD Delft, Netherlands

[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2021年 / 32卷 / 07期

基金：

瑞士国家科学基金会; 中国国家自然科学基金;

关键词：

Tensors; Sparse matrices; Optimization; Stochastic processes; Matrix decomposition; Indexes; Data models; High-order; high-dimension and sparse tensor; low-rank representation learning; machine learning algorithm; sparse tucker decomposition; stochastic optimization; parallel strategy; FACTORIZATION; REDUCTION; NETWORKS;

D O I：

10.1109/TPDS.2020.3047460

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal low-rank representation feature for the High-Order, High-Dimension, and Sparse Tensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices Khatri-Rao product, Kronecker product, and matrix-matrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD_Tucker) is proposed for STD which can automatically divide the high-dimension intermediate variables into small batches of intermediate matrices. Specifically, SGD_Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD_Tucker features the two distinct advancements over the state of the art. First, SGD_Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low data-dependence of SGD_Tucker enables fine-grained parallelization, which makes SGD_Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD_Tucker runs at least 2X faster than the state of the art.

引用

页码：1828 / 1841

页数：14

共 53 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Balazevic Ivana, 2019, ARXIV190109590, P5188

[3]

Becker S, 2018, P 32 INT C NEUR INF

[4]

Boyd S. P., 2004, Convex Optimization

[5] On Optimizing Distributed Non-negative Tucker Decomposition [J].

Chakaravarthy, Venkatesan T. ;

Pandian, Shivmaran S. ;

Raje, Saurabh ;

Sabharwal, Yogish .

INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, :238-249

[6] On Optimizing Distributed Tucker Decomposition for Sparse Tensors [J].

Chakaravarthy, Venkatesan T. ;

Choi, Jee W. ;

Joseph, Douglas J. ;

Murali, Prakash ;

Pandian, Shivmaran S. ;

Sabharwal, Yogish ;

Sreedhar, Dheeraj .

INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, :374-384

[7] Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview [J].

Chi, Yuejie ;

Lu, Yue M. ;

Chen, Yuxin .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (20) :5239-5269

[8]

Cichocki A, 2016, FOUND TRENDS MACH LE, V9, P431, DOI [10.1561/2200000059, 10.1561/2200000067]

[9] Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Part 1 Low-Rank Tensor Decompositions [J].

Cichocki, Andrzej ;

Lee, Namgil ;

Oseledets, Ivan ;

Anh-Huy Phan ;

Zhao, Qibin ;

Mandic, Danilo P. .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2016, 9 (4-5) :I-+

[10]

Dutta S, 2018, PR MACH LEARN RES, V84

← 1 2 3 4 5 6 →