Sparse MTTKRP Acceleration for Tensor Decomposition on GPU

被引:0
作者
Wijeratne, Sasindu [1 ]
Kannan, Rajgopal [2 ]
Prasanna, Viktor [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] DEVCOM Army Res Lab, Los Angeles, CA USA
来源
PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024 | 2024年
基金
美国国家科学基金会;
关键词
Tensor Decomposition; spMTTKRP; GPU;
D O I
10.1145/3649153.3649187
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation, including (1) eliminating global atomic operations across GPU thread blocks, (2) avoiding the intermediate values being communicated between GPU thread blocks and GPU global memory, and (3) ensuring a balanced distribution of workloads across GPU thread blocks. Our approach also supports dynamic tensor remapping, enabling the above optimizations in all the modes of the input tensor. Our approach achieves a geometric mean speedup of 1.5x, 2.0x, and 21.7x in total execution time across widely used datasets compared with the state-of-the-art GPU implementations. Our work is the only GPU implementation that can support tensors with modes greater than 4 since the state-of-the-art works have implementation constraints for tensors with a large number of modes.
引用
收藏
页码:88 / 96
页数:9
相关论文
共 33 条
[1]  
Ansorge Richard., 2022, Programming in parallel with CUDA: a practical guide
[2]  
Bollacker Kurt., P 2008 ACM SIGMOD IN
[3]  
Chandra R., 2001, PARALLEL PROGRAMMING
[4]  
Cheng ZY, 2020, INT CONF ACOUST SPEE, P3292, DOI [10.1109/ICASSP40776.2020.9053292, 10.1109/icassp40776.2020.9053292]
[5]  
Cook S, 2013, CUDA PROGRAMMING: A DEVELOPER'S GUIDE TO PARALLEL COMPUTING WITH GPUS, P1
[6]  
Fatica Massimiliano., 2008, 2008 IEEE HOT CHIPS, P1, DOI DOI 10.1109/HOTCHIPS.2008.7476520
[7]   Overview of constrained PARAFAC models [J].
Favier, Gerard ;
de Almeida, Andre L. F. .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014, :1-25
[8]   Tensor decomposition for analysing time-evolving social networks: an overview [J].
Fernandes, Sofia ;
Fanaee-T, Hadi ;
Gama, Joao .
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) :2891-2916
[9]   BOUNDS ON MULTIPROCESSING TIMING ANOMALIES [J].
GRAHAM, RL .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1969, 17 (02) :416-&
[10]   Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering [J].
He, Ruining ;
McAuley, Julian .
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, :507-517