A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN

被引:3
作者
Wang, Haotian [1 ]
Yang, Wangdong [1 ]
Ouyang, Renqiu [1 ]
Hu, Rong [1 ]
Li, Kenli [1 ]
Li, Keqin [1 ,2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, 116 Lu Shan South Rd, Changsha 410082, Hunan, Peoples R China
[2] SUNY Coll New Paltz, Dept Comp Sci, 1 Hawk Dr, New Paltz, NY 12561 USA
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
CPU-GPU heterogeneous systems; format selection; GCN; parallel computing; SpTTM; TENSOR DECOMPOSITIONS; SPARSE; FRAMEWORK;
D O I
10.1145/3584373
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310x compared to ParTI! library on a CPU-GPU heterogeneous system.
引用
收藏
页数:23
相关论文
共 54 条
[1]  
Anandkumar A, 2014, J MACH LEARN RES, V15, P2773
[2]   Load-balancing Sparse Matrix Vector Product Kernels on GPUs [J].
Anzt, Hartwig ;
Cojean, Terry ;
Chen, Yen-Chen ;
Dongarra, Jack ;
Flegar, Goran ;
Nayak, Pratik ;
Tomov, Stanimire ;
Tsai, Yuhsiang M. ;
Wang, Weichung .
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2020, 7 (01)
[3]   Efficient MATLAB computations with sparse and factored tensors [J].
Bader, Brett W. ;
Kolda, Tamara G. .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2007, 30 (01) :205-231
[4]  
Ballard Grey, 2020, PPSC
[5]   Design of a High-Performance Tensor-Vector Multiplication with BLAS [J].
Bassoy, Cem .
COMPUTATIONAL SCIENCE - ICCS 2019, PT I, 2019, 11536 :32-45
[6]   Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU [J].
Benatia, Akrem ;
Ji, Weixing ;
Wang, Yizhuo ;
Shi, Feng .
PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, :496-505
[7]  
Chang Hong, 2021, MACHINE LEARNING A J
[8]   Citywide Traffic Flow Prediction Based on Multiple Gated Spatio-temporal Convolutional Neural Networks [J].
Chen, Cen ;
Li, Kenli ;
Teo, Sin G. ;
Zou, Xiaofeng ;
Li, Keqin ;
Zeng, Zeng .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2020, 14 (04)
[9]   fgSpMSpV: A Fine-grained Parallel SpMSpV Framework on HPC Platforms [J].
Chen, Yuedan ;
Xiao, Guoqing ;
Li, Kenli ;
Piccialli, Francesco ;
Zomaya, Albert Y. .
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2022, 9 (02)
[10]   aeSpTV: An Adaptive and Efficient Framework for Sparse Tensor-Vector Product Kernel on a High-Performance Computing Platform [J].
Chen, Yuedan ;
Xiao, Guoqing ;
Ozsu, M. Tamer ;
Liu, Chubo ;
Zomaya, Albert Y. ;
Li, Tao .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) :2329-2345