A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN

被引：3

作者：

Wang, Haotian ^{[1
]}

Yang, Wangdong ^{[1
]}

Ouyang, Renqiu ^{[1
]}

Hu, Rong ^{[1
]}

Li, Kenli ^{[1
]}

Li, Keqin ^{[1
,2
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, 116 Lu Shan South Rd, Changsha 410082, Hunan, Peoples R China

[2] SUNY Coll New Paltz, Dept Comp Sci, 1 Hawk Dr, New Paltz, NY 12561 USA

来源：

ACM TRANSACTIONS ON PARALLEL COMPUTING | 2023年 / 10卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

CPU-GPU heterogeneous systems; format selection; GCN; parallel computing; SpTTM; TENSOR DECOMPOSITIONS; SPARSE; FRAMEWORK;

D O I：

10.1145/3584373

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310x compared to ParTI! library on a CPU-GPU heterogeneous system.

引用

页数：23

共 54 条

[1]

Anandkumar A, 2014, J MACH LEARN RES, V15, P2773

[2] Load-balancing Sparse Matrix Vector Product Kernels on GPUs [J].