fgSpMSpV: A Fine-grained Parallel SpMSpV Framework on HPC Platforms

被引:11
作者
Chen, Yuedan [1 ,2 ]
Xiao, Guoqing [1 ,2 ]
Li, Kenli [1 ,2 ]
Piccialli, Francesco [3 ]
Zomaya, Albert Y. [4 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China
[3] Univ Naples Federico II, Dept Elect Engn & Informat Technol, I-80100 Naples, Italy
[4] Univ Sydney, Sch Informat Technol, Sidney, BC 2006, Canada
基金
中国国家自然科学基金;
关键词
Heterogeneous; HPC; manycore; optimization; parallelism; SpMSpV; MATRIX-MATRIX MULTIPLICATION; PERFORMANCE;
D O I
10.1145/3512770
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse matrix-sparse vector (SpMSpV) multiplication is one of the fundamental and important operations in many high-performance scientific and engineering applications. The inherent irregularity and poor data locality lead to two main challenges to scaling SpMSpV over high-performance computing (HPC) systems: (i) a large amount of redundant data limits the utilization of bandwidth and parallel resources; (ii) the irregular access pattern limits the exploitation of computing resources. This paper proposes a fine-grained parallel SpMSpV (fgSpMSpV) framework on Sunway TaihuLight supercomputer to alleviate the challenges for large-scale real-world applications. First, fgSpMSpV adopts an MPI+OpenMP+X parallelization model to exploit the multi-stage and hybrid parallelism of heterogeneous HPC architectures and accelerate both pre-/post-processing and main SpMSpV computation. Second, fgSpMSpV utilizes an adaptive parallel execution to reduce the pre-processing, adapt to the parallelism and memory hierarchy of the Sunway system, while still tame redundant and random memory accesses in SpMSpV, including a set of techniques like the fine-grained partitioner, re-collection method, and Compressed Sparse Column Vector (CSCV) matrix format. Third, fgSpMSpV uses several optimization techniques to further utilize the computing resources. fgSpMSpV on the Sunway TaihuLight gains a noticeable performance improvement from the key optimization techniques with various sparsity of the input. Additionally, fgSpMSpV is implemented on an NVIDIA Tesal P100 GPU and applied to the breath-first-search (BFS) application. fgSpMSpV on a P100 GPU obtains the speedup of up to 134.38x over the state-of-the-art SpMSpV algorithms, and the BFS application using fgSpMSpV achieves the speedup of up to 21.68x over the state-of-the-arts.
引用
收藏
页数:29
相关论文
共 44 条
[21]   PEPS plus plus : Towards Extreme-Scale Simulations of Strongly Correlated Quantum Many-Particle Models on Sunway TaihuLight [J].
He, Lixin ;
An, Hong ;
Yang, Chao ;
Wang, Fei ;
Chen, Junshi ;
Wang, Chao ;
Liang, Weihao ;
Dong, Shaojun ;
Sun, Qiao ;
Han, Wenting ;
Liu, Wenyuan ;
Han, Yongjian ;
Yao, Wenjun .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (12) :2838-2848
[22]  
Jo Yong-Yeon, 2015, P 24 ACM INT C INF K, P1261, DOI 10.1145/2806416.2806445
[23]   MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks [J].
Kim, Byeongho ;
Chung, Jongwook ;
Lee, Eojin ;
Jung, Wonkyung ;
Lee, Sunjung ;
Choi, Jaewan ;
Park, Jaehyun ;
Wi, Minbok ;
Lee, Sukhan ;
Ahn, Jung Ho .
IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (07) :955-967
[24]   Evaluation Criteria for Sparse Matrix Storage Formats [J].
Langr, Daniel ;
Tvrdik, Pavel .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (02) :428-440
[25]   Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity [J].
Li, Min ;
Ao, Yulong ;
Yang, Chao .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (07) :1842-1853
[26]  
Luo SQ, 2019, AAAI CONF ARTIF INTE, P4496
[27]   Non-Negativity Constrained Missing Data Estimation for High-Dimensional and Sparse Matrices from Industrial Applications [J].
Luo, Xin ;
Zhou, MengChu ;
Li, Shuai ;
Hu, Lun ;
Shang, Mingsheng .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) :1844-1855
[28]   A Nonnegative Latent Factor Model for Large-Scale Sparse Matrices in Recommender Systems via Alternating Direction Method [J].
Luo, Xin ;
Zhou, MengChu ;
Li, Shuai ;
You, Zhuhong ;
Xia, Yunni ;
Zhu, Qingsheng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (03) :579-592
[29]  
Mishra AK, 2017, ASIA S PACIF DES AUT, P635, DOI 10.1109/ASPDAC.2017.7858395
[30]   Improving Efficiency of Parallel Vertex-Centric Algorithms for Irregular Graphs [J].
Ozdal, Muhammet Mustafa .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (10) :2265-2282