Load-balancing Sparse Matrix Vector Product Kernels on GPUs

被引：31

作者：

Anzt, Hartwig ^{[1
,2
]}

Cojean, Terry ^{[1
]}

Chen, Yen-Chen ^{[3
,7
]}

Dongarra, Jack ^{[2
,4
,5
]}

Flegar, Goran ^{[6
]}

Nayak, Pratik ^{[1
]}

Tomov, Stanimire ^{[2
]}

Tsai, Yuhsiang M. ^{[3
]}

Wang, Weichung ^{[3
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

[2] Univ Tennessee, Knoxville, TN 37996 USA

[3] Natl Taiwan Univ, Taipei, Taiwan

[4] Oak Ridge Natl Lab, Oak Ridge, TN USA

[5] Univ Manchester, Manchester, Lancs, England

[6] Univ Jaume 1, Castellon De La Plana, Spain

[7] Univ Tokyo, Tokyo, Japan

来源：

ACM TRANSACTIONS ON PARALLEL COMPUTING | 2020年 / 7卷 / 01期

关键词：

Sparse Matrix Vector Product (SpMV); irregular matrices; GPUs; MULTIPLICATION;

D O I：

10.1145/3380930

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.

引用

页数：26

共 29 条

[11] An Online Holistic Scheduling Framework for Energy-Constrained Wireless Real-Time Systems [J].

Chantem, Thidapat ;

Yi, Jun ;

Hong, Shengyan ;

Hu, X. Sharon ;

Poellabauer, Christian ;

Zhang, Liqiang .

2011 IEEE 17TH INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA 2011), VOL 1, 2011, :267-276

[12]

Chow E., 2016, 291 LAPACK

[13] Optimizing Sparse Matrix Operations on GPUs using Merge Path [J].

Dalton, Steven ;

Olson, Luke ;

Baxter, Sean ;

Merrill, Duane ;

Garland, Michael .

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, :407-416

[14] Estimating an eigenvector by the power method with a random start [J].

DelCorso, GM .

SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1997, 18 (04) :913-937

[15] Sparse Matrix-Vector Multiplication on GPGPUs [J].

Filippone, Salvatore ;

Cardellini, Valeria ;

Barbieri, Davide ;

Fanfarillo, Alessandro .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04)

[16] Overcoming Load Imbalance for Irregular Sparse Matrices [J].

Flegar, Goran ;

Anzt, Hartwig .

PROCEEDINGS OF IA3 2017: SEVENTH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS, 2017,

[17] Balanced CSR Sparse Matrix-Vector Product on Graphics Processors [J].

Flegar, Goran ;

Quintana-Orti, Enrique S. .

EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 :697-709

[18] A Note on Performance Profiles for Benchmarking Software [J].

Gould, Nicholas ;

Scott, Jennifer .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2016, 43 (02)

[19]

Grossman Max, 2016, ABS160800636 CORR

[20]

Higham D. J., 2005, MATLAB GUIDE, V150, DOI [10.1137/1.9780898717891, DOI 10.1137/1.9780898717891]

← 1 2 3 →