Optimization techniques for sparse matrix-vector multiplication on GPUs

被引：17

作者：

Maggioni, Marco ^{[1
]}

Berger-Wolf, Tanya ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, 851 S Morgan,Room 1120 SEO, Chicago, IL 60607 USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2016年 / 93-94卷

基金：

美国国家科学基金会;

关键词：

SpMV; Optimization; GPU; Adaptive; AdELL; Blocking; Compression; Unrolling; Auto-tuning;

D O I：

10.1016/j.jpdc.2016.03.011

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse linear algebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real world applications. (C) 2016 Elsevier Inc. All rights reserved.

引用

页码：66 / 86

页数：21

共 44 条

[1] [Anonymous], NVIDIAS NEXT GEN CUD
[2] [Anonymous], SIAM C DAT MIN
[3] [Anonymous], CUDA: Compute Unified Device Architecture
[4] Asanovic K., 2006, The landscape of parallel computing research: A view from berkeley
[5] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Parthasarathy, Srinivasan
Sadayappan, P.
[J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
[6] Bardsley E, 2014, LECT NOTES COMPUT SC, V8430, P230, DOI 10.1007/978-3-319-06200-6_18
[7] Baskaran M.M., 2009, Optimizing sparse matrix-vector multiplication on gpus
[8] Baxter S., MODERN GPU CODE INTE
[9] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[10] Boisvert RF, 1997, QUALITY OF NUMERICAL SOFTWARE - ASSESSMENT AND ENHANCEMENT, P125

← 1 2 3 4 5 →