Optimization techniques for sparse matrix-vector multiplication on GPUs

被引:17
作者
Maggioni, Marco [1 ]
Berger-Wolf, Tanya [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, 851 S Morgan,Room 1120 SEO, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
SpMV; Optimization; GPU; Adaptive; AdELL; Blocking; Compression; Unrolling; Auto-tuning;
D O I
10.1016/j.jpdc.2016.03.011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse linear algebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real world applications. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:66 / 86
页数:21
相关论文
共 44 条
  • [1] [Anonymous], NVIDIAS NEXT GEN CUD
  • [2] [Anonymous], SIAM C DAT MIN
  • [3] [Anonymous], CUDA: Compute Unified Device Architecture
  • [4] Asanovic K., 2006, The landscape of parallel computing research: A view from berkeley
  • [5] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [6] Bardsley E, 2014, LECT NOTES COMPUT SC, V8430, P230, DOI 10.1007/978-3-319-06200-6_18
  • [7] Baskaran M.M., 2009, Optimizing sparse matrix-vector multiplication on gpus
  • [8] Baxter S., MODERN GPU CODE INTE
  • [9] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
  • [10] Boisvert RF, 1997, QUALITY OF NUMERICAL SOFTWARE - ASSESSMENT AND ENHANCEMENT, P125