Auto-tuning of Sparse Matrix-Vector Multiplication on Graphics Processors

被引:0
作者
Abu-Sufah, Walid [1 ,2 ]
Karim, Asma Abdel [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Jordan, Amman, Jordan
来源
SUPERCOMPUTING (ISC 2013) | 2013年 / 7905卷
基金
欧盟第七框架计划;
关键词
SpMV; GPUs; Auto-tuning; sparse linear algebra; CUDA;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a heuristics-based auto-tuner for sparse matrixvector multiplication (SpMV) on GPUs. For a given sparse matrix, our framework delivers a high performance SpMV kernel which combines the use of the most effective storage format and tuned parameters of the corresponding code targeting the underlying GPU architecture. 250 matrices from 23 application areas are used to develop heuristics which prune the auto-tuning search space. For performance evaluation, we use 59 matrices from 12 application areas and different NVIDIA GPUs. The maximum speedup of our framework delivered kernels over NVIDIA library kernels is 7x. For most matrices, the performance of the kernels delivered by our framework is within 1% of the kernels found using exhaustive search. Compared to exhaustive search auto-tuning, our framework can be more than one order of magnitude faster.
引用
收藏
页码:151 / 164
页数:14
相关论文
共 19 条
[1]   An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units [J].
Abu-Sufah, Walid ;
Karim, Asma Abdel .
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, :453-460
[2]  
[Anonymous], 2011, P 4 WORKSH GEN PURP
[3]  
[Anonymous], 2003, CITESEERX
[4]  
[Anonymous], 2012, NVIDIA CUDA C Programming Guide
[5]  
[Anonymous], THESIS
[6]  
[Anonymous], J PHYS C SERIES
[7]  
[Anonymous], P WORKSH PAR PROGR A
[8]  
[Anonymous], 2012, Inequity in the Technopolis: Race, Class, Gender, and the Digital Divide in Austin
[9]  
Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[10]  
Choi J., 2010, P 15 ACM SIGPLAN S P, P37, DOI DOI 10.1145/1693453/1693471