A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

被引：26

作者：

Meng, Ke ^{[1
,2
]}

Li, Jiajia ^{[3
]}

Tan, Guangming ^{[1
,2
]}

Sun, Ninghui ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, State Key Lab Comp Architecture, Inst Comp Technol, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Pacific Northwest Natl Lab, Richland, WA 99352 USA

来源：

PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19) | 2019年

基金：

中国国家自然科学基金;

关键词：

GPU; Graph processing; Auto-tuning; IMPLEMENTATION; DRIVEN; MODEL;

D O I：

10.1145/3293883.3295716

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper proposes GSWITCH, a pattern-based algorithmic auto-tuning system that dynamically switches between optimization variants with negligible overhead. Its novelty lies in a small set of algorithmic patterns that allow for the configurable assembly of variants of the algorithm. The fast transition of GSWITCH is based on a machine learning model trained using 644 real graphs. Moreover, GSWITCH provides a simple programming interface that conceals low-level tuning details from the user. We evaluate GSWITCH on typical graph algorithms (BFS, CC, PR, SSSP, and BC) using Nvidia Kepler and Pascal GPUs. The results show that GSWITCH runs up to 10x faster than the best configuration of the state-ofthe-art programmable GPU-based graph processing libraries on 10 representative graphs. GSWITCH outperforms Gunrock on 92.4% cases of 644 graphs which is the largest dataset evaluation reported to date.

引用

页码：201 / 213

页数：13

共 64 条

[11] Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
Choi, Jee W.
Singh, Amik
Vuduc, Richard W.
[J]. ACM SIGPLAN NOTICES, 2010, 45 (05) : 115 - 125
[12] Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths
Davidson, Andrew
Baxter, Sean
Garland, Michael
Owens, John D.
[J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[13] Ding YF, 2015, ACM SIGPLAN NOTICES, V50, P379, DOI [10.1145/2737924.2737969, 10.1145/2813885.2737969]
[14] Domke J., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P616, DOI 10.1109/IPDPS.2011.65
[15] The design and implementation of FFTW3
Frigo, M
Johnson, SG
[J]. PROCEEDINGS OF THE IEEE, 2005, 93 (02) : 216 - 231
[16] Fu Z., 2014, P WORKSH GRAPH DAT M, P1, DOI [DOI 10.1145/2621934.2621936, 10.1145/2621934.2621936]
[17] Gadepally V, 2016, BIG DATA: STORAGE, SHARING, AND SECURITY, P15
[18] Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
Georganas, Evangelos
Buluc, Aydin
Chapman, Jarrod
Oliker, Leonid
Rokhsar, Daniel
Yelick, Katherine
[J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 437 - 448
[19] Gharaibeh A, 2012, INT CONFER PARA, P345
[20] Gonzalez Joseph E., 2012, P OSDI 2012

← 1 2 3 4 5 6 7 →