A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs

被引:26
作者
Meng, Ke [1 ,2 ]
Li, Jiajia [3 ]
Tan, Guangming [1 ,2 ]
Sun, Ninghui [1 ,2 ]
机构
[1] Chinese Acad Sci, State Key Lab Comp Architecture, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Pacific Northwest Natl Lab, Richland, WA 99352 USA
来源
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19) | 2019年
基金
中国国家自然科学基金;
关键词
GPU; Graph processing; Auto-tuning; IMPLEMENTATION; DRIVEN; MODEL;
D O I
10.1145/3293883.3295716
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper proposes GSWITCH, a pattern-based algorithmic auto-tuning system that dynamically switches between optimization variants with negligible overhead. Its novelty lies in a small set of algorithmic patterns that allow for the configurable assembly of variants of the algorithm. The fast transition of GSWITCH is based on a machine learning model trained using 644 real graphs. Moreover, GSWITCH provides a simple programming interface that conceals low-level tuning details from the user. We evaluate GSWITCH on typical graph algorithms (BFS, CC, PR, SSSP, and BC) using Nvidia Kepler and Pascal GPUs. The results show that GSWITCH runs up to 10x faster than the best configuration of the state-ofthe-art programmable GPU-based graph processing libraries on 10 representative graphs. GSWITCH outperforms Gunrock on 92.4% cases of 644 graphs which is the largest dataset evaluation reported to date.
引用
收藏
页码:201 / 213
页数:13
相关论文
共 64 条
  • [11] Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
    Choi, Jee W.
    Singh, Amik
    Vuduc, Richard W.
    [J]. ACM SIGPLAN NOTICES, 2010, 45 (05) : 115 - 125
  • [12] Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths
    Davidson, Andrew
    Baxter, Sean
    Garland, Michael
    Owens, John D.
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [13] Ding YF, 2015, ACM SIGPLAN NOTICES, V50, P379, DOI [10.1145/2737924.2737969, 10.1145/2813885.2737969]
  • [14] Domke J., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P616, DOI 10.1109/IPDPS.2011.65
  • [15] The design and implementation of FFTW3
    Frigo, M
    Johnson, SG
    [J]. PROCEEDINGS OF THE IEEE, 2005, 93 (02) : 216 - 231
  • [16] Fu Z., 2014, P WORKSH GRAPH DAT M, P1, DOI [DOI 10.1145/2621934.2621936, 10.1145/2621934.2621936]
  • [17] Gadepally V, 2016, BIG DATA: STORAGE, SHARING, AND SECURITY, P15
  • [18] Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
    Georganas, Evangelos
    Buluc, Aydin
    Chapman, Jarrod
    Oliker, Leonid
    Rokhsar, Daniel
    Yelick, Katherine
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 437 - 448
  • [19] Gharaibeh A, 2012, INT CONFER PARA, P345
  • [20] Gonzalez Joseph E., 2012, P OSDI 2012