Compiling Graph Applications for GPUs with GraphIt

被引:11
作者
Brahmakshatriya, Ajay [1 ]
Zhang, Yunming [1 ]
Hong, Changwan [1 ]
Kamil, Shoaib [2 ]
Shun, Julian [1 ]
Amarasinghe, Saman [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Adobe Res, San Jose, CA USA
来源
CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO) | 2021年
关键词
Compiler Optimizations; Graph Processing; GPUs; Domain-Specific Languages; ALGORITHMS; ANALYTICS;
D O I
10.1109/CGO51591.2021.9370321
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all settings. To achieve high performance, the programmer must carefully select which set of optimizations and hardware platforms to use. The GraphIt programming language makes it easy for the programmer to write the algorithm once and optimize it for different inputs using a scheduling language. However, GraphIt currently has no support for generating high-performance code for GPUs. Programmers must resort to re-implementing the entire algorithm from scratch in a low-level language with an entirely different set of abstractions and optimizations in order to achieve high performance on GPUs. We propose G2, an extension to the GraphIt compiler framework, that achieves high performance on both CPUs and GPUs using the same algorithm specification. G2 significantly expands the optimization space of GPU graph processing frameworks with a novel GPU scheduling language and compiler that enables combining load balancing, edge traversal direction, active vertexset creation, active vertexset processing ordering, and kernel fusion optimizations. G2 also introduces two performance optimizations, Edge-based Thread Warps CTAs load balancing (ETWC) and EdgeBlocking, to expand the optimization space for GPUs. ETWC improves load balancing by dynamically partitioning the edges of each vertex into blocks that are assigned to threads, warps, and CTAs for execution. EdgeBlocking improves the locality of the program by reordering the edges and restricting random memory accesses to fit within the L2 cache. We evaluate G2 on 5 algorithms and 9 input graphs on both Pascal and Volta generation NVIDIA GPUs, and show that it achieves up to 5.11x speedup over state-of-the-art GPU graph processing frameworks, and is the fastest on 66 out of the 90 experiments.
引用
收藏
页码:248 / 261
页数:14
相关论文
共 50 条
  • [31] A representation for the modules of a graph and applications
    Klein, Sulamita
    Szwarcfiter, Jaime L.
    Journal of the Brazilian Computer Society, 2003, 9 (01) : 9 - 16
  • [32] EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal in GPUs
    Min, Seung Won
    Mailthody, Vikram Sharma
    Qureshi, Zaid
    Xiong, Jinjun
    Ebrahimi, Eiman
    Hwu, Wen-mei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (02): : 114 - 127
  • [33] DiGraph: An Efficient Path-based Iterative Directed Graph Processing System on Multiple GPUs
    Zhang, Yu
    Liao, Xiaofei
    Jin, Hai
    He, Bingsheng
    Liu, Haikun
    Gu, Lin
    TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 601 - 614
  • [34] OpenMP's Asynchronous Offloading for All-pairs Shortest Path Graph Algorithms on GPUs
    Thavappiragasam, Mathialakan
    Kale, Vivek
    2022 IEEE/ACM INTERNATIONAL WORKSHOP ON HIERARCHICAL PARALLELISM FOR EXASCALE COMPUTING (HIPAR), 2022, : 1 - 11
  • [35] TC-Stream: Large-Scale Graph Triangle Counting on a Single Machine Using GPUs
    Huang, Jianqiang
    Wang, Haojie
    Fei, Xiang
    Wang, Xiaoying
    Chen, Wenguang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3067 - 3078
  • [36] Graph Summarization Methods and Applications: A Survey
    Liu, Yike
    Safavi, Tara
    Dighe, Abhilash
    Koutra, Danai
    ACM COMPUTING SURVEYS, 2018, 51 (03)
  • [37] Technical Survey Graph Databases and Applications
    Wang, Shao-Ting
    Jin, Jennifer
    Rivett, Pete
    Kitazawa, Atsushi
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2015, 9 (04) : 523 - 545
  • [38] Distributed evolutionary optimization using Nash games and GPUs - Applications to CFD design problems
    Leskinen, Jyri
    Periaux, Jacques
    COMPUTERS & FLUIDS, 2013, 80 : 190 - 201
  • [39] A novel geometric graph miner and its applications
    Munoz-Briseno, Alfredo
    Lara-Alvarez, Gustavo
    Gago-Alonso, Andres
    Hernandez-Palancar, Jose
    PATTERN RECOGNITION LETTERS, 2016, 84 : 208 - 214
  • [40] Using MAST for modeling and response-time analysis of real-time applications with GPUs
    Gomez, Iosu
    de Cerio, Unai Diaz
    Parra, Jorge
    Rivas, Juan M.
    Gutierrez, J. Javier
    Harbour, Michael Gonzalez
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 157