ScalaGraph: A Scalable Accelerator for Massively Parallel Graph Processing

被引:17
作者
Yao, Pengcheng [1 ]
Zheng, Long [1 ]
Huang, Yu [1 ]
Wang, Qinggang [1 ]
Gui, Chuangyi [1 ]
Zeng, Zhen [1 ]
Liao, Xiaofei [1 ]
Jin, Hai [1 ]
Xue, Jingling [2 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab,Cluster & Grid Comp, Wuhan 430074, Peoples R China
[2] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
来源
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022) | 2022年
基金
中国国家自然科学基金;
关键词
graph processing; accelerator; scalability;
D O I
10.1109/HPCA53966.2022.00023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph processing is promising to extract valuable insights in graphs. Nowadays, emerging 3D-stacked memories and silicon technologies can provide over terabytes per second memory bandwidth and thousands of processing elements (PEs) to meet the high hardware demand of graph applications. However, this leap in hardware capability does not result in a huge increase but even a degradation sometimes in performance for graph processing. In this paper, we discover that the centralized on-chip memory hierarchy adopted in existing graph accelerators is the villain causing poor scalability due to its quadratic increase of hardware overheads with respect to the number of PEs. We present a novel distributed on-chip memory hierarchy by leveraging the network-on-chip (NoC) to enable massively parallel graph processing. We architect ScalaGraph, a brand new graph processing accelerator, to exploit this insight. ScalaGraph adopts a software-hardware co-design to minimize NoC communication overheads via an efficient row-oriented dataflow mapping and runtime aggregation. A specialized scheduling mechanism is also proposed to improve load imbalance. Our results on a Xilinx Alveo U280 FPGA card show that ScalaGraph on a modest configuration of 512 PEs achieves 2.2x and 3.2x speedups over a state-of-the-art graph accelerator GraphDyns and a GPU-based graph system Gunrock, respectively. Moreover, ScalaGraph enables supporting at least 1,024 PEs with nearly linear performance scaling while GraphDyns fails to work.
引用
收藏
页码:199 / 212
页数:14
相关论文
共 58 条
[1]   Chronos: Efficient Speculative Parallelism for Accelerators [J].
Abeydeera, Maleen ;
Sanchez, Daniel .
TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, :1247-1262
[2]   Graph-based methods for analysing networks in cell biology [J].
Aittokallio, Tero ;
Schwikowski, Benno .
BRIEFINGS IN BIOINFORMATICS, 2006, 7 (03) :243-255
[3]   Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads [J].
Basak, Abanti ;
Li, Shuangchen ;
Hu, Xing ;
Oh, Sang Min ;
Xie, Xinfeng ;
Zhao, Li ;
Jiang, Xiaowei ;
Xie, Yuan .
2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, :373-386
[4]  
Beamer S, 2012, INT CONF HIGH PERFOR
[5]   Reducing Pagerank Communication via Propagation Blocking [J].
Beamer, Scott ;
Asanovic, Krste ;
Patterson, David .
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, :820-831
[6]   Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server [J].
Beamer, Scott ;
Asanovic, Krste ;
Patterson, David .
2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, :56-65
[7]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[8]  
Chow G.C., 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL), P1
[9]   DYNAMIC LOAD BALANCING FOR DISTRIBUTED MEMORY MULTIPROCESSORS [J].
CYBENKO, G .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1989, 7 (02) :279-301
[10]   PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators [J].
Dadu, Vidushi ;
Liu, Sihao ;
Nowatzki, Tony .
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :595-608