TDGraph: A Topology-Driven Accelerator for High-Performance Streaming Graph Processing

被引:17
作者
Zhao, Jin [1 ]
Yang, Yun [1 ]
Zhang, Yu [1 ]
Liao, Xiaofei [1 ]
Gu, Lin [1 ]
He, Ligang [2 ]
He, Bingsheng [3 ]
Jin, Hai [1 ]
Liu, Haikun [1 ]
Jiang, Xinyu [1 ]
Yu, Hui [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Natl Engn Res Ctr Big Dat, Wuhan, Peoples R China
[2] Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England
[3] Natl Univ Singapore, Singapore, Singapore
来源
PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22) | 2022年
基金
中国国家自然科学基金;
关键词
streaming graphs; incremental computation; state propagation; accelerator; many-core processor;
D O I
10.1145/3470496.3527409
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many solutions have been recently proposed to support the processing of streaming graphs. However, for the processing of each graph snapshot of a streaming graph, the newstates of the vertices affected by the graph updates are propagated irregularly along the graph topology. Despite the years' research efforts, existing approaches still suffer from the serious problems of redundant computation overhead and irregular memory access, which severely underutilizes a many-core processor. To address these issues, this paper proposes a topology-driven programmable accelerator TDGraph, which is the first accelerator to augment the many-core processors to achieve high performance processing of streaming graphs. Specifically, we propose an efficient topology-driven incremental execution approach into the accelerator design for more regular state propagation and better data locality. TDGraph takes the vertices affected by graph updates as the roots to prefetch other vertices along the graph topology and synchronizes the incremental computations of them on the fly. In this way, most state propagations originated from multiple vertices affected by different graph updates can be conducted together along the graph topology, which help reduce the redundant computations and data access cost. Besides, through the efficient coalescing of the accesses to vertex states, TDGraph further improves the utilization of the cache and memory bandwidth. We have evaluated TDGraph on a simulated 64-core processor. The results show that, the state-of-the-art software system achieves the speedup of 7.1 similar to 21.4 times after integrating with TDGraph, while incurring only 0.73% area cost. Compared with four cutting-edge accelerators, i.e., HATS, Minnow, PHI, and DepGraph, TDGraph gains the speedups of 4.6 similar to 12.7, 3.2 similar to 8.6, 3.8 similar to 9.7, and 2.3 similar to 6.1 times, respectively.
引用
收藏
页码:116 / 129
页数:14
相关论文
共 76 条
[1]   A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing [J].
Ahn, Junwhan ;
Hong, Sungpack ;
Yoo, Sungjoo ;
Mutlu, Onur ;
Choi, Kiyoung .
2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :105-117
[2]   Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective [J].
Ainsworth, Sam ;
Jones, Timothy M. .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2019, 36 (03)
[3]  
Ainsworth S, 2018, ACM SIGPLAN NOTICES, V53, P578, DOI [10.1145/3173162.3173189, 10.1145/3296957.3173189]
[4]  
Ainsworth Sam, 2016, P 2016 INT C SUP ICS, DOI [10.1145/2925426.2926254, DOI 10.1145/2925426.2926254, 10.1145/ 2925426.2926254]
[5]  
[Anonymous], 2022, SNAP
[6]  
[Anonymous], 2013, ISCA
[7]  
[Anonymous], 2012, P 7 ACM EUR C COMP S
[8]  
[Anonymous], 2022, MACSIM
[9]  
[Anonymous], DDR4 SDRAM SYSTEM PO
[10]   Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses [J].
Asiatici, Mikhail ;
Ienne, Paolo .
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :609-622