FPGA- Accelerated Transactional Execution of Graph Workloads

被引:21
作者
Ma, Xiaoyu [1 ]
Zhang, Dan [1 ]
Chiou, Derek [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
来源
FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS | 2017年
关键词
Graph Application; FPGA Accelerator; Transactional Memory; Throughput Compute; Multi-threaded Architecture;
D O I
10.1145/3020078.3021743
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many applications that operate on large graphs can be intuitively parallelized by executing a large number of the graph operations concurrently and as transactions to deal with potential conflicts. However, large numbers of operations occurring concurrently might incur too many conflicts that would negate the potential benefits of the parallelization which has probably made highly multi-threaded transactional machines seem impractical. Given the large size and topology of many modern graphs, however, such machines can provide real performance, energy efficiency, and programability benefits. This paper describes an architecture that consists of many lightweight multi-threaded processing engines, a global transactional shared memory, and a work scheduler. We present challenges of realizing such an architecture, especially the requirement of scalable conflict detection, and propose solutions. We also argue that despite increased transaction conflicts due to the higher concurrency and single-thread latency, scalable speedup over serial execution can be achieved. We implement the proposed architecture as a synthesizable FPGA RTL design and demonstrate improved per-socket performance (2X) and energy efficiency (22X) by comparing to a baseline platform that contains two Intel Haswell processors, each with 12 cores.
引用
收藏
页码:227 / 236
页数:10
相关论文
共 40 条
[1]  
Ahn J., 2015, P INT S COMP ARCH
[2]  
Alverson R., 1990, P 4 INT C SUP
[3]  
Ananian C. S., 2005, P INT S HIGH PERF CO
[4]  
[Anonymous], 2004, Proceedings of the 2004 SIAM international conference on data mining (SDM04)
[5]  
Betkaoui B., 2011, P IEEE INT C FIELD P
[6]  
Blundell C., 2007, P INT S COMP ARCH
[7]  
Bobba J., 2008, P INT S COMP ARCH
[8]  
Burtscher M., 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC 2012), P141, DOI 10.1109/IISWC.2012.6402918
[9]  
Cain H. W., 2013, P INT S COMP ARCH
[10]  
Casper J., 2011, P INT C ARCH SUPP PR