Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform

被引:18
作者
Chen, Ren [1 ]
Prasanna, Viktor K. [1 ]
机构
[1] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA
来源
2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) | 2016年
关键词
Keywords Database Operation; Heterogeneous Platform; Hardware Acceleration; CPU-FPGA; Sorting; Join; Selection;
D O I
10.1109/FCCM.2016.62
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accelerating database applications using FPGAs has recently been an area of growing interest in both academia and industry. Equi-join is one of the key database operations whose performance highly depends on sorting, which exhibits high memory usage on FPGA. A fully pipelined N-key merge sorter consists of log N sorting stages using 0(N) memory totally. For large data sets, external memory has to be employed to perform data buffering between the sorting stages. This introduces pipeline stalls as well as several iterations between FPGA and external memory, causing significant performance degradation. In this paper, we speed-up equi-join using a hybrid CPU-FPGA heterogeneous platform. To alleviate the performance impact of limited memory, we propose a merge sort based hybrid design where the first few sorting stages in the merge sort tree are replaced with "folded" bitonic sorting networks. These "folded" bitonic sorting networks operate in parallel on the FPGA. The partial results are then merged on the CPU to produce the final sorted result. Based on this hybrid sorting design, we develop two streaming join algorithms by optimizing the classic CPU-based nested-loop join and sort-merge join algorithms. On a range of data set sizes, our design achieves throughput improvement of 3.1x and 1.9x compared with software-only and FPGA only implementations, respectively. Our design sustains 21.6% of the peak bandwidth, which is 3.9x utilization obtained by the stateof-the-art FPGA equi-join implementation.
引用
收藏
页码:212 / 219
页数:8
相关论文
共 16 条
[1]  
[Anonymous], 2009, PROC VLDB ENDOW
[2]  
[Anonymous], 2015, P 2015 ACM SIGDA INT
[3]  
Batcher K.E., 1968, P AFIPS SPRING JOINT, P307, DOI DOI 10.1145/1468075.1468121
[4]  
Blanas S., 2013, P 4 ANN S CLOUD COMP
[5]   STORAGE AND ACCESS IN RELATIONAL DATA-BASES [J].
BLASGEN, MW ;
ESWARAN, KP .
IBM SYSTEMS JOURNAL, 1977, 16 (04) :363-377
[6]  
Casper J., 2014, P ACM SIGDA FPGA
[7]  
Chen Ren., 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC), P1, DOI DOI 10.1109/HPEC.2013.6670343
[8]   A Cloud Task Scheduling Algorithm Based on Users' Satisfaction [J].
Chen, Rongxian ;
Zhang, Yaying ;
Zhang, Dongdong .
2013 FOURTH INTERNATIONAL CONFERENCE ON NETWORKING AND DISTRIBUTED COMPUTING (ICNDC), 2013, :1-5
[9]  
Kaldewey T., 2012, DaMoN, DOI DOI 10.1145/2236584.2236592
[10]   Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs [J].
Kim, Changkyu ;
Sedlar, Eric ;
Chhugani, Jatin ;
Kaldewey, Tim ;
Nguyen, Anthony D. ;
Di Bias, Andrea ;
Lee, Victor W. ;
Satish, Nadathur ;
Dubey, Pradeep .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02) :1378-1389