Large-scale Distributed Sorting for GPU-based Heterogeneous Supercomputers

被引：0

作者：

Shamoto, Hideyuki ^{[1
,2
]}

Shirahata, Koichi ^{[1
,2
]}

Drozd, Aleksandr ^{[1
,2
]}

Sato, Hitoshi ^{[1
,2
]}

Matsuoka, Satoshi ^{[1
,2
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Japan Sci & Technol Agcy, CREST, Tokyo, Japan

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2014年

关键词：

Sorting; Distributed Systems; GPGPU; Hybrid Programming; Big Data Applications;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms on large-scale heterogeneous GPU-based systems remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. We also handle GPU memory overflows by introducing an iterative approach which sorts multiple chunks and merges them into one array. We evaluate the performance of our implementation with local sort acceleration on the TSUBAME2.5 supercomputer that comprises over 4000 NVIDIA K20x GPUs. Performance evaluation of weak scaling shows that we achieve 389 times speedup with 0.25TB/s throughput when sorting 4TB 64bit integer on 1024 nodes compared to running on 1 node; on the other hand, for CPU vs. GPU comparison, our implementation achieves only 1.40 times speedup using 1024 nodes. Detailed analysis however reveals that the limitation is almost entirely due to the bottleneck in CPU-GPU host-to-device bandwidth. With orders of magnitude improvements planned for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.

引用

页码：510 / 518

页数：9

共 25 条

[1]

[Anonymous], PAR DISTR PROC WORKS

[2]

[Anonymous], P 3 SPAA

[3]

[Anonymous], 2011, GPU Computing Gems

[4]

[Anonymous], 2012, P 26 ACM INT C SUP, DOI 10.1145/2304576.2304621

[5]

Batcher Kenneth E., 1968, P APRIL 30 MAY2 1968, V32, P307, DOI [DOI 10.1145/1468075.1468121, 10.1145/1468075.1468121]

[6]

Cederman D, 2008, LECT NOTES COMPUT SC, V5193, P246, DOI 10.1007/978-3-540-87744-8_21

[7]

CHENG DR, 2006, ALENEX UNPUB

[8]

Davidson A., 2012, 2012 INNOVATIVE PARA, P1, DOI DOI 10.1109/INPAR.2012.6339592

[9]

Hanrahan P., 2006, P 2006 ACM SIGMOD IN, P721, DOI DOI 10.1145/1142473.1142560

[10]

JEON M, 2006, HIGH PERFORMANCE COM, P25

← 1 2 3 →