Exploring Fine-Grained Task-based Execution on Multi-GPU Systems

被引:16
作者
Chen, Long [1 ]
Villa, Oreste [2 ]
Gao, Guang R. [3 ]
机构
[1] Qualcomm Inc, San Diego, CA 92121 USA
[2] Pacific NW Natl Lab, Richland, WA 99352 USA
[3] Univ Delaware, Newark, DE 19716 USA
来源
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2011年
关键词
fine-grained; task; GPGPU; multi-GPU; dynamic load balance; VISUALIZATION;
D O I
10.1109/CLUSTER.2011.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e. g., CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine-grained computation with communication, etc. In this paper, we present a fine-grained task-based execution framework for multi-GPU systems. By scheduling finer-grained tasks than what is supported in the conventional CUDA programming method among multiple GPUs, and allowing concurrent task execution on a single GPU, our framework provides means for solving the above issues and efficiently utilizing multi-GPU systems. Experiments with a molecular dynamics application show that, for non-uniform distributed workload, the solutions based on our framework achieve good load balance, and considerable performance improvement over other solutions based on the standard CUDA programming methodologies.
引用
收藏
页码:386 / 394
页数:9
相关论文
共 22 条
[1]  
[Anonymous], 2004, Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page, DOI DOI 10.1109/SC.2004.26
[2]  
[Anonymous], 2008, SC 08 P 2008 ACMIEEE
[3]  
[Anonymous], OPENCL
[4]  
[Anonymous], 2009, PARALLEL DISTRIBUTED
[5]  
Augonnet C, 2009, LECT NOTES COMPUT SC, V5704, P863, DOI 10.1007/978-3-642-03869-3_80
[6]  
Brooks B.R., 1992, Chemical Design Automation News (CDA News), V7, P16
[7]  
Chen L., 2010, IPDPS 10 ATL GA US
[8]  
Clark T., 1991, SIAM PP 91 MARCH, P338
[9]   Zippy: A framework for computation and visualization on a GPU cluster [J].
Fan, Zhe ;
Qiu, Feng ;
Kaufman, Arie E. .
COMPUTER GRAPHICS FORUM, 2008, 27 (02) :341-350
[10]  
Frenkel D., 1996, Understanding Molecular Simulation: from Algorithms to Applications