GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

被引:4
作者
Ino, Fumihiko [1 ]
Nakagawa, Shinta [2 ]
Hagihara, Kenichi [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka 5650871, Japan
[2] NEC Corp Ltd, Storage Div, Fuchu, Tokyo 1838501, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2013年 / E96D卷 / 12期
关键词
stream processing; GPGPU; CUDA; task scheduling; GRAPHICS;
D O I
10.1587/transinf.E96.D.2604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
引用
收藏
页码:2604 / 2616
页数:13
相关论文
共 29 条
[1]  
[Anonymous], 2012, NVIDIAS NEXT GEN CUD
[2]  
[Anonymous], 2000, Parallel Programming in OpenMP
[3]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[4]  
Bhat V, 2007, CLUSTER COMPUT, V10, P365, DOI 10.1007/s10586-007-0023-x
[5]   Comparison of scheduling rules in a flow shop with multiple processors: A simulation [J].
Brah, SA ;
Wheeler, GE .
SIMULATION, 1998, 71 (05) :302-311
[6]  
Chen L., 2010, P 24 IEEE INT PAR DI
[7]  
Diamos G., 2010, P 24 IEEE INT PAR DI
[8]  
Hagiescu A., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P467, DOI 10.1109/IPDPS.2011.52
[9]  
Hormati A, 2011, ACM SIGPLAN NOTICES, V46, P381, DOI [10.1145/1961295.1950409, 10.1145/1961296.1950409]
[10]   Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems [J].
Huynh, Huynh Phung ;
Hagiescu, Andrei ;
Wong, Weng-Fai ;
Goh, Rick Siow Mong .
ACM SIGPLAN NOTICES, 2012, 47 (08) :1-10