Providing Source Code Level Portability Between CPU and GPU with MapCG

被引：7

作者：

Hong, Chun-Tao ^{[1
]}

Chen, De-Hao ^{[1
]}

Chen, Yu-Bei ^{[2
]}

Chen, Wen-Guang ^{[1
]}

Zheng, Wei-Min ^{[1
]}

Lin, Hai-Bo ^{[3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[3] IBM China Res Lab, Beijing 100094, Peoples R China

来源：

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY | 2012年 / 27卷 / 01期

基金：

中国国家自然科学基金;

关键词：

portability; parallel; GPU programming;

D O I：

10.1007/s11390-012-1205-4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture. This results in high development and maintenance costs. We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs, as well as different GPUs. This would allow programmers to write one version of the code, which can be compiled and executed on either CPUs or GPUs efficiently without modification. In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPUs and GPUs. In contrast to other approaches such as OpenCL, our framework, based on MapReduce, provides a high level programming model and makes programming much easier. We describe the design of MapCG, including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU. A prototype of the MapCG runtime, supporting multi-core CPUs and NVIDIA GPUs, was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average speedup of 1.6 similar to 2.5x over previous implementations of MapReduce on eight commonly used applications.

引用

页码：42 / 56

页数：15

共 28 条

[1]

[Anonymous], 2007, NVidia CUDA Compute Unified Device Architecture: Programming Guide

[2]

Berger ED, 2000, ACM SIGPLAN NOTICES, V35, P117, DOI 10.1145/384264.379232

[3]

BRADSKI G., 2007, NIPS, P281

[4] Brook for GPUs: Stream computing on graphics hardware [J].

Buck, I ;

Foley, T ;

Horn, D ;

Sugerman, J ;

Fatahalian, K ;

Houston, M ;

Hanrahan, P .

ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03) :777-786

[5] Tiled-MapReduce: Optimizing Resource Usages of Data-parallel Applications on Multicore with Tiling [J].

Chen, Rong ;

Chen, Haibo ;

Zang, Binyu .

PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, :523-534

[6]

Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137

[7]

Dice Dave, 2002, INT S MEM MAN ISMM, P163

[8] Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture [J].

Eichenberger, AE ;

O'Brien, JK ;

O'Brien, KM ;

Wu, P ;

Chen, T ;

Oden, PH ;

Prener, DA ;

Shepherd, JC ;

So, B ;

Sura, Z ;

Wang, A ;

Zhang, T ;

Zhao, P ;

Gschwind, MK ;

Archambault, R ;

Gao, Y ;

Koo, R .

IBM SYSTEMS JOURNAL, 2006, 45 (01) :59-84

[9]

Ekanayake J., 2008, eScience, P277, DOI DOI 10.1109/ESCIENCE.2008.59

[10] Mars: Accelerating MapReduce with Graphics Processors [J].

Fang, Wenbin ;

He, Bingsheng ;

Luo, Qiong ;

Govindaraju, Naga K. .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (04) :608-620

← 1 2 3 →