GPUrpc: Exploring Transparent Access to Remote GPUs

被引:1
作者
Iida, Yuki [1 ]
Fujii, Yusuke [1 ]
Azumi, Takuya [2 ,4 ]
Nishio, Nobuhiko [1 ]
Kato, Shinpei [3 ,5 ]
机构
[1] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, 1-1-1 Noji Higashi, Kusatsu, Shiga 5258577, Japan
[2] Osaka Univ, Grad Sch Engn Sci, Toyonaka, Osaka, Japan
[3] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi, Japan
[4] Osaka Univ, Sch Informat Sci & Engn, 1-3 Machikaneyama Cho, Toyonaka, Osaka 5608531, Japan
[5] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648603, Japan
关键词
GPU; parallel computing; cloud computing; distributed computing; high performance computing; PERFORMANCE;
D O I
10.1145/2950056
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics processing units (GPUs) are increasingly used for high-performance computing. Programming frameworks for general-purpose computing on GPUs (GPGPU), such as CUDA and OpenCL, are also maturing. Driving this trend is the recent proliferation of mobile devices such as smartphones and wearable computers. These devices are increasingly incorporating computationally intensive applications that involve some form of environmental recognition such as augmented reality (AR) or voice recognition. However, devices with low computational power cannot satisfy such demanding computing requirements. The CPU load of these devices could be reduced by offloading computation onto GPUs on the cloud. This paper presents GPUrpc, a remote procedure call (RPC) extension to Gdev, which is a rich set of runtime libraries and device drivers for achieving first-class GPU resource management. GPUrpc allows developers to use CUDA for GPGPU development work. Existing research uses RPCs based on the CUDA application programming interfaces (APIs); hence, all CUDA APIs require communication. To reduce communication overhead, we use an RPC based on a low-level API than CUDA API and reduced API that does not require communication. Our evaluation conducted on Linux and NVIDIA GPUs shows that the basic performance of our prototype implementation is reliable in comparison with the existing method. Evaluation using the Rodinia benchmark suite designed for research in heterogeneous parallel computing showed that GPUrpc is effective for applications such as image processing and data mining. GPUrpc also can improve power consumption to approximately 1/6 that of CPU processing for performing 512 x 512 matrix multiplication.
引用
收藏
页数:25
相关论文
共 24 条
  • [1] Alerstam E., 2008, J BIOMEDICAL OPTICS, V13
  • [2] [Anonymous], FUTURE COMPUTING 201
  • [3] [Anonymous], 2013, OPENCL
  • [4] [Anonymous], 2011, P 9 INT C MOB SYST A
  • [5] [Anonymous], 2015, CUDA C PROGRAMMING G
  • [6] A performance study of general-purpose applications on graphics processors using CUDA
    Che, Shuai
    Boyer, Michael
    Meng, Jiayuan
    Tarjan, David
    Sheaffer, Jeremy W.
    Skadron, Kevin
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (10) : 1370 - 1380
  • [7] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
  • [8] Cuervo E., 2010, P 8 INT C MOB SYST A, P49, DOI [DOI 10.1145/1814433.1814441, 10.1145/1814433.1814441]
  • [9] Duato Jose, 2010, 2010 International Conference on High Performance Computing & Simulation (HPCS 2010), P224, DOI 10.1109/HPCS.2010.5547126
  • [10] Duato Jose, 2009, Euro-Par 2009 Parallel Processing Workshops. HPPC, HeteroPar, PROPER, ROIA, UNICORE, VHPC. Revised Selected Papers, P385