Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

被引：6

作者：

Steuwer, Michel ^{[1
]}

Friese, Malte ^{[1
]}

Albers, Sebastian ^{[1
]}

Gorlatch, Sergei ^{[1
]}

机构：

[1] Univ Munster, Dept Math & Comp Sci, D-48149 Munster, Germany

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2014年 / 42卷 / 04期

关键词：

High-level programming models; Algorithmic skeletons; GPU computing; Allpairs computation; SkelCL;

D O I：

10.1007/s10766-013-0265-6

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism and provide their efficient implementations, allowing the application developer to focus on the structure of algorithms, rather than on implementation details. This becomes especially important for modern parallel systems with multiple graphics processing units (GPUs) whose programming is complex and error-prone, because state-of-the-art programming approaches like CUDA and OpenCL lack high-level abstractions. We define a new algorithmic skeleton for allpairs computations which occur in real-world applications, ranging from bioinformatics to physics. We develop the skeleton's generic parallel implementation for multi-GPU Systems in OpenCL. To enable the automatic use of the fast GPU memory, we identify and implement an optimized version of the allpairs skeleton with a customizing function that follows a certain memory access pattern. We use matrix multiplication as an application study for the allpairs skeleton and its two implementations and demonstrate that the skeleton greatly simplifies programming, saving up to 90 % of lines of code as compared to OpenCL. The performance of our optimized implementation is up to 6.8 times higher as compared with the generic implementation and is competitive to the performance of a manually written optimized OpenCL code.

引用

页码：601 / 618

页数：18

共 16 条

[1] [Anonymous], 2016, Programming massively parallel processors: a hands-on approach
[2] [Anonymous], 2012, NVIDIA CUDA C Programming Guide
[3] Arora Nitin, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P379, DOI 10.1109/ICPP.2009.71
[4] Compute pairwise Manhattan distance and Pearson correlation coefficient of data points with GPU
Chang, Dar-Jen
Desoky, Ahmed H.
Ouyang, Ming
Rouchka, Eric C.
[J]. SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 501 - 506
[5] Honeycomb rectangular disks
Teng, YH
Tan, JJM
Hsu, LH
[J]. PARALLEL COMPUTING, 2005, 31 (3-4) : 371 - 388
[6] Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data
Daub, CO
Steuer, R
Selbig, J
Kloska, S
[J]. BMC BIOINFORMATICS, 2004, 5 (1)
[7] Enmyren J, 2010, HLPP 2010: PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING AND APPLICATIONS, P5
[8] Algorithmic skeletons for multi-core, multi-GPU systems and clusters
Ernsting, Steffen
Kuchen, Herbert
[J]. International Journal of High Performance Computing and Networking, 2012, 7 (02) : 129 - 138
[9] A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers
Gonzalez-Velez, Horacio
Leyton, Mario
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 2010, 40 (12) : 1135 - 1160
[10] Gorlatch S, 2011, ENCY PARALLEL COMPUT, P1417, DOI [10.1007/978-0-387-09766-4_24, DOI 10.1007/978-0-387-09766-4_24, 10.1007/978-0- 387-09766-4_24]

← 1 2 →