Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

被引：19

作者：

Mudalige, G. R. ^{[1
]}

Giles, M. B. ^{[1
]}

Thiyagalingam, J. ^{[1
]}

Reguly, I. Z. ^{[1
]}

Bertolli, C. ^{[2
]}

Kelly, P. H. J. ^{[3
]}

Trefethen, A. E. ^{[1
]}

机构：

[1] Univ Oxford, Oxford E Res Ctr, Oxford OX1 3QG, England

[2] IBM TJ Watson Res Ctr, New York, NY USA

[3] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London, England

来源：

PARALLEL COMPUTING | 2013年 / 39卷 / 11期

基金：

英国工程与自然科学研究理事会;

关键词：

OP2; Domain specific language; Active library; Unstructured mesh; GPU; Heterogeneous systems;

D O I：

10.1016/j.parco.2013.09.004

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2's recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and CPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer. (C) 2013 Elsevier B.V. All rights reserved.

引用

页码：669 / 692

页数：24

共 40 条

[1]

[Anonymous], 2013, WHAT IS GPU COMPUTIN

[2]

[Anonymous], 2013, P 18 INT WORKSH HIGH

[3] Unsteady CFD computations using vertex-centered finite volumes for unstructured grids on Graphics Processing Units [J].

Asouti, V. G. ;

Trompoukis, X. S. ;

Kampolis, I. C. ;

Giannakoglou, K. C. .

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2011, 67 (02) :232-246

[4]

Bertolli C., 2012, P 25 INT WORKSH LANG

[5]

Bertolli C., 2011, LECT NOTES COMPUTER

[6]

Brandvik Tobias, 2010, Proceedings of the 2010 IEEE 10th International Conference on Computer and Information Technology (CIT 2010), P1181, DOI 10.1109/CIT.2010.214

[7]

Burgess D. A., 1994, P 2 EUR COMP FLUID D, P391

[8] Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines [J].

Burgess, DA ;

Giles, MB .

ADVANCES IN ENGINEERING SOFTWARE, 1997, 28 (03) :189-201

[9] Multigrid aircraft computations using the OPlus parallel library [J].

Crumpton, PI ;

Giles, MB .

PARALLEL COMPUTATIONAL FLUID DYNAMICS: IMPLEMENTATIONS AND RESULTS USING PARALLEL COMPUTERS, 1996, :339-346

[10]

Czarnecki K, 2000, LECT NOTES COMPUT SC, V1766, P25

← 1 2 3 4 →