Automatic Mapping of Parallel Pattern-Based Algorithms on Heterogeneous Architectures

被引：2

作者：

Truemper, Lukas ^{[1
,2
]}

Miller, Julian ^{[2
]}

Terboven, Christian ^{[2
]}

Mueller, Matthias S. ^{[2
]}

机构：

[1] Huddly AS, Oslo, Norway

[2] Rhein Westfal TH Aachen, IT Ctr, Chair High Performance Comp, Aachen, Germany

来源：

ARCHITECTURE OF COMPUTING SYSTEMS (ARCS 2021) | 2021年 / 12800卷

关键词：

Mapping; Heterogeneous architectures; Global transformations; Parallel patterns; Performance portability; MODEL;

D O I：

10.1007/978-3-030-81682-7_4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, specialized hardware is often found in clusters to improve compute performance and energy efficiency. The porting and tuning of scientific codes to these heterogeneous clusters requires significant development efforts. To mitigate these efforts while maintaining high performance, modern parallel programming models introduce a second layer of abstraction, where an architecture-agnostic source code can be maintained and automatically optimized for the target architecture. However, with increasing heterogeneity, the mapping of an application to a specific architecture itself becomes a complex decision requiring a differentiated consideration of processor features and algorithmic properties. Furthermore, architecture-agnostic global transformations are necessary to maximize the simultaneous utilization of different processors. Therefore, we introduce a combinatorial optimization approach to globally transform and automatically map parallel algorithms to heterogeneous architectures. We derive a global transformation and mapping algorithm which bases on a static performance model. Moreover, we demonstrate the approach on five typical algorithmic kernels showing automatic and global transformations such as loop fusion, re-ordering, pipelining, NUMA awareness, and optimal mapping strategies to an exemplary CPU-GPU compute node. Our algorithm achieves performance on par with hand-tuned implementations of all five kernels.

引用

页码：53 / 67

页数：15

共 25 条

[1] Altaf M.S.B, 2014, IEEE COMPUT ARCHIT L, V14, P132
[2] [Anonymous], STRUCTURED PARALLEL
[3] [Anonymous], The message passing interface MPI standard
[4] A View of the Parallel Computing Landscape
Asanovic, Krste
Bodik, Rastislav
Demmel, James
Keaveny, Tony
Keutzer, Kurt
Kubiatowicz, John
Morgan, Nelson
Patterson, David
Sen, Koushik
Wawrzynek, John
Wessel, David
Yelick, Katherine
[J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 56 - 67
[5] COMPILER TRANSFORMATIONS FOR HIGH-PERFORMANCE COMPUTING
BACON, DF
GRAHAM, SL
SHARP, OJ
[J]. ACM COMPUTING SURVEYS, 1994, 26 (04) : 345 - 420
[6] BAILEY DH, 1991, SUPERCOMPUTING 91, P158
[7] BANERJEE U, 1979, IEEE T COMPUT, V28, P660, DOI 10.1109/TC.1979.1675434
[8] Recent Advances in Matrix Partitioning for Parallel Computing on Heterogeneous Platforms
Beaumont, Olivier
Becker, Brett A.
DeFlumere, Ashley
Eyraud-Dubois, Lionel
Lambert, Thomas
Lastovetsky, Alexey
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (01) : 218 - 229
[9] RAJA: Portable Performance for Large-Scale Scientific Applications
Beckingsale, David Alexander
Burmark, Jason
Hornung, Rich
Jones, Holger
Killian, William
Kunen, Adam J.
Pearce, Olga
Robinson, Peter
Ryujin, Brian S.
Scogland, Thomas R. W.
[J]. PROCEEDINGS OF P3HPC 2019: 2019 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC), 2019, : 71 - 81
[10] Chakravarty M. M. T., 2001, Euro-Par 2001 Parallel Processing. 7th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2150), P524

← 1 2 3 →