Automatic Mapping of Parallel Pattern-Based Algorithms on Heterogeneous Architectures

被引:2
作者
Truemper, Lukas [1 ,2 ]
Miller, Julian [2 ]
Terboven, Christian [2 ]
Mueller, Matthias S. [2 ]
机构
[1] Huddly AS, Oslo, Norway
[2] Rhein Westfal TH Aachen, IT Ctr, Chair High Performance Comp, Aachen, Germany
来源
ARCHITECTURE OF COMPUTING SYSTEMS (ARCS 2021) | 2021年 / 12800卷
关键词
Mapping; Heterogeneous architectures; Global transformations; Parallel patterns; Performance portability; MODEL;
D O I
10.1007/978-3-030-81682-7_4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, specialized hardware is often found in clusters to improve compute performance and energy efficiency. The porting and tuning of scientific codes to these heterogeneous clusters requires significant development efforts. To mitigate these efforts while maintaining high performance, modern parallel programming models introduce a second layer of abstraction, where an architecture-agnostic source code can be maintained and automatically optimized for the target architecture. However, with increasing heterogeneity, the mapping of an application to a specific architecture itself becomes a complex decision requiring a differentiated consideration of processor features and algorithmic properties. Furthermore, architecture-agnostic global transformations are necessary to maximize the simultaneous utilization of different processors. Therefore, we introduce a combinatorial optimization approach to globally transform and automatically map parallel algorithms to heterogeneous architectures. We derive a global transformation and mapping algorithm which bases on a static performance model. Moreover, we demonstrate the approach on five typical algorithmic kernels showing automatic and global transformations such as loop fusion, re-ordering, pipelining, NUMA awareness, and optimal mapping strategies to an exemplary CPU-GPU compute node. Our algorithm achieves performance on par with hand-tuned implementations of all five kernels.
引用
收藏
页码:53 / 67
页数:15
相关论文
共 25 条
  • [1] Altaf M.S.B, 2014, IEEE COMPUT ARCHIT L, V14, P132
  • [2] [Anonymous], STRUCTURED PARALLEL
  • [3] [Anonymous], The message passing interface MPI standard
  • [4] A View of the Parallel Computing Landscape
    Asanovic, Krste
    Bodik, Rastislav
    Demmel, James
    Keaveny, Tony
    Keutzer, Kurt
    Kubiatowicz, John
    Morgan, Nelson
    Patterson, David
    Sen, Koushik
    Wawrzynek, John
    Wessel, David
    Yelick, Katherine
    [J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 56 - 67
  • [5] COMPILER TRANSFORMATIONS FOR HIGH-PERFORMANCE COMPUTING
    BACON, DF
    GRAHAM, SL
    SHARP, OJ
    [J]. ACM COMPUTING SURVEYS, 1994, 26 (04) : 345 - 420
  • [6] BAILEY DH, 1991, SUPERCOMPUTING 91, P158
  • [7] BANERJEE U, 1979, IEEE T COMPUT, V28, P660, DOI 10.1109/TC.1979.1675434
  • [8] Recent Advances in Matrix Partitioning for Parallel Computing on Heterogeneous Platforms
    Beaumont, Olivier
    Becker, Brett A.
    DeFlumere, Ashley
    Eyraud-Dubois, Lionel
    Lambert, Thomas
    Lastovetsky, Alexey
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (01) : 218 - 229
  • [9] RAJA: Portable Performance for Large-Scale Scientific Applications
    Beckingsale, David Alexander
    Burmark, Jason
    Hornung, Rich
    Jones, Holger
    Killian, William
    Kunen, Adam J.
    Pearce, Olga
    Robinson, Peter
    Ryujin, Brian S.
    Scogland, Thomas R. W.
    [J]. PROCEEDINGS OF P3HPC 2019: 2019 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC), 2019, : 71 - 81
  • [10] Chakravarty M. M. T., 2001, Euro-Par 2001 Parallel Processing. 7th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2150), P524