Portable Node-Level Parallelism for the PGAS Model

被引:0
|
作者
Jungblut, Pascal [1 ]
Fuerlinger, Karl [1 ]
机构
[1] Ludwig Maximilians Univ LMU Munchen, Dept Comp Sci, MNM Team, Oettingenstr 67, D-80538 Munich, Germany
关键词
PGAS; Parallel computing; Programming models;
D O I
10.1007/s10766-021-00718-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Partitioned Global Address Space (PGAS) programming model brings intuitive shared memory semantics to distributed memory systems. Even with an abstract and unifying virtual global address space it is, however, challenging to use the full potential of different systems. Without explicit support by the implementation node-local operations have to be optimized manually for each architecture. A goal of this work is to offer a user-friendly programming model that provides portable performance across systems. In this paper we present an approach to integrate node-level programming abstractions with the PGAS programming model. We describe the hierarchical data distribution with local patterns and our implementation, MEPHISTO, in C++ using two existing projects. The evaluation of MEPHISTO shows that our approach achieves portable performance while requiring only minimal changes to port it from a CPU-based system to a GPU-based one using a CUDA or HIP back-end.
引用
收藏
页码:867 / 885
页数:19
相关论文
共 38 条
  • [21] Real-time electromagnetic transient simulation algorithm for integrated power systems based on network level and component level parallelism
    LaiJun Chen
    Ying Chen
    ShengWei Mei
    Science China Technological Sciences, 2012, 55 : 3232 - 3241
  • [22] Optimizing Dynamic Programming on Graphics Processing Units via Adaptive Thread-Level Parallelism
    Wu, Chao-Chin
    Ke, Jenn-Yang
    Lin, Heshan
    Feng, Wu-chun
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 96 - 103
  • [23] Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems
    Al Maruf, Md
    Azim, Akramul
    Auluck, Nitin
    Sahi, Mansi
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 190
  • [24] Time-space tiling with tile-level parallelism for the 3D FDTD method
    Fukaya, Takeshi
    Iwashita, Takeshi
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2018), 2018, : 116 - 126
  • [25] Portable and efficient parallel computing using the BSP model
    Goudreau, MW
    Lang, K
    Rao, SB
    Suel, T
    Tsantilas, T
    IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (07) : 670 - 689
  • [26] GODSON-T: AN EFFICIENT MANY-CORE PROCESSOR EXPLORING THREAD-LEVEL PARALLELISM
    Fan, Dongrui
    Zhang, Hao
    Wang, Da
    Ye, Xiaochun
    Song, Fenglong
    Li, Guojie
    Sun, Ninghui
    IEEE MICRO, 2012, 32 (02) : 38 - 47
  • [27] Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model
    Gonzalez-Dominguez, Jorge
    Kaessens, Jan Christian
    Wienbrandt, Lars
    Schmidt, Bertil
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (04) : 506 - 510
  • [28] A parallel and performance portable implementation of a full-field crystal plasticity model
    Yenusah, Caleb O.
    Morgan, Nathaniel R.
    Lebensohn, Ricardo A.
    Zecevic, Miroslav
    Knezevic, Marko
    COMPUTER PHYSICS COMMUNICATIONS, 2024, 300
  • [29] OpenH: A Novel Programming Model and API for Developing Portable Parallel Programs on Heterogeneous Hybrid Servers
    Farrelly, Simon
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE ACCESS, 2024, 12 : 23666 - 23694
  • [30] Scalable Heterogeneous Scheduling Based Model Parallelism for Real-Time Inference of Large-Scale Deep Neural Networks
    Zou, Xiaofeng
    Chen, Cen
    Lin, Peiying
    Zhang, Luochuan
    Xu, Yanwu
    Zhang, Wenjie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2962 - 2973