Portable Node-Level Parallelism for the PGAS Model

被引：0

作者：

Jungblut, Pascal ^{[1
]}

Fuerlinger, Karl ^{[1
]}

机构：

[1] Ludwig Maximilians Univ LMU Munchen, Dept Comp Sci, MNM Team, Oettingenstr 67, D-80538 Munich, Germany

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2021年 / 49卷 / 06期

关键词：

PGAS; Parallel computing; Programming models;

D O I：

10.1007/s10766-021-00718-x

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The Partitioned Global Address Space (PGAS) programming model brings intuitive shared memory semantics to distributed memory systems. Even with an abstract and unifying virtual global address space it is, however, challenging to use the full potential of different systems. Without explicit support by the implementation node-local operations have to be optimized manually for each architecture. A goal of this work is to offer a user-friendly programming model that provides portable performance across systems. In this paper we present an approach to integrate node-level programming abstractions with the PGAS programming model. We describe the hierarchical data distribution with local patterns and our implementation, MEPHISTO, in C++ using two existing projects. The evaluation of MEPHISTO shows that our approach achieves portable performance while requiring only minimal changes to port it from a CPU-based system to a GPU-based one using a CUDA or HIP back-end.

引用

页码：867 / 885

页数：19

共 38 条

[21] Real-time electromagnetic transient simulation algorithm for integrated power systems based on network level and component level parallelism
LaiJun Chen
Ying Chen
ShengWei Mei
Science China Technological Sciences, 2012, 55 : 3232 - 3241
[22] Optimizing Dynamic Programming on Graphics Processing Units via Adaptive Thread-Level Parallelism
Wu, Chao-Chin
Ke, Jenn-Yang
Lin, Heshan
Feng, Wu-chun
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 96 - 103
[23] Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems
Al Maruf, Md
Azim, Akramul
Auluck, Nitin
Sahi, Mansi
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 190
[24] Time-space tiling with tile-level parallelism for the 3D FDTD method
Fukaya, Takeshi
Iwashita, Takeshi
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2018), 2018, : 116 - 126
[25] Portable and efficient parallel computing using the BSP model
Goudreau, MW
Lang, K
Rao, SB
Suel, T
Tsantilas, T
IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (07) : 670 - 689
[26] GODSON-T: AN EFFICIENT MANY-CORE PROCESSOR EXPLORING THREAD-LEVEL PARALLELISM
Fan, Dongrui
Zhang, Hao
Wang, Da
Ye, Xiaochun
Song, Fenglong
Li, Guojie
Sun, Ninghui
IEEE MICRO, 2012, 32 (02) : 38 - 47
[27] Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model
Gonzalez-Dominguez, Jorge
Kaessens, Jan Christian
Wienbrandt, Lars
Schmidt, Bertil
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (04) : 506 - 510
[28] A parallel and performance portable implementation of a full-field crystal plasticity model
Yenusah, Caleb O.
Morgan, Nathaniel R.
Lebensohn, Ricardo A.
Zecevic, Miroslav
Knezevic, Marko
COMPUTER PHYSICS COMMUNICATIONS, 2024, 300
[29] OpenH: A Novel Programming Model and API for Developing Portable Parallel Programs on Heterogeneous Hybrid Servers
Farrelly, Simon
Manumachu, Ravi Reddy
Lastovetsky, Alexey
IEEE ACCESS, 2024, 12 : 23666 - 23694
[30] Scalable Heterogeneous Scheduling Based Model Parallelism for Real-Time Inference of Large-Scale Deep Neural Networks
Zou, Xiaofeng
Chen, Cen
Lin, Peiying
Zhang, Luochuan
Xu, Yanwu
Zhang, Wenjie
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2962 - 2973

← 1 2 3 4 →