Fine-Grained MPI plus OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

被引：3

作者：

Richard, Jerome ^{[1
,2
]}

Latu, Guillaume ^{[1
]}

Bigot, Julien ^{[3
]}

Gautier, Thierry ^{[4
]}

机构：

[1] CEA, IRFM, F-13108 St Paul Les Durance, France

[2] Zebrys, Toulouse, France

[3] Univ Paris Saclay, UVSQ, Univ Paris Sud, Maison Simulat,CEA,CNRS, Gif Sur Yvette, France

[4] Univ Lyon, INRIA, CNRS, ENS Lyon,Univ Claude Bernard Lyon 1,LIP, Lyon, France

来源：

EURO-PAR 2019: PARALLEL PROCESSING | 2019年 / 11725卷

关键词：

Dependent tasks; OpenMP; 4.5; MPI; Many-core;

D O I：

10.1007/978-3-030-29400-7_30

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper demonstrates how OpenMP 4.5 tasks can be used to efficiently overlap computations and MPI communications based on a case-study conducted on multi-core and many-core architectures. It focuses on task granularity, dependencies and priorities, and also identifies some limitations of OpenMP. Results on 64 Skylake nodes show that while 64% of the wall-clock time is spent in MPI communications, 60% of the cores are busy in computations, which is a good result. Indeed, the chosen dataset is small enough to be a challenging case in terms of overlap and thus useful to assess worst-case scenarios in future simulations. Two key features were identified: by using task priority we improved the performance by 5.7% (mainly due to an improved overlap), and with recursive tasks we shortened the execution time by 9.7%. We also illustrate the need to have access to tools for task tracing and task visualization. These tools allowed a fine understanding and a performance increase for this task-based OpenMP+MPI code.

引用

页码：419 / 433

页数：15

共 8 条

[1] MPI plus OpenMP tasking scalability for multi-morphology simulations of the human brain
Valero-Lara, Pedro
Sirvent, Raul
Pena, Antonio J.
Labarta, Jesus
PARALLEL COMPUTING, 2019, 84 : 50 - 61
[2] FINE-GRAINED MULTITHREADING SUPPORT FOR HYBRID THREADED MPI PROGRAMMING
Balaji, Pavan
Buntinas, Darius
Goodell, David
Gropp, William
Thakur, Rajeev
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2010, 24 (01) : 49 - 57
[3] Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMP
Munera, Adrian
Royuela, Sara
Pressler, Michael
Mackamul, Harald
Ziegenbein, Dirk
Quinones, Eduardo
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 146
[4] Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization
Lu, Yongchun
Zeng, Xiangrui
Zhao, Xiaofang
Li, Shirui
Li, Hua
Gao, Xin
Xu, Min
BMC BIOINFORMATICS, 2019, 20 (01)
[5] Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization
Yongchun Lü
Xiangrui Zeng
Xiaofang Zhao
Shirui Li
Hua Li
Xin Gao
Min Xu
BMC Bioinformatics, 20
[6] Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking
Tagliavini, Giuseppe
Cesarini, Daniele
Marongiu, Andrea
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (09) : 2150 - 2163
[7] AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and Loops
Wodiany, Igor
Drebes, Andi
Neill, Richard
Pop, Antoniu
OPENMP: PORTABLE MULTI-LEVEL PARALLELISM ON MODERN SYSTEMS, 2020, 12295 : 165 - 180
[8] X-OpenMP - eXtreme fine-grained tasking using lock-less work stealing
Nookala, Poornima
Chard, Kyle
Raicu, Ioan
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 444 - 458

← 1 →