Fine-Grained MPI plus OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

被引:3
|
作者
Richard, Jerome [1 ,2 ]
Latu, Guillaume [1 ]
Bigot, Julien [3 ]
Gautier, Thierry [4 ]
机构
[1] CEA, IRFM, F-13108 St Paul Les Durance, France
[2] Zebrys, Toulouse, France
[3] Univ Paris Saclay, UVSQ, Univ Paris Sud, Maison Simulat,CEA,CNRS, Gif Sur Yvette, France
[4] Univ Lyon, INRIA, CNRS, ENS Lyon,Univ Claude Bernard Lyon 1,LIP, Lyon, France
来源
EURO-PAR 2019: PARALLEL PROCESSING | 2019年 / 11725卷
关键词
Dependent tasks; OpenMP; 4.5; MPI; Many-core;
D O I
10.1007/978-3-030-29400-7_30
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper demonstrates how OpenMP 4.5 tasks can be used to efficiently overlap computations and MPI communications based on a case-study conducted on multi-core and many-core architectures. It focuses on task granularity, dependencies and priorities, and also identifies some limitations of OpenMP. Results on 64 Skylake nodes show that while 64% of the wall-clock time is spent in MPI communications, 60% of the cores are busy in computations, which is a good result. Indeed, the chosen dataset is small enough to be a challenging case in terms of overlap and thus useful to assess worst-case scenarios in future simulations. Two key features were identified: by using task priority we improved the performance by 5.7% (mainly due to an improved overlap), and with recursive tasks we shortened the execution time by 9.7%. We also illustrate the need to have access to tools for task tracing and task visualization. These tools allowed a fine understanding and a performance increase for this task-based OpenMP+MPI code.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 8 条
  • [1] MPI plus OpenMP tasking scalability for multi-morphology simulations of the human brain
    Valero-Lara, Pedro
    Sirvent, Raul
    Pena, Antonio J.
    Labarta, Jesus
    PARALLEL COMPUTING, 2019, 84 : 50 - 61
  • [2] FINE-GRAINED MULTITHREADING SUPPORT FOR HYBRID THREADED MPI PROGRAMMING
    Balaji, Pavan
    Buntinas, Darius
    Goodell, David
    Gropp, William
    Thakur, Rajeev
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2010, 24 (01) : 49 - 57
  • [3] Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMP
    Munera, Adrian
    Royuela, Sara
    Pressler, Michael
    Mackamul, Harald
    Ziegenbein, Dirk
    Quinones, Eduardo
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 146
  • [4] Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization
    Lu, Yongchun
    Zeng, Xiangrui
    Zhao, Xiaofang
    Li, Shirui
    Li, Hua
    Gao, Xin
    Xu, Min
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [5] Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization
    Yongchun Lü
    Xiangrui Zeng
    Xiaofang Zhao
    Shirui Li
    Hua Li
    Xin Gao
    Min Xu
    BMC Bioinformatics, 20
  • [6] Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking
    Tagliavini, Giuseppe
    Cesarini, Daniele
    Marongiu, Andrea
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (09) : 2150 - 2163
  • [7] AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and Loops
    Wodiany, Igor
    Drebes, Andi
    Neill, Richard
    Pop, Antoniu
    OPENMP: PORTABLE MULTI-LEVEL PARALLELISM ON MODERN SYSTEMS, 2020, 12295 : 165 - 180
  • [8] X-OpenMP - eXtreme fine-grained tasking using lock-less work stealing
    Nookala, Poornima
    Chard, Kyle
    Raicu, Ioan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 444 - 458