FINE-GRAINED MULTITHREADING SUPPORT FOR HYBRID THREADED MPI PROGRAMMING

被引:34
|
作者
Balaji, Pavan [1 ]
Buntinas, Darius [1 ]
Goodell, David [1 ]
Gropp, William [2 ]
Thakur, Rajeev [1 ]
机构
[1] Argonne Natl Lab, Math & Comp Sci Div, Argonne, IL 60439 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
关键词
MPI; threads; hybrid programming; fine-grained locks;
D O I
10.1177/1094342009360206
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As high-end computing systems continue to grow in scale, recent advances in multi-and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amounts of memory per processing element and shared caches. As a result, application and computer scientists are exploring alternative programming models that involve using MPI between address spaces and some other threaded model, such as OpenMP, Pthreads, or Intel TBB, within an address space. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We present performance results that demonstrate the performance implications of the different approaches.
引用
收藏
页码:49 / 57
页数:9
相关论文
共 29 条
  • [1] Experience with mixed MPI/threaded programming models
    May, JM
    de Supinski, BR
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOL VI, PROCEEDINGS, 1999, : 2907 - 2912
  • [2] Fine-Grained MPI plus OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks
    Richard, Jerome
    Latu, Guillaume
    Bigot, Julien
    Gautier, Thierry
    EURO-PAR 2019: PARALLEL PROCESSING, 2019, 11725 : 419 - 433
  • [3] Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization
    Lu, Yongchun
    Zeng, Xiangrui
    Zhao, Xiaofang
    Li, Shirui
    Li, Hua
    Gao, Xin
    Xu, Min
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [4] Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization
    Yongchun Lü
    Xiangrui Zeng
    Xiaofang Zhao
    Shirui Li
    Hua Li
    Xin Gao
    Min Xu
    BMC Bioinformatics, 20
  • [5] Parallel programming model for the Epiphany many-core coprocessor using threaded MPI
    Ross, James A.
    Richie, David A.
    Park, Song J.
    Shires, Dale R.
    MICROPROCESSORS AND MICROSYSTEMS, 2016, 43 : 95 - 103
  • [6] Fine-Grained Data Distribution Operations for Particle Codes
    Hofmann, Michael
    Ruenger, Gudula
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 54 - 63
  • [7] Hybrid Parallel Programming with MPI and Unified Parallel C
    Dinan, James
    Balaji, Pavan
    Lusk, Ewing
    Sadayappan, P.
    Thakur, Rajeev
    PROCEEDINGS OF THE 2010 COMPUTING FRONTIERS CONFERENCE (CF 2010), 2010, : 177 - 185
  • [8] Parallelization of Array Method with Hybrid Programming: OpenMP and MPI
    Velarde Martinez, Apolinar
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [9] Enabling Performance Efficient Runtime Support for Hybrid MPI plus UPC plus plus Programming Models
    Hashmi, Jahanzeb Maqbool
    Hamidouche, Khaled
    Panda, Dhabaleswar K.
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1180 - 1187
  • [10] Program transformation and runtime support for threaded MPI execution on shared-memory machines
    Tang, H
    Shen, K
    Yang, T
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2000, 22 (04): : 673 - 700