Advanced Thread Synchronization for Multithreaded MPI Implementations

被引:10
作者
Hoang-Vu Dang [1 ]
Seo, Sangmin [2 ]
Amer, Abdelhalim [2 ]
Balaji, Pavan [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Argonne Natl Lab, Math & Comp Sci Div, 9700 S Cass Ave, Argonne, IL 60439 USA
来源
2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2017年
基金
美国国家科学基金会;
关键词
MPI; threads; OpenMP; thread safety; lock; mutex; synchronization;
D O I
10.1109/CCGRID.2017.65
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Concurrent multithreaded access to the Message Passing Interface (MPI) is gaining importance to support emerging hybrid MPI applications. The interoperability between threads and MPI, however, is complex and renders efficient implementations nontrivial. Prior studies showed that threads waiting for communication progress (waiting threads) often interfere with others (active threads) and degrade their progress. This situation occurs when both classes of threads compete for the same MPI resource and ownership passing to waiting threads does not guarantee communication to advance. The best-known practical solution prioritizes active threads and adapts first-infirst-out arbitration within each class. This approach, however, suffers from residual wasted resource acquisitions (waste) and ignores data locality, thus resulting in poor scalability. In this work, we propose thread synchronization improvements to eliminate waste while preserving data locality in a production MPI implementation. First, we leverage MPI knowledge and a fast synchronization method to eliminate waste and accelerate progress. Second, we rely on a cooperative progress model that dynamically elects and restricts a single waiting thread to drive a communication context for improved data locality. Third, we prioritize active threads and synchronize them with a localitypreserving lock that is hierarchical and exploits unbounded bias for high throughput. Results show significant improvement in synthetic microbenchmarks and two MPI+OpenMP applications.
引用
收藏
页码:314 / 324
页数:11
相关论文
共 50 条
  • [41] Optimization Strategies for Inter-Thread Synchronization Overhead on NUMA Machine
    Wu, Song
    Zhang, Jun
    Peng, Yaqiong
    Jin, Hai
    Jiang, Wenbin
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [42] MPI for Windows NT: Two generations of implementations and experience with the message passing interface for clusters and SMP environments
    Hebert, LS
    Seefeld, WG
    Skjellum, A
    Taylor, CD
    Dimitrov, R
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 309 - 316
  • [43] Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi
    Wu, Hancheng
    Becchi, Michela
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 1018 - 1029
  • [44] CORBA AND MPI-BASED "BACKBONE" FOR COUPLING ADVANCED SIMULATION TOOLS
    Seydaliev, M.
    Caswell, D.
    CNL NUCLEAR REVIEW, 2014, 3 (02) : 83 - 90
  • [45] Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters
    Wu, Xingfu
    Taylor, Valerie
    COMPUTER JOURNAL, 2012, 55 (02) : 154 - 167
  • [46] Characterizing the Advanced Synchronization Capabilities of PTM Enabled Hardware
    Frech, Brandon
    St James, Julian
    Pinales, Armando
    Byagowi, Ahmad
    2023 IEEE INTERNATIONAL SYMPOSIUM ON PRECISION CLOCK SYNCHRONIZATION FOR MEASUREMENT, CONTROL, AND COMMUNICATION, ISPCS, 2023,
  • [47] An Advanced Phase Synchronization Scheme for LT-1
    Jin, Guodong
    Liu, Kaiyu
    Liu, Dacheng
    Liang, Da
    Zhang, Heng
    Ou, Naiming
    Zhang, Yanyan
    Deng, Yun-Kai
    Li, Chuang
    Wang, Robert
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1735 - 1746
  • [48] OpenMP and MPI implementations of an elasto-viscoplastic fast Fourier transform-based micromechanical solver for fast crystal plasticity modeling
    Eghtesad, Adnan
    Barrett, Timothy J.
    Germaschewski, Kai
    Lebensohn, Ricardo A.
    McCabe, Rodney J.
    Knezevic, Marko
    ADVANCES IN ENGINEERING SOFTWARE, 2018, 126 : 46 - 60
  • [49] YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve
    Zhang, Feng
    Su, Jiya
    Liu, Weifeng
    He, Bingsheng
    Wu, Ruofan
    Du, Xiaoyong
    Wang, Rujia
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (09) : 2321 - 2337
  • [50] Advanced Structures for Grid Synchronization of Power Converters in Distributed Generation Applications
    Luna, Alvaro
    Rocabert, Joan
    Candela, Ignacio
    Rodriguez, Pedro
    Teodorescu, Remus
    Blaabjerg, Frede
    2012 IEEE ENERGY CONVERSION CONGRESS AND EXPOSITION (ECCE), 2012, : 2769 - 2776