Advanced Thread Synchronization for Multithreaded MPI Implementations

被引：10

作者：

Hoang-Vu Dang ^{[1
]}

Seo, Sangmin ^{[2
]}

Amer, Abdelhalim ^{[2
]}

Balaji, Pavan ^{[2
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[2] Argonne Natl Lab, Math & Comp Sci Div, 9700 S Cass Ave, Argonne, IL 60439 USA

来源：

2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) | 2017年

基金：

美国国家科学基金会;

关键词：

MPI; threads; OpenMP; thread safety; lock; mutex; synchronization;

D O I：

10.1109/CCGRID.2017.65

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Concurrent multithreaded access to the Message Passing Interface (MPI) is gaining importance to support emerging hybrid MPI applications. The interoperability between threads and MPI, however, is complex and renders efficient implementations nontrivial. Prior studies showed that threads waiting for communication progress (waiting threads) often interfere with others (active threads) and degrade their progress. This situation occurs when both classes of threads compete for the same MPI resource and ownership passing to waiting threads does not guarantee communication to advance. The best-known practical solution prioritizes active threads and adapts first-infirst-out arbitration within each class. This approach, however, suffers from residual wasted resource acquisitions (waste) and ignores data locality, thus resulting in poor scalability. In this work, we propose thread synchronization improvements to eliminate waste while preserving data locality in a production MPI implementation. First, we leverage MPI knowledge and a fast synchronization method to eliminate waste and accelerate progress. Second, we rely on a cooperative progress model that dynamically elects and restricts a single waiting thread to drive a communication context for improved data locality. Third, we prioritize active threads and synchronize them with a localitypreserving lock that is hierarchical and exploits unbounded bias for high throughput. Results show significant improvement in synthetic microbenchmarks and two MPI+OpenMP applications.

引用

页码：314 / 324

页数：11

共 50 条

[41] Optimization Strategies for Inter-Thread Synchronization Overhead on NUMA Machine
Wu, Song
Zhang, Jun
Peng, Yaqiong
Jin, Hai
Jiang, Wenbin
2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
[42] MPI for Windows NT: Two generations of implementations and experience with the message passing interface for clusters and SMP environments
Hebert, LS
Seefeld, WG
Skjellum, A
Taylor, CD
Dimitrov, R
INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 309 - 316
[43] Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi
Wu, Hancheng
Becchi, Michela
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 1018 - 1029
[44] CORBA AND MPI-BASED "BACKBONE" FOR COUPLING ADVANCED SIMULATION TOOLS
Seydaliev, M.
Caswell, D.
CNL NUCLEAR REVIEW, 2014, 3 (02) : 83 - 90
[45] Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters
Wu, Xingfu
Taylor, Valerie
COMPUTER JOURNAL, 2012, 55 (02) : 154 - 167
[46] Characterizing the Advanced Synchronization Capabilities of PTM Enabled Hardware
Frech, Brandon
St James, Julian
Pinales, Armando
Byagowi, Ahmad
2023 IEEE INTERNATIONAL SYMPOSIUM ON PRECISION CLOCK SYNCHRONIZATION FOR MEASUREMENT, CONTROL, AND COMMUNICATION, ISPCS, 2023,
[47] An Advanced Phase Synchronization Scheme for LT-1
Jin, Guodong
Liu, Kaiyu
Liu, Dacheng
Liang, Da
Zhang, Heng
Ou, Naiming
Zhang, Yanyan
Deng, Yun-Kai
Li, Chuang
Wang, Robert
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1735 - 1746
[48] OpenMP and MPI implementations of an elasto-viscoplastic fast Fourier transform-based micromechanical solver for fast crystal plasticity modeling
Eghtesad, Adnan
Barrett, Timothy J.
Germaschewski, Kai
Lebensohn, Ricardo A.
McCabe, Rodney J.
Knezevic, Marko
ADVANCES IN ENGINEERING SOFTWARE, 2018, 126 : 46 - 60
[49] YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve
Zhang, Feng
Su, Jiya
Liu, Weifeng
He, Bingsheng
Wu, Ruofan
Du, Xiaoyong
Wang, Rujia
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (09) : 2321 - 2337
[50] Advanced Structures for Grid Synchronization of Power Converters in Distributed Generation Applications
Luna, Alvaro
Rocabert, Joan
Candela, Ignacio
Rodriguez, Pedro
Teodorescu, Remus
Blaabjerg, Frede
2012 IEEE ENERGY CONVERSION CONGRESS AND EXPOSITION (ECCE), 2012, : 2769 - 2776

← 1 2 3 4 5 →