A Graph Based Approach for MPI Deadlock Detection

被引:28
作者
Hilbrich, Tobias [1 ]
de Supinski, Bronis R.
Schulz, Martin
Mueller, Matthias S. [1 ]
机构
[1] Tech Univ Dresden, ZIH, D-01062 Dresden, Germany
来源
ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING | 2009年
关键词
Parallel Programming; MPI; Deadlock Detection; Umpire; VISUALIZATION;
D O I
10.1145/1542275.1542319
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The MPI standard defines several usage patterns that can lead to deadlock, some of which involve collective communications or non-deterministic operations such as wildcard receives. Further, some MPI programming deadlocks only occur for some MPI implementations or certain configurations. Many tools to detect MPI deadlocks exist; however, none precisely handles the increased complexity of deadlock detection created by the richness of the MPI standard, which requires a general deadlock model. We present the first general deadlock model for MPI including a novel necessary and sufficient criterion, the OR-Knot, for deadlock in MPI programs. This model enables visualization of MPI deadlocks and motivates the design of a new deadlock detection mechanism. We compare our implementation of this mechanism to the ad-hoc mechanism previously available in Umpire, which reflected MPI non-determinism and, thus, more completely detected MPI deadlocks than any other existing MPI deadlock detection tool. Overall, our results demonstrate that our mechanism improves performance by as much as two orders of magnitude while providing precise characterization of deadlocks.
引用
收藏
页码:296 / 305
页数:10
相关论文
共 20 条
[1]  
BAILEY DH, 1992, IEEE PARALLEL DISTRI
[2]  
BARBOSA VC, 1998, GRAPH THEORETIC CHAR
[3]  
Brunst H., 2005, INT SERIES ENG COMPU, V777, P92
[4]  
Haque W., 2006, International Journal of Computers & Applications, V28, P19, DOI 10.2316/Journal.202.2006.1.202-1383
[5]  
HILBRICH T, 2008, THESIS TU DRESDEN
[6]  
*INT CORP, INT TRAC COLL 7 1 US
[7]  
KRAMMER B, 2005, PARCO, V33, P893
[8]  
*LAWR LIV NAT LAB, ASCI PURPL BENCHM CO
[9]   Fast, centralized detection and resolution of distributed deadlocks in the generalized model [J].
Lee, S .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2004, 30 (09) :561-573
[10]   Deadlock detection in MPI programs [J].
Luecke, GR ;
Zou, Y ;
Coyle, J ;
Hoekstra, J ;
Kraeva, M .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2002, 14 (11) :911-932