Distributed Convergence Detection Based on Global Residual Error Under Asynchronous Iterations

被引:23
作者
Magoules, Frederic [1 ]
Gbikpi-Benissan, Guillaume [2 ]
机构
[1] Univ Paris Saclay, Cent Supelec, F-91191 Gif Sur Yvette, France
[2] IRT SystemX, F-91127 Paris, France
关键词
Asynchronous iterations; convergence detection; global residual; distributed snapshot; parallel computing; TERMINATION DETECTION; PARALLEL; ALGORITHMS; MULTIPROCESSORS; SYSTEMS;
D O I
10.1109/TPDS.2017.2780856
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5,600 processor cores confirm the practical effectiveness and efficiency of our approach.
引用
收藏
页码:819 / 829
页数:11
相关论文
共 31 条
[1]  
[Anonymous], 1996, Distributed algorithms
[2]  
[Anonymous], 1969, Linear algebra and its applications, DOI DOI 10.1016/0024-3795(69)90028-7
[3]   Parallel treatment of a class of differential-algebraic systems [J].
Bahi, J ;
Griepentrog, E ;
Miellou, JC .
SIAM JOURNAL ON NUMERICAL ANALYSIS, 1996, 33 (05) :1969-1980
[4]  
Bahi JM, 2008, LECT NOTES COMPUT SC, V5336, P240, DOI 10.1007/978-3-540-92859-1_22
[5]   A decentralized convergence detection algorithm for asynchronous parallel iterative algorithms [J].
Bahi, JM ;
Contassot-Vivier, S ;
Couturier, R ;
Vernier, F .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (01) :4-13
[6]   ASYNCHRONOUS ITERATIVE METHODS FOR MULTIPROCESSORS [J].
BAUDET, GM .
JOURNAL OF THE ACM, 1978, 25 (02) :226-244
[7]  
Benjelloun S., 1989, COMPUT MECH PUBL, V2, P275
[8]   SOME ASPECTS OF PARALLEL AND DISTRIBUTED ITERATIVE ALGORITHMS - A SURVEY [J].
BERTSEKAS, DP ;
TSITSIKLIS, JN .
AUTOMATICA, 1991, 27 (01) :3-21
[9]   SYNCHRONOUS AND ASYNCHRONOUS IMPLEMENTATIONS OF RELAXATION ALGORITHMS FOR NONLINEAR NETWORK OPTIMIZATION [J].
CHAJAKIS, ED ;
ZENIOS, SA .
PARALLEL COMPUTING, 1991, 17 (08) :873-894
[10]   DISTRIBUTED SNAPSHOTS - DETERMINING GLOBAL STATES OF DISTRIBUTED SYSTEMS [J].
CHANDY, KM ;
LAMPORT, L .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (01) :63-75