Finding missing synchronization in a distributed computation using controlled re-execution

被引:0
作者
Neeraj Mittal
Vijay K. Garg
机构
[1] The University of Texas at Dallas,Department of Computer Science
[2] The University of Texas at Austin,Department of Electrical and Computer Engineering
来源
Distributed Computing | 2004年 / 17卷
关键词
Distributed system; Debugging; Software-fault tolerance; Controlled re-execution; Predicate control;
D O I
暂无
中图分类号
学科分类号
摘要
Correct distributed programs are hard to write. Not surprisingly, distributed systems are especially vulnerable to software faults. Testing and debugging is an important way to improve the reliability of distributed systems. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem of predicate control, which takes a computation and a safety property specified on the computation as inputs, and produces a controlled computation, with added synchronization, that maintains the given safety property as output. We devise efficient control algorithms for two classes of useful predicates, namely region predicates and disjunctive predicates. For the former, we prove that the control algorithm is optimal in the sense that it guarantees maximum concurrency possible in the controlled computation. For the latter, we prove that our control algorithm generates the least number of synchronization dependencies and therefore has optimal message-complexity. Furthermore, we provide a necessary and sufficient condition under which it is possible to efficiently compute a minimal controlling synchronization for a general predicate. We also give an algorithm to compute such a synchronization under the condition provided.
引用
收藏
页码:107 / 130
页数:23
相关论文
共 9 条
[1]  
Chandy TJ(1)Debugging Programs with Instant Replay ACM Transactions on Computer Systems 3 63-482
[2]  
Chase JM(4)undefined Distributed Computing 191-undefined
[3]  
Fidge undefined(8)undefined IEEE Computer 24 28-undefined
[4]  
Kilgore undefined(8)undefined The Computer Journal 40 489-undefined
[5]  
Lamport undefined(7)undefined Communications of the ACM 558-undefined
[6]  
LeBlanc undefined(1987)undefined IEEE Transactions on Computers C -36 471-undefined
[7]  
Mellor-Crummey undefined(1)undefined Theoretical Computer Science 13 61-undefined
[8]  
Maggiolo-Schettini undefined(10)undefined IEEE Transactions on Computers 46 1137-undefined
[9]  
Wang undefined(undefined)undefined undefined undefined undefined-undefined