DFix: Automatically Fixing Timing Bugs in Distributed Systems

被引:9
作者
Li, Guangpu [1 ]
Liu, Haopeng [1 ]
Chen, Xianglan [1 ,2 ]
Gunawi, Haryadi S. [1 ]
Lu, Shan [1 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Univ Sci & Tech China, Hefei, Peoples R China
来源
PROCEEDINGS OF THE 40TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '19) | 2019年
基金
美国国家科学基金会;
关键词
Distributed system; Timing; Bug fixing;
D O I
10.1145/3314221.3314620
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Distributed systems nowadays are the backbone of computing society, and are expected to have high availability. Unfortunately, distributed timing bugs, a type of bugs triggered by non-deterministic timing of messages and node crashes, widely exist. They lead to many production-run failures, and are difficult to reason about and patch. Although recently proposed techniques can automatically detect these bugs, how to automatically and correctly fix them still remains as an open problem. This paper presents DFix, a tool that automatically processes distributed timing bug reports, statically analyzes the buggy system, and produces patches. Our evaluation shows that DFix is effective in fixing real-world distributed timing bugs.
引用
收藏
页码:994 / 1009
页数:16
相关论文
共 60 条
  • [11] DISTRIBUTED SNAPSHOTS - DETERMINING GLOBAL STATES OF DISTRIBUTED SYSTEMS
    CHANDY, KM
    LAMPORT, L
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1985, 3 (01): : 63 - 75
  • [12] Bigtable: A distributed storage system for structured data
    Chang, Fay
    Dean, Jeffrey
    Ghemawat, Sanjay
    Hsieh, Wilson C.
    Wallach, Deborah A.
    Burrows, Mike
    Chandra, Tushar
    Fikes, Andrew
    Gruber, Robert E.
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2008, 26 (02):
  • [13] Chow M., 2014, 11 USENIX S OP SYST, P217
  • [14] de Souza Eduardo Faria., 2018, A novel fitness function for automated program repair based on source code checkpoints
  • [15] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [16] DeCandia Giuseppe, 2007, Operating Systems Review, V41, P205, DOI 10.1145/1323293.1294281
  • [17] P: Safe Asynchronous Event-Driven Programming
    Desai, Ankush
    Gupta, Vivek
    Jackson, Ethan
    Qadeer, Shaz
    Rajamani, Sriram
    Zufferey, Damien
    [J]. ACM SIGPLAN NOTICES, 2013, 48 (06) : 321 - 331
  • [18] Rx: Treating bugs method to, survive as allergies - A safe software failures
    Qin, Feng
    Tucek, Joseph
    Zhou, Yuanyuan
    Sundaresan, Jagadeesan
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2007, 25 (03):
  • [19] Geels D.M., 2006, Replay debugging for distributed applications
  • [20] Automatically Repairing Network Control Planes Using an Abstract Representation
    Gember-Jacobson, Aaron
    Akella, Aditya
    Mahajan, Ratul
    Liu, Hongqiang Harry
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, : 359 - 373