DCatch: Automatically detecting distributed concurrency bugs in cloud systems

被引:28
作者
Liu H. [1 ]
Li G. [1 ]
Lukman J.F. [1 ]
Li J. [1 ]
Lu S. [1 ]
Gunawi H.S. [1 ]
Tian C. [2 ]
机构
[1] University of Chicago, Chicago
[2] Huawei US R and D Center, Sanfrancisco
来源
| 1600年 / Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States卷 / 52期
基金
美国国家科学基金会;
关键词
Bug detection; Cloud computing; Concurrency bugs; Distributed systems;
D O I
10.1145/3037697.3037735
中图分类号
学科分类号
摘要
In big data and cloud computing era, reliability of distributed systems is extremely important. Unfortunately, distributed concurrency bugs, referred to as DCbugs, widely exist. They hide in the large state space of distributed cloud systems and manifest non-deterministically depending on the timing of distributed computation and communication. Effective techniques to detect DCbugs are desired. This paper presents a pilot solution, DCatch, in the world of DCbug detection. DCatch predicts DCbugs by analyzing correct execution of distributed systems. To build DCatch, we design a set of happens-before rules that model a wide variety of communication and concurrency mechanisms in real-world distributed cloud systems. We then build runtime tracing and trace analysis tools to effectively identify concurrent conflicting memory accesses in these systems. Finally, we design tools to help prune false positives and trigger DCbugs. We have evaluated DCatch on four representative open-source distributed cloud systems, Cassandra, Hadoop MapReduce, HBase, and ZooKeeper. By monitoring correct execution of seven workloads on these systems, DCatch reports 32 DCbugs, with 20 of them being truly harmful. © 2017 ACM.
引用
收藏
页码:677 / 691
页数:14
相关论文
共 50 条
  • [1] Bertot Y., Casteran P., Interactive theorem proving and program development, Coq'Art: The Calculus of Inductive Constructions, (2004)
  • [2] Brutschy L., Dimitrov D., Muller P., Vechev M.T., Serializability for eventual consistency: Criterion, analysis, and applications, POPL, (2017)
  • [3] Burrows M., The chubby lock service for loosely-coupled distributed systems, OSDI, (2006)
  • [4] Chang F., Dean J., Ghemawat S., Hsieh W.C., Wallach D.A., Burrows M., Chandra T., Fikes A., Gruber R., Bigtable: A distributed storage system for structured data, OSDI, (2006)
  • [5] Dean J., Ghemawat S., Mapreduce: Simplified data processing on large clusters, OSDI, (2004)
  • [6] DeCandia G., Hastorun D., Jampani M., Kakulapati G., Lakshman A., Pilchin A., Sivasubramanian S., Vosshall P., Vogels W., Dynamo: Amazon's highly available key-value store, SOSP, (2007)
  • [7] Desai A., Gupta V., Jackson E., Qadeer S., Rajamani S., Zufferey D., P: Safe asynchronous event-driven programming, PLDI, (2013)
  • [8] Flanagan C., Freund S.N., Atomizer: A dynamic atomicity checker for multithreaded programs, ACM SIGPLAN Notices, 39, 1, pp. 256-267, (2004)
  • [9] Flanagan C., Freund S.N., Fasttrack: Efficient and precise dynamic race detection, PLDI, (2009)
  • [10] Gao Q., Zhang W., Chen Z., Zheng M., Qin F., 2ndStrike: Toward Manifesting Hidden Concurrency Typestate Bugs, ASPLOS, (2011)