Efficiently Detecting Concurrency Bugs in Persistent Memory Programs

被引:4
作者
Chen, Zhangyu [1 ]
Hua, Yu [1 ]
Zhang, Yongle [2 ]
Ding, Luochangqi [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Hubei, Peoples R China
[2] Purdue Univ, W Lafayette, IN 47907 USA
来源
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS | 2022年
基金
中国国家自然科学基金;
关键词
Persistent Memory; Crash Consistency; Testing; Debugging; Concurrency;
D O I
10.1145/3503222.3507755
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the salient DRAM-comparable performance, TB-scale capacity, and non-volatility, persistent memory (PM) provides new opportunities for large-scale in-memory computing with instant crash recovery. However, programming PM systems is error-prone due to the existence of crash-consistency bugs, which are challenging to diagnose especially with concurrent programming widely adopted in PM applications to exploit hardware parallelism. Existing bug detection tools for DRAM-based concurrency issues cannot detect PM crash-consistency bugs because they are oblivious to PM operations and PM consistency. On the other hand, existing PM-specific debugging tools only focus on sequential PM programs and cannot effectively detect crash-consistency issues hidden in concurrent executions. In order to effectively detect crash-consistency bugs that only manifest in concurrent executions, we propose PMRace, the first PM-specific concurrency bug detection tool. We identify and define two new types of concurrent crash-consistency bugs: PM Inter-thread Inconsistency and PM Synchronization Inconsistency. In particular, PMRace adopts PM-aware and coverage-guided fuzz testing to explore PM program executions. For PM Inter-thread Inconsistency, which denotes the data inconsistency hidden in thread inter-leavings, PMRace performs PM-aware interleaving exploration and thread scheduling to drive the execution towards executions that reveal such inconsistencies. For PM Synchronization Inconsistency between persisted synchronization variables and program data, PMRace identifies the inconsistency during interleaving exploration. The post-failure validation reduces the false positives that come from custom crash recovery mechanisms. PMRace has found 14 bugs (10 new bugs) in real-world concurrent PM systems including PM-version memcached.
引用
收藏
页码:873 / 887
页数:15
相关论文
共 70 条
  • [1] BBB: Simplifying Persistent Programming using Battery-Backed Buffers
    Alshboul, Mohammad
    Ramrakhyani, Prakash
    Wang, William
    Tuck, James
    Solihin, Yan
    [J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 111 - 124
  • [2] [Anonymous], 2014, P 9 EUR C COMP SYST, DOI DOI 10.1145/2592798.2592814
  • [3] [Anonymous], 2018, ARM Architecture Reference Manual
  • [4] [Anonymous], 2003, PaX address space layout randomization (ASLR)
  • [5] Arpaci-Dusseau R. H., 2018, Operating Systems: Three Easy Pieces
  • [6] Carellan E.B. G. D., 2018, Discover Persistent Memory Programming Errors with Pmemcheck
  • [7] Chen Z., 2020, 2020 USENIX ANN TECH
  • [8] Chen Zhangyu, 2022, REPLICATION PACKAGE, DOI [10.5281/zenodo.5790730, DOI 10.5281/ZENODO.5790730]
  • [9] Understanding and Dealing with Hard Faults in Persistent Memory Systems
    Choi, Brian
    Burns, Randal
    Huang, Peng
    [J]. PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 441 - 457
  • [10] Choi Jungsik, 2020, 2020 USENIX ANN TECH