Automated Localization for Unreproducible Builds

被引:24
作者
Ren, Zhilei [1 ]
Jiang, He [1 ]
Xuan, Jifeng [2 ]
Yang, Zijiang [3 ]
机构
[1] Dalian Univ Technol, Sch Software, Key Lab Ubiquitous Network & Serv Software Liaoni, Dalian, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan, Hubei, Peoples R China
[3] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
来源
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) | 2018年
基金
中国国家自然科学基金;
关键词
Unreproducible Build; Localization; Software Maintenance;
D O I
10.1145/3180155.3180224
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reproducibility is the ability of recreating identical binaries under pre-defined build environments. Due to the need of quality assurance and the benefit of better detecting attacks against build environments, the practice of reproducible builds has gained popularity in many open-source software repositories such as Debian and Bitcoin. However, identifying the unreproducible issues remains a labour intensive and time consuming challenge, because of the lacking of information to guide the search and the diversity of the causes that may lead to the unreproducible binaries. In this paper we propose an automated framework called RepLoc to localize the problematic files for unreproducible builds. RepLoc features a query augmentation component that utilizes the information extracted from the build logs, and a heuristic rule-based filtering component that narrows the search scope. By integrating the two components with a weighted file ranking module, RepLoc is able to automatically produce a ranked list of files that are helpful in locating the problematic files for the unreproducible builds. We have implemented a prototype and conducted extensive experiments over 671 real-world unreproducible Debian packages in four different categories. By considering the topmost ranked file only, RepLoc achieves an accuracy rate of 47.09%. If we expand our examination to the top ten ranked files in the list produced by RepLoc, the accuracy rate becomes 79.28%. Considering that there are hundreds of source code, scripts, Makefiles, etc., in a package, RepLoc significantly reduces the scope of localizing problematic files. Moreover, with the help of RepLoc, we successfully identified and fixed six new unreproducible packages from Debian and Guix.
引用
收藏
页码:71 / 81
页数:11
相关论文
共 38 条
[1]  
[Anonymous], 2017, REPRODUCIBLE BUILDS
[2]  
[Anonymous], 2017, OVERVIEW REPRODUCIBL
[3]  
[Anonymous], 2015, PROTECTING OUR CUSTO
[4]  
[Anonymous], 2017, KNOWN ISSUES RELATED
[5]  
[Anonymous], 2017, DEBIAN PACKAGING SOU
[6]  
[Anonymous], 2017, NOTES BUILD REPRODUC
[7]  
[Anonymous], 2017, VARIATIONS INTRO TES
[8]  
[Anonymous], 2017, GUIX SYSTEM DISTRIBU
[9]  
[Anonymous], 2008, INTRO INFORM RETRIEV, DOI DOI 10.1017/CBO9780511809071
[10]  
[Anonymous], 2015, P 30 IEEE ACM INT C, DOI DOI 10.1109/ASE.2015.73