Deep Just-In-Time Defect Localization

被引:7
作者
Qiu, Fangcheng [1 ]
Gao, Zhipeng [2 ]
Xia, Xin [3 ]
Lo, David [4 ]
Grundy, John [2 ]
Wang, Xinyu [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Zhejiang 310027, Peoples R China
[2] Monash Univ, Fac Informat Technol, Melbourne, Vic 3800, Australia
[3] Huawei, Software Engn Applicat Technol Lab, Huawei 518129, Peoples R China
[4] Singapore Management Univ, Sch Informat Syst, Singapore 188065, Singapore
基金
国家重点研发计划; 新加坡国家研究基金会;
关键词
Defect localization; just-in-time; software naturalness; deep learning;
D O I
10.1109/TSE.2021.3135875
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
During software development and maintenance, defect localization is an essential part of software quality assurance. Even though different techniques have been proposed for defect localization, i.e., information retrieval (IR)-based techniques and spectrum-based techniques, they can only work after the defect has been exposed, which can be too late and costly to adapt to the newly introduced bugs in the daily development. There are also many JIT defect prediction tools that have been proposed to predict the buggy commit. But these tools do not locate the suspicious buggy positions in the buggy commit. To assist developers to detect bugs in time and avoid introducing them, just-in-time (JIT) bug localization techniques have been proposed, which is targeting to locate suspicious buggy code after a change commit has been submitted. In this paper, we propose a novel JIT defect localization approach, named DeepDL (Deep Learning-based defect localization), to locate defect code lines within a defect introducing change. DeepDL employs a neural language model to capture the semantics of the code lines, in this way, the naturalness of each code line can be learned and converted to a suspiciousness score. The core of our DeepDL is a deep learning-based neural language model. We train the neural language model with previous snapshots (history versions) of a project so that it can calculate the naturalness of a piece of code. In its application, for a given new code change, DeepDL automatically assigns a suspiciousness score to each code line and sorts these code lines in descending order of this score. The code lines at the top of the list are considered as potential defect locations. Our tool can assist developers efficiently check buggy lines at an early stage, which is able to reduce the risk of introducing bugs in time and improve the developers' confidence in the reliability of their software. We conducted an extensive experiment on 14 open source Java projects with a total of 11,615 buggy changes. We evaluate the experimental results considering four evaluation metrics. The experimental results show that our method outperforms the state-of-the-art by a substantial margin.
引用
收藏
页码:5068 / 5086
页数:19
相关论文
共 59 条
[1]  
Abdi H., 2007, ENCY MEASUREMENT STA, V3, P103, DOI [10.4135/9781412952644, DOI 10.4135/9781412952644]
[2]  
Abreu R, 2006, 12TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, P39
[3]  
Ahmad Wasi Uddin, 2020, P 58 ANN M ASS COMP, P4998, DOI [DOI 10.18653/V1/2020.ACL-MAIN.449, 10.18653/v1/2020.acl-main.449]
[4]  
Alon U., 2018, arXiv
[5]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[6]  
Bahdanau D., 2019, PROC IEEE INT C ACOU, P4945
[7]  
Neto EC, 2018, 2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), P380, DOI 10.1109/SANER.2018.8330225
[8]  
Chen Z., 2019, IEEE Transactions on Software Engineering, V47, P1943
[9]  
Chiu CC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4774, DOI 10.1109/ICASSP.2018.8462105
[10]  
Collofello J. S., 1987, AFIPS Conference Proceedings. Vol.56: 1987 National Computer Conference, P539