Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning

被引:8
作者
Zhang, Tianhang [1 ]
Du, Qingfeng [1 ]
Xu, Jincheng [1 ]
Li, Jiechu [1 ]
Li, Xiaojun [2 ]
机构
[1] Tongji Univ, Sch Software Engn, Shanghai, Peoples R China
[2] Tongji Univ, Coll Civil Engn, Dept Geotech Engn, Shanghai, Peoples R China
来源
2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020) | 2020年
基金
中国国家自然科学基金;
关键词
Software defect prediction; Ensemble learning; Attention model; Deep learning;
D O I
10.1109/APSEC51365.2020.00016
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction (SDP) utilizes a trained prediction model to predict the defect proneness of code modules in a software system by mining the inherent characteristics of historical defect data. An effective model can optimize the allocation of testing resources, thus improving the quality of software products. Most previous studies use handcrafted features to represent code snippets, but the main problem is that it is difficult to capture the semantic and structural information of the code context, which is often crucial for software defect prediction. Meanwhile, most of the existing software defect prediction models cannot make predictions at the code line level, which makes it extremely arduous to provide developers with more detailed reference information. To address these issues, in this paper, we propose a model based on ensemble learning techniques and attention mechanisms to offer more comprehensive prediction information to developers by locating suspect lines of code when making method-level defect predictions. This model leverages abstract syntax trees (ASTs) as the intermediate representation of code snippets. Since the historical defect data has a striking characteristic of classimbalance, an approach based on Self-organizing Map (SOM) clustering is employed to handle noisy data. Experimental results show that, on average, the proposed model improves the F-measure by 17.7% and AUC by 37.8%, compared with the other four machine learning algorithms.
引用
收藏
页码:81 / 90
页数:10
相关论文
共 33 条
[1]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[2]   Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction [J].
Anh Viet Phan ;
Minh Le Nguyen ;
Lam Thu Bui .
2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, :45-52
[3]   A feature dependent Naive Bayes approach and its application to the software defect prediction problem [J].
Arar, Omer Faruk ;
Ayan, Kursat .
APPLIED SOFT COMPUTING, 2017, 59 :197-209
[4]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[5]   Software defect prediction: do different classifiers find the same defects? [J].
Bowes, David ;
Hall, Tracy ;
Petric, Jean .
SOFTWARE QUALITY JOURNAL, 2018, 26 (02) :525-552
[6]   Tackling class overlap and imbalance problems in software defect prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
SOFTWARE QUALITY JOURNAL, 2018, 26 (01) :97-125
[7]  
Dam Hoa Khanh, 2018, ARXIV PREPRINT ARXIV
[8]   Integrated Approach to Software Defect Prediction [J].
Felix, Ebubeogu Amarachukwu ;
Lee, Sai Peck .
IEEE ACCESS, 2017, 5 :21524-21547
[9]   Reflections on the NASA MDP data sets [J].
Gray, D. ;
Bowes, D. ;
Davey, N. ;
Sun, Y. ;
Christianson, B. .
IET SOFTWARE, 2012, 6 (06) :549-558
[10]   A Systematic Literature Review on Fault Prediction Performance in Software Engineering [J].
Hall, Tracy ;
Beecham, Sarah ;
Bowes, David ;
Gray, David ;
Counsell, Steve .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2012, 38 (06) :1276-1304