REPD: Source code defect prediction as anomaly detection

被引:10
作者
Afric, Petar [1 ]
Sikic, Lucija [1 ]
Kurdija, Adrian Satja [1 ]
Silic, Marin [1 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Zagreb, Croatia
关键词
Defect prediction; Anomaly detection; REPD; Program analysis; OBJECT-ORIENTED SOFTWARE; FAULTS; METRICS;
D O I
10.1016/j.jss.2020.110641
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we present a novel approach for within-project source code defect prediction. Since defect prediction datasets are typically imbalanced, and there are few defective examples, we treat defect prediction as anomaly detection. We present our Reconstruction Error Probability Distribution (REPD) model which can handle point and collective anomalies. We compare it on five different traditional code feature datasets against five models: Gaussian Naive Bayes, logistic regression, k-nearest-neighbors, decision tree, and Hybrid SMOTE-Ensemble. In addition, REPD is compared on 24 semantic features datasets against previously mentioned models. In order to compare the performance of competing models, we utilize F1-score measure. By using statistical means, we show that our model produces significantly better results, improving F1-score up to 7.12%. Additionally, REPD's robustness to dataset imbalance is analyzed by creating defect undersampled and non-defect oversampled datasets. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页数:15
相关论文
共 57 条
[1]  
ABREU FBE, 1994, J SYST SOFTWARE, V26, P87, DOI 10.1016/0164-1212(94)90099-X
[2]  
Afric P., 2019, REPD MODEL SOURCE CO
[3]   Hybrid SMOTE-Ensemble Approach for Software Defect Prediction [J].
Alsawalqah, Hamad ;
Faris, Hossam ;
Aljarah, Ibrahim ;
Alnemer, Loai ;
Alhindawi, Nouh .
SOFTWARE ENGINEERING TRENDS AND TECHNIQUES IN INTELLIGENT SYSTEMS, CSOC2017, VOL 3, 2017, 575 :355-366
[4]  
[Anonymous], 2015, 3 INT C LEARNING REP
[5]  
[Anonymous], 1977, ELEMENTS SOFTWARE SC
[6]  
Arnold B.C., 2015, ONLINE, P1, DOI DOI 10.1002/9781118445112.STAT01100.PUB2
[7]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[8]  
Bettenburg N., 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR 2012), P60, DOI 10.1109/MSR.2012.6224300
[9]  
Cheung L, 2008, ICSE'08 PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P111, DOI 10.1145/1368088.1368104
[10]   A METRICS SUITE FOR OBJECT-ORIENTED DESIGN [J].
CHIDAMBER, SR ;
KEMERER, CF .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) :476-493