Isolation Forest Filter to Simplify Training Data for Cross-Project Defect Prediction

被引:4
作者
Cui, Can [1 ]
Liu, Bin [1 ]
Wang, Shihai [1 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, Beijing, Peoples R China
来源
2019 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-QINGDAO) | 2019年
关键词
Cross-project defect prediction (CPDP); Isolation Forest Filter; data mining; transfer learning; training data simplification; SOFTWARE; MODELS; CODE;
D O I
10.1109/phm-qingdao46334.2019.8942919
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Cross-project defect prediction (CPDP) is an active research area. When the historical data is limited or a new project is to develop, establishing CPDP models is very useful, which assists software testers to judge the defect-prone entities or software managers to focus on the "important" parts by allocating the manpower, budget, time. However, the dissimilarity of data distributions between the source projects and the target project decreases the performance of CPDP models. How to simplify or the cross-project training data is an important problem. To solve this issue, an isolation forest (iForest) filter is proposed. We use 15 versions of different java projects from open PROMISE Data Repository and five typical predictors (naive bayes (NB), decision tree (DT), logistic regression(LR), k-nearest neighbor(k-NN) and random forest(RF) to build 1050 (15*14*5) software defect prediction models (SDPM). Meanwhile, we compare our models with Burak Filter models and Peter Filter models. From the results of performance measures, called AUC, balance, G-measure, G-mean, F1-measure, we can know that our iForest filter is feasible and even better than other two. Therefore, using iForest filter can make cross-project training data simple and build efficient SDPM.
引用
收藏
页数:6
相关论文
共 21 条
[1]  
Bettenburg N., 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR 2012), P60, DOI 10.1109/MSR.2012.6224300
[2]   Assessing the applicability of fault-proneness models across object-oriented software projects [J].
Briand, LC ;
Melo, WL ;
Wüst, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720
[3]  
Canfora G., 2013, P 6 IEEE INT C SOFTW
[4]   A systematic review of software fault prediction studies [J].
Catal, Cagatay ;
Diri, Banu .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7346-7354
[5]   Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem [J].
Catal, Cagatay ;
Diri, Banu .
INFORMATION SCIENCES, 2009, 179 (08) :1040-1058
[6]  
Cruz AEC, 2009, INT SYMP EMP SOFTWAR, P461
[7]   An investigation on the feasibility of cross-project defect prediction [J].
He, Zhimin ;
Shu, Fengdi ;
Yang, Ye ;
Li, Mingshu ;
Wang, Qing .
AUTOMATED SOFTWARE ENGINEERING, 2012, 19 (02) :167-199
[8]  
Herbold S., 2013, INT C PRED MOD SOFTW
[9]  
Herbold S., 2016, EMPIRICAL SOFTWARE E
[10]  
Jureczko M., 2010, P 6 INT C PRED MOD S, P1, DOI [DOI 10.1145/1868328.1868342, 10.1145/1868328.1868342]