Cross-Project Online Just-In-Time Software Defect Prediction

被引：22

作者：

Tabassum, Sadia ^{[1
]}

Minku, Leandro L. ^{[1
]}

Feng, Danyi ^{[2
]}

机构：

[1] Univ Birminigham, Sch Comp Sci, Birmingham B15 2TT, England

[2] Xiliu Tech, Beijing 100050, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2023年 / 49卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

Training; Software; Training data; Predictive models; Codes; Resource management; Open source software; Software defect prediction; cross-project learning; transfer learning; online learning; verification latency; concept drift; MACHINE;

D O I：

10.1109/TSE.2022.3150153

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Cross-Project (CP) Just-In-Time Software Defect Prediction (JIT-SDP) makes use of CP data to overcome the lack of data necessary to train well performing JIT-SDP classifiers at the beginning of software projects. However, such approaches have never been investigated in realistic online learning scenarios, where Within-Project (WP) software changes naturally arrive over time and can be used to automatically update the classifiers. We provide the first investigation of when and to what extent CP data are useful for JIT-SDP in such realistic scenarios. For that, we propose three different online CP JIT-SDP approaches that can be updated with incoming CP and WP training examples over time. We also collect data on 9 proprietary software projects and use 10 open source software projects to analyse these approaches. We find that training classifiers with incoming CP+WP data can lead to absolute improvements in G-mean of up to 53.89% and up to 35.02% at the initial stage of the projects compared to classifiers using WP-only and CP-only data, respectively. Using CP+WP data was also shown to be beneficial after a large number of WP data were received. Using CP data to supplement WP data helped the classifiers to reduce or prevent large drops in predictive performance that may occur over time, leading to absolute G-Mean improvements of up to 37.35% and 48.16% compared to WP-only and CP-only data during such periods, respectively. During periods of stable predictive performance, absolute improvements were of up to 29.03% and up to 41.25% compared to WP-only and CP-only classifiers, respectively. Our results highlight the importance of using both CP and WP data together in realistic online JIT-SDP scenarios.

引用

页码：268 / 287

页数：20

共 45 条

[1] Is "Better Data" Better Than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction [J].

Agrawal, Amritanshu ;

Menzies, Tim .

PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :1050-1061

[2] Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction [J].

Cabral, George G. ;

Minku, Leandro L. ;

Shihab, Emad ;

Mujahid, Suhaib .

2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, :666-676

[3] Multi-Objective Cross-Project Defect Prediction [J].

Canfora, Gerardo ;

De Lucia, Andrea ;

Di Penta, Massimiliano ;

Oliveto, Rocco ;

Panichella, Annibale ;

Panichella, Sebastiano .

2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2013), 2013, :252-261

[4]

Catolino G, 2019, 2019 IEEE/ACM 6TH INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS (MOBILESOFT 2019), P99, DOI 10.1109/MOBILESoft.2019.00023

[5] MULTI: Multi-objective effort-aware just-in-time software defect prediction [J].

Chen, Xiang ;

Zhao, Yingquan ;

Wang, Qiuping ;

Yuan, Zhidan .

INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 93 :1-13

[6]

Demsar J, 2006, J MACH LEARN RES, V7, P1

[7] Learning in Nonstationary Environments: A Survey [J].

Ditzler, Gregory ;

Roveri, Manuel ;

Alippi, Cesare ;

Polikar, Robi .

IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2015, 10 (04) :12-25

[8]

Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107

[9]

Duarte M., 2020, detecta: A python module to detect events in data

[10] On the need of preserving order of data when validating within-project defect classifiers [J].

Falessi, Davide ;

Huang, Jacky ;

Narayana, Likhita ;

Thai, Jennifer Fong ;

Turhan, Burak .

EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (06) :4805-4830

← 1 2 3 4 5 →