An Empirical Study on Heterogeneous Defect Prediction Approaches

被引:49
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ]
Li, Zhiqiang [4 ]
Wu, Di [1 ]
Peng, Yi [1 ]
Huang, Zhiguo [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp, Maoming 525000, Peoples R China
[3] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[4] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Measurement; NASA; Predictive models; Data models; Software quality; Heterogeneous defect prediction; cross-project; empirical study; metric selection; metric transformation; KERNEL; MODELS; CODE; MACHINE; FAULTS;
D O I
10.1109/TSE.2020.2968520
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction has always been a hot research topic in the field of software engineering owing to its capability of allocating limited resources reasonably. Compared with cross-project defect prediction (CPDP), heterogeneous defect prediction (HDP) further relaxes the limitation of defect data used for prediction, permitting different metric sets to be contained in the source and target projects. However, there is still a lack of a holistic understanding of existing HDP studies due to different evaluation strategies and experimental settings. In this paper, we provide an empirical study on HDP approaches. We review the research status systematically and compare the HDP approaches proposed from 2014 to June 2018. Furthermore, we also investigate the feasibility of HDP approaches in CPDP. Through extensive experiments on 30 projects from five datasets, we have the following findings: (1) metric transformation-based HDP approaches usually result in better prediction effects, while metric selection-based approaches have better interpretability. Overall, the HDP approach proposed by Li et al. (CTKCCA) currently has the best performance. (2) Handling class imbalance problems can boost the prediction effects, but the improvements are usually limited. In addition, utilizing mixed project data cannot improve the performance of HDP approaches consistently since the label information in the target project is not used effectively. (3) HDP approaches are feasible for cross-project defect prediction in which the source and target projects have the same metric set.
引用
收藏
页码:2803 / 2822
页数:20
相关论文
共 78 条
[1]  
[Anonymous], P 30 IEEE ACM INT C
[2]  
[Anonymous], 2016, PROC INT CONF SOFTW, DOI DOI 10.1145/2884781.2884857
[3]  
[Anonymous], 2016, P INT C SOFTW ENG KN, DOI DOI 10.18293/SEKE2016-090
[4]  
[Anonymous], 2012, P ACM SIGSOFT 20 INT
[5]   Kernel independent component analysis [J].
Bach, FR ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) :1-48
[6]  
Basili V.R, 1994, The Goal Question Metric Approach, P528, DOI 10.1002/0471028959.sof142
[7]   Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection [J].
Belhumeur, PN ;
Hespanha, JP ;
Kriegman, DJ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) :711-720
[8]   Assessing the applicability of fault-proneness models across object-oriented software projects [J].
Briand, LC ;
Melo, WL ;
Wüst, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720
[9]   Defect prediction as a multiobjective optimization problem [J].
Canfora, Gerardo ;
De Lucia, Andrea ;
Di Penta, Massimiliano ;
Oliveto, Rocco ;
Panichella, Annibale ;
Panichella, Sebastiano .
SOFTWARE TESTING VERIFICATION & RELIABILITY, 2015, 25 (04) :426-459
[10]   Negative samples reduction in cross-company software defects prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 62 :67-77