An Empirical Study on Heterogeneous Defect Prediction Approaches

被引:45
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ]
Li, Zhiqiang [4 ]
Wu, Di [1 ]
Peng, Yi [1 ]
Huang, Zhiguo [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp, Maoming 525000, Peoples R China
[3] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[4] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Measurement; NASA; Predictive models; Data models; Software quality; Heterogeneous defect prediction; cross-project; empirical study; metric selection; metric transformation; KERNEL; MODELS; CODE; MACHINE; FAULTS;
D O I
10.1109/TSE.2020.2968520
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction has always been a hot research topic in the field of software engineering owing to its capability of allocating limited resources reasonably. Compared with cross-project defect prediction (CPDP), heterogeneous defect prediction (HDP) further relaxes the limitation of defect data used for prediction, permitting different metric sets to be contained in the source and target projects. However, there is still a lack of a holistic understanding of existing HDP studies due to different evaluation strategies and experimental settings. In this paper, we provide an empirical study on HDP approaches. We review the research status systematically and compare the HDP approaches proposed from 2014 to June 2018. Furthermore, we also investigate the feasibility of HDP approaches in CPDP. Through extensive experiments on 30 projects from five datasets, we have the following findings: (1) metric transformation-based HDP approaches usually result in better prediction effects, while metric selection-based approaches have better interpretability. Overall, the HDP approach proposed by Li et al. (CTKCCA) currently has the best performance. (2) Handling class imbalance problems can boost the prediction effects, but the improvements are usually limited. In addition, utilizing mixed project data cannot improve the performance of HDP approaches consistently since the label information in the target project is not used effectively. (3) HDP approaches are feasible for cross-project defect prediction in which the source and target projects have the same metric set.
引用
收藏
页码:2803 / 2822
页数:20
相关论文
共 78 条
  • [1] [Anonymous], P 30 IEEE ACM INT C
  • [2] [Anonymous], 2011, P 19 ACM SIGSOFT S 1
  • [3] [Anonymous], 2012, P ACM SIGSOFT 20 INT
  • [4] Kernel independent component analysis
    Bach, FR
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) : 1 - 48
  • [5] Basili V.R, 1994, Encyclopedia of software engineering, P528, DOI 10.1002/0471028959.sof142
  • [6] Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
    Belhumeur, PN
    Hespanha, JP
    Kriegman, DJ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) : 711 - 720
  • [7] Assessing the applicability of fault-proneness models across object-oriented software projects
    Briand, LC
    Melo, WL
    Wüst, J
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 706 - 720
  • [8] Defect prediction as a multiobjective optimization problem
    Canfora, Gerardo
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    [J]. SOFTWARE TESTING VERIFICATION & RELIABILITY, 2015, 25 (04) : 426 - 459
  • [9] Negative samples reduction in cross-company software defects prediction
    Chen, Lin
    Fang, Bin
    Shang, Zhaowei
    Tang, Yuanyan
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 62 : 67 - 77
  • [10] Cheng M., 2016, P 28 INT C SOFTW ENG, P171