An Empirical Study on Heterogeneous Defect Prediction Approaches

被引:45
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ]
Li, Zhiqiang [4 ]
Wu, Di [1 ]
Peng, Yi [1 ]
Huang, Zhiguo [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp, Maoming 525000, Peoples R China
[3] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[4] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Measurement; NASA; Predictive models; Data models; Software quality; Heterogeneous defect prediction; cross-project; empirical study; metric selection; metric transformation; KERNEL; MODELS; CODE; MACHINE; FAULTS;
D O I
10.1109/TSE.2020.2968520
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction has always been a hot research topic in the field of software engineering owing to its capability of allocating limited resources reasonably. Compared with cross-project defect prediction (CPDP), heterogeneous defect prediction (HDP) further relaxes the limitation of defect data used for prediction, permitting different metric sets to be contained in the source and target projects. However, there is still a lack of a holistic understanding of existing HDP studies due to different evaluation strategies and experimental settings. In this paper, we provide an empirical study on HDP approaches. We review the research status systematically and compare the HDP approaches proposed from 2014 to June 2018. Furthermore, we also investigate the feasibility of HDP approaches in CPDP. Through extensive experiments on 30 projects from five datasets, we have the following findings: (1) metric transformation-based HDP approaches usually result in better prediction effects, while metric selection-based approaches have better interpretability. Overall, the HDP approach proposed by Li et al. (CTKCCA) currently has the best performance. (2) Handling class imbalance problems can boost the prediction effects, but the improvements are usually limited. In addition, utilizing mixed project data cannot improve the performance of HDP approaches consistently since the label information in the target project is not used effectively. (3) HDP approaches are feasible for cross-project defect prediction in which the source and target projects have the same metric set.
引用
收藏
页码:2803 / 2822
页数:20
相关论文
共 78 条
  • [61] Empirical evaluation of the effects of mixed project data on learning defect predictors
    Turhan, Burak
    Misirli, Ayse Tosun
    Bener, Ayse
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2013, 55 (06) : 1101 - 1118
  • [62] On the relative value of cross-company and within-company data for defect prediction
    Turhan, Burak
    Menzies, Tim
    Bener, Ayse B.
    Di Stefano, Justin
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2009, 14 (05) : 540 - 578
  • [63] Automatically Learning Semantic Features for Defect Prediction
    Wang, Song
    Liu, Taiyue
    Tan, Lin
    [J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 297 - 308
  • [64] Non-negative sparse-based SemiBoost for software defect prediction
    Wang, Tiejian
    Zhang, Zhiwu
    Jing, Xiaoyuan
    Liu, Yanli
    [J]. SOFTWARE TESTING VERIFICATION & RELIABILITY, 2016, 26 (07) : 498 - 515
  • [65] Multiple kernel ensemble learning for software defect prediction
    Wang, Tiejian
    Zhang, Zhiwu
    Jing, Xiaoyuan
    Zhang, Liqiang
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2016, 23 (04) : 569 - 590
  • [66] A survey of transfer learning
    Weiss K.
    Khoshgoftaar T.M.
    Wang D.D.
    [J]. Journal of Big Data, 3 (1)
  • [67] Weston J., 2003, Journal of Machine Learning Research, V3, P1439, DOI 10.1162/153244303322753751
  • [68] Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach
    Wu, Fei
    Jing, Xiao-Yuan
    Sun, Ying
    Sun, Jing
    Huang, Lin
    Cui, Fangyi
    Sun, Yanfei
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2018, 67 (02) : 581 - 597
  • [69] HYDRA: Massively Compositional Model for Cross-Project Defect Prediction
    Xia, Xin
    Lo, David
    Pan, Sinno Jialin
    Nagappan, Nachiappan
    Wang, Xinyu
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2016, 42 (10) : 977 - 998
  • [70] Kernel Optimization in Discriminant Analysis
    You, Di
    Hamsici, Onur C.
    Martinez, Aleix M.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) : 631 - 638