Revisiting heterogeneous defect prediction methods: How far are we?

被引:19
作者
Chen, Xiang [1 ,2 ]
Mu, Yanzhou [3 ]
Liu, Ke [1 ]
Cui, Zhanqi [4 ]
Ni, Chao [5 ]
机构
[1] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
[2] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Guilin, Peoples R China
[3] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[4] Beijing Informat Sci & Technol Univ, Comp Sch, Beijing, Peoples R China
[5] Zhejiang Univ, Sch Software Technol, Ningbo, Peoples R China
基金
中国国家自然科学基金;
关键词
Software defect prediction; Heterogeneous defect prediction; Unsupervised defect prediction; Non-effort-aware performance indicators; Effort-aware performance indicators; Diversity analysis; Empirical studies; FEATURE-SELECTION;
D O I
10.1016/j.infsof.2020.106441
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Cross-project defect prediction applies to the scenarios that the target projects are new projects. Most of the previous studies tried to utilize the training data from other projects (i.e., the source projects). However, metrics used by practitioners to measure the extracted program modules from different projects may not be the same, and performing heterogeneous defect prediction (HDP) is challenging. Objective: Researchers have proposed many novel HDP methods with promising performance until now. Recently, unsupervised defect prediction (UDP) methods have received more attention and show competitive performance. However, to our best knowledge, whether HDP methods can perform significantly better than UDP methods has not yet been thoroughly investigated. Method: In this article, we perform a comparative study to have a holistic look at this issue. Specifically, we compare five HDP methods with four UDP methods on 34 projects in five groups under the same experimental setup from three different perspectives: non-effort-aware performance indicators (NPIs), effort-aware performance indicators (EPIs) and diversity analysis on identifying defective modules. Result: We have the following findings: (1) HDP methods do not perform significantly better than some of UDP methods in terms of two NPIs and four EPIs. (2) According to two satisfactory criteria recommended by previous studies, the satisfactory ratio of existing HDP methods is pessimistic. (3) The diversity of prediction for defective modules across HDP vs. UDP methods is more than that within HDP methods or UDP methods. Conclusion: The above findings implicate there is still a long way for the HDP issue to go. Given this, we present some observations about the road ahead for HDP.
引用
收藏
页数:16
相关论文
共 68 条
[1]   Is "Better Data" Better Than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction [J].
Agrawal, Amritanshu ;
Menzies, Tim .
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :1050-1061
[2]   Heterogeneous Defect Prediction [J].
Nam, Jaechang ;
Fu, Wei ;
Kim, Sunghun ;
Menzies, Tim ;
Tan, Lin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (09) :874-896
[3]  
[Anonymous], 2018, OPTOELECTRON LETT, DOI DOI 10.1007/s11801-018-7228-5
[4]  
[Anonymous], 2011, P 19 ACM SIGSOFT S 1
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[7]   Software defect prediction: do different classifiers find the same defects? [J].
Bowes, David ;
Hall, Tracy ;
Petric, Jean .
SOFTWARE QUALITY JOURNAL, 2018, 26 (02) :525-552
[8]  
Chakraborty J., 2019, ARXIV190505786
[9]   Negative samples reduction in cross-company software defects prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 62 :67-77
[10]   Large-Scale Empirical Studies on Effort-Aware Security Vulnerability Prediction Methods [J].
Chen, Xiang ;
Zhao, Yingquan ;
Cui, Zhanqi ;
Meng, Guozhu ;
Liu, Yang ;
Wang, Zan .
IEEE TRANSACTIONS ON RELIABILITY, 2020, 69 (01) :70-87