Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach

被引:109
|
作者
Wu, Fei [1 ]
Jing, Xiao-Yuan [1 ,2 ]
Sun, Ying [1 ]
Sun, Jing [1 ]
Huang, Lin [1 ]
Cui, Fangyi [1 ]
Sun, Yanfei [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210003, Jiangsu, Peoples R China
[2] Wuhan Univ, Sch Comp, State Key Lab Software Engn, Wuhan 430072, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cost-sensitive kernelized semisupervised dictionary learning (CKSDL); cross-project semisupervised defect prediction (CSDP); within-project semisupervised defect prediction (WSDP); NETWORKS; MACHINE;
D O I
10.1109/TR.2018.2804922
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
When there exist not enough historical defect data for building an accurate prediction model, semisupervised defect prediction (SSDP) and cross-project defect prediction (CPDP) are two feasible solutions. Existing CPDP methods assume that the available source data are well labeled. However, due to expensive human efforts for labeling a large amount of defect data, usually, we can only utilize the suitable unlabeled source data. We call CPDP in this scenario as cross-project semisupervised defect prediction (CSDP). Although some within-project semisupervised defect prediction (WSDP) methods have been developed in recent years, there still exists much room for improvement on prediction performance. In this paper, we aim to provide a unified and effective solution for both CSDP and WSDP problems. We introduce the semisupervised dictionary learning technique and propose a cost-sensitive kernelized semisupervised dictionary learning (CKSDL) approach. CKSDL can make full use of the limited labeled defect data and a large amount of unlabeled data in the kernel space. In addition, CKSDL considers the misclassification costs in the dictionary learning process. Extensive experiments on 16 projects indicate that CKSDL outperforms state-of-the-art WSDP methods, using unlabeled cross-project defect data can help improve the WSDP performance, and CKSDL generally obtains significantly better prediction performance than related SSDP methods in the CSDP scenario.
引用
收藏
页码:581 / 597
页数:17
相关论文
共 22 条
  • [21] An Exploratory Study on Applicability of Cross Project Defect Prediction Approaches to Cross-Company Effort Estimation
    Amasaki, Sousuke
    Aman, Hirohisa
    Yokogawa, Tomoyuki
    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, PROMISE 2020, 2020, : 71 - 80
  • [22] Multi-source Cross Project Defect Prediction with Joint Wasserstein Distance and Ensemble Learning
    Zou, Quanyi
    Lu, Lu
    Yang, Zhanyu
    Xu, Hao
    2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 57 - 68