Heterogeneous fault prediction with cost-sensitive domain adaptation

被引:27
作者
Li, Zhiqiang [1 ]
Jing, Xiao-Yuan [1 ,2 ]
Zhu, Xiaoke [1 ,3 ]
机构
[1] Wuhan Univ, Sch Comp, State Key Lab Software Engn, Wuhan 430072, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Automat, Nanjing 210023, Jiangsu, Peoples R China
[3] Henan Univ, Sch Comp & Informat Engn, Kaifeng 475001, Peoples R China
关键词
cost-sensitive learning; class imbalance; heterogeneous domain adaptation; heterogeneous fault prediction; mixed project; software quality assurance; STATIC CODE ATTRIBUTES; DEFECT PREDICTION; MACHINE; MODELS;
D O I
10.1002/stvr.1658
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the early phases of software testing, projects may have only limited historical defect data. Learning prediction model with such insufficient training data will limit the efficacy of learned predictor. In practice, there are usually many publicly available fault prediction datasets. Recently, heterogeneous fault prediction (HFP) has been proposed. However, existing HFP models do not investigate how to use mixed project data to predict target. Furthermore, defect data are often imbalanced. The imbalanced data distribution of source usually leads to serious misclassification of fault-prone instances, which will degrade the predictor's performance. Existing HFP methods do not consider the class imbalance problem in the training stages. In this paper, we propose a novel Cost-sensitive Label and Structure-consistent Unilateral Projection (CLSUP) approach for HFP. CLSUP can not only make better use of the within-project and cross-project data but also alleviate the class imbalance problem by setting different misclassification costs for fault-prone and non-fault-prone instances. Extensive experiments on 30 projects demonstrate the effectiveness of CLSUP.
引用
收藏
页数:22
相关论文
共 76 条
[1]  
[Anonymous], P GEN EV COMP C DENV
[2]  
[Anonymous], P 10 JOINT M FDN SOF
[3]  
[Anonymous], PROC INT CONF SOFTW
[4]  
[Anonymous], 2005, PRINCIPAL COMPONENT
[5]  
[Anonymous], EMPIRICAL SOFTWARE E
[6]  
[Anonymous], P 30 IEEE ACM INT C
[7]  
[Anonymous], P 38 INT C SOFTW ENG
[8]  
[Anonymous], AUTOMAT SOFTW ENG
[9]  
[Anonymous], 2008, Proceedings of the 4th international workshop on Predictor models in software engineering
[10]  
[Anonymous], P 36 INT C SOFTW ENG