Adaptive Centre-Weighted Oversampling for Class Imbalance in Software Defect Prediction

被引:4
作者
Zhao, Qi [1 ]
Yan, Xuefeng [1 ,2 ]
Zhou, Yong [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing, Jiangsu, Peoples R China
来源
2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS | 2018年
关键词
software defect prediction; class imbalance; oversampling; adaptive centre; weights; SMOTE; ALGORITHM;
D O I
10.1109/BDCloud.2018.00044
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of software engineering, software defect prediction can maintain the high quality of software products, which is a popular current research topic. However, class imbalance affects the overall classification accuracy of software defect prediction models which is the key issue to be resolved. A new method called adaptive centre-weighted oversampling (ACWO) is proposed to effectively address imbalanced learning problems. First, an appropriate neighborhood size and neighbors are determined for each minority class sample. Then, for a minority class sample, the adaptive centre that is within its neighborhood range, its neighbors and the minority class sample are used to generate synthetic samples. Finally, oversampling of each minority class sample is carried out based on the weights assigned to them. These weights are obtained according to the neighborhood sizes and Euclidean distances to the centre. Afterwards, the software defect prediction model is eventually established by ACWO algorithm with stacked denoising autoencoder neural network. Experimental results show that the software defect prediction model based on ACWO algorithm has a better performance than based on many existing class imbalance learning algorithms according to the precision P, recall R, F1 measure, G-mean, and AUC values.
引用
收藏
页码:223 / 230
页数:8
相关论文
共 25 条
[1]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[2]   Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models [J].
Bennin, Kwabena Ebo ;
Keung, Jacky ;
Monden, Akito ;
Kamei, Yasutaka ;
Ubayashi, Naoyasu .
PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1, 2016, :154-163
[3]  
Boetticher G., 2007, The PROMISE Repository of Empirical Software Engineering Data
[4]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[5]  
Cao L. L., 2016, BUILDING FEATURE SPA
[6]  
Cao Z., 2014, STUDY OPTIMIZATION R
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Chen L., 2016, SOFTW QUAL J, P1
[9]   A comparison of some soft computing methods for software fault prediction [J].
Erturk, Ezgi ;
Sezer, Ebru Akcapinar .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (04) :1872-1879
[10]  
Grbac T., 2013, SQAMIA