An empirical study for software change prediction using imbalanced data

被引:51
作者
Malhotra, Ruchika [1 ]
Khanna, Megha [2 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, Delhi, India
[2] Delhi Technol Univ, Delhi, India
关键词
Change proneness; Data sampling; Empirical validation; Imbalanced learning; MetaCost learners; Object-oriented metrics; STATIC CODE ATTRIBUTES; CHANGE-PRONE CLASSES; FAULT-PRONENESS; METRICS; CLASSIFICATION; FRAMEWORK; QUALITY; SUITE;
D O I
10.1007/s10664-016-9488-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software change prediction is crucial in order to efficiently plan resource allocation during testing and maintenance phases of a software. Moreover, correct identification of change-prone classes in the early phases of software development life cycle helps in developing cost-effective, good quality and maintainable software. An effective software change prediction model should equally recognize change-prone and not change-prone classes with high accuracy. However, this is not the case as software practitioners often have to deal with imbalanced data sets where instances of one type of class is much higher than the other type. In such a scenario, the minority classes are not predicted with much accuracy leading to strategic losses. This study evaluates a number of techniques for handling imbalanced data sets using various data sampling methods and MetaCost learners on six open-source data sets. The results of the study advocate the use of resample with replacement sampling method for effective imbalanced learning.
引用
收藏
页码:2806 / 2851
页数:46
相关论文
共 72 条
[1]   A model for detecting cost-prone classes based on Mahalanobis-Taguchi method [J].
Aman, H ;
Mochiduki, N ;
Yamada, H .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (04) :1347-1358
[2]  
[Anonymous], P 18 INT C EV ASS SO
[3]  
[Anonymous], P 37 INT C SOFTW ENG
[4]  
[Anonymous], 2004, Neural Networks, DOI DOI 10.5555/541500
[5]  
[Anonymous], 2014, P 2014 INT C DAT MIN
[6]  
[Anonymous], PATTERN RECOGN LETT
[7]  
[Anonymous], 2012, IEEE T SYST MAN CY C, DOI DOI 10.1109/TSMCC.2011.2161285
[8]  
[Anonymous], 2004, SIGKDD Explorations, DOI [10.1145/1007730.1007738, DOI 10.1145/1007730.1007738]
[9]  
[Anonymous], P INT C SOFTW TEST V
[10]  
[Anonymous], 1997, P 14 INT C ONMACHINE