Feature Selection Techniques to Counter Class Imbalance Problem for Aging Related Bug Prediction Aging Related Bug Prediction

被引：13

作者：

Kumar, Lov ^{[1
]}

Sureka, Ashish ^{[2
]}

机构：

[1] Thapar Univ, Patiala, Punjab, India

[2] Ashoka Univ, Sonepat, Haryana, India

来源：

ISEC'18: PROCEEDINGS OF THE 11TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE | 2018年

关键词：

Aging Related Bugs; Imbalance Learning; Empirical Software Engineering; Feature Selection Techniques; Machine Learning; Predictive Modeling; Software Maintenance; Source Code Metrics; CLASSIFICATION; COMPLEXITY;

D O I：

10.1145/3172871.3172872

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Aging-Related Bugs (ARBs) occur in long running systems due to error conditions caused because of accumulation of problems such as memory leakage or unreleased files and locks. Aging-Related Bugs are hard to discover during software testing and also challenging to replicate. Automatic identification and prediction of aging related fault-prone files and classes in an object oriented system can help the software quality assurance team to optimize their testing efforts. In this paper, we present a study on the application of static source code metrics and machine learning techniques to predict aging related bugs. We conduct a series of experiments on publicly available dataset from two large open-source software systems: Linux and MySQL. Class imbalance and high dimensionality are the two main technical challenges in building effective predictors for aging related bugs. We investigate the application of five different feature selection techniques (OneR, Information Gain, Gain Ratio, RELEIF and Symmetric Uncertainty) for dimensionality reduction and five different strategies (Random Under-sampling, Random Oversampling, SMOTE, SMOTEBoost and RUSBoost) to counter the effect of class imbalance in our proposed machine learning based solution approach. Experimental results reveal that the random under-sampling approach performs best followed by RUSBoost in-terms of the mean AUC metric. Statistical significance test demonstrates that there is a significant difference between the performance of the various feature selection techniques. Experimental results shows that Gain Ratio and RELEIF performs best in comparison to other strategies to address the class imbalance problem. We infer from the statistical significance test that there is no difference between the performances of the five different learning algorithms.

引用

页数：11

共 27 条

[11] Predicting aging-related bugs using software complexity metrics
Cotroneo, Domenico
Natella, Roberto
Pietrantuono, Roberto
[J]. PERFORMANCE EVALUATION, 2013, 70 (03) : 163 - 178
[12] Feldt R, 2010, 22ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING & KNOWLEDGE ENGINEERING (SEKE 2010), P374
[13] Guyon I., 2003, INTRO VARIABLE FEATU
[14] Hoens TR, 2013, IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS, P43
[15] Using Structured Text Source Code Metrics and Artificial Neural Networks to Predict Change Proneness at Code Tab and Program Organization Level
Kumar, Lov
Sureka, Ashish
[J]. PROCEEDINGS OF THE 10TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE, 2017, : 172 - 180
[16] Empirical Analysis on Effectiveness of Source Code Metrics for Predicting Change-Proneness
Kumar, Lov
Rath, Santanu Kumar
Sureka, Ashish
[J]. PROCEEDINGS OF THE 10TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE, 2017, : 4 - 14
[17] Kumar Lov, 2017, IEEE IND COUNC INT C
[18] Lal Sangeeta, 2016, International Journal of Open Source Software and Processes, V7, P43, DOI 10.4018/IJOSSP.2016040103
[19] A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms
Lim, TS
Loh, WY
Shih, YS
[J]. MACHINE LEARNING, 2000, 40 (03) : 203 - 228
[20] Toward integrating feature selection algorithms for classification and clustering
Liu, H
Yu, L
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (04) : 491 - 502

← 1 2 3 →