A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

被引：44

作者：

Kalsoom, Anum ^{[1
]}

Maqsood, Muazzam ^{[1
,2
]}

Ghazanfar, Mustansar Ali ^{[2
]}

Aadil, Farhan ^{[1
,2
]}

Rho, Seungmin ^{[3
]}

机构：

[1] COMSATS Inst Informat & Technol Attock, Dept Comp Sci, Attock, Pakistan

[2] Univ Engn & Technol Taxila, Dept Software Engn, Taxila, Pakistan

[3] Sungkyul Univ, Dept Media Software, Anyang, South Korea

来源：

JOURNAL OF SUPERCOMPUTING | 2018年 / 74卷 / 09期

基金：

新加坡国家研究基金会;

关键词：

Software fault prediction; Fisher linear discriminant; Reliability; Fault-tolerance; Robustness; DEFECT PREDICTION; IDENTIFICATION; SELECTION; METRICS;

D O I：

10.1007/s11227-018-2326-5

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Software quality is an important factor in the success of software companies. Traditional software quality assurance techniques face some serious limitations especially in terms of time and budget. This leads to increase in the use of machine learning classification techniques to predict software faults. Software fault prediction can help developers to uncover software problems in early stages of software life cycle. The extent to which these techniques can be generalized to different sizes of software, class imbalance problem, and identification of discriminative software metrics are the most critical challenges. In this paper, we have analyzed the performance of nine widely used machine learning classifiers-Bayes Net, NB, artificial neural network, support vector machines, K nearest neighbors, AdaBoost, Bagging, Zero R, and Random Forest for software fault prediction. Two standard sampling techniques-SMOTE and Resample with substitution are used to handle the class imbalance problem. We further used FLDA-based feature selection approach in combination with SMOTE and Resample to select most discriminative metrics. Then the top four classifiers based on performance are used for software fault prediction. The experimentation is carried out over 15 publically available datasets (small, medium and large) which are collected from PROMISE repository. The proposed Resample-FLDA method gives better performance as compared to existing methods in terms of precision, recall, f-measure and area under the curve.

引用

页码：4568 / 4602

页数：35

共 40 条

[1]

Al Hindi A, 2014, 2014 IEEE ACS 11 INT

[2]

Alexandre-Cortizo E, 2005, EUROCON 2005

[3]

Aljamaan HI, 2009, IEEE S COMP INT DAT

[4]

[Anonymous], INT WORKSH PRED MOD

[5]

Bell Robert M., 2006, P 2006 INT S SOFTW T

[6] A study of software reliability growth from the perspective of learning effects [J].

Chiu, Kuei-Chen ;

Huang, Yeu-Shiang ;

Lee, Tzai-Zang .

RELIABILITY ENGINEERING & SYSTEM SAFETY, 2008, 93 (10) :1410-1421

[7] A symbolic fault-prediction model based on multiobjective particle swarm optimization [J].

de Carvalho, Andre B. ;

Pozo, Aurora ;

Vergilio, Silvia Regina .

JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (05) :868-882

[8]

Franco Horacio., 1999, EUROSPEECH

[9] A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches [J].

Galar, Mikel ;

Fernandez, Alberto ;

Barrenechea, Edurne ;

Bustince, Humberto ;

Herrera, Francisco .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04) :463-484

[10] Choosing software metrics for defect prediction: an investigation on feature selection techniques [J].

Gao, Kehan ;

Khoshgoftaar, Taghi M. ;

Wang, Huanjing ;

Seliya, Naeem .

SOFTWARE-PRACTICE & EXPERIENCE, 2011, 41 (05) :579-606

← 1 2 3 4 →