Performance tuning for machine learning-based software development effort prediction models

被引：7

作者：

Ertugrul, Egemen ^{[1
]}

Baytar, Zakir ^{[2
]}

Catal, Cagatay ^{[3
]}

Muratli, Can ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] Istanbul Kultur Univ, Fac Engn, Dept Comp Engn, Istanbul, Turkey

[3] Wageningen Univ, Social Sci, Informat Technol Grp, Wageningen, Netherlands

来源：

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES | 2019年 / 27卷 / 02期

关键词：

Software effort estimation; machine learning; feature binning; grid search; artificial neural networks; mean absolute residual; PROJECT EFFORT; REGRESSION;

D O I：

10.3906/elk-1809-129

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Software development effort estimation is a critical activity of the project management process. In this study, machine learning algorithms were investigated in conjunction with feature transformation, feature selection, and parameter tuning techniques to estimate the development effort accurately and a new model was proposed as part of an expert system. We preferred the most general-purpose algorithms, applied parameter optimization technique (GridSearch), feature transformation techniques (binning and one-hot-encoding), and feature selection algorithm (principal component analysis). All the models were trained on the ISBSG datasets and implemented by using the scikit-learn package in the Python language. The proposed model uses a multilayer perceptron as its underlying algorithm, applies binning of the features to transform continuous features and one-hot-encoding technique to transform categorical data into numerical values as feature transformation techniques, does feature selection based on the principal component analysis method, and performs parameter tuning based on the GridSearch algorithm. We demonstrate that our effort prediction model mostly outperforms the other existing models in terms of prediction accuracy based on the mean absolute residual parameter.

引用

页码：1308 / 1324

页数：17

共 38 条

[1] [Anonymous], 2013, P 28 ANN ACM S APPL
[2] Basgalupp M.P., 2012, P 27 ANN ACM S APPL, P1209
[3] Investigating the use of Support Vector Regression for web effort estimation
Corazza, Anna
Di Martino, Sergio
Ferrucci, Filomena
Gravino, Carmine
Mendes, Emilia
[J]. EMPIRICAL SOFTWARE ENGINEERING, 2011, 16 (02) : 211 - 243
[4] Data Mining Techniques for Software Effort Estimation: A Comparative Study
Dejaeger, Karel
Verbeke, Wouter
Martens, David
Baesens, Bart
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2012, 38 (02) : 375 - 397
[5] Improved estimation of software project effort using multiple additive regression trees
Elish, Mahmoud O.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10774 - 10778
[6] Farr L., 1964, Factors That Affect the Cost of Computer Programming
[7] A simulation study of the model evaluation criterion MMRE
Foss, T
Stensrud, E
Kitchenham, B
Myrtveit, I
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2003, 29 (11) : 985 - 995
[8] Stochastic gradient boosting
Friedman, JH
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) : 367 - 378
[9] The usage of ISBSG data fields in software effort estimation: A systematic mapping study
Gonzalez-Ladron-de-Guevara, Fernando
Fernandez-Diego, Marta
Lokan, Chris
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 113 : 188 - 215
[10] Hackeling G., 2017, Mastering Machine Learning with Scikit-Learn

← 1 2 3 4 →