Performance tuning for machine learning-based software development effort prediction models

被引:7
作者
Ertugrul, Egemen [1 ]
Baytar, Zakir [2 ]
Catal, Cagatay [3 ]
Muratli, Can [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Istanbul Kultur Univ, Fac Engn, Dept Comp Engn, Istanbul, Turkey
[3] Wageningen Univ, Social Sci, Informat Technol Grp, Wageningen, Netherlands
关键词
Software effort estimation; machine learning; feature binning; grid search; artificial neural networks; mean absolute residual; PROJECT EFFORT; REGRESSION;
D O I
10.3906/elk-1809-129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software development effort estimation is a critical activity of the project management process. In this study, machine learning algorithms were investigated in conjunction with feature transformation, feature selection, and parameter tuning techniques to estimate the development effort accurately and a new model was proposed as part of an expert system. We preferred the most general-purpose algorithms, applied parameter optimization technique (GridSearch), feature transformation techniques (binning and one-hot-encoding), and feature selection algorithm (principal component analysis). All the models were trained on the ISBSG datasets and implemented by using the scikit-learn package in the Python language. The proposed model uses a multilayer perceptron as its underlying algorithm, applies binning of the features to transform continuous features and one-hot-encoding technique to transform categorical data into numerical values as feature transformation techniques, does feature selection based on the principal component analysis method, and performs parameter tuning based on the GridSearch algorithm. We demonstrate that our effort prediction model mostly outperforms the other existing models in terms of prediction accuracy based on the mean absolute residual parameter.
引用
收藏
页码:1308 / 1324
页数:17
相关论文
共 38 条
  • [1] [Anonymous], 2013, P 28 ANN ACM S APPL
  • [2] Basgalupp M.P., 2012, P 27 ANN ACM S APPL, P1209
  • [3] Investigating the use of Support Vector Regression for web effort estimation
    Corazza, Anna
    Di Martino, Sergio
    Ferrucci, Filomena
    Gravino, Carmine
    Mendes, Emilia
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2011, 16 (02) : 211 - 243
  • [4] Data Mining Techniques for Software Effort Estimation: A Comparative Study
    Dejaeger, Karel
    Verbeke, Wouter
    Martens, David
    Baesens, Bart
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2012, 38 (02) : 375 - 397
  • [5] Improved estimation of software project effort using multiple additive regression trees
    Elish, Mahmoud O.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10774 - 10778
  • [6] Farr L., 1964, Factors That Affect the Cost of Computer Programming
  • [7] A simulation study of the model evaluation criterion MMRE
    Foss, T
    Stensrud, E
    Kitchenham, B
    Myrtveit, I
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2003, 29 (11) : 985 - 995
  • [8] Stochastic gradient boosting
    Friedman, JH
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) : 367 - 378
  • [9] The usage of ISBSG data fields in software effort estimation: A systematic mapping study
    Gonzalez-Ladron-de-Guevara, Fernando
    Fernandez-Diego, Marta
    Lokan, Chris
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 113 : 188 - 215
  • [10] Hackeling G., 2017, Mastering Machine Learning with Scikit-Learn