Categorical missing data imputation for software cost estimation by multinomial logistic regression

被引:45
作者
Sentas, P [1 ]
Angelis, L [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
关键词
software effort prediction; cost estimation; missing data; imputation; multinomial logistic regression;
D O I
10.1016/j.jss.2005.02.026
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A common problem in software cost estimation is the manipulation of incomplete or missing data in databases used for the development of prediction models. In such cases, the most popular and simple method of handling missing data is to ignore either the projects or the attributes with missing observations. This technique causes the loss of valuable information and therefore may lead to inaccurate cost estimation models. On the other hand, there are various imputation methods used to estimate the missing values in a data set. These methods are applied mainly on numerical data and produce continuous estimates. However, it is well known that the majority of the cost data sets contain software projects with mostly categorical attributes with many missing values. It is therefore reasonable to use some estimating method producing categorical rather than continuous values. The purpose of this paper is to investigate the possibility of using such a method for estimating categorical missing values in software cost databases. Specifically, the method known as multinomial logistic regression (MLR) is suggested for imputation and is applied on projects of the ISBSG multi-organizational software database. Comparisons of NILR with other techniques for handling missing data, such as listwise deletion (LD), mean imputation (MI), expectation maximization (EM) and regression imputation (RI) under different patterns and percentages of missing data, show the high efficiency of the proposed method. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:404 / 414
页数:11
相关论文
共 50 条
  • [41] Weighting and Imputation for Missing Data in a Cost and Earnings Fishery Survey
    Lew, Daniel K.
    Himes-Cornell, Amber
    Lee, Jean
    MARINE RESOURCE ECONOMICS, 2015, 30 (02) : 219 - 230
  • [42] Missing data techniques in analogy-based software development effort estimation
    Idri, Ali
    Abnane, Ibtissam
    Abran, Alain
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 117 : 595 - 611
  • [43] Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data
    Thongsri, Thidarat
    Samart, Klairung
    INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2022, 17 (01) : 51 - 62
  • [44] Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis
    Thidarat Thongsri
    Klairung Samart
    Lobachevskii Journal of Mathematics, 2022, 43 : 3390 - 3399
  • [45] MISSING DATA IN TRAFFIC ESTIMATION: A VARIATIONAL AUTOENCODER IMPUTATION METHOD
    Boquet, Guillem
    Lopez Vicario, Jose
    Morell, Antoni
    Serrano, Javier
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2882 - 2886
  • [46] Using multiple imputation to estimate missing data in meta-regression
    Ellington, E. Hance
    Bastille-Rousseau, Guillaume
    Austin, Cayla
    Landolt, Kristen N.
    Pond, Bruce A.
    Rees, Erin E.
    Robar, Nicholas
    Murray, Dennis L.
    METHODS IN ECOLOGY AND EVOLUTION, 2015, 6 (02): : 153 - 163
  • [47] Predicting the Type of Nanostructure Using Data Mining Techniques and Multinomial Logistic Regression
    Shehadeh, Mahmoud
    Ebrahimi, Nader
    Ochigbo, Abel
    COMPLEX ADAPTIVE SYSTEMS 2012, 2012, 12 : 392 - 397
  • [48] Development of Imputation Methods for Missing Data in Multiple Linear Regression Analysis
    Thongsri, Thidarat
    Samart, Klairung
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2022, 43 (11) : 3390 - 3399
  • [49] Reduced rank multinomial logistic regression in Markov chains with application to cognitive data
    Wang, Pei
    Abner, Erin L.
    Fardo, David W.
    Schmitt, Frederick A.
    Jicha, Gregory A.
    Van Eldik, Linda J.
    Kryscio, Richard J.
    STATISTICS IN MEDICINE, 2021, 40 (11) : 2650 - 2664
  • [50] A comparison of various software tools for dealing with missing data via imputation
    Abrahantes, Jose Cortinas
    Sotto, Cristina
    Molenberghs, Geert
    Vromman, Geert
    Bierinckx, Bart
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2011, 81 (11) : 1653 - 1675