Data Mining Techniques for Software Effort Estimation: A Comparative Study

被引:120
|
作者
Dejaeger, Karel [1 ]
Verbeke, Wouter [1 ]
Martens, David [2 ]
Baesens, Bart [1 ,3 ]
机构
[1] Katholieke Univ Leuven, Dept Decis Sci & Informat Management, B-3000 Louvain, Belgium
[2] Univ Antwerp, Fac Appl Econ, B-2000 Antwerp, Belgium
[3] Univ Southampton, Sch Management, Highfield Southampton SO17 1BJ, Hants, England
关键词
Data mining; software effort estimation; regression; COST ESTIMATION; FEEDFORWARD NETWORKS; EMPIRICAL VALIDATION; MUTUAL INFORMATION; EFFORT PREDICTION; FEATURE-SELECTION; NEURAL-NETWORKS; MODELS; CLASSIFICATION; ANALOGY;
D O I
10.1109/TSE.2011.55
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule-based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained.
引用
收藏
页码:375 / 397
页数:23
相关论文
共 50 条
  • [1] Software Effort Estimation Using Data Mining Techniques
    Benala, Tirimula Rao
    Mall, Rajib
    Srikavya, P.
    HariPriya, M. Vani
    ICT AND CRITICAL INFRASTRUCTURE: PROCEEDINGS OF THE 48TH ANNUAL CONVENTION OF COMPUTER SOCIETY OF INDIA - VOL I, 2014, 248 : 85 - 92
  • [2] Analysis of Data Mining Techniques for Software Effort Estimation
    Sehra, Sumeet Kaur
    Kaur, Jasneet
    Brar, Yadwinder Singh
    Kaur, Navdeep
    2014 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS (ITNG), 2014, : 633 - 638
  • [3] Data mining techniques for software quality prediction: a comparative study
    Ronchieri, Elisabetta
    Canaparo, Marco
    Costantini, Alessandro
    Duma, Doina Cristina
    2018 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE PROCEEDINGS (NSS/MIC), 2018,
  • [4] A Comparative Study of Estimation by Analogy using Data Mining Techniques
    Nagpal, Geeta
    Uddin, Moin
    Kaur, Arvinder
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2012, 8 (04): : 621 - 652
  • [5] Application of data mining methods for effort estimation of software projects
    Karna, Hrvoje
    Vickovic, Linda
    Gotovac, Sven
    SOFTWARE-PRACTICE & EXPERIENCE, 2019, 49 (02): : 171 - 191
  • [6] The effects of data mining techniques on software cost estimation
    Lum, Karen T.
    Baker, Daniel R.
    Hihn, Jairus M.
    IEMC - EUROPE 2008: INTERNATIONAL ENGINEERING MANAGEMENT CONFERENCE, EUROPE, CONFERENCE PROCEEDINGS: MANAGING ENGINEERING, TECHNOLOGY AND INNOVATION FOR GROWTH, 2008, : 99 - 103
  • [7] Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges
    Almutlaq, Ayman Jalal Hassan
    Jawawi, Dayang N. A.
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 1144 - 1158
  • [8] A Survey on Software Effort Estimation Techniques
    Rastogi, Himani
    Dhankar, Swati
    Kakkar, Misha
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 826 - 830
  • [9] Effort Estimation in Information Systems Projects using Data Mining Techniques
    Villanueva-Balsera, Joaquin
    Ortega-Fernandez, Francisco
    Rodriguez-Montequin, Vicente
    Concepcion-Suarez, Ramiro
    PROCEEDINGS OF THE 13TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS, 2009, : 652 - +
  • [10] Comparative Study of Streaming Data Mining Techniques
    Khan, Shabia Shabir
    Peer, M. A.
    Quadri, S. M. K.
    2014 INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2014, : 209 - 214