An empirical analysis of data preprocessing for machine learning-based software cost estimation
被引:120
作者:
Huang, Jianglin
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R ChinaCity Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Huang, Jianglin
[1
]
Li, Yan-Fu
论文数: 0引用数: 0
h-index: 0
机构:
CentraleSupelec, Dept Ind Engn, Paris, FranceCity Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Li, Yan-Fu
[2
]
Xie, Min
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R ChinaCity Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Xie, Min
[1
]
机构:
[1] City Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Software cost estimation;
Data preprocessing;
Missing-data treatments;
Scaling;
Feature selection;
Case selection;
SUPPORT VECTOR REGRESSION;
MISSING DATA;
MUTUAL INFORMATION;
FEATURE-SELECTION;
PREDICTION;
MODELS;
IMPUTATION;
WEIGHTS;
SIZE;
D O I:
10.1016/j.infsof.2015.07.004
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Context: Due to the complex nature of software development process, traditional parametric models and statistical methods often appear to be inadequate to model the increasingly complicated relationship between project development cost and the project features (or cost drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for software cost estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage of ML methods; however, very few works have been focused on the effects of data preprocessing techniques. Objective: This study aims for an empirical assessment of the effectiveness of data preprocessing techniques on ML methods in the context of software cost estimation. Method: In this work, we first conduct a literature survey of the recent publications using data preprocessing techniques, followed by a systematic empirical study to analyze the strengths and weaknesses of individual data preprocessing techniques as well as their combinations. Results: Our results indicate that data preprocessing techniques may significantly influence the final prediction. They sometimes might have negative impacts on prediction performance of ML methods. Conclusion: In order to reduce prediction errors and improve efficiency, a careful selection is necessary according to the characteristics of machine learning methods, as well as the datasets used for software cost estimation. (C) 2015 Elsevier B.V. All rights reserved.
机构:
Department of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki
University of Thessaloniki, Dept. of InformaticsDepartment of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki
Angelis L.
Stamelos I.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Informatics, Aristotle University of Thessaloniki, 54006, ThessalonikiDepartment of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki
机构:
Univ Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, EnglandUniv Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, England
Azzeh, Mohammad
Neagu, Daniel
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, EnglandUniv Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, England
Neagu, Daniel
Cowling, Peter I.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, EnglandUniv Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, England
机构:
Department of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki
University of Thessaloniki, Dept. of InformaticsDepartment of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki
Angelis L.
Stamelos I.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Informatics, Aristotle University of Thessaloniki, 54006, ThessalonikiDepartment of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki
机构:
Univ Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, EnglandUniv Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, England
Azzeh, Mohammad
Neagu, Daniel
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, EnglandUniv Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, England
Neagu, Daniel
Cowling, Peter I.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, EnglandUniv Bradford, Dept Comp, AI Res Grp, Bradford BD7 1DP, W Yorkshire, England