Dealing with Missing Values in Software Project Datasets: A Systematic Mapping Study

被引:6
作者
Idri, Ali [1 ]
Abnane, Ibtissam [1 ]
Abran, Alain [2 ]
机构
[1] Mohamed V Univ, Software Project Management Res Team, ENSIAS, Rabat, Morocco
[2] Ecole Technol Super, Dept Software Engn, Montreal, PQ H3C IK3L, Canada
来源
SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING | 2016年 / 653卷
关键词
Systematic mapping study; Software engineering; Missing values; DATA SETS; IMPUTATION;
D O I
10.1007/978-3-319-33810-1_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing Values (MV) present a serious problem facing research in software engineering (SE) which is mainly based on statistical and/or data mining analysis of SE data. Therefore, various techniques have been developed to deal adequately with MV. In this paper, a systematic mapping study was carried out to summarize the existing techniques dealing with MV in SE datasets and to classify the selected studies according to six classification criteria: research type, research approach, MV technique, MV type, data types and MV objective. Publication channels and trends were also identified. As results, 35 papers concerning MV treatments of SE data were selected. This study shows an increasing interest in machine learning (ML) techniques especially the K-nearest neighbor algorithm (KNN) to deal with MV in SE datasets and found that most of the MV techniques are used to serve software development effort estimation techniques.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 39 条
[21]   Development of an imputation technique - INI for software metric database with incomplete data [J].
Olanrewaju, Rashidah F. ;
Ito, Wasito .
2006 4TH STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT, 2006, :76-80
[22]  
Ouhbi S., 2013, REQUIR ENG, P1
[23]   Categorical missing data imputation for software cost estimation by multinomial logistic regression [J].
Sentas, P ;
Angelis, L .
JOURNAL OF SYSTEMS AND SOFTWARE, 2006, 79 (03) :404-414
[24]  
Seo Yeong-Seok., 2008, Proceedings of the 4th international workshop on Predictor models in software engineering, P25
[25]   A short note on safest default missingness mechanism assumptions [J].
Song, QB ;
Shepperd, M ;
Cartwright, K .
EMPIRICAL SOFTWARE ENGINEERING, 2005, 10 (02) :235-243
[26]   A new imputation method for small software project data sets [J].
Song, Qinbao ;
Shepperd, Martin .
JOURNAL OF SYSTEMS AND SOFTWARE, 2007, 80 (01) :51-62
[27]   Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation [J].
Song, Qinbao ;
Shepperd, Martin ;
Chen, Xiangru ;
Liu, Jun .
JOURNAL OF SYSTEMS AND SOFTWARE, 2008, 81 (12) :2361-2370
[28]   Software cost estimation with incomplete data [J].
Strike, K ;
El Emam, K ;
Madhavji, N .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2001, 27 (10) :890-908
[29]  
Sumanth Y., 2005, THESIS
[30]  
Tamura K., 2009, EMPIRICAL EVALUATION, DOI 10.1.1.145.780