A survey of methodologies for the treatment of missing values within datasets: limitations and benefits

被引:70
作者
Young, W. [1 ]
Weckman, G. [1 ]
Holland, W. [1 ]
机构
[1] Ohio Univ, Stocker Ctr 270, Ind & Syst Engn, Athens, OH 45710 USA
基金
美国国家科学基金会;
关键词
data mining; imputation methods; machine learning; multiple imputation; missing values;
D O I
10.1080/14639220903470205
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
Knowledge discovery in ergonomics is complicated by the presence of missing data, because most methodologies do not tolerate incomplete sample instances. Data-miners cannot always remove sample instances when they occur. Imputation methods are needed to 'fill in' estimated values for the missing instances in order to construct a complete dataset. Even with emerging methodologies, the ergonomics field seems to rely on outdated imputation techniques. This survey presents an overview of a variety of imputation methods found in current academic research, which is not limited to ergonomic studies. The objective is to strengthen the communities' understanding of imputation methodologies and briefly highlight their benefits and limitations. This survey suggests that the multiple imputation method is the current state-of-the-art missing value technique. This method has proven to be robust to many of the shortcomings that plague other methods and should be considered the primary choice for missing value problems found in ergonomic studies.
引用
收藏
页码:15 / 43
页数:29
相关论文
共 94 条
[1]   Diagnostics for multivariate imputations [J].
Abayomi, Kobi ;
Gelman, Andrew ;
Levy, Marc .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2008, 57 :273-291
[2]  
Abel A. L., 2005, Information Technology, Learning, and Performance Journal, V23, P39
[3]   Working with missing values [J].
Acock, AC .
JOURNAL OF MARRIAGE AND FAMILY, 2005, 67 (04) :1012-1028
[4]  
Allison P., 2002, MISSING DATA
[5]   Multiple imputation for missing data - A cautionary tale [J].
Allison, PD .
SOCIOLOGICAL METHODS & RESEARCH, 2000, 28 (03) :301-309
[6]  
Amer S, 2006, INT J COMPUTER SYSTE, V3, P12
[7]  
[Anonymous], INT J GEOMAGNETISM A
[8]   A sensitivity analysis for nonrandomly missing categorical data arising from a National Health Disability Survey [J].
Baker, SG ;
Ko, CW ;
Graubard, BI .
BIOSTATISTICS, 2003, 4 (01) :41-56
[9]  
Batista GEAPA, 2003, APPL ARTIF INTELL, V17, P519, DOI [10.1080/713827181, 10.1080/08839510390219309]
[10]   Effect of physical ergonomics on VDT workers' health: A longitudinal intervention field study in a service organization [J].
Bayeh, AD ;
Smith, MJ .
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 1999, 11 (02) :109-135