Decision Tree: Review of Techniques for Missing Values at Training, Testing and Compatibility

被引:17
作者
Gavankar, Sachin [1 ]
Sawarkar, Sudhirkumar [1 ]
机构
[1] Mumbai Univ, Datta Meghe Coll Engn, Dept Comp Engn, Navi Mumbai, India
来源
2015 THIRD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, MODELLING AND SIMULATION (AIMS 2015) | 2015年
关键词
data mining; induction; decision tree; missing values; training data; testing data; compatibility;
D O I
10.1109/AIMS.2015.29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining rely on large amount of data to make learning model and the quality of data is very important. One of the important problem under data quality is the presence of missing values. Missing values can occur in both at the time of training and at the time of testing. There are many methods proposed to deal with missing values in training data. Many of them resort to imputation techniques. However, Very few methods are there to deal with the missing values at testing/prediction time. In this paper, we discuss and summarize various strategies to deal with this problem both at training and testing time. Also, we have discussed the compatibility between various methods at training and testing to achieve better results.
引用
收藏
页码:122 / 126
页数:5
相关论文
共 27 条
[1]  
Aggarwal Charu, ICDE 06
[2]  
ALI KM, 1993, P 13 INT JOINT C ART, P1064
[3]  
[Anonymous], P 21 INT C MACH LEAR
[4]  
[Anonymous], P 5 INT WORKSH ROUGH
[5]  
Clark P., 1989, Machine Learning, V3, P261, DOI 10.1023/A:1022641700528
[6]  
DATE CJ, 1989, RELATIONAL DATABASE, P343
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
Friedman JH, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P717
[9]  
Gustavo Batista, 2003, APPL ARTIFICIAL INTE, V17
[10]  
Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/C2009-0-61819-5, DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00009-5]