Decision tree: Compatibility of techniques for handling missing values at training and testing

被引:0
作者
Gavankar S. [1 ]
Sawarkar S. [1 ]
机构
[1] Department of Computer Engineering, Datta Meghe College of Engineering, Mumbai University, Navi Mumbai
来源
International Journal of Simulation: Systems, Science and Technology | 2016年 / 17卷 / 34期
关键词
Compatibility; Data mining; Decision tree; Induction; Missing values; Testing data; Training data;
D O I
10.5013/IJSSST.a.17.34.10
中图分类号
学科分类号
摘要
Data mining rely on large amount of data to make learning model and the quality of data is very important. One of the important problem under data quality is the presence of missing values both at the time of training and testing. There are many methods proposed to deal with missing values in training data. Many of them resort to imputation techniques. However, Very few methods are there to deal with the missing values at testing/prediction time. In this paper, we discuss and summarize various strategies to deal with this problem both at training and testing time. Also, we have proposed the analysis of compatibility between various methods at training and testing. Our analysis indicates that the known value strategy at testing outperformed with various missing value handling techniques for training data followed by C4.5. © 2016, UK Simulation Society. All rights reserved.
引用
收藏
页码:10.1 / 10.7
相关论文
共 50 条
[21]   XGBoost in handling missing values for life insurance risk prediction [J].
Rusdah, Deandra Aulia ;
Murfi, Hendri .
SN APPLIED SCIENCES, 2020, 2 (08)
[22]   Handling Missing Values in Local Post-hoc Explainability [J].
Cinquini, Martina ;
Giannotti, Fosca ;
Guidotti, Riccardo ;
Mattei, Andrea .
EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT II, 2023, 1902 :256-278
[23]   Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis [J].
Josse, Julie ;
Chavent, Marie ;
Liquet, Benot ;
Husson, Francois .
JOURNAL OF CLASSIFICATION, 2012, 29 (01) :91-116
[24]   HANDLING MISSING VALUES VIA A NEURAL SELECTIVE INPUT MODEL [J].
Lopes, Noel ;
Ribeiro, Bernardete .
NEURAL NETWORK WORLD, 2012, 22 (04) :357-370
[25]   Handling missing values in exploratory multivariate data analysis methods [J].
Josse, Julie ;
Husson, Francois .
JOURNAL OF THE SFDS, 2012, 153 (02) :79-99
[26]   Handling missing values in kernel methods with application to microbiology data [J].
Belanche, Lluis A. ;
Kobayashi, Vladimer ;
Aluja, Tomas .
NEUROCOMPUTING, 2014, 141 :110-116
[27]   The training strategy for creating decision tree [J].
Liu, ZB .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :3238-3243
[28]   Adjusted weight voting algorithm for random forests in handling missing values [J].
Xia, Jing ;
Zhang, Shengyu ;
Cai, Guolong ;
Li, Li ;
Pan, Qing ;
Yan, Jing ;
Ning, Gangmin .
PATTERN RECOGNITION, 2017, 69 :52-60
[29]   JUST COMPRESS AND RELAX: HANDLING MISSING VALUES IN BIG TENSOR ANALYSIS [J].
Marcos, J. H. ;
Sidiropoulos, N. D. .
2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, :218-221
[30]   DualBoost : Handling Missing Values with Feature Weights and Weak Classifiers that Abstain [J].
Wang, Weihong ;
Xu, Jie ;
Wang, Yang ;
Cai, Chen ;
Chen, Fang .
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, :1543-1546