Missing Data Imputation Through the Use of the Random Forest Algorithm

被引:0
作者
Pantanowitz, Adam [1 ]
Marwala, Tshilidzi [1 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, ZA-2050 Johannesburg, South Africa
来源
ADVANCES IN COMPUTATIONAL INTELLIGENCE | 2009年 / 61卷
关键词
auto-associative; imputation; missing data; neural network; random forest;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a comparison of different paradigms used for missing data imputation. The data set used is HIV seroprevalence data front an antenatal clinic study survey performed in 2001. Data imputation is performed through five methods: Random Forests; auto-associative neural networks with genetic algorithms; auto-associative neuro-fuzzy configurations; and two random forest and neural network based hybrids. Results indicate that Random Forests are superior in imputing missing data for the given data set in terms of accuracy and in terms of computation time, with accuracy increases of up to 32 % on average for certain variables when compared with auto-associative networks. While the concept of hybrid systems has promise, the presented systems appear to be hindered by their auto-associative neural network components.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 28 条
  • [1] Abraham A., 2001, Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence. 6th International Work-Conference on Artificial and Natural Neural Networks, IWANN 2001. Proceedings, Part I (Lecture Notes in Computer Science Vol. 2084), P269
  • [2] Betechuoh BL, 2006, CURR SCI INDIA, V91, P1467
  • [3] Biau G, 2008, J MACH LEARN RES, V9, P2015
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Breiman L., 2005, Random forests
  • [6] BRENCE JR, 2006, IMPROVING ROBUST RAN
  • [7] ENGELBRECHT AP, 2002, COMPUTATION INTELLIG
  • [8] FOGARTY DJ, 2006, INTERSAT, V41
  • [9] Haykin S., 1994, Neural Networks: A Comprehensive Foundation
  • [10] HO TK, 1995, ICDAR 1995, V1