A review and comparison of strategies for handling missing values in separate-and-conquer rule learning

被引:0
作者
Lars Wohlrab
Johannes Fürnkranz
机构
[1] Technische Universität Darmstadt,Knowledge Engineering Group
来源
Journal of Intelligent Information Systems | 2011年 / 36卷
关键词
Machine learning; Inductive rule learning; Missing values;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we review possible strategies for handling missing values in separate-and-conquer rule learning algorithms, and compare them experimentally on a large number of datasets. In particular through a careful study with data with controlled levels of missing values we get additional insights on the strategies’ different biases w.r.t. attributes with missing values. Somewhat surprisingly, a strategy that implements a strong bias against the use of attributes with missing values, exhibits the best average performance on 24 datasets from the UCI repository.
引用
收藏
页码:73 / 98
页数:25
相关论文
共 29 条
  • [1] Bruha I(1996)Comparison of various routines for unknown attribute value processing: The covering paradigm International Journal of Pattern Recognition and Artificial Intelligence 10 939-955
  • [2] Franek F(2007)OLAP over uncertain and imprecise data The International Journal on Very Large Data Bases 16 123-144
  • [3] Burdick D(1989)The CN2 induction algorithm Machine Learning 3 261-283
  • [4] Deshpande PM(2006)Statistical comparisons of classifiers over multiple data sets Journal of Machine Learning Research 7 1-30
  • [5] Jayram TS(1999)Separate-and-conquer rule learning Artificial Intelligence Review 13 3-54
  • [6] Ramakrishnan R(1980)Approximations in the critical region of the Friedman statistic Communications in Statistics—Theory and Methods 9 571-595
  • [7] Vaithyanathan S(2010)On the quest for optimal rule learning heuristics Machine Learning 78 343-379
  • [8] Clark P(1999)Rules in incomplete information systems Information Sciences 113 271-292
  • [9] Niblett T(1999)Imputation of missing data in industrial databases Applied Intelligence 11 259-275
  • [10] Demšar J(2003)On decomposition for incomplete data Fundamenta Informaticae 54 1-16