BIAS IN INFORMATION-BASED MEASURES IN DECISION TREE INDUCTION

被引:21
作者
WHITE, AP [1 ]
LIU, WZ [1 ]
机构
[1] UNIV BIRMINGHAM,SCH MATH & STAT,BIRMINGHAM B15 2TT,W MIDLANDS,ENGLAND
关键词
DECISION TREES; NOISE; INDUCTION; UNBIASED ATTRIBUTE SELECTION; INFORMATION-BASED MEASURES;
D O I
10.1023/A:1022694010754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
引用
收藏
页码:321 / 329
页数:9
相关论文
共 11 条
[1]   A DISTANCE-BASED ATTRIBUTE SELECTION MEASURE FOR DECISION TREE INDUCTION [J].
DEMANTARAS, RL .
MACHINE LEARNING, 1991, 6 (01) :81-92
[2]  
EDWARDS E, 1964, INFORMATION TRANSMIS
[3]  
Keppel G., 1973, DESIGN ANAL RES HDB
[4]  
KONONENKO I, 1984, EXPT AUTOMATIC LEARN
[5]  
KULLBACK S, 1959, INFORMATION THEORY S
[6]   THE IMPORTANCE OF ATTRIBUTE SELECTION MEASURES IN DECISION TREE INDUCTION [J].
LIU, WZ ;
WHITE, AP .
MACHINE LEARNING, 1994, 15 (01) :25-41
[7]  
Mingers J., 1989, Machine Learning, V3, P319, DOI 10.1007/BF00116837
[8]  
MINGERS J, 1987, J OPER RES SOC, V38, P39, DOI 10.2307/2582520
[9]  
Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1007/BF00116251
[10]  
QUINLAN JR, 1988, MACH INTELL, V11, P305