COMPOUND DIVERSITY FUNCTIONS FOR ENSEMBLE SELECTION

被引:11
作者
Ko, Albert Hung-Ren [1 ]
Sabourin, Robert [1 ]
Britto, Alceu De Souza, Jr. [2 ]
机构
[1] Univ Quebec, LIVIA, Ecole Technol Super, Montreal, PQ H3C 1K3, Canada
[2] Pontificia Univ Catolica Parana, PPGIA, BR-80215901 Curitiba, Parana, Brazil
基金
加拿大自然科学与工程研究理事会;
关键词
Diversity; ensemble of classifiers; pattern recognition; majority voting; VARIANCE; BIAS;
D O I
10.1142/S021800140900734X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An effective way to improve a classification method's performance is to create ensembles of classifiers. Two elements are believed to be important in constructing an ensemble: (a) the performance of each individual classifier; and (b) diversity among the classifiers. Nevertheless, most works based on diversity suggest that there exists only weak correlation between classifier performance and ensemble accuracy. We propose compound diversity functions which combine the diversities with the performance of each individual classifier, and show that there is a strong correlation between the proposed functions and ensemble accuracy. Calculation of the correlations with different ensemble creation methods, different problems and different classification algorithms on 0.624 million ensembles suggests that most compound diversity functions are better than traditional diversity measures. The population-based Genetic Algorithm was used to search for the best ensembles on a handwritten numerals recognition problem and to evaluate 42.24 million ensembles. The statistical results indicate that compound diversity functions perform better than traditional diversity measures, and are helpful in selecting the best ensembles.
引用
收藏
页码:659 / 686
页数:28
相关论文
共 38 条
[1]  
AFIFI A.A., 1979, STAT ANAL COMPUTER O
[2]   Shape quantization and recognition with randomized trees [J].
Amit, Y ;
Geman, D .
NEURAL COMPUTATION, 1997, 9 (07) :1545-1588
[3]  
Banfield RE, 2003, LECT NOTES COMPUT SC, V2709, P306
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Brown G., 2005, Information Fusion, V6, P5, DOI 10.1016/j.inffus.2004.04.004
[6]  
BROWN G, 2005, INT WORKSH MULT CLAS, P296
[7]  
Domingos P., 2000, P 17 INT C MACH LEAR, P231, DOI DOI 10.5555/645529.657784
[8]  
Duin R. P. W., PATTERN RECOGNITION
[9]  
Fleiss JLLB., 2003, The Measurement of Interrater Agreement. Statistical Methods for Rates and Proportions, V3rd
[10]   NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58