How to use expert advice

被引:350
作者
CesaBianchi, N
Freund, Y
Haussler, D
Helmbold, DP
Schapire, RE
Warmuth, MK
机构
[1] AT&T BELL LABS,FLORHAM PK,NJ 07932
[2] UNIV CALIF SANTA CRUZ,SANTA CRUZ,CA 95064
关键词
D O I
10.1145/258128.258179
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
引用
收藏
页码:427 / 485
页数:59
相关论文
共 47 条
  • [1] [Anonymous], 1982, ESTIMATION DEPENDENC
  • [2] RATES OF CONVERGENCE FOR MINIMUM CONTRAST ESTIMATORS
    BIRGE, L
    MASSART, P
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 1993, 97 (1-2) : 113 - 150
  • [3] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    WARMUTH, MK
    [J]. JOURNAL OF THE ACM, 1989, 36 (04) : 929 - 965
  • [4] CESABIANCHI N, 1993, 25TH P ANN ACM S THE, P382
  • [5] CESABIANCHIU N, 1996, IN PRESS MACH LEARN
  • [6] Chung T. H., 1994, Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 94, P183, DOI 10.1145/180139.181097
  • [7] Cover T., 1965, P 4 PRAG C INF THEOR
  • [8] COVER TM, 1977, IEEE T SYST MAN CYB, V6, P421
  • [9] DAWID A, 1991, BAYESIAN STAT, V4, P109
  • [10] STATISTICAL-THEORY - THE PREQUENTIAL APPROACH
    DAWID, AP
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1984, 147 : 278 - 292