Bregman Divergences and Surrogates for Learning

被引:29
作者
Nock, Richard [1 ]
Nielsen, Frank [2 ]
机构
[1] Univ Antilles Guyane, CEREGMIA, UFR Droit & Sci Econ, Campus Schoelcher,BP 7209, F-97275 Schoelcher, Martinique, France
[2] Ecole Polytech, LIX, F-91128 Palaiseau, France
关键词
Ensemble learning; boosting; Bregman divergences; linear separators; decision trees; BOUNDS;
D O I
10.1109/TPAMI.2008.225
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bartlett et al. (2006) recently proved that a ground condition for surrogates, classification calibration, ties up their consistent minimization to that of the classification risk, and left as an important problem the algorithmic questions about their minimization. In this paper, we address this problem for a wide set which lies at the intersection of classification calibrated surrogates and those of Murata et al. (2004). This set coincides with those satisfying three common assumptions about surrogates. Equivalent expressions for the members-sometimes well known-follow for convex and concave surrogates, frequently used in the induction of linear separators and decision trees. Most notably, they share remarkable algorithmic features: for each of these two types of classifiers, we give a minimization algorithm provably converging to the minimum of any such surrogate. While seemingly different, we show that these algorithms are offshoots of the same "master" algorithm. This provides a new and broad unified account of different popular algorithms, including additive regression with the squared loss, the logistic loss, and the top-down induction performed in CART, C4.5. Moreover, we show that the induction enjoys the most popular boosting features, regardless of the surrogate. Experiments are provided on 40 readily available domains.
引用
收藏
页码:2048 / 2059
页数:12
相关论文
共 24 条
[1]  
[Anonymous], 1993, C4.5: Programs for machine learning
[2]  
AZRAN A, 2004, P C COMP LEARN THEOR, P427
[3]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1705
[4]   On the optimality of conditional expectation as a Bregman predictor [J].
Banerjee, A ;
Guo, X ;
Wang, H .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (07) :2664-2669
[5]  
BARTLETT P, 2006, P NEUR INF PROC SYST
[6]   Convexity, classification, and risk bounds [J].
Bartlett, PL ;
Jordan, MI ;
McAuliffe, JD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156
[7]  
Breiman L., 1984, BIOMETRICS, V40, P874, DOI 10.1201/9781315139470
[8]  
COLLINS M., 2000, MACH LEARN, P158
[9]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[10]  
GENTILE C, 1998, P 1998 C ADV NEUR IN, P225