Boosting for Correlated Binary Classification

被引:7
作者
Adewale, Adeniyi J. [1 ]
Dinu, Irina [2 ]
Yasui, Yutaka [2 ]
机构
[1] Merck Res Labs, N Wales, PA 19454 USA
[2] Univ Alberta, Sch Publ Hlth, Dept Publ Hlth Sci, Edmonton, AB T6G 2G3, Canada
关键词
Functional gradient descent; Likelihood optimization; LogitBoost; Matched-pair; Penalized quasi-likelihood (PQL); LONGITUDINAL DATA-ANALYSIS; MIXED MODELS; REGRESSION;
D O I
10.1198/jcgs.2009.07118
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Boosting is a successful method for dealing with problems of high-dimensional classification of independent data. However, existing variants do not address the correlations in the context of longitudinal or cluster study-designs with measurements collected across two or more time points or in clusters. This article presents two new variants of boosting with a focus on high-dimensional classification problems with matched-pair binary responses or, more generally, any correlated binary responses. The first method is based on the generic functional gradient descent algorithm and the second method is based on a direct likelihood optimization approach. The performance and the computational requirements of the algorithms were evaluated using simulations. Whereas the performance of the two methods is similar, the computational efficiency of the generic-functional-gradient-descent-based algorithm far exceeds that of the direct-likelihood-optimization-based algorithm. The former method is illustrated using data on gene expression changes in de novo and relapsed childhood acute lymphoblastic leukemia. Computer code implementing the algorithms and the relevant dataset are available online as supplemental materials.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 14 条
[1]   Biologic pathways associated with relapse in childhood acute lymphoblastic leukemia: a Children's Oncology Group study [J].
Bhojwani, Deepa ;
Kang, Huining ;
Moskowitz, Naomi P. ;
Min, Dong-Joon ;
Lee, Hokyung ;
Potter, Jeffrey W. ;
Davidson, George ;
Willman, Cheryl L. ;
Borowitz, Michael J. ;
Belitskaya-Levy, Ilana ;
Hunger, Stephen P. ;
Raetz, Elizabeth A. ;
Carroll, William L. .
BLOOD, 2006, 108 (02) :711-717
[2]   APPROXIMATE INFERENCE IN GENERALIZED LINEAR MIXED MODELS [J].
BRESLOW, NE ;
CLAYTON, DG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :9-25
[3]   Boosting with the L2 loss:: Regression and classification [J].
Bühlmann, P ;
Yu, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :324-339
[4]   BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY [J].
FREUND, Y .
INFORMATION AND COMPUTATION, 1995, 121 (02) :256-285
[5]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[6]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[7]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[8]   Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data [J].
Li, HZ ;
Luan, YH .
BIOINFORMATICS, 2005, 21 (10) :2403-2409
[9]  
LIANG KY, 1986, BIOMETRIKA, V73, P13, DOI 10.1093/biomet/73.1.13
[10]  
Molenberghs G, 2006, Models for Discrete Longitudinal Data