Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data

被引:61
作者
Li, HZ [1 ]
Luan, YH
机构
[1] Univ Calif Davis, Rowe Program Human Genet, Davis, CA 95616 USA
[2] Shandong Univ, Sch Math & Systemat Sci, Jinan 250100, Shandong, Peoples R China
关键词
D O I
10.1093/bioinformatics/bti324
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An important area of research in the postgenomics era is to relate high-dimensional genetic or genomic data to various clinical phenotypes of patients. Due to large variability in time to certain clinical events among patients, studying possibly censored survival phenotypes can be more informative than treating the phenotypes as categorical variables. Due to high dimensionality and censoring, building a predictive model for time to event is more difficult than the classification/linear regression problem. We propose to develop a boosting procedure using smoothing splines for estimating the general proportional hazards models. Such a procedure can potentially be used for identifying non-linear effects of genes on the risk of developing an event. Results: Our empirical simulation studies showed that the procedure can indeed recover the true functional forms of the covariates and can identify important variables that are related to the risk of an event. Results from predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed method can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. In addition, there is clear evidence of non-linear effects of some genes on survival time.
引用
收藏
页码:2403 / 2409
页数:7
相关论文
共 23 条
[1]   Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[2]   Boosting with the L2 loss:: Regression and classification [J].
Bühlmann, P ;
Yu, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :324-339
[3]  
BUHLMANN P, 2004, IN PRESS BOOSTING HI
[4]  
Buhlmann P., 2003, P 3 INT WORKSH DISTR
[5]  
Cheng SC, 1995, BIOMETRIKA, V82, P835, DOI 10.1093/biomet/82.4.835
[6]  
COX DR, 1972, J R STAT SOC B, V34, P187
[7]   Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069
[8]  
Fan JQ, 1997, ANN STAT, V25, P1661
[9]   BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY [J].
FREUND, Y .
INFORMATION AND COMPUTATION, 1995, 121 (02) :256-285
[10]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139