Kernel Machine Approach to Testing the Significance of Multiple Genetic Markers for Risk Prediction

被引:47
作者
Cai, Tianxi [1 ]
Tonini, Giulia [2 ]
Lin, Xihong [1 ]
机构
[1] Harvard Univ, Dept Biostat, Boston, MA 02115 USA
[2] Univ Florence, Dept Stat, Florence, Italy
基金
美国国家科学基金会;
关键词
Gene-set analysis; Genetic association; Genetic pathways; Kernel machine; Kernel PCA; Risk prediction; Score test; Survival analysis; SUPPORT VECTOR MACHINES; SEMIPARAMETRIC REGRESSION; PATHWAY; ASSOCIATION; SURVIVAL; CLASSIFICATION;
D O I
10.1111/j.1541-0420.2010.01544.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There is growing evidence that genomic and proteomic research holds great potential for changing irrevocably the practice of medicine. The ability to identify important genomic and biological markers for risk assessment can have a great impact in public health from disease prevention, to detection, to treatment selection. However, the potentially large number of markers and the complexity in the relationship between the markers and the outcome of interest impose a grand challenge in developing accurate risk prediction models. The standard approach to identifying important markers often assesses the marginal effects of individual markers on a phenotype of interest. When multiple markers relate to the phenotype simultaneously via a complex structure, such a type of marginal analysis may not be effective. To overcome such difficulties, we employ a kernel machine Cox regression framework and propose an efficient score test to assess the overall effect of a set of markers, such as genes within a pathway or a network, on survival outcomes. The proposed test has the advantage of capturing the potentially nonlinear effects without explicitly specifying a particular nonlinear functional form. To approximate the null distribution of the score statistic, we propose a simple resampling procedure that can be easily implemented in practice. Numerical studies suggest that the test performs well with respect to both empirical size and power even when the number of variables in a gene set is not small compared to the sample size.
引用
收藏
页码:975 / 986
页数:12
相关论文
共 42 条
[1]  
[Anonymous], 1991, Counting Processes and Survival Analysis
[2]   Learning eigenfunctions links spectral embedding and kernel PCA [J].
Bengio, Y ;
Delalleau, O ;
Le Roux, N ;
Paiement, JF ;
Vincent, P ;
Ouimet, M .
NEURAL COMPUTATION, 2004, 16 (10) :2197-2219
[3]  
Bilias Y, 1997, ANN STAT, V25, P662
[4]  
BRAUN M, 2005, THESIS U BONN BONN
[5]   APPROXIMATE INFERENCE IN GENERALIZED LINEAR MIXED MODELS [J].
BRESLOW, NE ;
CLAYTON, DG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :9-25
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]  
Buhmann M.D., 2003, C MO AP C M, V12, P259, DOI 10.1017/CBO9780511543241
[8]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[9]   Semiparametric regression analysis for clustered failure time data [J].
Cai, T ;
Wei, LJ ;
Wilcox, M .
BIOMETRIKA, 2000, 87 (04) :867-878
[10]  
Christianini N., 2000, INTRO SUPPORT VECTOR, P189