Bayesian Variable Selection for Probit Mixed Models Applied to Gene Selection

被引:15
作者
Baragatti, Meli [1 ,2 ]
机构
[1] CNRS, IML, Marseille, France
[2] Ipsogen SA Luminy Biotech Enterprises, Marseille, France
来源
BAYESIAN ANALYSIS | 2011年 / 6卷 / 02期
关键词
Bayesian variable selection; random effects; probit mixed regression model; grouping technique (or blocking technique); Metropolis-within-Gibbs algorithm; LOGISTIC-REGRESSION; EXPRESSION; GIBBS; IDENTIFICATION; CLASSIFICATION; SIGNATURES;
D O I
10.1214/11-BA607
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In computational biology, gene expression datasets are characterized by very few individual samples compared to a large number of measurements per sample. Thus, it is appealing to merge these datasets in order to increase the number of observations and diversify the data, allowing a more reliable selection of genes relevant to the biological problem. Besides, the increased size of a merged dataset facilitates its re-splitting into training and validation sets. This necessitates the introduction of the dataset as a random effect. In this context, extending a work of Lee et al. (2003), a method is proposed to select relevant variables among tens of thousands in a probit mixed regression model, considered as part of a larger hierarchical Bayesian model. Latent variables are used to identify subsets of selected variables and the grouping (or blocking) technique of Liu (1994) is combined with a Metropolis-within-Gibbs algorithm (Robert and Casella 2004). The method is applied to a merged dataset made of three individual gene expression datasets, in which tens of thousands of measurements are available for each of several hundred human breast cancer samples. Even for this large dataset comprised of around 20000 predictors, the method is shown to be efficient and feasible. As an illustration, it is used to select the most important genes that characterize the estrogen receptor status of patients with breast cancer.
引用
收藏
页码:209 / 229
页数:21
相关论文
共 38 条
[11]  
Fruhwirth-Schnatter S., 2010, BAYESIAN STAT, P165
[12]   Prior distributions for variance parameters in hierarchical models(Comment on an Article by Browne and Draper) [J].
Gelman, Andrew .
BAYESIAN ANALYSIS, 2006, 1 (03) :515-533
[13]   VARIABLE SELECTION VIA GIBBS SAMPLING [J].
GEORGE, EI ;
MCCULLOCH, RE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) :881-889
[14]  
George EI, 1997, STAT SINICA, V7, P339
[15]   Calibration and empirical Bayes variable selection [J].
George, EI ;
Foster, DP .
BIOMETRIKA, 2000, 87 (04) :731-747
[16]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[17]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[18]   Gene selection: a Bayesian variable selection approach [J].
Lee, KE ;
Sha, NJ ;
Dougherty, ER ;
Vannucci, M ;
Mallick, BK .
BIOINFORMATICS, 2003, 19 (01) :90-97
[20]   Gene expression signatures and biomarkers of noninvasive and invasive breast cancer cells: comprehensive profiles by representational difference analysis, microarrays and proteomics [J].
Nagaraja, GM ;
Othman, M ;
Fox, BP ;
Alsaber, R ;
Pellegrino, CM ;
Zeng, Y ;
Khanna, R ;
Tamburini, P ;
Swaroop, A ;
Kandpal, RP .
ONCOGENE, 2006, 25 (16) :2328-2338