Deciphering the complex: Methodological overview of statistical models to derive OMICS-based biomarkers

被引:96
作者
Chadeau-Hyam, Marc [1 ]
Campanella, Gianluca [1 ]
Jombart, Thibaut [2 ]
Bottolo, Leonardo [3 ]
Portengen, Lutzen [4 ]
Vineis, Paolo [1 ,5 ]
Liquet, Benoit [6 ]
Vermeulen, Roel C. H. [4 ,7 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Epidemiol & Biostat, Sch Publ Hlth, MRC HPA Ctr Environm & Hlth, London W2 1PG, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Infect Dis Epidemiol, MRC Ctr Outbreak Anal & Modelling, London W2 1PG, England
[3] Univ London Imperial Coll Sci Technol & Med, Dept Math, London W2 1PG, England
[4] Univ Utrecht, Inst Risk Assessment, Utrecht, Netherlands
[5] Human Genet Fdn, HuGeF, Turin, Italy
[6] Inst Publ Hlth, MRC Biostat Unit, Cambridge, England
[7] Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
关键词
OMICS data; biomarkers; statistical review; PARTIAL LEAST-SQUARES; BAYESIAN VARIABLE SELECTION; GENOME-WIDE ASSOCIATION; FALSE DISCOVERY RATE; PRINCIPAL COMPONENT ANALYSIS; MIXED-EFFECTS MODELS; STOCHASTIC SEARCH; REGULARIZATION PATHS; GENE-EXPRESSION; REGRESSION;
D O I
10.1002/em.21797
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets. (c) 2013 Wiley Periodicals, Inc.
引用
收藏
页码:542 / 557
页数:16
相关论文
共 121 条
[1]  
[Anonymous], 1993, RESAMPLING BASED MUL
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]  
[Anonymous], 1993, Theory and applications of correspondence analysis
[4]  
Bach F., 2008, COMPUT RES REPOSIT
[5]   A tutorial on statistical methods for population association studies [J].
Balding, David J. .
NATURE REVIEWS GENETICS, 2006, 7 (10) :781-791
[6]   A study of variable selection using g-prior distribution with ridge parameter [J].
Baragatti, M. ;
Pommeret, D. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (06) :1920-1934
[7]   Bayesian Variable Selection for Probit Mixed Models Applied to Gene Selection [J].
Baragatti, Meli .
BAYESIAN ANALYSIS, 2011, 6 (02) :209-229
[8]   Patterns of DNA methylation in individual colonic crypts reveal aging and cancer-related field defects in the morphologically normal mucosa [J].
Belshaw, Nigel J. ;
Pal, Nandita ;
Tapp, Henri S. ;
Dainty, Jack R. ;
Lewis, Michael P. N. ;
Williams, Mark R. ;
Lund, Elizabeth K. ;
Johnson, Ian T. .
CARCINOGENESIS, 2010, 31 (06) :1158-1163
[9]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[10]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300