Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data

被引:6
作者
Dai, Xiaotian [1 ]
Fu, Guifang [1 ]
Zhao, Shaofei [1 ]
Zeng, Yifei [1 ]
机构
[1] SUNY Binghamton Univ, Dept Math Sci, Vestal, NY 13850 USA
关键词
disease; GWAS; unbalanced case-control; genomic selection; genomic prediction; BAYESIAN VARIABLE SELECTION; GENE-GENE INTERACTION; MIXED-MODEL ANALYSIS; POPULATION-STRUCTURE; SUSCEPTIBILITY LOCI; CONJUGATE GRADIENTS; QUADRATIC-FORMS; CLASS IMBALANCE; REGRESSION; CLASSIFICATION;
D O I
10.3390/genes12050736
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.
引用
收藏
页数:14
相关论文
共 98 条
[11]   Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models [J].
Chen, Han ;
Wang, Chaolong ;
Conomos, Matthew P. ;
Stilp, Adrienne M. ;
Li, Zilin ;
Sofer, Tamar ;
Szpiro, Adam A. ;
Chen, Wei ;
Brehm, John M. ;
Celedon, Juan C. ;
Redline, Susan ;
Papanicolaou, George J. ;
Thornton, Timothy A. ;
Laurie, Cathy C. ;
Rice, Kenneth ;
Lin, Xihong .
AMERICAN JOURNAL OF HUMAN GENETICS, 2016, 98 (04) :653-666
[12]   A forest-based approach to identifying gene and gene-gene interactions [J].
Chen, Xiang ;
Liu, Ching-Ti ;
Zhang, Meizhuo ;
Zhang, Heping .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) :19199-19203
[13]   Follow-up of 1715 SNPs from the Wellcome Trust Case Control Consortium genome-wide association study in type I diabetes families [J].
Cooper, J. D. ;
Walker, N. M. ;
Smyth, D. J. ;
Downes, K. ;
Healy, B. C. ;
Todd, J. A. .
GENES AND IMMUNITY, 2009, 10 :S85-S94
[14]   Detecting gene-gene interactions that underlie human diseases [J].
Cordell, Heather J. .
NATURE REVIEWS GENETICS, 2009, 10 (06) :392-404
[15]   Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune-related loci [J].
Cortes, Adrian ;
Hadler, Johanna ;
Pointon, Jenny P. ;
Robinson, Philip C. ;
Karaderi, Tugce ;
Leo, Paul ;
Cremin, Katie ;
Pryce, Karena ;
Harris, Jessica ;
Lee, Seunghun ;
Joo, Kyung Bin ;
Shim, Seung-Cheol ;
Weisman, Michael ;
Ward, Michael ;
Zhou, Xiaodong ;
Garchon, Henri-Jean ;
Chiocchia, Gilles ;
Nossent, Johannes ;
Lie, Benedicte A. ;
Forre, Oystein ;
Tuomilehto, Jaakko ;
Laiho, Kari ;
Jiang, Lei ;
Liu, Yu ;
Wu, Xin ;
Bradbury, Linda A. ;
Elewaut, Dirk ;
Burgos-Vargas, Ruben ;
Stebbings, Simon ;
Appleton, Louise ;
Farrah, Claire ;
Lau, Jonathan ;
Kenna, Tony J. ;
Haroon, Nigil ;
Ferreira, Manuel A. ;
Yang, Jian ;
Mulero, Juan ;
Fernandez-Sueiro, Jose Luis ;
Gonzalez-Gay, Miguel A. ;
Lopez-Larrea, Carlos ;
Deloukas, Panos ;
Donnelly, Peter ;
Bowness, Paul ;
Gafney, Karl ;
Gaston, Hill ;
Gladman, Dafna D. ;
Rahman, Proton ;
Maksymowych, Walter P. ;
Xu, Huji ;
Crusius, J. Bart A. .
NATURE GENETICS, 2013, 45 (07) :730-+
[16]   Detecting PCOS susceptibility loci from genome-wide association studies via iterative trend correlation based feature screening [J].
Dai, Xiaotian ;
Fu, Guifang ;
Reese, Randall .
BMC BIOINFORMATICS, 2020, 21 (01)
[17]   Prediction models for cardiovascular disease risk in the general population: systematic review [J].
Damen, Johanna A. A. G. ;
Hooft, Lotty ;
Schuit, Ewoud ;
Debray, Thomas P. A. ;
Collins, Gary S. ;
Tzoulaki, Ioanna ;
Lassale, Camille M. ;
Siontis, George C. M. ;
Chiocchia, Virginia ;
Roberts, Corran ;
Schlussel, Michael Maia ;
Gerry, Stephen ;
Black, James A. ;
Heus, Pauline ;
van der Schouw, Yvonne T. ;
Peelen, Linda M. ;
Moons, Karel G. M. .
BMJ-BRITISH MEDICAL JOURNAL, 2016, 353
[18]   The use of unbalanced historical data for genomic selection in an international wheat breeding program [J].
Dawson, Julie C. ;
Endelman, Jeffrey B. ;
Heslot, Nicolas ;
Crossa, Jose ;
Poland, Jesse ;
Dreisigacker, Susanne ;
Manes, Yann ;
Sorrells, Mark E. ;
Jannink, Jean-Luc .
FIELD CROPS RESEARCH, 2013, 154 :12-22
[19]   A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS [J].
Dey, Rounak ;
Schmidt, Ellen M. ;
Abecasis, Goncalo R. ;
Lee, Seunggeun .
AMERICAN JOURNAL OF HUMAN GENETICS, 2017, 101 (01) :37-49
[20]   Exploration of gene-gene interaction effects using entropy-based methods [J].
Dong, Changzheng ;
Chu, Xun ;
Wang, Ying ;
Wang, Yi ;
Jin, Li ;
Shi, Tieliu ;
Huang, Wei ;
Li, Yixue .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2008, 16 (02) :229-235