Comprehensive-GWAS: a pipeline for genome-wide association studies utilizing cross-validation to assess the predictivity of genetic variations

被引:2
作者
Dagasso, Gabrielle [1 ]
Yan, Yan [2 ]
Wang, Lipu [3 ]
Li, Longhai [4 ]
Kutcher, Randy [3 ]
Zhang, Wentao [5 ]
Jin, Lingling [6 ]
机构
[1] Thompson Rivers Univ, Dept Math & Stat, Kamloops, BC, Canada
[2] Thompson Rivers Univ, Dept Comp Sci, Kamloops, BC, Canada
[3] Univ Saskatchewan, Dept Plant Sci, Saskatoon, SK, Canada
[4] Univ Saskatchewan, Dept Math & Stat, Saskatoon, SK, Canada
[5] Natl Res Council Canada, Ottawa, ON, Canada
[6] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE | 2020年
关键词
SOFTWARE; MODELS;
D O I
10.1109/BIBM49941.2020.9313355
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genome-wide association studies is an important approach to associate genetic variations among individuals with a particular trait. Despite many GWAS programs have been developed based on different statistical models, their results could vary to a large extent. To obtain a more comprehensive and accurate set of associated SNPs with a trait, we present comprehensive-GWAS, a novel automated pipeline that allows a two-step wrapper model for seamless GWAS analyses between various programs involved in performing traditional GWAS analyses and machine learning methods with additional population structure analysis. It first performs population structure analysis, then executes multiple GWAS software and combines their results into a single SNP subset. After that, it selects relevant SNPs with high individual and/or joint effects from that SNP subset and assess the predictivity of the model using cross-validation by LASSO. The combined and validated "true" significant SNPs are output as Manhattan plot, QQ plot and statistical results for each trait. To demonstrate the utility of the comprehensive-GWAS pipeline, it was applied to 199 wheat varieties that were genotyped with 90K infinium SNP array and phenotyped for traits related to fusarium head blight (FHB) disease in greenhouse condition in the year 2019 with three replications. It pinpoints genome regions that are more likely to be responsible for FHB resistance. The results will contribute to characterizing the genetic architecture of wheat lines with the highest FHB resistance. The pipeline is publicly available at https://github.com/notTrivial/Comprehensive-GWAS.
引用
收藏
页码:1361 / 1367
页数:7
相关论文
共 20 条
[1]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]   TASSEL: software for association mapping of complex traits in diverse samples [J].
Bradbury, Peter J. ;
Zhang, Zhiwu ;
Kroon, Dallas E. ;
Casstevens, Terry M. ;
Ramdoss, Yogesh ;
Buckler, Edward S. .
BIOINFORMATICS, 2007, 23 (19) :2633-2635
[3]   Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study [J].
Evanno, G ;
Regnaut, S ;
Goudet, J .
MOLECULAR ECOLOGY, 2005, 14 (08) :2611-2620
[4]  
Francis R., 2019, TABULATE ANAL VISUAL
[5]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[6]  
Hilton AJ, 1999, PLANT PATHOL, V48, P202, DOI 10.1046/j.1365-3059.1999.00339.x
[7]   GAPIT: genome association and prediction integrated tool [J].
Lipka, Alexander E. ;
Tian, Feng ;
Wang, Qishan ;
Peiffer, Jason ;
Li, Meng ;
Bradbury, Peter J. ;
Gore, Michael A. ;
Buckler, Edward S. ;
Zhang, Zhiwu .
BIOINFORMATICS, 2012, 28 (18) :2397-2399
[8]  
Lippert C, 2011, NAT METHODS, V8, P833, DOI [10.1038/NMETH.1681, 10.1038/nmeth.1681]
[9]   Efficient Bayesian mixed-model analysis increases association power in large cohorts [J].
Loh, Po-Ru ;
Tucker, George ;
Bulik-Sullivan, Brendan K. ;
Vilhjalmsson, Bjarni J. ;
Finucane, Hilary K. ;
Salem, Rany M. ;
Chasman, Daniel I. ;
Ridker, Paul M. ;
Neale, Benjamin M. ;
Berger, Bonnie ;
Patterson, Nick ;
Price, Alkes L. .
NATURE GENETICS, 2015, 47 (03) :284-+
[10]   Finding the missing heritability of complex diseases [J].
Manolio, Teri A. ;
Collins, Francis S. ;
Cox, Nancy J. ;
Goldstein, David B. ;
Hindorff, Lucia A. ;
Hunter, David J. ;
McCarthy, Mark I. ;
Ramos, Erin M. ;
Cardon, Lon R. ;
Chakravarti, Aravinda ;
Cho, Judy H. ;
Guttmacher, Alan E. ;
Kong, Augustine ;
Kruglyak, Leonid ;
Mardis, Elaine ;
Rotimi, Charles N. ;
Slatkin, Montgomery ;
Valle, David ;
Whittemore, Alice S. ;
Boehnke, Michael ;
Clark, Andrew G. ;
Eichler, Evan E. ;
Gibson, Greg ;
Haines, Jonathan L. ;
Mackay, Trudy F. C. ;
McCarroll, Steven A. ;
Visscher, Peter M. .
NATURE, 2009, 461 (7265) :747-753