Comparative analysis of machine learning models for shortlisting SNPs to facilitate detection of marginal epistasis in GWAS

被引:0
作者
Dasmandal, Tanwy [1 ,2 ]
Sinha, Dipro [4 ]
Rai, Anil [3 ]
Mishra, Dwijesh Chandra [4 ]
Archak, Sunil [5 ]
机构
[1] ICAR Indian Agr Res Inst, Grad Sch, New Delhi, India
[2] ICAR Natl Bur Fish Genet Resources, Lucknow, Uttar Pradesh, India
[3] Indian Council Agr Res, New Delhi, India
[4] ICAR Indian Agr Stat Res Inst, New Delhi, India
[5] ICAR Natl Bur Plant Genet Resources, New Delhi 110012, India
关键词
Marginal epistasis; Machine learning; GWAS; Feature selection; SNP-SNP interactions; GENOME-WIDE ASSOCIATION; GENETIC ARCHITECTURE; COMPLEX TRAITS; COMMON; SELECTION;
D O I
10.1007/s41060-024-00647-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Epistasis, an essential genetic element causing phenotypic diversity, is frequently characterized as the interaction between two or more genes. Previous models could identify marginal epistatic interactions by mapping variants that have nonzero marginal epistatic effects. However, these models fail short of identifying individual interaction partners. To reduce the computational burden of the existing epistasis detection algorithms without compromising the detection of exact epistatic partners, strengths of various machine learning algorithms were exploited as a filtering strategy. Seven machine learning strategies were compared for shortlisting marginally associated SNPs that includes AdaBoost, artificial neural network, 3 random forest, stepwise regression, ridge regression, lasso and elastic net. Datasets were simulated for different combinations of heritability and minor allele frequencies, and performances of different algorithms were evaluated using power and precision measures. We found that ridge regression model outperformed the other models in shortlisting marginal epistasis-related SNPs. Thus, it is expected that epistasis detection tools will benefit by adding a filtering stage using ridge regression for efficient detection of marginal epistasis in large genomic datasets.
引用
收藏
页数:10
相关论文
共 41 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2020, Practical statistics for data scientists: 50+ essential concepts using r and python
[3]   Common and rare variants in multifactorial susceptibility to common diseases [J].
Bodmer, Walter ;
Bonilla, Carolina .
NATURE GENETICS, 2008, 40 (06) :695-701
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   RAVAR: a curated repository for rare variant-trait associations [J].
Cao, Chen ;
Shao, Mengting ;
Zuo, Chunman ;
Kwok, Devin ;
Liu, Lin ;
Ge, Yuli ;
Zhang, Zilong ;
Cui, Feifei ;
Chen, Mingshuai ;
Fan, Rui ;
Ding, Yijie ;
Jiang, Hangjin ;
Wang, Guishen ;
Zou, Quan .
NUCLEIC ACIDS RESEARCH, 2024, 52 (D1) :D990-D997
[6]   Genome-wide association studies: applications and insights gained in Ophthalmology [J].
Chandra, A. ;
Mitry, D. ;
Wright, A. ;
Campbell, H. ;
Charteris, D. G. .
EYE, 2014, 28 (09) :1066-1079
[7]   Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits [J].
Crawford, Lorin ;
Zeng, Ping ;
Mukherjee, Sayan ;
Zhou, Xiang .
PLOS GENETICS, 2017, 13 (07)
[8]   Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP [J].
Endelman, Jeffrey B. .
PLANT GENOME, 2011, 4 (03) :250-255
[9]  
Freund Y., 1999, Journal of Japanese Society for Artificial Intelligence, V14, P771
[10]  
Greene CS, 2008, LECT NOTES COMPUT SC, V5217, P37, DOI 10.1007/978-3-540-87527-7_4