A Modification of the Lasso Method by Using the Bahadur Representation for the Genome-Wide Association Study

被引:0
作者
Utkin, Lev V. [1 ]
Zhuk, Yulia A. [2 ]
机构
[1] Peter Great St Petersburg Polytech Univ, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
来源
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS | 2018年 / 42卷 / 02期
关键词
data analysis; feature selection; Lasso; Bahadur representation; genome-wide association study;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A modification of the Lasso method as a powerful machine learning tool applied to a genome-wide association study is proposed in the paper. From the machine learning point of view, a feature selection problem is solved in the paper, where features are single nucleotide polymorphisms or DNA-markers whose association with a quantitative trait is established. The main idea underlying the modification is to take into account correlations between DNA-markers and peculiarities of phenotype values by using the Bahadur representation of joint probabilities of binary random variables. Interactions of DNA-markers called the epistasis are also considered in the framework of the proposed modification. Various numerical experiments with real datasets illustrate the proposed modification.
引用
收藏
页码:175 / 188
页数:14
相关论文
共 57 条
[1]  
Altidor W, 2011, HANDBOOK OF DATA INTENSIVE COMPUTING, P349, DOI 10.1007/978-1-4614-1415-5_13
[2]  
[Anonymous], 2012, ADOLESCENT PSYCHIAT, V6, DOI [10.1186/1753, DOI 10.1186/1753-6561-6-S2-S10]
[3]  
Bahadur R. R., 1961, STUDIES ITEM ANAL PR, V6, P158
[4]   Bayesian neural networks for detecting epistasis in genetic association studies [J].
Beam, Andrew L. ;
Motsinger-Reif, Alison ;
Doyle, Jon .
BMC BIOINFORMATICS, 2014, 15
[5]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[6]   Estimation of epistasis in doubled haploid barley populations considering interactions between all possible marker pairs [J].
Bocianowski, Jan .
EUPHYTICA, 2014, 196 (01) :105-115
[7]  
Buhlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[8]  
Chen L., 2011, BMC GENOMICS, V12, P1
[9]   Construction and application for QTL analysis of a Restriction Site Associated DNA (RAD) linkage map in barley [J].
Chutimanitsakun, Yada ;
Nipper, Rick W. ;
Cuesta-Marcos, Alfonso ;
Cistue, Luis ;
Corey, Ann ;
Filichkina, Tanya ;
Johnson, Eric A. ;
Hayes, Patrick M. .
BMC GENOMICS, 2011, 12
[10]   Comparative mapping of the Oregon Wolfe Barley using doubled haploid lines derived from female and male gametes [J].
Cistue, L. ;
Cuesta-Marcos, A. ;
Chao, S. ;
Echavarri, B. ;
Chutimanitsakun, Y. ;
Corey, A. ;
Filichkina, T. ;
Garcia-Marino, N. ;
Romagosa, I. ;
Hayes, P. M. .
THEORETICAL AND APPLIED GENETICS, 2011, 122 (07) :1399-1410