Machine learning approaches for the discovery of gene-gene interactions in disease data

被引:64
作者
Upstill-Goddard, Rosanna [1 ]
Eccles, Diana [1 ]
Fliege, Joerg [2 ]
Collins, Andrew [3 ]
机构
[1] Univ Southampton, Southampton SO16 6YD, Hants, England
[2] Univ Southampton, Sch Math, Operat Res Grp, Southampton SO16 6YD, Hants, England
[3] Univ Southampton, Genet Epidemiol & Bioinformat Res Grp, Southampton SO16 6YD, Hants, England
关键词
machine learning; gene-gene interaction; random forest; support vector machines; multifactor-dimensionality reduction; genome-wide association study; MULTIFACTOR-DIMENSIONALITY REDUCTION; STATISTICAL EPISTASIS; NEURAL-NETWORKS; RANDOM FORESTS; OPTIMIZATION; SUSCEPTIBILITY; ASSOCIATION; CHALLENGES; STRATEGIES; POWER;
D O I
10.1093/bib/bbs024
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Because of the complexity of gene-phenotype relationships machine learning approaches have considerable appeal as a strategy for modelling interactions. A number of such methods have been developed and applied in recent years with some modest success. Progress is hampered by the challenges presented by the complexity of the disease genetic data, including phenotypic and genetic heterogeneity, polygenic forms of inheritance and variable penetrance, combined with the analytical and computational issues arising from the enormous number of potential interactions. We review here recent and current approaches focusing, wherever possible, on applications to real data (particularly in the context of genome-wide association studies) and looking ahead to the further challenges posed by next generation sequencing data.
引用
收藏
页码:251 / 260
页数:10
相关论文
共 42 条
[1]  
Bi J., 2003, Journal of Machine Learning Research, V3, P1229, DOI 10.1162/153244303322753643
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[4]   Improving strategies for detecting genetic patterns of disease susceptibility in association studies [J].
Calle, M. L. ;
Urrea, V. ;
Vellalta, G. ;
Malats, N. ;
Steen, K. V. .
STATISTICS IN MEDICINE, 2008, 27 (30) :6532-6546
[5]   Model-Based Multifactor Dimensionality Reduction for detecting epistasis in case-control data in the presence of noise [J].
Cattaert, Tom ;
Calle, M. Luz ;
Dudek, Scott M. ;
John, Jestinah M. Mahachie ;
Van Lishout, Francois ;
Urrea, Victor ;
Ritchie, Marylyn D. ;
Van Steen, Kristel .
ANNALS OF HUMAN GENETICS, 2011, 75 :78-89
[6]   A support vector machine approach for detecting gene-gene interaction [J].
Chen, Shyh-Huei ;
Sun, Jielin ;
Dimitrov, Latchezar ;
Turner, Aubrey R. ;
Adams, Tamara S. ;
Meyers, Deborah A. ;
Chang, Bao-Li ;
Zheng, S. Lilly ;
Groenberg, Henrik ;
Xu, Jianfeng ;
Hsu, Fang-Chi .
GENETIC EPIDEMIOLOGY, 2008, 32 (02) :152-167
[7]   Odds ratio based multifactor-dimensionality reduction method for detecting gene-gene interactions [J].
Chung, Yujin ;
Lee, Seung Yeoun ;
Elston, Robert C. ;
Park, Taesung .
BIOINFORMATICS, 2007, 23 (01) :71-76
[8]   Detecting gene-gene interactions that underlie human diseases [J].
Cordell, Heather J. .
NATURE REVIEWS GENETICS, 2009, 10 (06) :392-404
[9]   Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans [J].
Cordell, HJ .
HUMAN MOLECULAR GENETICS, 2002, 11 (20) :2463-2468
[10]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297