Detecting gene-gene interactions using a permutation-based random forest method

被引:54
作者
Li, Jing [1 ]
Malley, James D. [2 ]
Andrew, Angeline S. [3 ]
Karagas, Margaret R. [3 ]
Moore, Jason H. [4 ,5 ]
机构
[1] Dartmouth Coll, Geisel Sch Med, Dept Genet, Hanover, NH 03755 USA
[2] NIH, Div Computat Biosci, Ctr Informat Technol, Bldg 10, Bethesda, MD 20892 USA
[3] Dartmouth Coll, Geisel Sch Med, Dept Epidemiol, Hanover, NH 03755 USA
[4] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[5] Univ Penn, Perelman Sch Med, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
来源
BIODATA MINING | 2016年 / 9卷
基金
美国国家卫生研究院;
关键词
Random forest; GWAS; Machine learning; Scale invariant; MULTIFACTOR DIMENSIONALITY REDUCTION; GENOME-WIDE ASSOCIATION; EPISTATIC MODELS; RISK; DISEASE; STRICT; PURE; SNPS;
D O I
10.1186/s13040-016-0093-5
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. Results: We systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistical epistasis network (SEN). Furthermore, we built permuted random forest networks (PRFN), in which we used nodes to represent SNPs and edges to indicate interactions. Conclusions: We successfully developed a scale-invariant methodology to detect pure gene-gene interactions based on permutation strategies and the machine learning method random forest. This methodology showed great potential to be used for detecting gene-gene interactions to study underlying genetic architectures in a scale-free way, which could be benefit to uncover the complex disease mechanisms.
引用
收藏
页数:17
相关论文
共 41 条
[1]   Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility [J].
Andrew, AS ;
Nelson, HH ;
Kelsey, KT ;
Moore, JH ;
Meng, AC ;
Casella, DP ;
Tosteson, TD ;
Schned, AR ;
Karagas, MR .
CARCINOGENESIS, 2006, 27 (05) :1030-1037
[2]   Guidelines for Genome-Wide Association Studies [J].
Barsh, Gregory S. ;
Copenhaver, Gregory P. ;
Gibson, Greg ;
Williams, Scott M. .
PLOS GENETICS, 2012, 8 (07)
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[5]   Chapter 11: Genome-Wide Association Studies [J].
Bush, William S. ;
Moore, Jason H. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (12)
[6]   Random forests for genomic data analysis [J].
Chen, Xi ;
Ishwaran, Hemant .
GENOMICS, 2012, 99 (06) :323-329
[7]   Detecting gene-gene interactions that underlie human diseases [J].
Cordell, Heather J. .
NATURE REVIEWS GENETICS, 2009, 10 (06) :392-404
[8]   Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience [J].
Dasgupta, Abhijit ;
Sun, Yan V. ;
Koenig, Inke R. ;
Bailey-Wilson, Joan E. ;
Malley, James D. .
GENETIC EPIDEMIOLOGY, 2011, 35 :S5-S11
[9]   Understanding multicellular function and disease with human tissue-specific networks [J].
Greene, Casey S. ;
Krishnan, Arjun ;
Wong, Aaron K. ;
Ricciotti, Emanuela ;
Zelaya, Rene A. ;
Himmelstein, Daniel S. ;
Zhang, Ran ;
Hartmann, Boris M. ;
Zaslavsky, Elena ;
Sealfon, Stuart C. ;
Chasman, Daniel I. ;
FitzGerald, Garret A. ;
Dolinski, Kara ;
Grosser, Tilo ;
Troyanskaya, Olga G. .
NATURE GENETICS, 2015, 47 (06) :569-576
[10]  
Greene CS, 2010, BIOCOMPUT-PAC SYM, P327