Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering

被引:68
作者
Guo, Xuan [1 ]
Meng, Yu [1 ]
Yu, Ning [1 ]
Pan, Yi [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
关键词
Cloud computing; Genome-wide association studies; Dynamic clustering; MULTIFACTOR-DIMENSIONALITY REDUCTION; GENE-GENE INTERACTIONS; ASSOCIATION; INFERENCE; PRIORITIZATION; DISEASES;
D O I
10.1186/1471-2105-15-102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Backgroud: Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. Results: In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Conclusions: Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two-and three- locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.
引用
收藏
页数:16
相关论文
共 36 条
[1]  
Bateson W., 1909, MENDELS PRINCIPLES H
[2]   Most parsimonious haplotype allele sharing determination [J].
Cai Z. ;
Sabaa H. ;
Wang Y. ;
Goebel R. ;
Wang Z. ;
Xu J. ;
Stothard P. ;
Lin G. .
BMC Bioinform., 2009, 10 (1)
[3]   Model-Based Multifactor Dimensionality Reduction for detecting epistasis in case-control data in the presence of noise [J].
Cattaert, Tom ;
Calle, M. Luz ;
Dudek, Scott M. ;
John, Jestinah M. Mahachie ;
Van Lishout, Francois ;
Urrea, Victor ;
Ritchie, Marylyn D. ;
Van Steen, Kristel .
ANNALS OF HUMAN GENETICS, 2011, 75 :78-89
[4]   ToppGene Suite for gene list enrichment analysis and candidate gene prioritization [J].
Chen, Jing ;
Bardes, Eric E. ;
Aronow, Bruce J. ;
Jegga, Anil G. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W305-W311
[5]   Efficient haplotype inference algorithms in one whole genome scan for pedigree data with non-genotyped founders [J].
Cheng, Yongxi ;
Sabaa, Hadi ;
Cai, Zhipeng ;
Goebel, Randy ;
Lin, Guohui .
ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2009, 25 (03) :477-488
[6]   Detecting gene-gene interactions that underlie human diseases [J].
Cordell, Heather J. .
NATURE REVIEWS GENETICS, 2009, 10 (06) :392-404
[7]   Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans [J].
Cordell, HJ .
HUMAN MOLECULAR GENETICS, 2002, 11 (20) :2463-2468
[8]   High-Order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions [J].
Fang, Gang ;
Haznadar, Majda ;
Wang, Wen ;
Yu, Haoyu ;
Steinbach, Michael ;
Church, Timothy R. ;
Oetting, William S. ;
Van Ness, Brian ;
Kumar, Vipin .
PLOS ONE, 2012, 7 (04)
[9]   A variable selection method for genome-wide association studies [J].
He, Qianchuan ;
Lin, Dan-Yu .
BIOINFORMATICS, 2011, 27 (01) :1-8
[10]   De Novo Assembly Methods for Next Generation Sequencing Data [J].
He, Yiming ;
Zhang, Zhen ;
Peng, Xiaoqing ;
Wu, Fangxiang ;
Wang, Jianxin .
TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (05) :500-514