Gene-gene interaction filtering with ensemble of filters

被引:30
作者
Yang, Pengyi [1 ,2 ,3 ]
Ho, Joshua W. K. [1 ,3 ]
Yang, Yee Hwa [2 ]
Zhou, Bing B. [1 ,4 ]
机构
[1] Univ Sydney, Sch Informat Technol, Sydney, NSW 2006, Australia
[2] Univ Sydney, Sch Math & Stat, Sydney, NSW 2006, Australia
[3] Natl ICT Australia, Eveleigh, NSW 2015, Australia
[4] Univ Sydney, Ctr Distributed & High Performance Comp, Sydney, NSW 2006, Australia
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
澳大利亚研究理事会;
关键词
FEATURE-SELECTION; WIDE ASSOCIATION; EPISTASIS; RELIEFF; SNPS;
D O I
10.1186/1471-2105-12-S1-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the 'unstable' results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance. Results: We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional chi(2)-test and odds ratio methods in terms of retaining gene-gene interactions.
引用
收藏
页数:10
相关论文
共 29 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]   Bladder cancer SNP panel predicts susceptibility and survival [J].
Andrew, Angeline S. ;
Gui, Jiang ;
Sanderson, Arthur C. ;
Mason, Rebecca A. ;
Morlock, Elaine V. ;
Schned, Alan R. ;
Kelsey, Karl T. ;
Marsit, Carmen J. ;
Moore, Jason H. ;
Karagas, Margaret R. .
HUMAN GENETICS, 2009, 125 (5-6) :527-539
[3]  
[Anonymous], P GEN EV ALG C
[4]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[5]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[6]   An innovative approach for testing bioinformatics programs using metamorphic testing [J].
Chen, Tsong Yueh ;
Ho, Joshua W. K. ;
Liu, Huai ;
Xie, Xiaoyuan .
BMC BIOINFORMATICS, 2009, 10
[7]   A forest-based approach to identifying gene and gene-gene interactions [J].
Chen, Xiang ;
Liu, Ching-Ti ;
Zhang, Meizhuo ;
Zhang, Heping .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) :19199-19203
[8]   Odds ratio based multifactor-dimensionality reduction method for detecting gene-gene interactions [J].
Chung, Yujin ;
Lee, Seung Yeoun ;
Elston, Robert C. ;
Park, Taesung .
BIOINFORMATICS, 2007, 23 (01) :71-76
[9]   Detecting gene-gene interactions that underlie human diseases [J].
Cordell, Heather J. .
NATURE REVIEWS GENETICS, 2009, 10 (06) :392-404
[10]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15