Integrative Analysis Using Module-Guided Random Forests Reveals Correlated Genetic Factors Related to Mouse Weight

被引:19
作者
Chen, Zheng [1 ]
Zhang, Weixiong [1 ,2 ]
机构
[1] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
VARIABLE IMPORTANCE MEASURES; GENOME-WIDE ASSOCIATION; EXPRESSION; SELECTION; CLASSIFICATION; OBESITY; IDENTIFY; TOOL;
D O I
10.1371/journal.pcbi.1002956
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Complex traits such as obesity are manifestations of intricate interactions of multiple genetic factors. However, such relationships are difficult to identify. Thanks to the recent advance in high-throughput technology, a large amount of data has been collected for various complex traits, including obesity. These data often measure different biological aspects of the traits of interest, including genotypic variations at the DNA level and gene expression alterations at the RNA level. Integration of such heterogeneous data provides promising opportunities to understand the genetic components and possibly genetic architecture of complex traits. In this paper, we propose a machine learning based method, module-guided Random Forests (mgRF), to integrate genotypic and gene expression data to investigate genetic factors and molecular mechanism underlying complex traits. mgRF is an augmented Random Forests method enhanced by a network analysis for identifying multiple correlated variables of different types. We applied mgRF to genetic markers and gene expression data from a cohort of F2 female mouse intercross. mgRF outperformed several existing methods in our extensive comparison. Our new approach has an improved performance when combining both genotypic and gene expression data compared to using either one of the two types of data alone. The resulting predictive variables identified by mgRF provide information of perturbed pathways that are related to body weight. More importantly, the results uncovered intricate interactions among genetic markers and genes that have been overlooked if only one type of data was examined. Our results shed light on genetic mechanisms of obesity and our approach provides a promising complementary framework to the "genetics of gene expression" analysis for integrating genotypic and gene expression information for analyzing complex traits.
引用
收藏
页数:12
相关论文
共 45 条
  • [1] Enriched random forests
    Amaratunga, Dhammika
    Cabrera, Javier
    Lee, Yung-Seop
    [J]. BIOINFORMATICS, 2008, 24 (18) : 2010 - 2014
  • [2] [Anonymous], 2006, Journal of the Royal Statistical Society, Series B
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
    Burton, Paul R.
    Clayton, David G.
    Cardon, Lon R.
    Craddock, Nick
    Deloukas, Panos
    Duncanson, Audrey
    Kwiatkowski, Dominic P.
    McCarthy, Mark I.
    Ouwehand, Willem H.
    Samani, Nilesh J.
    Todd, John A.
    Donnelly, Peter
    Barrett, Jeffrey C.
    Davison, Dan
    Easton, Doug
    Evans, David
    Leung, Hin-Tak
    Marchini, Jonathan L.
    Morris, Andrew P.
    Spencer, Chris C. A.
    Tobin, Martin D.
    Attwood, Antony P.
    Boorman, James P.
    Cant, Barbara
    Everson, Ursula
    Hussey, Judith M.
    Jolley, Jennifer D.
    Knight, Alexandra S.
    Koch, Kerstin
    Meech, Elizabeth
    Nutland, Sarah
    Prowse, Christopher V.
    Stevens, Helen E.
    Taylor, Niall C.
    Walters, Graham R.
    Walker, Neil M.
    Watkins, Nicholas A.
    Winzer, Thilo
    Jones, Richard W.
    McArdle, Wendy L.
    Ring, Susan M.
    Strachan, David P.
    Pembrey, Marcus
    Breen, Gerome
    St Clair, David
    Caesar, Sian
    Gordon-Smith, Katherine
    Jones, Lisa
    Fraser, Christine
    Green, Elain K.
    [J]. NATURE, 2007, 447 (7145) : 661 - 678
  • [7] Harnessing gene expression to identify the genetic basis of drug resistance
    Chen, Bo-Juen
    Causton, Helen C.
    Mancenido, Denesy
    Goddard, Noel L.
    Perlstein, Ethan O.
    Pe'er, Dana
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2009, 5
  • [8] Gene selection and classification of microarray data using random forest -: art. no. 3
    Díaz-Uriarte, R
    de Andrés, SA
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [9] Protein Networks as Logic Functions in Development and Cancer
    Dutkowski, Janusz
    Ideker, Trey
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (09)
  • [10] A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity
    Frayling, Timothy M.
    Timpson, Nicholas J.
    Weedon, Michael N.
    Zeggini, Eleftheria
    Freathy, Rachel M.
    Lindgren, Cecilia M.
    Perry, John R. B.
    Elliott, Katherine S.
    Lango, Hana
    Rayner, Nigel W.
    Shields, Beverley
    Harries, Lorna W.
    Barrett, Jeffrey C.
    Ellard, Sian
    Groves, Christopher J.
    Knight, Bridget
    Patch, Ann-Marie
    Ness, Andrew R.
    Ebrahim, Shah
    Lawlor, Debbie A.
    Ring, Susan M.
    Ben-Shlomo, Yoav
    Jarvelin, Marjo-Riitta
    Sovio, Ulla
    Bennett, Amanda J.
    Melzer, David
    Ferrucci, Luigi
    Loos, Ruth J. F.
    Barroso, Ines
    Wareham, Nicholas J.
    Karpe, Fredrik
    Owen, Katharine R.
    Cardon, Lon R.
    Walker, Mark
    Hitman, Graham A.
    Palmer, Colin N. A.
    Doney, Alex S. F.
    Morris, Andrew D.
    Smith, George Davey
    Hattersley, Andrew T.
    McCarthy, Mark I.
    [J]. SCIENCE, 2007, 316 (5826) : 889 - 894