Integrative Analysis Using Module-Guided Random Forests Reveals Correlated Genetic Factors Related to Mouse Weight

被引:19
作者
Chen, Zheng [1 ]
Zhang, Weixiong [1 ,2 ]
机构
[1] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
VARIABLE IMPORTANCE MEASURES; GENOME-WIDE ASSOCIATION; EXPRESSION; SELECTION; CLASSIFICATION; OBESITY; IDENTIFY; TOOL;
D O I
10.1371/journal.pcbi.1002956
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Complex traits such as obesity are manifestations of intricate interactions of multiple genetic factors. However, such relationships are difficult to identify. Thanks to the recent advance in high-throughput technology, a large amount of data has been collected for various complex traits, including obesity. These data often measure different biological aspects of the traits of interest, including genotypic variations at the DNA level and gene expression alterations at the RNA level. Integration of such heterogeneous data provides promising opportunities to understand the genetic components and possibly genetic architecture of complex traits. In this paper, we propose a machine learning based method, module-guided Random Forests (mgRF), to integrate genotypic and gene expression data to investigate genetic factors and molecular mechanism underlying complex traits. mgRF is an augmented Random Forests method enhanced by a network analysis for identifying multiple correlated variables of different types. We applied mgRF to genetic markers and gene expression data from a cohort of F2 female mouse intercross. mgRF outperformed several existing methods in our extensive comparison. Our new approach has an improved performance when combining both genotypic and gene expression data compared to using either one of the two types of data alone. The resulting predictive variables identified by mgRF provide information of perturbed pathways that are related to body weight. More importantly, the results uncovered intricate interactions among genetic markers and genes that have been overlooked if only one type of data was examined. Our results shed light on genetic mechanisms of obesity and our approach provides a promising complementary framework to the "genetics of gene expression" analysis for integrating genotypic and gene expression information for analyzing complex traits.
引用
收藏
页数:12
相关论文
共 45 条
  • [11] Integrating genetic and network analysis to characterize genes related to mouse weight
    Ghazalpour, Anatole
    Doss, Sudheer
    Zhang, Bin
    Wang, Susanna
    Plaisier, Christopher
    Castellanos, Ruth
    Brozell, Alec
    Schadt, Eric E.
    Drake, Thomas A.
    Lusis, Aldons J.
    Horvath, Steve
    [J]. PLOS GENETICS, 2006, 2 (08): : 1182 - 1192
  • [12] Random Forests for Genetic Association Studies
    Goldstein, Benjamin A.
    Polley, Eric C.
    Briggs, Farren B. S.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
  • [13] An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings
    Goldstein, Benjamin A.
    Hubbard, Alan E.
    Cutler, Adele
    Barcellos, Lisa F.
    [J]. BMC GENETICS, 2010, 11
  • [14] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422
  • [15] Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers
    Jakobsdottir, Johanna
    Gorin, Michael B.
    Conley, Yvette P.
    Ferrell, Robert E.
    Weeks, Daniel E.
    [J]. PLOS GENETICS, 2009, 5 (02):
  • [16] A random forest approach to the detection of epistatic interactions in case-control studies
    Jiang, Rui
    Tang, Wanwan
    Wu, Xuebing
    Fu, Wenhui
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [17] Regularized ROC method for disease classification and biomarker selection with microarray data
    Ma, SG
    Huang, J
    [J]. BIOINFORMATICS, 2005, 21 (24) : 4356 - 4362
  • [18] Genome-wide association studies for complex traits: consensus, uncertainty and challenges
    McCarthy, Mark I.
    Abecasis, Goncalo R.
    Cardon, Lon R.
    Goldstein, David B.
    Little, Julian
    Ioannidis, John P. A.
    Hirschhorn, Joel N.
    [J]. NATURE REVIEWS GENETICS, 2008, 9 (05) : 356 - 369
  • [19] Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits
    Mehrabian, M
    Allayee, H
    Stockton, J
    Lum, PY
    Drake, TA
    Castellani, LW
    Suh, M
    Armour, C
    Edwards, S
    Lamb, J
    Lusis, AJ
    Schadt, EE
    [J]. NATURE GENETICS, 2005, 37 (11) : 1224 - 1233
  • [20] Performance of random forest when SNPs are in linkage disequilibrium
    Meng, Yan A.
    Yu, Yi
    Cupples, L. Adrienne
    Farrer, Lindsay A.
    Lunetta, Kathryn L.
    [J]. BMC BIOINFORMATICS, 2009, 10