Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

被引:84
作者
Shu, Le [1 ]
Zhao, Yuqi [1 ]
Kurt, Zeyneb [1 ]
Byars, Sean Geoffrey [2 ,3 ]
Tukiainen, Taru [4 ]
Kettunen, Johannes [4 ]
Orozco, Luz D. [5 ]
Pellegrini, Matteo [5 ]
Lusis, Aldons J. [6 ]
Ripatti, Samuli [4 ]
Zhang, Bin [7 ]
Inouye, Michael [2 ,3 ,8 ]
Makinen, Ville-Petteri [1 ,9 ,10 ,11 ,12 ]
Yang, Xia [1 ,13 ]
机构
[1] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA USA
[2] Univ Melbourne, Ctr Syst Genom, Melbourne, Vic, Australia
[3] Univ Melbourne, Sch BioSci, Melbourne, Vic, Australia
[4] Inst Mol Med, Helsinki, Finland
[5] Univ Calif Los Angeles, Dept Mol Cell & Dev Biol, Los Angeles, CA USA
[6] Univ Calif Los Angeles, David Geffen Sch Med, Dept Med, Los Angeles, CA 90095 USA
[7] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[8] Univ Melbourne, Dept Pathol, Melbourne, Vic, Australia
[9] South Australian Hlth & Med Res Inst, Adelaide, SA, Australia
[10] Univ Adelaide, Sch Biol Sci, Adelaide, SA, Australia
[11] Univ Oulu, Fac Med, Computat Med, Oulu, Finland
[12] Bioctr Oulu, Oulu, Finland
[13] Univ Calif Los Angeles, Insitute Quantitat & Computat Biosci, Los Angeles, CA USA
基金
澳大利亚国家健康与医学研究理事会; 英国医学研究理事会;
关键词
Mergeomics; Integrative genomics; Multidimensional data integration; Functional genomics; Gene networks; Key drivers; Cholesterol; Blood glucose; GENOME-WIDE ASSOCIATION; SET ENRICHMENT ANALYSIS; GENE-EXPRESSION; DISEASE; TRAITS; LOCI; MOUSE; MICE;
D O I
10.1186/s12864-016-3198-9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.
引用
收藏
页数:16
相关论文
共 45 条
[11]   Genetics of gene expression and its effect on disease [J].
Emilsson, Valur ;
Thorleifsson, Gudmar ;
Zhang, Bin ;
Leonardson, Amy S. ;
Zink, Florian ;
Zhu, Jun ;
Carlson, Sonia ;
Helgason, Agnar ;
Walters, G. Bragi ;
Gunnarsdottir, Steinunn ;
Mouy, Magali ;
Steinthorsdottir, Valgerdur ;
Eiriksdottir, Gudrun H. ;
Bjornsdottir, Gyda ;
Reynisdottir, Inga ;
Gudbjartsson, Daniel ;
Helgadottir, Anna ;
Jonasdottir, Aslaug ;
Jonasdottir, Adalbjorg ;
Styrkarsdottir, Unnur ;
Gretarsdottir, Solveig ;
Magnusson, Kristinn P. ;
Stefansson, Hreinn ;
Fossdal, Ragnheidur ;
Kristjansson, Kristleifur ;
Gislason, Hjortur G. ;
Stefansson, Tryggvi ;
Leifsson, Bjorn G. ;
Thorsteinsdottir, Unnur ;
Lamb, John R. ;
Gulcher, Jeffrey R. ;
Reitman, Marc L. ;
Kong, Augustine ;
Schadt, Eric E. ;
Stefansson, Kari .
NATURE, 2008, 452 (7186) :423-U2
[12]   The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease [J].
Eppig, Janan T. ;
Blake, Judith A. ;
Bult, Carol J. ;
Kadin, James A. ;
Richardson, Joel E. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D726-D736
[13]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[14]   Principles for the post-GWAS functional characterization of cancer risk loci [J].
Freedman, Matthew L. ;
Monteiro, Alvaro N. A. ;
Gayther, Simon A. ;
Coetzee, Gerhard A. ;
Risch, Angela ;
Plass, Christoph ;
Casey, Graham ;
De Biasi, Mariella ;
Carlson, Chris ;
Duggan, David ;
James, Michael ;
Liu, Pengyuan ;
Tichelaar, Jay W. ;
Vikis, Haris G. ;
You, Ming ;
Mills, Ian G. .
NATURE GENETICS, 2011, 43 (06) :513-518
[15]   Understanding multicellular function and disease with human tissue-specific networks [J].
Greene, Casey S. ;
Krishnan, Arjun ;
Wong, Aaron K. ;
Ricciotti, Emanuela ;
Zelaya, Rene A. ;
Himmelstein, Daniel S. ;
Zhang, Ran ;
Hartmann, Boris M. ;
Zaslavsky, Elena ;
Sealfon, Stuart C. ;
Chasman, Daniel I. ;
FitzGerald, Garret A. ;
Dolinski, Kara ;
Grosser, Tilo ;
Troyanskaya, Olga G. .
NATURE GENETICS, 2015, 47 (06) :569-576
[16]  
Hedges L, 1985, STAT METHODS METAANA, DOI [DOI 10.1016/C2009-0-03396-0, 10.2307/1164953]
[17]   Postgwas: Advanced GWAS Interpretation in R [J].
Hiersche, Milan ;
Ruehle, Frank ;
Stoll, Monika .
PLOS ONE, 2013, 8 (08)
[18]   Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Lempicki, Richard A. .
NATURE PROTOCOLS, 2009, 4 (01) :44-57
[19]   Gene-environment interactions in human diseases [J].
Hunter, DJ .
NATURE REVIEWS GENETICS, 2005, 6 (04) :287-298
[20]   Defining functional DNA elements in the human genome [J].
Kellis, Manolis ;
Wold, Barbara ;
Snyder, Michael P. ;
Bernstein, Bradley E. ;
Kundaje, Anshul ;
Marinov, Georgi K. ;
Ward, Lucas D. ;
Birney, Ewan ;
Crawford, Gregory E. ;
Dekker, Job ;
Dunham, Ian ;
Elnitski, Laura L. ;
Farnham, Peggy J. ;
Feingold, Elise A. ;
Gerstein, Mark ;
Giddings, Morgan C. ;
Gilbert, David M. ;
Gingeras, Thomas R. ;
Green, Eric D. ;
Guigo, Roderic ;
Hubbard, Tim ;
Kent, Jim ;
Lieb, Jason D. ;
Myers, Richard M. ;
Pazin, Michael J. ;
Ren, Bing ;
Stamatoyannopoulos, John A. ;
Weng, Zhiping ;
White, Kevin P. ;
Hardison, Ross C. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (17) :6131-6138