Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

被引:84
作者
Shu, Le [1 ]
Zhao, Yuqi [1 ]
Kurt, Zeyneb [1 ]
Byars, Sean Geoffrey [2 ,3 ]
Tukiainen, Taru [4 ]
Kettunen, Johannes [4 ]
Orozco, Luz D. [5 ]
Pellegrini, Matteo [5 ]
Lusis, Aldons J. [6 ]
Ripatti, Samuli [4 ]
Zhang, Bin [7 ]
Inouye, Michael [2 ,3 ,8 ]
Makinen, Ville-Petteri [1 ,9 ,10 ,11 ,12 ]
Yang, Xia [1 ,13 ]
机构
[1] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA USA
[2] Univ Melbourne, Ctr Syst Genom, Melbourne, Vic, Australia
[3] Univ Melbourne, Sch BioSci, Melbourne, Vic, Australia
[4] Inst Mol Med, Helsinki, Finland
[5] Univ Calif Los Angeles, Dept Mol Cell & Dev Biol, Los Angeles, CA USA
[6] Univ Calif Los Angeles, David Geffen Sch Med, Dept Med, Los Angeles, CA 90095 USA
[7] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[8] Univ Melbourne, Dept Pathol, Melbourne, Vic, Australia
[9] South Australian Hlth & Med Res Inst, Adelaide, SA, Australia
[10] Univ Adelaide, Sch Biol Sci, Adelaide, SA, Australia
[11] Univ Oulu, Fac Med, Computat Med, Oulu, Finland
[12] Bioctr Oulu, Oulu, Finland
[13] Univ Calif Los Angeles, Insitute Quantitat & Computat Biosci, Los Angeles, CA USA
基金
澳大利亚国家健康与医学研究理事会; 英国医学研究理事会;
关键词
Mergeomics; Integrative genomics; Multidimensional data integration; Functional genomics; Gene networks; Key drivers; Cholesterol; Blood glucose; GENOME-WIDE ASSOCIATION; SET ENRICHMENT ANALYSIS; GENE-EXPRESSION; DISEASE; TRAITS; LOCI; MOUSE; MICE;
D O I
10.1186/s12864-016-3198-9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.
引用
收藏
页数:16
相关论文
共 45 条
[1]   Hundreds of variants clustered in genomic loci and biological pathways affect human height [J].
Allen, Hana Lango ;
Estrada, Karol ;
Lettre, Guillaume ;
Berndt, Sonja I. ;
Weedon, Michael N. ;
Rivadeneira, Fernando ;
Willer, Cristen J. ;
Jackson, Anne U. ;
Vedantam, Sailaja ;
Raychaudhuri, Soumya ;
Ferreira, Teresa ;
Wood, Andrew R. ;
Weyant, Robert J. ;
Segre, Ayellet V. ;
Speliotes, Elizabeth K. ;
Wheeler, Eleanor ;
Soranzo, Nicole ;
Park, Ju-Hyun ;
Yang, Jian ;
Gudbjartsson, Daniel ;
Heard-Costa, Nancy L. ;
Randall, Joshua C. ;
Qi, Lu ;
Smith, Albert Vernon ;
Maegi, Reedik ;
Pastinen, Tomi ;
Liang, Liming ;
Heid, Iris M. ;
Luan, Jian'an ;
Thorleifsson, Gudmar ;
Winkler, Thomas W. ;
Goddard, Michael E. ;
Lo, Ken Sin ;
Palmer, Cameron ;
Workalemahu, Tsegaselassie ;
Aulchenko, Yurii S. ;
Johansson, Asa ;
Zillikens, M. Carola ;
Feitosa, Mary F. ;
Esko, Tonu ;
Johnson, Toby ;
Ketkar, Shamika ;
Kraft, Peter ;
Mangino, Massimo ;
Prokopenko, Inga ;
Absher, Devin ;
Albrecht, Eva ;
Ernst, Florian ;
Glazer, Nicole L. ;
Hayward, Caroline .
NATURE, 2010, 467 (7317) :832-838
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]   Gene expression omnibus: Microarray data storage, submission, retrieval, and analysis [J].
Barrett, Tanya ;
Edgar, Ron .
DNA MICROARRAYS, PART B: DATABASES AND STATISTICS, 2006, 411 :352-369
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   A high-resolution association mapping panel for the dissection of complex traits in mice [J].
Bennett, Brian J. ;
Farber, Charles R. ;
Orozco, Luz ;
Kang, Hyun Min ;
Ghazalpour, Anatole ;
Siemers, Nathan ;
Neubauer, Michael ;
Neuhaus, Isaac ;
Yordanova, Roumyana ;
Guan, Bo ;
Truong, Amy ;
Yang, Wen-pin ;
He, Aiqing ;
Kayne, Paul ;
Gargalovic, Peter ;
Kirchgessner, Todd ;
Pan, Calvin ;
Castellani, Lawrence W. ;
Kostem, Emrah ;
Furlotte, Nicholas ;
Drake, Thomas A. ;
Eskin, Eleazar ;
Lusis, Aldons J. .
GENOME RESEARCH, 2010, 20 (02) :281-290
[6]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[7]   Annotation of functional variation in personal genomes using RegulomeDB [J].
Boyle, Alan P. ;
Hong, Eurie L. ;
Hariharan, Manoj ;
Cheng, Yong ;
Schaub, Marc A. ;
Kasowski, Maya ;
Karczewski, Konrad J. ;
Park, Julie ;
Hitz, Benjamin C. ;
Weng, Shuai ;
Cherry, J. Michael ;
Snyder, Michael .
GENOME RESEARCH, 2012, 22 (09) :1790-1797
[8]   Linoleic and alpha linolenic acids ameliorate streptozotocin-induced diabetes in mice [J].
Canetti, Lea ;
Werner, Haim ;
Leikin-Frenkel, Alicia .
ARCHIVES OF PHYSIOLOGY AND BIOCHEMISTRY, 2014, 120 (01) :34-39
[9]  
Croft D, 2014, NUCLEIC ACIDS RES, V42, pD472, DOI [10.1093/nar/gkt1102, 10.1093/nar/gkz1031]
[10]   New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk [J].
Dupuis, Josee ;
Langenberg, Claudia ;
Prokopenko, Inga ;
Saxena, Richa ;
Soranzo, Nicole ;
Jackson, Anne U. ;
Wheeler, Eleanor ;
Glazer, Nicole L. ;
Bouatia-Naji, Nabila ;
Gloyn, Anna L. ;
Lindgren, Cecilia M. ;
Magi, Reedik ;
Morris, Andrew P. ;
Randall, Joshua ;
Johnson, Toby ;
Elliott, Paul ;
Rybin, Denis ;
Thorleifsson, Gudmar ;
Steinthorsdottir, Valgerdur ;
Henneman, Peter ;
Grallert, Harald ;
Dehghan, Abbas ;
Hottenga, Jouke Jan ;
Franklin, Christopher S. ;
Navarro, Pau ;
Song, Kijoung ;
Goel, Anuj ;
Perry, John R. B. ;
Egan, Josephine M. ;
Lajunen, Taina ;
Grarup, Niels ;
Sparso, Thomas ;
Doney, Alex ;
Voight, Benjamin F. ;
Stringham, Heather M. ;
Li, Man ;
Kanoni, Stavroula ;
Shrader, Peter ;
Cavalcanti-Proenca, Christine ;
Kumari, Meena ;
Qi, Lu ;
Timpson, Nicholas J. ;
Gieger, Christian ;
Zabena, Carina ;
Rocheleau, Ghislain ;
Ingelsson, Erik ;
An, Ping ;
O'Connell, Jeffrey ;
Luan, Jian'an ;
Elliott, Amanda .
NATURE GENETICS, 2010, 42 (02) :105-U32