Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

被引:80
作者
Shu, Le [1 ]
Zhao, Yuqi [1 ]
Kurt, Zeyneb [1 ]
Byars, Sean Geoffrey [2 ,3 ]
Tukiainen, Taru [4 ]
Kettunen, Johannes [4 ]
Orozco, Luz D. [5 ]
Pellegrini, Matteo [5 ]
Lusis, Aldons J. [6 ]
Ripatti, Samuli [4 ]
Zhang, Bin [7 ]
Inouye, Michael [2 ,3 ,8 ]
Makinen, Ville-Petteri [1 ,9 ,10 ,11 ,12 ]
Yang, Xia [1 ,13 ]
机构
[1] Univ Calif Los Angeles, Dept Integrat Biol & Physiol, Los Angeles, CA USA
[2] Univ Melbourne, Ctr Syst Genom, Melbourne, Vic, Australia
[3] Univ Melbourne, Sch BioSci, Melbourne, Vic, Australia
[4] Inst Mol Med, Helsinki, Finland
[5] Univ Calif Los Angeles, Dept Mol Cell & Dev Biol, Los Angeles, CA USA
[6] Univ Calif Los Angeles, David Geffen Sch Med, Dept Med, Los Angeles, CA 90095 USA
[7] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[8] Univ Melbourne, Dept Pathol, Melbourne, Vic, Australia
[9] South Australian Hlth & Med Res Inst, Adelaide, SA, Australia
[10] Univ Adelaide, Sch Biol Sci, Adelaide, SA, Australia
[11] Univ Oulu, Fac Med, Computat Med, Oulu, Finland
[12] Bioctr Oulu, Oulu, Finland
[13] Univ Calif Los Angeles, Insitute Quantitat & Computat Biosci, Los Angeles, CA USA
来源
BMC GENOMICS | 2016年 / 17卷
基金
英国医学研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
Mergeomics; Integrative genomics; Multidimensional data integration; Functional genomics; Gene networks; Key drivers; Cholesterol; Blood glucose; GENOME-WIDE ASSOCIATION; SET ENRICHMENT ANALYSIS; GENE-EXPRESSION; DISEASE; TRAITS; LOCI; MOUSE; MICE;
D O I
10.1186/s12864-016-3198-9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies. Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package. Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.
引用
收藏
页数:16
相关论文
共 45 条
  • [1] Hundreds of variants clustered in genomic loci and biological pathways affect human height
    Allen, Hana Lango
    Estrada, Karol
    Lettre, Guillaume
    Berndt, Sonja I.
    Weedon, Michael N.
    Rivadeneira, Fernando
    Willer, Cristen J.
    Jackson, Anne U.
    Vedantam, Sailaja
    Raychaudhuri, Soumya
    Ferreira, Teresa
    Wood, Andrew R.
    Weyant, Robert J.
    Segre, Ayellet V.
    Speliotes, Elizabeth K.
    Wheeler, Eleanor
    Soranzo, Nicole
    Park, Ju-Hyun
    Yang, Jian
    Gudbjartsson, Daniel
    Heard-Costa, Nancy L.
    Randall, Joshua C.
    Qi, Lu
    Smith, Albert Vernon
    Maegi, Reedik
    Pastinen, Tomi
    Liang, Liming
    Heid, Iris M.
    Luan, Jian'an
    Thorleifsson, Gudmar
    Winkler, Thomas W.
    Goddard, Michael E.
    Lo, Ken Sin
    Palmer, Cameron
    Workalemahu, Tsegaselassie
    Aulchenko, Yurii S.
    Johansson, Asa
    Zillikens, M. Carola
    Feitosa, Mary F.
    Esko, Tonu
    Johnson, Toby
    Ketkar, Shamika
    Kraft, Peter
    Mangino, Massimo
    Prokopenko, Inga
    Absher, Devin
    Albrecht, Eva
    Ernst, Florian
    Glazer, Nicole L.
    Hayward, Caroline
    [J]. NATURE, 2010, 467 (7317) : 832 - 838
  • [2] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [3] Gene expression omnibus: Microarray data storage, submission, retrieval, and analysis
    Barrett, Tanya
    Edgar, Ron
    [J]. DNA MICROARRAYS, PART B: DATABASES AND STATISTICS, 2006, 411 : 352 - 369
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] A high-resolution association mapping panel for the dissection of complex traits in mice
    Bennett, Brian J.
    Farber, Charles R.
    Orozco, Luz
    Kang, Hyun Min
    Ghazalpour, Anatole
    Siemers, Nathan
    Neubauer, Michael
    Neuhaus, Isaac
    Yordanova, Roumyana
    Guan, Bo
    Truong, Amy
    Yang, Wen-pin
    He, Aiqing
    Kayne, Paul
    Gargalovic, Peter
    Kirchgessner, Todd
    Pan, Calvin
    Castellani, Lawrence W.
    Kostem, Emrah
    Furlotte, Nicholas
    Drake, Thomas A.
    Eskin, Eleazar
    Lusis, Aldons J.
    [J]. GENOME RESEARCH, 2010, 20 (02) : 281 - 290
  • [6] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [7] Annotation of functional variation in personal genomes using RegulomeDB
    Boyle, Alan P.
    Hong, Eurie L.
    Hariharan, Manoj
    Cheng, Yong
    Schaub, Marc A.
    Kasowski, Maya
    Karczewski, Konrad J.
    Park, Julie
    Hitz, Benjamin C.
    Weng, Shuai
    Cherry, J. Michael
    Snyder, Michael
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1790 - 1797
  • [8] Linoleic and alpha linolenic acids ameliorate streptozotocin-induced diabetes in mice
    Canetti, Lea
    Werner, Haim
    Leikin-Frenkel, Alicia
    [J]. ARCHIVES OF PHYSIOLOGY AND BIOCHEMISTRY, 2014, 120 (01) : 34 - 39
  • [9] Croft D, 2014, NUCLEIC ACIDS RES, V42, pD472, DOI [10.1093/nar/gkt1102, 10.1093/nar/gkz1031]
  • [10] New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk
    Dupuis, Josee
    Langenberg, Claudia
    Prokopenko, Inga
    Saxena, Richa
    Soranzo, Nicole
    Jackson, Anne U.
    Wheeler, Eleanor
    Glazer, Nicole L.
    Bouatia-Naji, Nabila
    Gloyn, Anna L.
    Lindgren, Cecilia M.
    Magi, Reedik
    Morris, Andrew P.
    Randall, Joshua
    Johnson, Toby
    Elliott, Paul
    Rybin, Denis
    Thorleifsson, Gudmar
    Steinthorsdottir, Valgerdur
    Henneman, Peter
    Grallert, Harald
    Dehghan, Abbas
    Hottenga, Jouke Jan
    Franklin, Christopher S.
    Navarro, Pau
    Song, Kijoung
    Goel, Anuj
    Perry, John R. B.
    Egan, Josephine M.
    Lajunen, Taina
    Grarup, Niels
    Sparso, Thomas
    Doney, Alex
    Voight, Benjamin F.
    Stringham, Heather M.
    Li, Man
    Kanoni, Stavroula
    Shrader, Peter
    Cavalcanti-Proenca, Christine
    Kumari, Meena
    Qi, Lu
    Timpson, Nicholas J.
    Gieger, Christian
    Zabena, Carina
    Rocheleau, Ghislain
    Ingelsson, Erik
    An, Ping
    O'Connell, Jeffrey
    Luan, Jian'an
    Elliott, Amanda
    [J]. NATURE GENETICS, 2010, 42 (02) : 105 - U32