Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli

被引:109
作者
Kim, Minseung [1 ,2 ]
Rai, Navneet [2 ]
Zorraquino, Violeta [2 ]
Tagkopoulos, Ilias [1 ,2 ]
机构
[1] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
GENE-EXPRESSION; PROTEIN; MICROARRAY; GROWTH; DATABASE; MODEL; NORMALIZATION; INFORMATION; VALIDATION; METABOLISM;
D O I
10.1038/ncomms13090
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery.
引用
收藏
页数:12
相关论文
共 71 条
[1]   Predicting Cellular Growth from Gene Expression Signatures [J].
Airoldi, Edoardo M. ;
Huttenhower, Curtis ;
Gresham, David ;
Lu, Charles ;
Caudy, Amy A. ;
Dunham, Maitreya J. ;
Broach, James R. ;
Botstein, David ;
Troyanskaya, Olga G. .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (01)
[2]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[3]  
[Anonymous], PHYSL BACTERIAL CELL
[4]  
[Anonymous], KNOWLEDGE BASE COMPU
[5]  
[Anonymous], 2015, NUCLEIC ACIDS RES, V43, pD1049
[6]  
[Anonymous], MOL ORG CELL FUNCTIO
[7]   Activities at the Universal Protein Resource (UniProt) [J].
Apweiler, Rolf ;
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Corbett, Matt .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D191-D198
[8]   Absolute metabolite concentrations and implied enzyme active site occupancy in Escherichia coli [J].
Bennett, Bryson D. ;
Kimball, Elizabeth H. ;
Gao, Melissa ;
Osterhout, Robin ;
Van Dien, Stephen J. ;
Rabinowitz, Joshua D. .
NATURE CHEMICAL BIOLOGY, 2009, 5 (08) :593-599
[9]   Indole-3-acetic acid regulates the central metabolic pathways in Escherichia coli [J].
Bianco, C. ;
Imperlini, E. ;
Calogero, R. ;
Senatore, B. ;
Pucci, P. ;
Defez, R. .
MICROBIOLOGY-SGM, 2006, 152 :2421-2431
[10]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120