Investigating the predictability of essential genes across distantly related organisms using an integrative approach

被引:105
作者
Deng, Jingyuan [1 ,2 ]
Deng, Lei [1 ,3 ]
Su, Shengchang [4 ]
Zhang, Minlu [5 ]
Lin, Xiaodong [6 ]
Wei, Lan [7 ]
Minai, Ali A. [3 ]
Hassett, Daniel J. [4 ]
Lu, Long J. [1 ,2 ,5 ,8 ]
机构
[1] Cincinnati Childrens Hosp Res Fdn, Div Biomed Informat, Cincinnati, OH 45229 USA
[2] Univ Cincinnati, Dept Biomed Engn, Cincinnati, OH 45229 USA
[3] Univ Cincinnati, Dept Elect & Comp Engn, Cincinnati, OH 45229 USA
[4] Univ Cincinnati, Dept Mol Genet Biochem & Microbiol, Cincinnati, OH 45229 USA
[5] Univ Cincinnati, Dept Comp Sci, Cincinnati, OH 45229 USA
[6] Rutgers State Univ, Dept Management Sci & Informat Syst, Piscataway, NJ 08854 USA
[7] Yale Univ, Sch Med, New Haven, CT 06511 USA
[8] Univ Cincinnati, Dept Environm Hlth, Cincinnati, OH 45229 USA
基金
美国国家卫生研究院;
关键词
EFFECTIVE NUMBER; PROTEIN; GENOME; IDENTIFICATION; DATABASE; DELETION; LIBRARY; CELLS;
D O I
10.1093/nar/gkq784
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Rapid and accurate identification of new essential genes in under-studied microorganisms will significantly improve our understanding of how a cell works and the ability to re-engineer microorganisms. However, predicting essential genes across distantly related organisms remains a challenge. Here, we present a machine learning-based integrative approach that reliably transfers essential gene annotations between distantly related bacteria. We focused on four bacterial species that have well-characterized essential genes, and tested the transferability between three pairs among them. For each pair, we trained our classifier to learn traits associated with essential genes in one organism, and applied it to make predictions in the other. The predictions were then evaluated by examining the agreements with the known essential genes in the target organism. Ten-fold cross-validation in the same organism yielded AUC scores between 0.86 and 0.93. Cross-organism predictions yielded AUC scores between 0.69 and 0.89. The transferability is likely affected by growth conditions, quality of the training data set and the evolutionary distance. We are thus the first to report that gene essentiality can be reliably predicted using features trained and tested in a distantly related organism. Our approach proves more robust and portable than existing approaches, significantly extending our ability to predict essential genes beyond orthologs.
引用
收藏
页码:795 / 807
页数:13
相关论文
共 59 条
[1]   A genome-based approach for the identification of essential bacterial genes [J].
Arigoni, F ;
Talabot, F ;
Peitsch, M ;
Edgerton, MD ;
Meldrum, E ;
Allet, E ;
Fish, R ;
Jamotte, T ;
Curchod, ML ;
Loferer, H .
NATURE BIOTECHNOLOGY, 1998, 16 (09) :851-856
[2]   Engineering the isobutanol biosynthetic pathway in Escherichia coli by comparison of three aldehyde reductase/alcohol dehydrogenase genes [J].
Atsumi, Shota ;
Wu, Tung-Yun ;
Eckl, Eva-Maria ;
Hawkins, Sarah D. ;
Buelter, Thomas ;
Liao, James C. .
APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, 2010, 85 (03) :651-657
[3]   Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants:: the Keio collection [J].
Baba, Tomoya ;
Ara, Takeshi ;
Hasegawa, Miki ;
Takai, Yuki ;
Okumura, Yoshiko ;
Baba, Miki ;
Datsenko, Kirill A. ;
Tomita, Masaru ;
Wanner, Barry L. ;
Mori, Hirotada .
MOLECULAR SYSTEMS BIOLOGY, 2006, 2 (1) :2006.0008
[4]   Unique features revealed by the genome sequence of Acinetobacter sp ADP1, a versatile and naturally transformation competent bacterium [J].
Barbe, V ;
Vallenet, D ;
Fonknechten, N ;
Kreimeyer, A ;
Oztas, S ;
Labarre, L ;
Cruveiller, S ;
Robert, C ;
Duprat, S ;
Wincker, P ;
Ornston, LN ;
Weissenbach, J ;
Marlière, P ;
Cohen, GN ;
Médigue, C .
NUCLEIC ACIDS RESEARCH, 2004, 32 (19) :5766-5779
[5]   NCBI GEO: mining tens of millions of expression profiles - database and tools update [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Rudnev, Dmitry ;
Evangelista, Carlos ;
Kim, Irene F. ;
Soboleva, Alexandra ;
Tomashevsky, Maxim ;
Edgar, Ron .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D760-D765
[6]   Phylogeny of γ-proteobacteria:: resolution of one branch of the universal tree [J].
Brown, JR ;
Volker, C .
BIOESSAYS, 2004, 26 (05) :463-468
[7]   Concordance analysis of microbial genomes [J].
Bruccoleri, RE ;
Dougherty, TJ ;
Davison, DB .
NUCLEIC ACIDS RESEARCH, 1998, 26 (19) :4482-4486
[8]   Understanding protein dispensability through machine-learning analysis of high-throughput data [J].
Chen, Y ;
Xu, D .
BIOINFORMATICS, 2005, 21 (05) :575-581
[9]   Are essential genes really essential? [J].
D'Elia, Michael A. ;
Pereira, Mark P. ;
Brown, Eric D. .
TRENDS IN MICROBIOLOGY, 2009, 17 (10) :433-438
[10]   A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1 [J].
de Berardinis, Veronique ;
Vallenet, David ;
Castelli, Vanina ;
Besnard, Marielle ;
Pinet, Agnes ;
Cruaud, Corinne ;
Samair, Sumitta ;
Lechaplais, Christophe ;
Gyapay, Gabor ;
Richez, Celine ;
Durot, Maxime ;
Kreimeyer, Annett ;
Le Fevre, Francois ;
Schaechter, Vincent ;
Pezo, Valerie ;
Doering, Volker ;
Scarpelli, Claude ;
Medigue, Claudine ;
Cohen, Georges N. ;
Marliere, Philippe ;
Salanoubat, Marcel ;
Weissenbach, Jean .
MOLECULAR SYSTEMS BIOLOGY, 2008, 4 (1)