HyperVR: a hybrid deep ensemble learning approach for simultaneously predicting virulence factors and antibiotic resistance genes

被引:10
作者
Ji, Boya [1 ]
Pi, Wending [1 ]
Liu, Wenjuan [1 ]
Liu, Yannan [2 ]
Cui, Yujun [3 ]
Zhang, Xianglilan [3 ]
Peng, Shaoliang [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410006, Peoples R China
[2] Capital Med Univ, Beijing Chao Yang Hosp, Emergency Med Clin Res Ctr, Beijing 100020, Peoples R China
[3] Beijing Inst Microbiol & Epidemiol, State Key Lab Pathogen & Biosecur, Beijing 100071, Peoples R China
基金
中国国家自然科学基金;
关键词
AMINO-ACID-COMPOSITION; COMPLETE GENOME SEQUENCE; ANTIMICROBIAL RESISTANCE; BACTERIAL GENOMES; ESCHERICHIA-COLI; DATABASE; EVOLUTION; ALIGNMENT;
D O I
10.1093/nargab/lqad012
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Infectious diseases emerge unprecedentedly, posing serious challenges to public health and the global economy. Virulence factors (VFs) enable pathogens to adhere, reproduce and cause damage to host cells, and antibiotic resistance genes (ARGs) allow pathogens to evade otherwise curable treatments. Simultaneous identification of VFs and ARGs can save pathogen surveillance time, especially in situ epidemic pathogen detection. However, most tools can only predict either VFs or ARGs. Few tools that predict VFs and ARGs simultaneously usually have high false-negative rates, are sensitive to the cutoff thresholds and can only identify conserved genes. For better simultaneous prediction of VFs and ARGs, we propose a hybrid deep ensemble learning approach called HyperVR. By considering both best hit scores and statistical gene sequence patterns, HyperVR combines classical machine learning and deep learning to simultaneously and accurately predict VFs, ARGs and negative genes (neither VFs nor ARGs). For the prediction of individual VFs and ARGs, in silico spike-in experiment (the VFs and ARGs in real metagenomic data), and pseudo-VFs and -ARGs (gene fragments), HyperVR outperforms the current state-of-the-art prediction tools. HyperVR uses only gene sequence information without strict cutoff thresholds, hence making prediction straightforward and reliable.
引用
收藏
页数:17
相关论文
共 64 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[3]   DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data [J].
Arango-Argoty, Gustavo ;
Garner, Emily ;
Prudent, Amy ;
Heath, Lenwood S. ;
Vikesland, Peter ;
Zhang, Liqing .
MICROBIOME, 2018, 6
[4]   Differential expression of over 60 chromosomal genes in Escherichia coli by constitutive expression of MarA [J].
Barbosa, TM ;
Levy, SB .
JOURNAL OF BACTERIOLOGY, 2000, 182 (12) :3467-3474
[5]   Assembly and channel opening in a bacterial drug efflux machine [J].
Bavro, Vassiliy N. ;
Pietras, Zbigniew ;
Furnham, Nicholas ;
Perez-Cano, Laura ;
Fernandez-Recio, Juan ;
Pei, Xue Yuan ;
Misra, Rajeev ;
Luisi, Ben .
MOLECULAR CELL, 2008, 30 (01) :114-121
[6]   Infectious diseases - A global challenge [J].
Becker, Katja ;
Hu, Ying ;
Biller-Andorno, Nikola .
INTERNATIONAL JOURNAL OF MEDICAL MICROBIOLOGY, 2006, 296 (4-5) :179-185
[7]   Classification of nuclear receptors based on amino acid composition and dipeptide composition [J].
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) :23262-23266
[8]   ESSAY What are the consequences of the disappearing human microbiota? [J].
Blaser, Martin J. ;
Falkow, Stanley .
NATURE REVIEWS MICROBIOLOGY, 2009, 7 (12) :887-894
[9]   Protein-inspired antibiotics active against vancomycin- and daptomycin-resistant bacteria [J].
Blaskovich, Mark A. T. ;
Hansford, Karl A. ;
Gong, Yujing ;
Butler, Mark S. ;
Muldoon, Craig ;
Huang, Johnny X. ;
Ramu, Soumya ;
Silva, Alberto B. ;
Cheng, Mu ;
Kavanagh, Angela M. ;
Ziora, Zyta ;
Premraj, Rajaratnam ;
Lindahl, Fredrik ;
Bradford, Tanya A. ;
Lee, June C. ;
Karoli, Tomislav ;
Pelingon, Ruby ;
Edwards, David J. ;
Amado, Maite ;
Elliott, Alysha G. ;
Phetsang, Wanida ;
Daud, Noor Huda ;
Deecke, Johan E. ;
Sidjabat, Hanna E. ;
Ramaologa, Sefetogi ;
Zuegg, Johannes ;
Betley, Jason R. ;
Beevers, Andrew P. G. ;
Smith, Richard A. G. ;
Roberts, Jason A. ;
Paterson, David L. ;
Cooper, Matthew A. .
NATURE COMMUNICATIONS, 2018, 9
[10]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370