PlasmidHostFinder: Prediction of Plasmid Hosts Using Random Forest

被引:14
作者
Aytan-Aktug, Derya [1 ]
Clausen, Philip T. L. C. [1 ]
Szarvas, Judit [1 ]
Munk, Patrick [1 ]
Otani, Saria [1 ]
Nguyen, Marcus [2 ,3 ]
Davis, James J. [2 ,3 ,4 ]
Lund, Ole [1 ]
Aarestrup, Frank M. [1 ]
机构
[1] Tech Univ Denmark, Natl Food Inst, Lyngby, Denmark
[2] Univ Chicago, Consortium Adv Sci & Engn, Chicago, IL 60637 USA
[3] Argonne Natl Lab, Data Sci & Learning Div, 9700 S Cass Ave, Argonne, IL 60439 USA
[4] Northwestern Argonne Inst Sci & Engn, Evanston, IL USA
基金
美国国家卫生研究院;
关键词
antimicrobial resistance; horizontal gene transfer; machine learning; plasmid; plasmid host; plasmid host range; random forest; EXPANSION;
D O I
10.1128/msystems.01180-21
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Plasmids play a major role facilitating the spread of antimicrobial resistance between bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identification of plasmid host ranges could be improved using automated pattern detection methods compared to homology-based methods due to the diversity and genetic plasticity of plasmids. In this study, we developed a method for predicting the host range of plasmids using machine learning-specifically, random forests. We trained the models with 8,519 plasmids from 359 different bacterial species per taxonomic level; the models achieved Matthews correlation coefficients of 0.662 and 0.867 at the species and order levels, respectively. Our results suggest that despite the diverse nature and genetic plasticity of plasmids, our random forest model can accurately distinguish between plasmid hosts. This tool is available online through the Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/PlasmidHostFinder/). IMPORTANCE Antimicrobial resistance is a global health threat to humans and animals, causing high mortality and morbidity while effectively ending decades of success in fighting against bacterial infections. Plasmids confer extra genetic capabilities to the host organisms through accessory genes that can encode antimicrobial resistance and virulence. In addition to lateral inheritance, plasmids can be transferred horizontally between bacterial taxa. Therefore, detection of the host range of plasmids is crucial for understanding and predicting the dissemination trajectories of extrachromosomal genes and bacterial evolution as well as taking effective countermeasures against antimicrobial resistance.
引用
收藏
页数:16
相关论文
共 41 条
[1]   Prediction of Acquired Antimicrobial Resistance for Multiple Bacterial Species Using Neural Networks [J].
Aytan-Aktug, D. ;
Clausen, P. T. L. C. ;
Bortolaia, V ;
Aarestrup, F. M. ;
Lund, O. .
MSYSTEMS, 2020, 5 (01)
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes [J].
Brettin, Thomas ;
Davis, James J. ;
Disz, Terry ;
Edwards, Robert A. ;
Gerdes, Svetlana ;
Olsen, Gary J. ;
Olson, Robert ;
Overbeek, Ross ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Shukla, Maulik ;
Thomason, James A., III ;
Stevens, Rick ;
Vonstein, Veronika ;
Wattam, Alice R. ;
Xia, Fangfang .
SCIENTIFIC REPORTS, 2015, 5
[4]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[5]   In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing [J].
Carattoli, Alessandra ;
Zankari, Ea ;
Garcia-Fernandez, Aurora ;
Larsen, Mette Voldby ;
Lund, Ole ;
Villa, Laura ;
Aarestrup, Frank Moller ;
Hasman, Henrik .
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, 2014, 58 (07) :3895-3903
[6]   Rapid and precise alignment of raw reads against redundant databases with KMA [J].
Clausen, Philip T. L. C. ;
Aarestrup, Frank M. ;
Lund, Ole .
BMC BIOINFORMATICS, 2018, 19
[7]   The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities [J].
Davis, James J. ;
Wattam, Alice R. ;
Aziz, Ramy K. ;
Brettin, Thomas ;
Butler, Ralph ;
Butler, Rory M. ;
Chlenski, Philippe ;
Conrad, Neal ;
Dickerman, Allan ;
Dietrich, Emily M. ;
Gabbard, Joseph L. ;
Gerdes, Svetlana ;
Guard, Andrew ;
Kenyon, Ronald W. ;
Machi, Dustin ;
Mao, Chunhong ;
Murphy-Olson, Dan ;
Nguyen, Marcus ;
Nordberg, Eric K. ;
Olsen, Gary J. ;
Olson, Robert D. ;
Overbeek, Jamie C. ;
Overbeek, Ross ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Shukla, Maulik ;
Thomas, Chris ;
VanOeffelen, Margo ;
Vonstein, Veronika ;
Warren, Andrew S. ;
Xia, Fangfang ;
Xie, Dawen ;
Yoo, Hyunseung ;
Stevens, Rick .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D606-D612
[8]   MINTyper: an outbreak-detection method for accurate and rapid SNP typing of clonal clusters with noisy long reads [J].
Hallgren, Malte B. ;
Overballe-Petersen, Soren ;
Lund, Ole ;
Hasman, Henrik ;
Clausen, Philip T. L. C. .
BIOLOGY METHODS & PROTOCOLS, 2021, 6 (01)
[9]  
HOBOHM U, 1992, PROTEIN SCI, V1, P409
[10]   ETE: a python']python Environment for Tree Exploration [J].
Huerta-Cepas, Jaime ;
Dopazo, Joaquin ;
Gabaldon, Toni .
BMC BIOINFORMATICS, 2010, 11