Comparative analysis of machine learning algorithms on the microbial strain-specific AMP prediction

被引:27
作者
Vishnepolsky, Boris [1 ]
Grigolava, Maya [1 ]
Managadze, Grigol [1 ]
Gabrielian, Andrei [2 ]
Rosenthal, Alex [3 ]
Hurt, Darrell E.
Tartakovsky, Michael [4 ]
Pirtskhalava, Malak [5 ]
机构
[1] Ivane Beritashvili Ctr Expt Biomed, Lab Bioinformat, Tbilisi, Georgia
[2] Natl Inst Hlth, Natl Insiltute Allergy & Infect Dissases, Bioinformat & Computat Biosci Branch, Bethesda, MD USA
[3] Off Cyber Infrastruct & Computat Biol, Natl Inst Ilergy & Infect Dis Natl Inst Hlth Beth, Bethesda, MD USA
[4] Nat Insiltutes Hlth, Off Cyber Infrastruct & Computat Biol Allergy & I, Bethesda, MD USA
[5] Ivane BeritashiAli Ctr Expt Biomed, Lab Bioinformat, Tbilisi, Georgia
基金
美国国家卫生研究院;
关键词
antimicrobial peptides; AMP prediction; machine learning; APPLICABILITY DOMAIN; PEPTIDES; DISCOVERY; TAXONOMY; SPACE;
D O I
10.1093/bib/bbac233
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The evolution of drug-resistant pathogenic microbial species is a major global health concern. Naturally occurring, antimicrobial peptides (AMPs) are considered promising candidates to address antibiotic resistance problems. A variety of computational methods have been developed to accurately predict AMPs. The majority of such methods are not microbial strain specific (MSS): they can predict whether a given peptide is active against some microbe, but cannot accurately calculate whether such peptide would be active against a particular MS. Due to insufficient data on most MS, only a few MSS predictive models have been developed so far. To overcome this problem, we developed a novel approach that allows to improve MSS predictive models (MSSPM), based on properties, computed for AMP sequences and characteristics of genomes, computed for target MS. New models can perform predictions of AMPs for MS that do not have data on peptides tested on them. We tested various types of feature engineering as well as different machine learning (ML) algorithms to compare the predictive abilities of resulting models. Among the ML algorithms, Random Forest and AdaBoost performed best. By using genome characteristics as additional features, the performance for all models increased relative to models relying on AMP sequence-based properties only. Our novel MSS AMP predictor is freely accessible as part of DBAASP database resource at http://dbaasp.org/prediction/genome
引用
收藏
页数:11
相关论文
共 43 条
[1]   Informatics for unveiling hidden genome signatures [J].
Abe, T ;
Kanaya, S ;
Kinouchi, M ;
Ichiba, Y ;
Kozuki, T ;
Ikemura, T .
GENOME RESEARCH, 2003, 13 (04) :693-702
[2]   iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model [J].
Akbar, Shahid ;
Ahmad, Ashfaq ;
Hayat, Maqsood ;
Rehman, Ateeq Ur ;
Khan, Salman ;
Ali, Farman .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 137
[3]   Taxonomy of Australian clinical isolates of the genus Photorhabdus and proposal of Photorhabdus asymbiotica subsp asymbiotica subsp nov and P-asymbiotica subsp australis subsp nov. [J].
Akhurst, RJ ;
Boemare, NE ;
Janssen, PH ;
Peel, MM ;
Alfredson, DA ;
Beard, CE .
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2004, 54 :1301-1310
[4]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[5]  
Benavoli A, 2017, J MACH LEARN RES, V18
[6]  
Benson DA, 2007, NUCLEIC ACIDS RES, V35, pD21, DOI [10.1093/nar/gks1195, 10.1093/nar/gkw1070, 10.1093/nar/gkl986, 10.1093/nar/gkp1024, 10.1093/nar/gkq1079, 10.1093/nar/gkg057, 10.1093/nar/gkx1094, 10.1093/nar/gkn723, 10.1093/nar/gkr1202]
[7]   Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea [J].
Chun, Jongsik ;
Rainey, Fred A. .
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2014, 64 :316-324
[8]   The lexicon of antimicrobial peptides: a complete set of arginine and tryptophan sequences [J].
Clark, Sam ;
Jowitt, Thomas A. ;
Harris, Lynda K. ;
Knight, Christopher G. ;
Dobson, Curtis B. .
COMMUNICATIONS BIOLOGY, 2021, 4 (01)
[9]   AMP0: Species-Specific Prediction of Anti-microbial Peptides Using Zero and Few Shot Learning [J].
Gull, Sadaf ;
Minhas, Fayyaz .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) :275-283
[10]   PreTP-EL: prediction of therapeutic peptides based on ensemble learning [J].
Guo, Yichen ;
Yan, Ke ;
Lv, Hongwu ;
Liu, Bin .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)