Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides

被引:82
|
作者
Xu, Jing [2 ,3 ]
Li, Fuyi [4 ]
Leier, Andre [5 ,6 ,7 ]
Xiang, Dongxu [2 ]
Shen, Hsin-Hui [8 ,9 ]
Lago, Tatiana T. Marquez [5 ,10 ,11 ]
Li, Jian [12 ,13 ]
Yu, Dong-Jun [1 ]
Song, Jiangning [12 ,14 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, 200 Xiaolingwei, Nanjing 210094, Peoples R China
[2] Monash Univ, Dept Biochem & Mol Biol, Clayton, Vic, Australia
[3] Monash Univ, Biomed Discovery Inst, Clayton, Vic, Australia
[4] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[5] UAB Sch Med, Dept Genet, Birmingham, AL USA
[6] UABs ONeal Comprehens Canc Ctr, Birmingham, AL USA
[7] Gregory Fleming James Cyst Fibrosis Res Ctr, Birmingham, AL USA
[8] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic, Australia
[9] Monash Univ, Dept Mat Sci & Engn, Clayton, Vic, Australia
[10] UAB Sch Med, Dept Microbiol, Birmingham, AL USA
[11] UAB Gregory Fleming James Cyst Fibrosis Res Ct, Birmingham, AL USA
[12] Monash Univ, Monash Biomed Discovery Inst, Clayton, Vic, Australia
[13] Monash Univ, Dept Microbiol, Clayton, Vic, Australia
[14] Monash Univ, Monash Data Futures Inst, Clayton, Vic, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会; 英国医学研究理事会;
关键词
antimicrobial peptides; bioinformatics; machine learning; deep learning; feature engineering; predictors; AMINO-ACID-COMPOSITION; LOGISTIC-REGRESSION; WEB SERVER; CD-HIT; PROTEIN; DATABASE; CLASSIFICATION; EVOLUTIONARY; TOOL; DNA;
D O I
10.1093/bib/bbab083
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides
    Gao, Wanling
    Zhao, Jun
    Gui, Jianfeng
    Wang, Zehan
    Chen, Jie
    Yue, Zhenyu
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (19) : 7772 - 7785
  • [2] THPep: A machine learning-based approach for predicting tumor homing peptides
    Shoombuatong, Watshara
    Schaduangrat, Nalini
    Pratiwi, Reny
    Nantasenamat, Chanin
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 80 : 441 - 451
  • [3] A comprehensive review of machine learning-based methods in landslide susceptibility mapping
    Liu, Songlin
    Wang, Luqi
    Zhang, Wengang
    He, Yuwei
    Pijush, Samui
    GEOLOGICAL JOURNAL, 2023, 58 (06) : 2283 - 2301
  • [4] Machine Learning-Based Intrusion Detection Methods in IoT Systems: A Comprehensive Review
    Kikissagbe, Brunel Rolack
    Adda, Meddi
    ELECTRONICS, 2024, 13 (18)
  • [5] A review of machine learning-based methods for predicting drug-target interactions
    Shi, Wen
    Yang, Hong
    Xie, Linhai
    Yin, Xiao-Xia
    Zhang, Yanchun
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2024, 12 (01)
  • [6] A Comprehensive Analysis of Machine Learning-Based Assessment and Prediction of Soil Enzyme Activity
    Shahare, Yogesh
    Singh, Mukund Partap
    Singh, Prabhishek
    Diwakar, Manoj
    Singh, Vijendra
    Kadry, Seifedine
    Sevcik, Lukas
    AGRICULTURE-BASEL, 2023, 13 (07):
  • [7] PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides
    Meng, Chaolu
    Hu, Yang
    Zhang, Ying
    Guo, Fei
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [8] Comprehensive Machine Learning-Based Model for Predicting Compressive Strength of Ready-Mix Concrete
    Xu, Jiajia
    Zhou, Li
    He, Ge
    Ji, Xu
    Dai, Yiyang
    Dang, Yagu
    MATERIALS, 2021, 14 (05) : 1 - 18
  • [9] Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning-Based Methods
    Sen, Ovishake
    Fuad, Mohtasim
    Islam, Md Nazrul
    Rabbi, Jakaria
    Masud, Mehedi
    Hasan, Md Kamrul
    Awal, Md Abdul
    Fime, Awal Ahmed
    Fuad, Md Tahmid Hasan
    Sikder, Delowar
    Iftee, Md Akil Raihan
    IEEE ACCESS, 2022, 10 : 38999 - 39044
  • [10] Bangla natural language processing: A comprehensive analysis of classical, machine learning, and deep learning-based methods
    Sen, Ovishake
    Fuad, Mohtasim
    Islam, M.D. Nazrul
    Rabbi, Jakaria
    Masud, Mehedi
    Hasan, Kamrul
    Awal, M.D. Abdul
    Fime, Awal Ahmed
    Fuad, M.D. Tahmid Hasan
    Sikder, Delowar
    Iftee, M.D. Akil Raihan
    IEEE Access, 2022, 10 : 38999 - 39044