Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

被引:1
作者
Gao, Wanling [1 ]
Zhao, Jun [1 ]
Gui, Jianfeng [1 ]
Wang, Zehan [1 ]
Chen, Jie [2 ]
Yue, Zhenyu [1 ]
机构
[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
CD-HIT; LANGUAGE; RESOURCE; DATABASE; PROTEIN; UNIREF; SETS;
D O I
10.1021/acs.jcim.4c00507
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.
引用
收藏
页码:7772 / 7785
页数:14
相关论文
共 74 条
  • [41] Molecular Dynamics for Antimicrobial Peptide Discovery
    Palmer, Nicholas
    Maasch, Jacqueline R. M. A.
    Torres, Marcelo D. T.
    de la Fuente-Nunez, Cesar
    [J]. INFECTION AND IMMUNITY, 2021, 89 (04)
  • [42] YADAMP: yet another database of antimicrobial peptides
    Piotto, Stefano P.
    Sessa, Lucia
    Concilio, Simona
    Iannelli, Pio
    [J]. INTERNATIONAL JOURNAL OF ANTIMICROBIAL AGENTS, 2012, 39 (04) : 346 - 351
  • [43] ADAPTABLE: a comprehensive web platform of antimicrobial peptides tailored to the user's research
    Ramos-Martin, Francisco
    Annaval, Thibault
    Buchoux, Sebastien
    Sarazin, Catherine
    D'Amelio, Nicola
    [J]. LIFE SCIENCE ALLIANCE, 2019, 2 (06)
  • [44] Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides
    Seebah, Shalin
    Suresh, Anita
    Zhuo, Shaowei
    Choong, Yong How
    Chua, Hazel
    Chuon, Danny
    Beuerman, Roger
    Verma, Chandra
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D265 - D268
  • [45] GRA-GCN: Dense Granule Protein Prediction in Apicomplexa Protozoa Through Graph Convolutional Network
    Shi, Haoyuan
    Feng, Haisong
    Lu, Zhenxiao
    Xue, Wei
    Yang, Congshan
    Yue, Zhenyu
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (03) : 1963 - 1970
  • [46] Bacterial defences: mechanisms, evolution and antimicrobial resistance
    Smith, William P. J.
    Wucher, Benjamin R.
    Nadell, Carey D.
    Foster, Kevin R.
    [J]. NATURE REVIEWS MICROBIOLOGY, 2023, 21 (08) : 519 - 534
  • [47] Clustering huge protein sequence sets in linear time
    Steinegger, Martin
    Soeding, Johannes
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [48] SUBAKAN C, 2021, ICASSP 2021, P21, DOI DOI 10.1109/ICASSP39728.2021.9413901
  • [49] UniRef: comprehensive and non-redundant UniProt reference clusters
    Suzek, Baris E.
    Huang, Hongzhan
    McGarvey, Peter
    Mazumder, Raja
    Wu, Cathy H.
    [J]. BIOINFORMATICS, 2007, 23 (10) : 1282 - 1288
  • [50] UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
    Suzek, Baris E.
    Wang, Yuqi
    Huang, Hongzhan
    McGarvey, Peter B.
    Wu, Cathy H.
    [J]. BIOINFORMATICS, 2015, 31 (06) : 926 - 932