Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

被引:1
作者
Gao, Wanling [1 ]
Zhao, Jun [1 ]
Gui, Jianfeng [1 ]
Wang, Zehan [1 ]
Chen, Jie [2 ]
Yue, Zhenyu [1 ]
机构
[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
CD-HIT; LANGUAGE; RESOURCE; DATABASE; PROTEIN; UNIREF; SETS;
D O I
10.1021/acs.jcim.4c00507
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.
引用
收藏
页码:7772 / 7785
页数:14
相关论文
共 74 条
  • [71] PredCID: prediction of driver frameshift indels in human cancer
    Yue, Zhenyu
    Chu, Xinlu
    Xia, Junfeng
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [72] EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides
    Zhang, Shengli
    Jing, Yuanyuan
    Liang, Yunyun
    [J]. CURRENT MEDICINAL CHEMISTRY, 2024, : 2040 - 2054
  • [73] A novel antibacterial peptide recognition algorithm based on BERT
    Zhang, Yue
    Lin, Jianyuan
    Zhao, Lianmin
    Zeng, Xiangxiang
    Liu, Xiangrong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [74] TriNet: A tri-fusion neural network for the prediction of anticancer and antimicrobial peptides
    Zhou, Wanyun
    Liu, Yufei
    Li, Yingxin
    Kong, Siqi
    Wang, Weilin
    Ding, Boyun
    Han, Jiyun
    Mou, Chaozhou
    Gao, Xin
    Liu, Juntao
    [J]. PATTERNS, 2023, 4 (03):