Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

被引：1

作者：

Gao, Wanling ^{[1
]}

Zhao, Jun ^{[1
]}

Gui, Jianfeng ^{[1
]}

Wang, Zehan ^{[1
]}

Chen, Jie ^{[2
]}

Yue, Zhenyu ^{[1
]}

机构：

[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China

[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2024年 / 64卷 / 19期

基金：

中国国家自然科学基金;

关键词：

CD-HIT; LANGUAGE; RESOURCE; DATABASE; PROTEIN; UNIREF; SETS;

D O I：

10.1021/acs.jcim.4c00507

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.

引用

页码：7772 / 7785

页数：14

共 74 条

[71] PredCID: prediction of driver frameshift indels in human cancer
Yue, Zhenyu
Chu, Xinlu
Xia, Junfeng
[J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[72] EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides
Zhang, Shengli
Jing, Yuanyuan
Liang, Yunyun
[J]. CURRENT MEDICINAL CHEMISTRY, 2024, : 2040 - 2054
[73] A novel antibacterial peptide recognition algorithm based on BERT
Zhang, Yue
Lin, Jianyuan
Zhao, Lianmin
Zeng, Xiangxiang
Liu, Xiangrong
[J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
[74] TriNet: A tri-fusion neural network for the prediction of anticancer and antimicrobial peptides
Zhou, Wanyun
Liu, Yufei
Li, Yingxin
Kong, Siqi
Wang, Weilin
Ding, Boyun
Han, Jiyun
Mou, Chaozhou
Gao, Xin
Liu, Juntao
[J]. PATTERNS, 2023, 4 (03):

← 1 2 3 4 5 6 7 8 →