Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

被引：1

作者：

Gao, Wanling ^{[1
]}

Zhao, Jun ^{[1
]}

Gui, Jianfeng ^{[1
]}

Wang, Zehan ^{[1
]}

Chen, Jie ^{[2
]}

Yue, Zhenyu ^{[1
]}

机构：

[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China

[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2024年 / 64卷 / 19期

基金：

中国国家自然科学基金;

关键词：

CD-HIT; LANGUAGE; RESOURCE; DATABASE; PROTEIN; UNIREF; SETS;

D O I：

10.1021/acs.jcim.4c00507

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.

引用

页码：7772 / 7785

页数：14

共 74 条

[41] Molecular Dynamics for Antimicrobial Peptide Discovery
Palmer, Nicholas
Maasch, Jacqueline R. M. A.
Torres, Marcelo D. T.
de la Fuente-Nunez, Cesar
[J]. INFECTION AND IMMUNITY, 2021, 89 (04)
[42] YADAMP: yet another database of antimicrobial peptides
Piotto, Stefano P.
Sessa, Lucia
Concilio, Simona
Iannelli, Pio
[J]. INTERNATIONAL JOURNAL OF ANTIMICROBIAL AGENTS, 2012, 39 (04) : 346 - 351
[43] ADAPTABLE: a comprehensive web platform of antimicrobial peptides tailored to the user's research
Ramos-Martin, Francisco
Annaval, Thibault
Buchoux, Sebastien
Sarazin, Catherine
D'Amelio, Nicola
[J]. LIFE SCIENCE ALLIANCE, 2019, 2 (06)
[44] Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides
Seebah, Shalin
Suresh, Anita
Zhuo, Shaowei
Choong, Yong How
Chua, Hazel
Chuon, Danny
Beuerman, Roger
Verma, Chandra
[J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D265 - D268
[45] GRA-GCN: Dense Granule Protein Prediction in Apicomplexa Protozoa Through Graph Convolutional Network
Shi, Haoyuan
Feng, Haisong
Lu, Zhenxiao
Xue, Wei
Yang, Congshan
Yue, Zhenyu
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (03) : 1963 - 1970
[46] Bacterial defences: mechanisms, evolution and antimicrobial resistance
Smith, William P. J.
Wucher, Benjamin R.
Nadell, Carey D.
Foster, Kevin R.
[J]. NATURE REVIEWS MICROBIOLOGY, 2023, 21 (08) : 519 - 534
[47] Clustering huge protein sequence sets in linear time
Steinegger, Martin
Soeding, Johannes
[J]. NATURE COMMUNICATIONS, 2018, 9
[48] SUBAKAN C, 2021, ICASSP 2021, P21, DOI DOI 10.1109/ICASSP39728.2021.9413901
[49] UniRef: comprehensive and non-redundant UniProt reference clusters
Suzek, Baris E.
Huang, Hongzhan
McGarvey, Peter
Mazumder, Raja
Wu, Cathy H.
[J]. BIOINFORMATICS, 2007, 23 (10) : 1282 - 1288
[50] UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
Suzek, Baris E.
Wang, Yuqi
Huang, Hongzhan
McGarvey, Peter B.
Wu, Cathy H.
[J]. BIOINFORMATICS, 2015, 31 (06) : 926 - 932

← 1 2 3 4 5 6 7 8 →