Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

被引：1

作者：

Gao, Wanling ^{[1
]}

Zhao, Jun ^{[1
]}

Gui, Jianfeng ^{[1
]}

Wang, Zehan ^{[1
]}

Chen, Jie ^{[2
]}

Yue, Zhenyu ^{[1
]}

机构：

[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China

[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2024年 / 64卷 / 19期

基金：

中国国家自然科学基金;

关键词：

CD-HIT; LANGUAGE; RESOURCE; DATABASE; PROTEIN; UNIREF; SETS;

D O I：

10.1021/acs.jcim.4c00507

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.

引用

页码：7772 / 7785

页数：14

共 74 条

[1] Positively Charged Polymers as Promising Devices against Multidrug Resistant Gram-Negative Bacteria: A Review
Alfei, Silvana
Schito, Anna Maria
[J]. POLYMERS, 2020, 12 (05)
[2] Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Alzubaidi, Laith
Zhang, Jinglan
Humaidi, Amjad J.
Al-Dujaili, Ayad
Duan, Ye
Al-Shamma, Omran
Santamaria, J.
Fadhel, Mohammed A.
Al-Amidie, Muthana
Farhan, Laith
[J]. JOURNAL OF BIG DATA, 2021, 8 (01)
[3] Computational Methods and Tools in Antimicrobial Peptide Research
Aronica, Pietro G. A.
Reid, Lauren M.
Desai, Nirali
Li, Jianguo
Fox, Stephen J.
Yadahalli, Shilpa
Essex, Jonathan W.
Verma, Chandra S.
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (07) : 3172 - 3196
[4] UniProt: a worldwide hub of protein knowledge
Bateman, Alex
Martin, Maria-Jesus
Orchard, Sandra
Magrane, Michele
Alpi, Emanuele
Bely, Benoit
Bingley, Mark
Britto, Ramona
Bursteinas, Borisas
Busiello, Gianluca
Bye-A-Jee, Hema
Da Silva, Alan
De Giorgi, Maurizio
Dogan, Tunca
Castro, Leyla Garcia
Garmiri, Penelope
Georghiou, George
Gonzales, Daniel
Gonzales, Leonardo
Hatton-Ellis, Emma
Ignatchenko, Alexandr
Ishtiaq, Rizwan
Jokinen, Petteri
Joshi, Vishal
Jyothi, Dushyanth
Lopez, Rodrigo
Luo, Jie
Lussi, Yvonne
MacDougall, Alistair
Madeira, Fabio
Mahmoudy, Mahdi
Menchi, Manuela
Nightingale, Andrew
Onwubiko, Joseph
Palka, Barbara
Pichler, Klemens
Pundir, Sangya
Qi, Guoying
Raj, Shriya
Renaux, Alexandre
Lopez, Milagros Rodriguez
Saidi, Rabie
Sawford, Tony
Shypitsyna, Aleksandra
Speretta, Elena
Turner, Edward
Tyagi, Nidhi
Vasudev, Preethi
Volynkin, Vladimir
Wardell, Tony
[J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
[5] Quantum Algorithms for Quantum Chemistry and Quantum Materials Science
Bauer, Bela
Bravyi, Sergey
Motta, Mario
Chan, Garnet Kin-Lic
[J]. CHEMICAL REVIEWS, 2020, 120 (22) : 12685 - 12717
[6] AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest
Bhadra, Pratiti
Yan, Jielu
Li, Jinyan
Fong, Simon
Siu, Shirley W. I.
[J]. SCIENTIFIC REPORTS, 2018, 8
[7] Antibacterial peptides: basic facts and emerging concepts
Boman, HG
[J]. JOURNAL OF INTERNAL MEDICINE, 2003, 254 (03) : 197 - 215
[8] A Long-Text Classification Method of Chinese News Based on BERT and CNN
Chen, Xinying
Cong, Peimin
Lv, Shuo
[J]. IEEE ACCESS, 2022, 10 : 34046 - 34057
[9] Comparison and integration of computational methods for deleterious synonymous mutation prediction
Cheng, Na
Li, Menglu
Zhao, Le
Zhang, Bo
Yang, Yuhua
Zheng, Chun-Hou
Xia, Junfeng
[J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 970 - 981
[10] Characterization and identification of antimicrobial peptides with different functional activities
Chung, Chia-Ru
Kuo, Ting-Rung
Wu, Li-Ching
Lee, Tzong-Yi
Horng, Jorng-Tzong
[J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1098 - 1114

← 1 2 3 4 5 6 7 8 →