Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides

被引:1
作者
Gao, Wanling [1 ]
Zhao, Jun [1 ]
Gui, Jianfeng [1 ]
Wang, Zehan [1 ]
Chen, Jie [2 ]
Yue, Zhenyu [1 ]
机构
[1] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
[2] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
CD-HIT; LANGUAGE; RESOURCE; DATABASE; PROTEIN; UNIREF; SETS;
D O I
10.1021/acs.jcim.4c00507
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.
引用
收藏
页码:7772 / 7785
页数:14
相关论文
共 74 条
  • [1] Positively Charged Polymers as Promising Devices against Multidrug Resistant Gram-Negative Bacteria: A Review
    Alfei, Silvana
    Schito, Anna Maria
    [J]. POLYMERS, 2020, 12 (05)
  • [2] Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
    Alzubaidi, Laith
    Zhang, Jinglan
    Humaidi, Amjad J.
    Al-Dujaili, Ayad
    Duan, Ye
    Al-Shamma, Omran
    Santamaria, J.
    Fadhel, Mohammed A.
    Al-Amidie, Muthana
    Farhan, Laith
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [3] Computational Methods and Tools in Antimicrobial Peptide Research
    Aronica, Pietro G. A.
    Reid, Lauren M.
    Desai, Nirali
    Li, Jianguo
    Fox, Stephen J.
    Yadahalli, Shilpa
    Essex, Jonathan W.
    Verma, Chandra S.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (07) : 3172 - 3196
  • [4] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [5] Quantum Algorithms for Quantum Chemistry and Quantum Materials Science
    Bauer, Bela
    Bravyi, Sergey
    Motta, Mario
    Chan, Garnet Kin-Lic
    [J]. CHEMICAL REVIEWS, 2020, 120 (22) : 12685 - 12717
  • [6] AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest
    Bhadra, Pratiti
    Yan, Jielu
    Li, Jinyan
    Fong, Simon
    Siu, Shirley W. I.
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [7] Antibacterial peptides: basic facts and emerging concepts
    Boman, HG
    [J]. JOURNAL OF INTERNAL MEDICINE, 2003, 254 (03) : 197 - 215
  • [8] A Long-Text Classification Method of Chinese News Based on BERT and CNN
    Chen, Xinying
    Cong, Peimin
    Lv, Shuo
    [J]. IEEE ACCESS, 2022, 10 : 34046 - 34057
  • [9] Comparison and integration of computational methods for deleterious synonymous mutation prediction
    Cheng, Na
    Li, Menglu
    Zhao, Le
    Zhang, Bo
    Yang, Yuhua
    Zheng, Chun-Hou
    Xia, Junfeng
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 970 - 981
  • [10] Characterization and identification of antimicrobial peptides with different functional activities
    Chung, Chia-Ru
    Kuo, Ting-Rung
    Wu, Li-Ching
    Lee, Tzong-Yi
    Horng, Jorng-Tzong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1098 - 1114