Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides

被引:6
作者
Medina-Ortiz, David [1 ,2 ]
Contreras, Seba [3 ]
Fernandez, Diego [1 ]
Soto-Garcia, Nicole [1 ]
Moya, Ivan [1 ,4 ]
Cabas-Mora, Gabriel [1 ]
Olivera-Nappa, Alvaro [2 ,5 ]
机构
[1] Univ Magallanes, Dept Ingn Comp, Punta Arenas 6210005, Chile
[2] Univ Chile, Ctr Biotechnol & Bioengn, CeBiB, Santiago 8370456, Chile
[3] Max Planck Inst Dynam & Self Org, Fassberg 17, D-37077 Gottingen, Germany
[4] Univ Magallanes, Dept Ingn Quim, Punta Arenas 6210005, Chile
[5] Univ Chile, Dept Ingn Quim Biotecnol & Mat, Santiago 8370456, Chile
关键词
antimicrobial peptides; machine learning; protein language models; generative learning; peptide discovery; peptide design; PREDICTION; CLASSIFICATION; DESIGN;
D O I
10.3390/ijms25168851
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Peptides are bioactive molecules whose functional versatility in living organisms has led to successful applications in diverse fields. In recent years, the amount of data describing peptide sequences and function collected in open repositories has substantially increased, allowing the application of more complex computational models to study the relations between the peptide composition and function. This work introduces AMP-Detector, a sequence-based classification model for the detection of peptides' functional biological activity, focusing on accelerating the discovery and de novo design of potential antimicrobial peptides (AMPs). AMP-Detector introduces a novel sequence-based pipeline to train binary classification models, integrating protein language models and machine learning algorithms. This pipeline produced 21 models targeting antimicrobial, antiviral, and antibacterial activity, achieving average precision exceeding 83%. Benchmark analyses revealed that our models outperformed existing methods for AMPs and delivered comparable results for other biological activity types. Utilizing the Peptide Atlas, we applied AMP-Detector to discover over 190,000 potential AMPs and demonstrated that it is an integrative approach with generative learning to aid in de novo design, resulting in over 500 novel AMPs. The combination of our methodology, robust models, and a generative design strategy offers a significant advancement in peptide-based drug discovery and represents a pivotal tool for therapeutic applications.
引用
收藏
页数:19
相关论文
共 50 条
[41]   Automation and machine learning augmented by large language models in a catalysis study [J].
Su, Yuming ;
Wang, Xue ;
Ye, Yuanxiang ;
Xie, Yibo ;
Xu, Yujing ;
Jiang, Yibin ;
Wang, Cheng .
CHEMICAL SCIENCE, 2024, 15 (31) :12200-12233
[42]   Assessing English language sentences readability using machine learning models [J].
Maqsood, Shazia ;
Shahid, Abdul ;
Afzal, Muhammad Tanvir ;
Roman, Muhammad ;
Khan, Zahid ;
Nawaz, Zubair ;
Aziz, Muhammad Haris .
PEERJ COMPUTER SCIENCE, 2022, 7
[43]   Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods [J].
Mardikoraem, Mehrsa ;
Woldring, Daniel .
PHARMACEUTICS, 2023, 15 (05)
[44]   Identification of vasodilators from molecular descriptors by machine learning methods [J].
Yang, Xue-gang ;
Cong, Yong ;
Xue, Ying .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 101 (02) :95-101
[45]   Toward insights on antimicrobial selectivity of host defense peptides via machine learning model interpretation [J].
Li, Hao ;
Tamang, Thinam ;
Nantasenamat, Chanin .
GENOMICS, 2021, 113 (06) :3851-3863
[46]   An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies [J].
Yuan Lin ;
Yinyin Cai ;
Juan Liu ;
Chen Lin ;
Xiangrong Liu .
BMC Bioinformatics, 20
[47]   Prediction of Linear Cationic Antimicrobial Peptides Active against Gram-Negative and Gram-Positive Bacteria Based on Machine Learning Models [J].
Soylemez, Ummu Gulsum ;
Yousef, Malik ;
Kesmen, Zulal ;
Buyukkiraz, Mine Erdem ;
Bakir-Gungor, Burcu .
APPLIED SCIENCES-BASEL, 2022, 12 (07)
[48]   Identification of Antimicrobial Peptides Using Chou's 5 Step Rule [J].
Malebary, Sharaf J. ;
Khan, Yaser Daanial .
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (03) :2863-2881
[49]   Machine Learning Guided Discovery of Non-Hemolytic Membrane Disruptive Anticancer Peptides [J].
Zakharova, Elena ;
Orsi, Markus ;
Capecchi, Alice ;
Reymond, Jean-Louis .
CHEMMEDCHEM, 2022, 17 (17)
[50]   Identification of Spoken Language using Machine Learning Approach [J].
Shahriar, Md Asif ;
Aziz, Iftekhar ;
Banik, Shovan ;
Sattar, Abdus .
2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,