Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides

被引:6
作者
Medina-Ortiz, David [1 ,2 ]
Contreras, Seba [3 ]
Fernandez, Diego [1 ]
Soto-Garcia, Nicole [1 ]
Moya, Ivan [1 ,4 ]
Cabas-Mora, Gabriel [1 ]
Olivera-Nappa, Alvaro [2 ,5 ]
机构
[1] Univ Magallanes, Dept Ingn Comp, Punta Arenas 6210005, Chile
[2] Univ Chile, Ctr Biotechnol & Bioengn, CeBiB, Santiago 8370456, Chile
[3] Max Planck Inst Dynam & Self Org, Fassberg 17, D-37077 Gottingen, Germany
[4] Univ Magallanes, Dept Ingn Quim, Punta Arenas 6210005, Chile
[5] Univ Chile, Dept Ingn Quim Biotecnol & Mat, Santiago 8370456, Chile
关键词
antimicrobial peptides; machine learning; protein language models; generative learning; peptide discovery; peptide design; PREDICTION; CLASSIFICATION; DESIGN;
D O I
10.3390/ijms25168851
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Peptides are bioactive molecules whose functional versatility in living organisms has led to successful applications in diverse fields. In recent years, the amount of data describing peptide sequences and function collected in open repositories has substantially increased, allowing the application of more complex computational models to study the relations between the peptide composition and function. This work introduces AMP-Detector, a sequence-based classification model for the detection of peptides' functional biological activity, focusing on accelerating the discovery and de novo design of potential antimicrobial peptides (AMPs). AMP-Detector introduces a novel sequence-based pipeline to train binary classification models, integrating protein language models and machine learning algorithms. This pipeline produced 21 models targeting antimicrobial, antiviral, and antibacterial activity, achieving average precision exceeding 83%. Benchmark analyses revealed that our models outperformed existing methods for AMPs and delivered comparable results for other biological activity types. Utilizing the Peptide Atlas, we applied AMP-Detector to discover over 190,000 potential AMPs and demonstrated that it is an integrative approach with generative learning to aid in de novo design, resulting in over 500 novel AMPs. The combination of our methodology, robust models, and a generative design strategy offers a significant advancement in peptide-based drug discovery and represents a pivotal tool for therapeutic applications.
引用
收藏
页数:19
相关论文
共 83 条
[21]  
Feng J., 2024, bioRxiv, DOI [10.1101/2024.04.25.590553, DOI 10.1101/2024.04.25.590553]
[22]   Exploring Machine Learning Algorithms and Protein Language Models Strategies to Develop Enzyme Classification Systems [J].
Fernandez, Diego ;
Olivera-Nappa, Alvaro ;
Uribe-Paredes, Roberto ;
Medina-Ortiz, David .
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2023, PT I, 2023, 13919 :307-319
[23]   Reduction of Promiscuous Peptides-Enzyme Inhibition and Aggregation by Negatively Charged Biopolymers [J].
Fu, Jinglin ;
Nguyen, Kaitlyn .
ACS APPLIED BIO MATERIALS, 2022, 5 (05) :1839-1845
[24]   CD-HIT: accelerated for clustering the next-generation sequencing data [J].
Fu, Limin ;
Niu, Beifang ;
Zhu, Zhengwei ;
Wu, Sitao ;
Li, Weizhong .
BIOINFORMATICS, 2012, 28 (23) :3150-3152
[25]   A cyclic heptapeptide-based hydrogel boosts the healing of chronic skin wounds in diabetic mice and patients [J].
Fu, Zhe ;
Sun, Huiling ;
Wu, Yutong ;
Li, Chao ;
Wang, Yinglei ;
Liu, Yixiang ;
Li, Yilin ;
Nie, Junxu ;
Sun, Dandan ;
Zhang, Yingxuan ;
Liu, Naixin ;
Guo, Kun ;
Yin, Saige ;
Jia, Qiuye ;
Yang, Ying ;
He, Li ;
Wang, Ying ;
Yang, Xinwang .
NPG ASIA MATERIALS, 2022, 14 (01)
[26]   Design of metalloproteins and novel protein folds using variational autoencoders [J].
Greener, Joe G. ;
Moffat, Lewis ;
Jones, David T. .
SCIENTIFIC REPORTS, 2018, 8
[27]   MultiPep: a hierarchical deep learning approach for multi-label classification of peptide bioactivities [J].
Gronning, Alexander G. B. ;
Kacprowski, Tim ;
Scheele, Camilla .
BIOLOGY METHODS & PROTOCOLS, 2021, 6 (01) :1-16
[28]   A two-stage computational framework for identifying antiviral peptides and their functional types based on contrastive learning and multi-feature fusion strategy [J].
Guan, Jiahui ;
Yao, Lantian ;
Xie, Peilin ;
Chung, Chia-Ru ;
Huang, Yixian ;
Chiang, Ying-Chih ;
Lee, Tzong-Yi .
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
[29]   AMAP: Hierarchical multi-label prediction of biologically active and antimicrobial peptides [J].
Gull, Sadaf ;
Shamim, Nauman ;
Minhas, Fayyaz .
COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 107 :172-181
[30]   Generating functional protein variants with variational autoencoders [J].
Hawkins-Hooker, Alex ;
Depardieu, Florence ;
Baur, Sebastien ;
Couairon, Guillaume ;
Chen, Arthur ;
Bikard, David .
PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (02)