Structure-aware machine learning strategies for antimicrobial peptide discovery

被引:7
作者
Aguilera-Puga, Mariana D. C. [1 ]
Plisson, Fabien [1 ]
机构
[1] Natl Polytech Inst CINVESTAV IPN, Ctr Res & Adv Studies, Dept Biotechnol & Biochem, Irapuato Unit, Irapuato 36824, Guanajuato, Mexico
关键词
Explainable machine learning; Peptide design; Oversampling; Structural bias; Protein structure prediction; AlphaFold2; HOST-DEFENSE PEPTIDES; SEQUENCE SPACE; PACKAGE;
D O I
10.1038/s41598-024-62419-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards alpha-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into alpha-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.
引用
收藏
页数:16
相关论文
共 77 条
[11]   AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest [J].
Bhadra, Pratiti ;
Yan, Jielu ;
Li, Jinyan ;
Fong, Simon ;
Siu, Shirley W. I. .
SCIENTIFIC REPORTS, 2018, 8
[12]   Towards an experimental classification system for membrane active peptides [J].
Brand, G. D. ;
Ramada, M. H. S. ;
Genaro-Mattos, T. C. ;
Bloch, C., Jr. .
SCIENTIFIC REPORTS, 2018, 8
[13]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[14]  
Breiman Leo, 2017, Classification and Regression Trees, DOI 10.1201/9781315139470
[15]   Antimicrobial peptides: Pore formers or metabolic inhibitors in bacteria? [J].
Brogden, KA .
NATURE REVIEWS MICROBIOLOGY, 2005, 3 (03) :238-250
[16]   Computer-Aided Design of Antimicrobial Peptides: Are We Generating Effective Drug Candidates? [J].
Cardoso, Marlon H. ;
Orozco, Raquel Q. ;
Rezende, Samilla B. ;
Rodrigues, Gisele ;
Oshiro, Karen G. N. ;
Candido, Elizabete S. ;
Franco, Octavio L. .
FRONTIERS IN MICROBIOLOGY, 2020, 10
[17]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[18]   iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Leier, Andre ;
Marquez-Lago, Tatiana T. ;
Wang, Yanan ;
Webb, Geoffrey I. ;
Smith, A. Ian ;
Daly, Roger J. ;
Chou, Kuo-Chen ;
Song, Jiangning .
BIOINFORMATICS, 2018, 34 (14) :2499-2502
[19]   Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design [J].
Clifton, Ben E. ;
Kozome, Dan ;
Laurino, Paola .
BIOCHEMISTRY, 2022, 62 (02) :210-220
[20]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411