Structure-aware machine learning strategies for antimicrobial peptide discovery

被引:7
作者
Aguilera-Puga, Mariana D. C. [1 ]
Plisson, Fabien [1 ]
机构
[1] Natl Polytech Inst CINVESTAV IPN, Ctr Res & Adv Studies, Dept Biotechnol & Biochem, Irapuato Unit, Irapuato 36824, Guanajuato, Mexico
关键词
Explainable machine learning; Peptide design; Oversampling; Structural bias; Protein structure prediction; AlphaFold2; HOST-DEFENSE PEPTIDES; SEQUENCE SPACE; PACKAGE;
D O I
10.1038/s41598-024-62419-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards alpha-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into alpha-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.
引用
收藏
页数:16
相关论文
共 77 条
[1]   CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides [J].
Agrawal, Piyush ;
Bhalla, Sherry ;
Usmani, Salman Sadullah ;
Singh, Sandeep ;
Chaudhary, Kumardeep ;
Raghava, Gajendra P. S. ;
Gautam, Ankur .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D1098-D1103
[2]  
Aguilera-Puga M. d. C., 2023, Computational Drug Discovery and Design
[3]   In silico proof of principle of machine learning-based antibody design at unconstrained scale [J].
Akbar, Rahmad ;
Robert, Philippe A. ;
Weber, Cedric R. ;
Widrich, Michael ;
Frank, Robert ;
Pavlovic, Milena ;
Scheffer, Lonneke ;
Chernigovskaya, Maria ;
Snapkov, Igor ;
Slabodkin, Andrei ;
Mehta, Brij Bhushan ;
Miho, Enkelejda ;
Lund-Johansen, Fridtjof ;
Andersen, Jan Terje ;
Hochreiter, Sepp ;
Haff, Ingrid Hobaek ;
Klambauer, Guenter ;
Sandve, Geir Kjetil ;
Greiff, Victor .
MABS, 2022, 14 (01)
[4]   Benchmarking protein structure predictors to assist machine learning-guided peptide discovery [J].
Aldas-Bulos, Victor Daniel ;
Plisson, Fabien .
DIGITAL DISCOVERY, 2023, 2 (04) :981-993
[5]  
Ali F, 2023, ARCH COMPUT METHOD E, V30, P4033, DOI [10.1063/5.0136246, 10.1007/s11831-023-09933-w]
[6]   Multicollinearity [J].
Alin, Aylin .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) :370-374
[7]  
[Anonymous], 1990, Current Trends in Knowledge Acquisition
[8]   PDBe: improved findability of macromolecular structure data in the PDB [J].
Armstrong, David R. ;
Berrisford, John M. ;
Conroy, Matthew J. ;
Gutmanas, Aleksandras ;
Anyango, Stephen ;
Choudhary, Preeti ;
Clark, Alice R. ;
Dana, Jose M. ;
Deshpande, Mandar ;
Dunlop, Roisin ;
Gane, Paul ;
Gaborova, Romana ;
Gupta, Deepti ;
Haslam, Pauline ;
Koca, Jaroslav ;
Mak, Lora ;
Mir, Saqib ;
Mukhopadhyay, Abhik ;
Nadzirin, Nurul ;
Nair, Sreenath ;
Paysan-Lafosse, Typhaine ;
Pravda, Lukas ;
Sehnal, David ;
Salih, Osman ;
Smart, Oliver ;
Tolchard, James ;
Varadi, Mihaly ;
Svobodova-Varekova, Radka ;
Zaki, Hossam ;
Kleywegt, Gerard J. ;
Velankar, Sameer .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D335-D343
[9]   Improving de novo protein binder design with deep learning [J].
Bennett, Nathaniel R. ;
Coventry, Brian ;
Goreshnik, Inna ;
Huang, Buwei ;
Allen, Aza ;
Vafeados, Dionne ;
Peng, Ying Po ;
Dauparas, Justas ;
Baek, Minkyung ;
Stewart, Lance ;
DiMaio, Frank ;
De Munck, Steven ;
Savvides, Savvas N. ;
Baker, David .
NATURE COMMUNICATIONS, 2023, 14 (01)
[10]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517