Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning

被引:8
作者
Xie, Li [1 ]
Xie, Lei [1 ,2 ,3 ]
机构
[1] CUNY, Hunter Coll, Dept Comp Sci, New York, NY 10017 USA
[2] CUNY, Grad Ctr, Ph D Program Comp Sci, New York, NY 10017 USA
[3] Cornell Univ, Helen & Robert Appel Alzheimers Dis Res Inst, Feil Family Brain & Mind Res Inst, Weill Cornell Med, New York, NY 10044 USA
关键词
AMINO-ACID-COMPOSITION; SELECTIVE DEGRADATION; S-NITROSYLATION; PREDICTION; CONJUGATION; DISCOVERY; INSIGHTS; FEATURES; DATABASE; DOCKING;
D O I
10.1371/journal.pcbi.1010974
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules that induce the degradation of target proteins by recruiting an E3 ligase. PROTACs have the potential to inactivate disease-related genes that are considered undruggable by small molecules, making them a promising therapy for the treatment of incurable diseases. However, only a few hundred proteins have been experimentally tested for their amenability to PROTACs, and it remains unclear which other proteins in the entire human genome can be targeted by PROTACs. In this study, we have developed PrePROTAC, an interpretable machine learning model based on a transformer-based protein sequence descriptor and random forest classification. PrePROTAC predicts genome-wide targets that can be degraded by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved a ROC-AUC of 0.81, an average precision of 0.84, and over 40% sensitivity at a false positive rate of 0.05. When evaluated by an external test set which comprised proteins from different structural folds than those in the training set, the performance of PrePROTAC did not drop significantly, indicating its generalizability. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method, which extends conventional SHAP analysis for original features to an embedding space through in silico mutagenesis. This method allowed us to identify key residues in the protein structure that play critical roles in PROTAC activity. The identified key residues were consistent with existing knowledge. Using PrePROTAC, we identified over 600 novel understudied proteins that are potentially degradable by CRBN and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease. Author summaryMany human diseases remain incurable because disease-causing genes cannot by selectively and effectively targeted by small molecules. Proteolysis-targeting chimera (PROTAC), an organic compound that binds to both a target and a degradation-mediating E3 ligase, has emerged as a promising approach to selectively target disease-driving genes that are not druggable by small molecules. However, not all of proteins can be accommodated by E3 ligases, and be effectively degraded. Knowledge about the degradability of a protein will be crucial for PROTAC design. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. This leaves us uncertain about which other proteins in the entire human genome can be targeted by PROTACs. In this paper, we propose an intepretable machine learning model, PrePROTAC, which takes advantage of powerful protein language modeling. PrePROTAC achieves high accuracy when evaluated with an external dataset which comes from different gene families from the proteins in the training data, suggesting the generalizability of this model. We apply PrePROTAC to the human genome, and identify more than 600 understudied proteins that are potentially responsive to PROTACs. Furthermore, we design PROTAC compounds for three novel drug targets associated with Alzheimer's disease.
引用
收藏
页数:26
相关论文
共 86 条
[1]   Developing potent PROTACs tools for selective degradation of HDAC6 protein [J].
An, Zixuan ;
Lv, Wenxing ;
Su, Shang ;
Wu, Wei ;
Rao, Yu .
PROTEIN & CELL, 2019, 10 (08) :606-609
[2]   The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures [J].
Andreeva, Antonina ;
Kulesha, Eugene ;
Gough, Julian ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D376-D382
[3]   SCOP2 prototype: a new approach to protein structure mining [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chothia, Cyrus ;
Kulesha, Eugene ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D310-D314
[4]   Rationalizing PROTAC-Mediated Ternary Complex Formation Using Rosetta [J].
Bai, Nan ;
Miller, Sven A. ;
Andrianov, Grigorii, V ;
Yates, Max ;
Kirubakaran, Palani ;
Karanicolas, John .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (03) :1368-1382
[5]   Modulating PCAF/GCN5 Immune Cell Function through a PROTAC Approach [J].
Bassi, Zuni I. ;
Fillmore, Martin C. ;
Miah, Afjal H. ;
Chapman, Trevor D. ;
Maller, Claire ;
Roberts, Emma J. ;
Davis, Lauren C. ;
Lewis, Darcy E. ;
Galwey, Nicholas W. ;
Waddington, Kirsty E. ;
Parravicini, Valentino ;
Macmillan-Jones, Abigail L. ;
Gongora, Celine ;
Humphreys, Philip G. ;
Churcher, Ian ;
Prinjha, Rab K. ;
Tough, David F. .
ACS CHEMICAL BIOLOGY, 2018, 13 (10) :2862-2867
[6]   New insights into ubiquitin E3 ligase mechanism [J].
Berndsen, Christopher E. ;
Wolberger, Cynthia .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2014, 21 (04) :301-307
[7]   Classification of nuclear receptors based on amino acid composition and dipeptide composition [J].
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) :23262-23266
[8]   Lessons in PROTAC Design from Selective Degradation with a Promiscuous Warhead [J].
Bondeson, Daniel P. ;
Smith, Blake E. ;
Burslem, George M. ;
Buhimschi, Alexandru D. ;
Hines, John ;
Jaime-Figueroa, Saul ;
Wang, Jing ;
Hamman, Brian D. ;
Ishchenko, Alexey ;
Crews, Craig M. .
CELL CHEMICAL BIOLOGY, 2018, 25 (01) :78-+
[9]   Targeting the C481S Ibrutinib-Resistance Mutation in Bruton's Tyrosine Kinase Using PROTAC-Mediated Degradation [J].
Buhimschi, Alexandru D. ;
Armstrong, Haley A. ;
Toure, Momar ;
Jaime-Figueroa, Saul ;
Chen, Timothy L. ;
Lehman, Amy M. ;
Woyach, Jennifer A. ;
Johnson, Amy J. ;
Byrd, John C. ;
Crews, Craig M. .
BIOCHEMISTRY, 2018, 57 (26) :3564-3575
[10]   The Advantages of Targeted Protein Degradation Over Inhibition: An RTK Case Study [J].
Burslem, George M. ;
Smith, Blake E. ;
Lai, Ashton C. ;
Jaime-Figueroa, Saul ;
McQuaid, Daniel C. ;
Bondeson, Daniel P. ;
Toure, Momar ;
Dong, Hanqing ;
Qian, Yimin ;
Wang, Jing ;
Crew, Andrew P. ;
Hines, John ;
Crews, Craig M. .
CELL CHEMICAL BIOLOGY, 2018, 25 (01) :67-+