Improving the generalizability of protein-ligand binding predictions with AI-Bind

被引:45
作者
Chatterjee, Ayan [1 ]
Walters, Robin [2 ]
Shafi, Zohair [2 ]
Ahmed, Omair Shafi [2 ]
Sebek, Michael [1 ,3 ]
Gysi, Deisy [1 ,3 ,4 ]
Yu, Rose [5 ]
Eliassi-Rad, Tina [1 ,2 ,6 ,7 ]
Barabasi, Albert-Laszlo [1 ,3 ,8 ]
Menichetti, Giulia [1 ,3 ,9 ]
机构
[1] Northeastern Univ, Network Sci Inst, Boston, MA 02115 USA
[2] Northeastern Univ, Khoury Coll Comp Sci, Boston, MA USA
[3] Northeastern Univ, Dept Phys, Boston, MA 02115 USA
[4] Harvard Med Sch, Brigham & Womens Hosp, Dept Med, Boston, MA USA
[5] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA USA
[6] Santa Fe Inst, Santa Fe, NM USA
[7] Northeastern Univ, Inst Experiential AI, Boston, MA USA
[8] Cent European Univ, Dept Network & Data Sci, Budapest, Hungary
[9] Harvard Med Sch, Brigham & Womens Hosp, Dept Med, Channing Div Network Med, Boston, MA 02115 USA
基金
欧盟地平线“2020”; 美国国家卫生研究院;
关键词
NF-KAPPA-B; CHEMISTRY; SEQUENCE; DOCKING; NETWORK;
D O I
10.1038/s41467-023-37572-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery. State-of-the-art machine learning models in drug discovery fail to reliably predict the binding properties of poorly annotated proteins and small molecules. Here, the authors present AI-Bind, a machine learning pipeline to improve generalizability and interpretability of binding predictions.
引用
收藏
页数:15
相关论文
共 70 条
  • [1] Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics
    Asgari, Ehsaneddin
    Mofrad, Mohammad R. K.
    [J]. PLOS ONE, 2015, 10 (11):
  • [2] The SWISS-PROT protein sequence data bank and its new supplement TREMBL
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (01) : 21 - 25
  • [3] Barabasi AL, 2016, NETWORK SCIENCE, P1
  • [4] The unmapped chemical complexity of our diet
    Barabasi, Albert-Laszlo
    Menichetti, Giulia
    Loscalzo, Joseph
    [J]. NATURE FOOD, 2020, 1 (01): : 33 - 37
  • [5] UniProt: the universal protein knowledgebase in 2021
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Agivetova, Rahat
    Ahmad, Shadab
    Alpi, Emanuele
    Bowler-Barnett, Emily H.
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Coetzee, Ray
    Cukura, Austra
    Da Silva, Alan
    Denny, Paul
    Dogan, Tunca
    Ebenezer, ThankGod
    Fan, Jun
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Hussein, Abdulrahman
    Ignatchenko, Alexandr
    Insana, Giuseppe
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lock, Antonia
    Lopez, Rodrigo
    Luciani, Aurelien
    Luo, Jie
    Lussi, Yvonne
    Mac-Dougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Mishra, Alok
    Moulang, Katie
    Nightingale, Andrew
    Oliveira, Carla Susana
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Rice, Daniel
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sampson, Joseph
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D480 - D489
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] Spironolactone: An Anti-androgenic and Anti-hypertensive Drug That May Provide Protection Against the Novel Coronavirus (SARS-CoV-2) Induced Acute Respiratory Distress Syndrome (ARDS) in COVID-19
    Cadegiani, Flavio A.
    Wambier, Carlos G.
    Goren, Andy
    [J]. FRONTIERS IN MEDICINE, 2020, 7
  • [8] Hijacking SARS-CoV-2/ACE2 Receptor Interaction by Natural and Semi-synthetic Steroidal Agents Acting on Functional Pockets on the Receptor Binding Domain
    Carino, Adriana
    Moraca, Federica
    Fiorillo, Bianca
    Marchiano, Silvia
    Sepe, Valentina
    Biagioli, Michele
    Finamore, Claudia
    Bozza, Silvia
    Francisci, Daniela
    Distrutti, Eleonora
    Catalanotti, Bruno
    Zampella, Angela
    Fiorucci, Stefano
    [J]. FRONTIERS IN CHEMISTRY, 2020, 8 : 1 - 15
  • [9] The rise of deep learning in drug discovery
    Chen, Hongming
    Engkvist, Ola
    Wang, Yinhai
    Olivecrona, Marcus
    Blaschke, Thomas
    [J]. DRUG DISCOVERY TODAY, 2018, 23 (06) : 1241 - 1250
  • [10] The Supramolecular Chemistry of β-Sheets
    Cheng, Pin-Nan
    Pham, Johnny D.
    Nowick, James S.
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2013, 135 (15) : 5477 - 5492