Protein Binding Site Representation in Latent Space

被引:1
作者
Lohmann, Frederieke [1 ]
Allenspach, Stephan [1 ]
Atz, Kenneth [1 ]
Schiebroek, Carl C. G. [1 ]
Hiss, Jan A. [1 ,2 ]
Schneider, Gisbert [1 ,2 ]
机构
[1] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Vladimir Prelog Weg 4, CH-8093 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Klingelbergstr 48, CH-4056 Basel, Switzerland
基金
瑞士国家科学基金会;
关键词
drug discovery; interpretability; machine learning; neural network; protein structure; COLLECTION; AFFINITIES;
D O I
10.1002/minf.202400205
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.
引用
收藏
页数:7
相关论文
共 40 条
[1]   The k-means Algorithm: A Comprehensive Survey and Performance Evaluation [J].
Ahmed, Mohiuddin ;
Seraj, Raihan ;
Islam, Syed Mohammed Shamsul .
ELECTRONICS, 2020, 9 (08) :1-12
[2]  
[Anonymous], representations (ICLR)
[3]   Deep learning in drug discovery: an integrative review and future challenges [J].
Askr, Heba ;
Elgeldawi, Enas ;
Ella, Heba Aboul ;
Elshaier, Yaseen A. M. M. ;
Gomaa, Mamdouh M. ;
Hassanien, Aboul Ella .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (07) :5975-6037
[4]   Prospective de novo drug design with deep interactome learning [J].
Atz, Kenneth ;
Cotos, Leandro ;
Isert, Clemens ;
Hakansson, Maria ;
Focht, Dorota ;
Hilleke, Mattis ;
Nippa, David F. ;
Iff, Michael ;
Ledergerber, Jann ;
Schiebroek, Carl C. G. ;
Romeo, Valentina ;
Hiss, Jan A. ;
Merk, Daniel ;
Schneider, Petra ;
Kuhn, Bernd ;
Grether, Uwe ;
Schneider, Gisbert .
NATURE COMMUNICATIONS, 2024, 15 (01)
[5]   Geometric deep learning on molecular representations [J].
Atz, Kenneth ;
Grisoni, Francesca ;
Schneider, Gisbert .
NATURE MACHINE INTELLIGENCE, 2021, 3 (12) :1023-1032
[6]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[7]   UniProt: a hub for protein information [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Apweiler, Rolf ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Cas-tro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightin-gale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Cowley, Andrew ;
Figueira, Luis ;
Li, Weizhong ;
McWilliam, Hamish .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D204-D212
[8]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[9]   Model-based clustering of high-dimensional data: A review [J].
Bouveyron, Charles ;
Brunet-Saumard, Camille .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :52-78
[10]  
Chakraborty T., 2018, ACM Computing Surveys, V50, P1, DOI [10.1145/3091106, DOI 10.1145/3091106]