Comparative evaluation of methods for the prediction of protein-ligand binding sites

被引:1
作者
Utges, Javier S. [1 ]
Barton, Geoffrey J. [1 ]
机构
[1] Univ Dundee, Sch Life Sci, Div Computat Biol, Dow St, Dundee DD1 5EH, Scotland
来源
JOURNAL OF CHEMINFORMATICS | 2024年 / 16卷 / 01期
基金
英国惠康基金; 英国生物技术与生命科学研究理事会;
关键词
Ligand binding site prediction; Binding pocket; Benchmark; Reference dataset; Machine learning; Drug discovery; STRUCTURAL CLASSIFICATION; CRYSTAL-STRUCTURE; SC-PDB; CATALYTIC MECHANISM; SECONDARY STRUCTURE; SCORING FUNCTIONS; PDBBIND DATABASE; ACTIVE-SITES; MOAD MOTHER; IDENTIFICATION;
D O I
10.1186/s13321-024-00923-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The accurate identification of protein-ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet. We benchmark the methods against the human subset of our new curated reference dataset, LIGYSIS. LIGYSIS is a comprehensive protein-ligand complex dataset comprising 30,000 proteins with bound ligands which aggregates biologically relevant unique protein-ligand interfaces across biological units of multiple structures from the same protein. LIGYSIS is an improvement for testing methods over earlier datasets like sc-PDB, PDBbind, binding MOAD, COACH420 and HOLO4K which either include 1:1 protein-ligand complexes or consider asymmetric units. Re-scoring of fpocket predictions by PRANK and DeepPocket display the highest recall (60%) whilst IF-SitePred presents the lowest recall (39%). We demonstrate the detrimental effect that redundant prediction of binding sites has on performance as well as the beneficial impact of stronger pocket scoring schemes, with improvements up to 14% in recall (IF-SitePred) and 30% in precision (Surfnet). Finally, we propose top-N+2 recall as the universal benchmark metric for ligand binding site prediction and urge authors to share not only the source code of their methods, but also of their benchmark.Scientific contributionsThis study conducts the largest benchmark of ligand binding site prediction methods to date, comparing 13 original methods and 15 variants using 10 informative metrics. The LIGYSIS dataset is introduced, which aggregates biologically relevant protein-ligand interfaces across multiple structures of the same protein. The study highlights the detrimental effect of redundant binding site prediction and demonstrates significant improvement in recall and precision through stronger scoring schemes. Finally, top-N+2 recall is proposed as a universal benchmark metric for ligand binding site prediction, with a recommendation for open-source sharing of both methods and benchmarks.
引用
收藏
页数:35
相关论文
共 148 条
  • [1] Abdollahi N, 2023, arXiv
  • [2] DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks
    Aggarwal, Rishal
    Gupta, Akash
    Chelur, Vineeth
    Jawahar, C., V
    Priyakumar, U. Deva
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (21) : 5069 - 5079
  • [3] Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures
    Ahmed, Aqeel
    Smith, Richard D.
    Clark, Jordan J.
    Dunbar, James B., Jr.
    Carlson, Heather A.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D465 - D469
  • [4] Pocketome via comprehensive identification and classification of ligand binding envelopes
    An, JH
    Totrov, M
    Abagyan, R
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2005, 4 (06) : 752 - 761
  • [5] An Jianghong, 2004, Genome Inform, V15, P31
  • [6] [Anonymous], 1901, B SOC VAUD SCI NAT, DOI DOI 10.5169/SEALS-266440
  • [7] ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information
    Armon, A
    Graur, D
    Ben-Tal, N
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (01) : 447 - 463
  • [8] PDBe: improved findability of macromolecular structure data in the PDB
    Armstrong, David R.
    Berrisford, John M.
    Conroy, Matthew J.
    Gutmanas, Aleksandras
    Anyango, Stephen
    Choudhary, Preeti
    Clark, Alice R.
    Dana, Jose M.
    Deshpande, Mandar
    Dunlop, Roisin
    Gane, Paul
    Gaborova, Romana
    Gupta, Deepti
    Haslam, Pauline
    Koca, Jaroslav
    Mak, Lora
    Mir, Saqib
    Mukhopadhyay, Abhik
    Nadzirin, Nurul
    Nair, Sreenath
    Paysan-Lafosse, Typhaine
    Pravda, Lukas
    Sehnal, David
    Salih, Osman
    Smart, Oliver
    Tolchard, James
    Varadi, Mihaly
    Svobodova-Varekova, Radka
    Zaki, Hossam
    Kleywegt, Gerard J.
    Velankar, Sameer
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D335 - D343
  • [9] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [10] Binding MOAD, a high-quality protein-ligand database
    Benson, Mark L.
    Smith, Richard D.
    Khazanov, Nickolay A.
    Dimcheff, Brandon
    Beaver, John
    Dresslar, Peter
    Nerothin, Jason
    Carlson, Heather A.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D674 - D678