MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery

被引:22
作者
Siebenmorgen, Till [1 ,2 ]
Menezes, Filipe [1 ,2 ]
Benassou, Sabrina [3 ]
Merdivan, Erinc [4 ]
Didi, Kieran [5 ]
Mourao, Andre Santos Dias [1 ,2 ]
Kitel, Radoslaw [6 ]
Lio, Pietro [5 ]
Kesselheim, Stefan [3 ]
Piraud, Marie [4 ]
Theis, Fabian J. [4 ,7 ,8 ]
Sattler, Michael [1 ,2 ]
Popowicz, Grzegorz M. [1 ,2 ]
机构
[1] Helmholtz Munich, Inst Struct Biol, Mol Targets & Therapeut Ctr, Neuherberg, Germany
[2] Tech Univ Munich, Bayer NMR Zent, TUM Sch Nat Sci, Dept Biosci, Garching, Germany
[3] Forschungszentrum Julich, Julich Supercomp Ctr, Julich, Germany
[4] Helmholtz Munich, Helmholtz AI, Neuherberg, Germany
[5] Univ Cambridge, Comp Lab, Cambridge, England
[6] Jagiellonian Univ, Fac Chem, Krakow, Poland
[7] Helmholtz Munich, Inst Computat Biol, Computat Hlth Ctr, Neuherberg, Germany
[8] Tech Univ Munich, TUM Sch Computat Informat & Technol, Garching, Germany
来源
NATURE COMPUTATIONAL SCIENCE | 2024年 / 4卷 / 05期
关键词
SCORING FUNCTION; FORCE-FIELD; BINDING; AFFINITY; EFFICIENT; MODELS; PARAMETERIZATION; GENERATION; PREDICTION; ACCURACY;
D O I
10.1038/s43588-024-00627-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of similar to 20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 mu s. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
引用
收藏
页码:367 / 378
页数:14
相关论文
共 76 条
[1]   GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions [J].
Bannwarth, Christoph ;
Ehlert, Sebastian ;
Grimme, Stefan .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2019, 15 (03) :1652-1671
[2]   Announcing the worldwide Protein Data Bank [J].
Berman, H ;
Henrick, K ;
Nakamura, H .
NATURE STRUCTURAL BIOLOGY, 2003, 10 (12) :980-980
[3]   Transient Protein States in Designing Inhibitors of the MDM2-p53 Interaction [J].
Bista, Michal ;
Wolf, Siglinde ;
Khoury, Kareem ;
Kowalska, Kaja ;
Huang, Yijun ;
Wrona, Ewa ;
Arciniega, Marcelino ;
Popowicz, Grzegorz M. ;
Holak, Tad A. ;
Domling, Alexander .
STRUCTURE, 2013, 21 (12) :2143-2151
[4]   A generally applicable atomic-charge dependent London dispersion correction [J].
Caldeweyher, Eike ;
Ehlert, Sebastian ;
Hansen, Andreas ;
Neugebauer, Hagen ;
Spicher, Sebastian ;
Bannwarth, Christoph ;
Grimme, Stefan .
JOURNAL OF CHEMICAL PHYSICS, 2019, 150 (15)
[5]  
Case D.A., 2021, AMBER 2017
[6]   Automated discovery of fundamental variables hidden in experimental data [J].
Chen, Boyuan ;
Huang, Kuang ;
Raghupathi, Sunand ;
Chandratreya, Ishaan ;
Du, Qiang ;
Lipson, Hod .
NATURE COMPUTATIONAL SCIENCE, 2022, 2 (07) :433-+
[7]   Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications [J].
Christensen, Anders S. ;
Kubar, Tomas ;
Cui, Qiang ;
Elstner, Marcus .
CHEMICAL REVIEWS, 2016, 116 (09) :5301-5337
[8]   NMRPIPE - A MULTIDIMENSIONAL SPECTRAL PROCESSING SYSTEM BASED ON UNIX PIPES [J].
DELAGLIO, F ;
GRZESIEK, S ;
VUISTER, GW ;
ZHU, G ;
PFEIFER, J ;
BAX, A .
JOURNAL OF BIOMOLECULAR NMR, 1995, 6 (03) :277-293
[9]   THE DEVELOPMENT AND USE OF QUANTUM-MECHANICAL MOLECULAR-MODELS .76. AM1 - A NEW GENERAL-PURPOSE QUANTUM-MECHANICAL MOLECULAR-MODEL [J].
DEWAR, MJS ;
ZOEBISCH, EG ;
HEALY, EF ;
STEWART, JJP .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1985, 107 (13) :3902-3909
[10]   Fast, accurate semiempirical molecular orbital calculations for macromolecules [J].
Dixon, SL ;
Merz, KM .
JOURNAL OF CHEMICAL PHYSICS, 1997, 107 (03) :879-893