MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery

被引:22
作者
Siebenmorgen, Till [1 ,2 ]
Menezes, Filipe [1 ,2 ]
Benassou, Sabrina [3 ]
Merdivan, Erinc [4 ]
Didi, Kieran [5 ]
Mourao, Andre Santos Dias [1 ,2 ]
Kitel, Radoslaw [6 ]
Lio, Pietro [5 ]
Kesselheim, Stefan [3 ]
Piraud, Marie [4 ]
Theis, Fabian J. [4 ,7 ,8 ]
Sattler, Michael [1 ,2 ]
Popowicz, Grzegorz M. [1 ,2 ]
机构
[1] Helmholtz Munich, Inst Struct Biol, Mol Targets & Therapeut Ctr, Neuherberg, Germany
[2] Tech Univ Munich, Bayer NMR Zent, TUM Sch Nat Sci, Dept Biosci, Garching, Germany
[3] Forschungszentrum Julich, Julich Supercomp Ctr, Julich, Germany
[4] Helmholtz Munich, Helmholtz AI, Neuherberg, Germany
[5] Univ Cambridge, Comp Lab, Cambridge, England
[6] Jagiellonian Univ, Fac Chem, Krakow, Poland
[7] Helmholtz Munich, Inst Computat Biol, Computat Hlth Ctr, Neuherberg, Germany
[8] Tech Univ Munich, TUM Sch Computat Informat & Technol, Garching, Germany
来源
NATURE COMPUTATIONAL SCIENCE | 2024年 / 4卷 / 05期
关键词
SCORING FUNCTION; FORCE-FIELD; BINDING; AFFINITY; EFFICIENT; MODELS; PARAMETERIZATION; GENERATION; PREDICTION; ACCURACY;
D O I
10.1038/s43588-024-00627-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of similar to 20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 mu s. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
引用
收藏
页码:367 / 378
页数:14
相关论文
共 76 条
[41]   BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities [J].
Liu, Tiqing ;
Lin, Yuhmei ;
Wen, Xin ;
Jorissen, Robert N. ;
Gilson, Michael K. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D198-D201
[42]   ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB [J].
Maier, James A. ;
Martinez, Carmenza ;
Kasavajhala, Koushik ;
Wickstrom, Lauren ;
Hauser, Kevin E. ;
Simmerling, Carlos .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2015, 11 (08) :3696-3713
[43]   ULYSSES: An Efficient and Easy to Use Semiempirical Library for C plus [J].
Menezes, Filipe ;
Popowicz, Grzegorz M. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (16) :3685-3694
[44]  
Mohs Richard C, 2017, Alzheimers Dement (N Y), V3, P651, DOI 10.1016/j.trci.2017.10.005
[45]  
O'Boyle NM., 2011, J CHEMINFORM, V3, P1, DOI [DOI 10.1186/1758-2946-3-33, 10.1186/1758-2946-3-33]
[46]   Successful generation of structural information for fragment-based drug discovery [J].
Oster, Linda ;
Tapani, Sofia ;
Xue, Yafeng ;
Kack, Helena .
DRUG DISCOVERY TODAY, 2015, 20 (09) :1104-1111
[47]   DeepDTA: deep drug-target binding affinity prediction [J].
Ozturk, Hakime ;
Ozgur, Arzucan ;
Ozkirimli, Elif .
BIOINFORMATICS, 2018, 34 (17) :821-829
[48]   Current Status of the AMOEBA Polarizable Force Field [J].
Ponder, Jay W. ;
Wu, Chuanjie ;
Ren, Pengyu ;
Pande, Vijay S. ;
Chodera, John D. ;
Schnieders, Michael J. ;
Haque, Imran ;
Mobley, David L. ;
Lambrecht, Daniel S. ;
DiStasio, Robert A., Jr. ;
Head-Gordon, Martin ;
Clark, Gary N. I. ;
Johnson, Margaret E. ;
Head-Gordon, Teresa .
JOURNAL OF PHYSICAL CHEMISTRY B, 2010, 114 (08) :2549-2564
[49]   AutoDockFR: Advances in Protein-Ligand Docking with Explicitly Specified Binding Site Flexibility [J].
Ravindranath, Pradeep Anand ;
Forli, Stefano ;
Goodsell, David S. ;
Olson, Arthur J. ;
Sanner, Michel F. .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (12)
[50]   Benchmarking semiempirical, Hartree-Fock, DFT, and MP2 methods against the ionization energies and electron affinities of short- through long-chain [n]acenes and [n]phenacenes [J].
Rayne, Sierra ;
Forest, Kaya .
CANADIAN JOURNAL OF CHEMISTRY, 2016, 94 (03) :251-258