nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

被引:9
作者
Khrabrov, Kuzma [1 ]
Shenbin, Ilya [3 ]
Ryabov, Alexander [4 ,5 ]
Tsypin, Artem [1 ]
Telepov, Alexander [1 ]
Alekseev, Anton [3 ,7 ]
Grishin, Alexander [1 ]
Strashnov, Pavel [1 ]
Zhilyaev, Petr [4 ]
Nikolenko, Sergey [3 ,6 ]
Kadurin, Artur [1 ,2 ]
机构
[1] AIRI, Kutuzovskiy Prospect House 32 Bldg K1, Moscow 121170, Russia
[2] Kuban State Univ, Stavropolskaya St 149, Krasnodar 350040, Russia
[3] Russian Acad Sci, Steklov Math Inst, St Petersburg Dept, Nab R Fontanki 27, St Petersburg 191011, Russia
[4] Skolkovo Inst Sci & Technol, Ctr Mat Technol, Bolshoy Blvd 30,Bld 1, Moscow 121205, Russia
[5] Natl Res Univ, Moscow Inst Phys & Technol, Inst Sky Lane 9, Dolgoprudnyi 141700, Moscow Region, Russia
[6] ISP RAS Res Ctr Trusted Artificial Intelligence, Alexander Solzhenitsyn St 25, Moscow 109004, Russia
[7] St Petersburg Univ, 7-9 Univ Skaya Embankment, St Petersburg 199034, Russia
关键词
CHEMICAL UNIVERSE; DENSITY FUNCTIONALS; VIRTUAL EXPLORATION; ACCURATE; SYSTEMS;
D O I
10.1039/d2cp03966d
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.
引用
收藏
页码:25853 / 25863
页数:11
相关论文
共 72 条
[1]   GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions [J].
Bannwarth, Christoph ;
Ehlert, Sebastian ;
Grimme, Stefan .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2019, 15 (03) :1652-1671
[2]   CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES [J].
BARNARD, JM ;
DOWNS, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06) :644-649
[3]  
BARTLETT RJ, 1994, REV COMP CH, V5, P65, DOI 10.1002/9780470125823.ch2
[4]   Perspective: Fifty years of density-functional theory in chemical physics [J].
Becke, Axel D. .
JOURNAL OF CHEMICAL PHYSICS, 2014, 140 (18)
[5]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[6]  
Bowen R.C., 2017, SCI REP-UK, V7, P1, DOI DOI 10.1038/
[7]  
Brown Tom B., 2020, Adv. Neural Inf Process. Syst, V33, P1877
[8]   Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections [J].
Chai, Jeng-Da ;
Head-Gordon, Martin .
PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2008, 10 (44) :6615-6620
[9]   Systematic optimization of long-range corrected hybrid density functionals [J].
Chai, Jeng-Da ;
Head-Gordon, Martin .
JOURNAL OF CHEMICAL PHYSICS, 2008, 128 (08)
[10]   Machine learning enabled autonomous microstructural characterization in 3D samples [J].
Chan, Henry ;
Cherukara, Mathew ;
Loeffler, Troy D. ;
Narayanan, Badri ;
Sankaranarayanan, Subramanian K. R. S. .
NPJ COMPUTATIONAL MATERIALS, 2020, 6 (01)