nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

被引:9
作者
Khrabrov, Kuzma [1 ]
Shenbin, Ilya [3 ]
Ryabov, Alexander [4 ,5 ]
Tsypin, Artem [1 ]
Telepov, Alexander [1 ]
Alekseev, Anton [3 ,7 ]
Grishin, Alexander [1 ]
Strashnov, Pavel [1 ]
Zhilyaev, Petr [4 ]
Nikolenko, Sergey [3 ,6 ]
Kadurin, Artur [1 ,2 ]
机构
[1] AIRI, Kutuzovskiy Prospect House 32 Bldg K1, Moscow 121170, Russia
[2] Kuban State Univ, Stavropolskaya St 149, Krasnodar 350040, Russia
[3] Russian Acad Sci, Steklov Math Inst, St Petersburg Dept, Nab R Fontanki 27, St Petersburg 191011, Russia
[4] Skolkovo Inst Sci & Technol, Ctr Mat Technol, Bolshoy Blvd 30,Bld 1, Moscow 121205, Russia
[5] Natl Res Univ, Moscow Inst Phys & Technol, Inst Sky Lane 9, Dolgoprudnyi 141700, Moscow Region, Russia
[6] ISP RAS Res Ctr Trusted Artificial Intelligence, Alexander Solzhenitsyn St 25, Moscow 109004, Russia
[7] St Petersburg Univ, 7-9 Univ Skaya Embankment, St Petersburg 199034, Russia
关键词
CHEMICAL UNIVERSE; DENSITY FUNCTIONALS; VIRTUAL EXPLORATION; ACCURATE; SYSTEMS;
D O I
10.1039/d2cp03966d
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.
引用
收藏
页码:25853 / 25863
页数:11
相关论文
共 72 条
[11]   Open Catalyst 2020 (OC20) Dataset and Community Challenges [J].
Chanussot, Lowik ;
Das, Abhishek ;
Goyal, Siddharth ;
Lavril, Thibaut ;
Shuaibi, Muhammed ;
Riviere, Morgane ;
Tran, Kevin ;
Heras-Domingo, Javier ;
Ho, Caleb ;
Hu, Weihua ;
Palizhati, Aini ;
Sriram, Anuroop ;
Wood, Brandon ;
Yoon, Junwoong ;
Parikh, Devi ;
Zitnick, C. Lawrence ;
Ulissi, Zachary .
ACS CATALYSIS, 2021, 11 (10) :6059-6072
[12]   Machine learning of accurate energy-conserving molecular force fields [J].
Chmiela, Stefan ;
Tkatchenko, Alexandre ;
Sauceda, Huziel E. ;
Poltavsky, Igor ;
Schuett, Kristof T. ;
Mueller, Klaus-Robert .
SCIENCE ADVANCES, 2017, 3 (05)
[13]   Fermionic neural-network states for ab-initio electronic structure [J].
Choo, Kenny ;
Mezzacapo, Antonio ;
Carleo, Giuseppe .
NATURE COMMUNICATIONS, 2020, 11 (01)
[14]   On the Art of Compiling and Using 'Drug-Like' Chemical Fragment Spaces [J].
Degen, Joerg ;
Wegscheid-Gerlach, Christof ;
Zaliani, Andrea ;
Rarey, Matthias .
CHEMMEDCHEM, 2008, 3 (10) :1503-1507
[15]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[16]   Large-Scale Condensed Matter DFT Simulations: Performance and Capabilities of the CRYSTAL Code [J].
Erba, A. ;
Baima, J. ;
Bush, I. ;
Orlando, R. ;
Dovesi, R. .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2017, 13 (10) :5019-5027
[17]   Hybrid DFT/Data-Driven Approach for Searching for New Quasicrystal Approximants in Sc-X (X = Rh, Pd, Ir, Pt) Systems [J].
Eremin, Roman A. ;
Humonen, Innokentiy S. ;
Zolotarev, Pavel N. ;
V. Medrish, Inna ;
Zhukov, Leonid E. ;
Budennyy, Semen A. .
CRYSTAL GROWTH & DESIGN, 2022, 22 (07) :4570-4581
[18]   Virtual exploration of the small-molecule chemical universe below 160 daltons [J].
Fink, T ;
Bruggesser, H ;
Reymond, JL .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2005, 44 (10) :1504-1508
[19]   Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: Assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery [J].
Fink, Tobias ;
Reymond, Jean-Louis .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (02) :342-353
[20]  
Gao Nicholas, 2021, arXiv