Gaussian Process-Based Refinement of Dispersion Corrections

被引:44
作者
Proppe, Jonny [1 ,2 ,3 ]
Gugler, Stefan [3 ]
Reiher, Markus [3 ]
机构
[1] Univ Toronto, Dept Chem, Toronto, ON M5S, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S, Canada
[3] Swiss Fed Inst Technol, Lab Phys Chem, Vladimir Prelog Weg 2, CH-8093 Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
DENSITY-FUNCTIONAL THEORY; BASIS-SET CONVERGENCE; PREDICTION UNCERTAINTY; RARE-GAS; ENERGIES; ACCURATE; APPROXIMATIONS; POTENTIALS; DATABASE; VALENCE;
D O I
10.1021/acs.jctc.9b00627
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
We employ Gaussian process (GP) regression to adjust for systematic errors in D3-type dispersion corrections. We refer to the associated, statistically improved model as D3-GP. It is trained on differences between interaction energies obtained from PBE-D3(BJ)/ma-def2-QZVPP and DLPNO- CCSD(T)/CBS calculations. We generated a data set containing interaction energies for 1248 molecular dimers, which resemble the dispersion-dominated systems contained in the S66 data set. Our systems represent not only equilibrium structures but also dimers with various relative orientations and conformations at both shorter and longer distances. A reparametrization of the D3(BJ) model based on 66 of these dimers suggests that two of its three empirical parameters, a(1), and s(g), are zero, whereas a(2) = 5.6841 bohr. For the remaining 1182 dimers, we find that this new set of parameters is superior to all previously published D3(BJ) parameter sets. To train our D3-GP model, we engineered two different vectorial representations of (supra-)molecular systems, both derived from the matrix of atom-pairwise D3(BJ) interaction terms: (a) a distance-resolved interaction energy histogram, histD3(BJ), and (b) eigenvalues of the interaction matrix ordered according to their decreasing absolute value, eigD3(BJ). Hence, the GP learns a mapping from D3(BJ) information only, which renders D3-GP-type dispersion corrections comparable to those obtained with the original D3 approach. They improve systematically if the underlying training set is selected carefully. Here, we harness the prediction variance obtained from GP regression to select optimal training sets in an automated fashion. The larger the variance, the more information the corresponding data point may add to the training set. For a given set of molecular systems, variance-based sampling can approximately determine the smallest subset being subjected to reference calculations such that all dispersion corrections for the remaining systems fall below a predefined accuracy threshold. To render the entire D3-GP workflow as efficient as possible, we present an improvement over our variance-based, sequential active-learning scheme [J. Chem. Theory Comput. 2018, 14, 5238]. Our refined learning algorithm selects multiple (instead of single) systems that can be subjected to reference calculations simultaneously. We refer to the underlying selection strategy as batchwise variance-based sampling (BVS). BVS-guided active learning is an essential component of our D3-GP workflow, which is implemented in a black-box fashion. Once provided with reference data for new molecular systems, the underlying GP model automatically learns to adapt to these and similar systems. This approach leads overall to a self-improving model (D3-GP) that predicts system-focused and GP-refined D3-type dispersion corrections for any given system of reference data.
引用
收藏
页码:6046 / 6060
页数:15
相关论文
共 84 条
[61]   Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning [J].
Rupp, Matthias ;
Tkatchenko, Alexandre ;
Mueller, Klaus-Robert ;
von Lilienfeld, O. Anatole .
PHYSICAL REVIEW LETTERS, 2012, 108 (05)
[62]   Binding energy curves from nonempirical density functionals II. van der Waals bonds in rare-gas and alkaline-earth diatomics [J].
Ruzsinszky, A ;
Perdew, JP ;
Csonka, GI .
JOURNAL OF PHYSICAL CHEMISTRY A, 2005, 109 (48) :11015-11021
[63]   Local response dispersion method. II. Generalized multicenter interactions [J].
Sato, Takeshi ;
Nakai, Hiromi .
JOURNAL OF CHEMICAL PHYSICS, 2010, 133 (19)
[64]   Density functional method including weak interactions: Dispersion coefficients based on the local response approximation [J].
Sato, Takeshi ;
Nakai, Hiromi .
JOURNAL OF CHEMICAL PHYSICS, 2009, 131 (22)
[65]   Taking the Human Out of the Loop: A Review of Bayesian Optimization [J].
Shahriari, Bobak ;
Swersky, Kevin ;
Wang, Ziyu ;
Adams, Ryan P. ;
de Freitas, Nando .
PROCEEDINGS OF THE IEEE, 2016, 104 (01) :148-175
[66]   Error-Controlled Exploration of Chemical Reaction Networks with Gaussian Processes [J].
Simm, Gregor N. ;
Reiher, Markus .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2018, 14 (10) :5238-5248
[67]   Error Assessment of Computational Models in Chemistry [J].
Simm, Gregor N. ;
Proppe, Jonny ;
Reiher, Markus .
CHIMIA, 2017, 71 (04) :202-208
[68]   Revised Damping Parameters for the D3 Dispersion Correction to Density Functional Theory [J].
Smith, Daniel G. A. ;
Burns, Lori A. ;
Patkowski, Konrad ;
Sherrill, C. David .
JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2016, 7 (12) :2197-2203
[69]   Less is more: Sampling chemical space with active learning [J].
Smith, Justin S. ;
Nebgen, Ben ;
Lubbers, Nicholas ;
Isayev, Olexandr ;
Roitberg, Adrian E. .
JOURNAL OF CHEMICAL PHYSICS, 2018, 148 (24)
[70]  
Srivastava N, 2014, J MACH LEARN RES, V15, P1929