Comparison of the Performance of Machine Learning Models in Representing High-Dimensional Free Energy Surfaces and Generating Observables

被引:19
|
作者
Cendagorta, Joseph R. [1 ]
Tolpin, Jocelyn [4 ]
Schneider, Elia [1 ]
Topper, Robert Q. [5 ]
Tuckerman, Mark E. [1 ,2 ,3 ]
机构
[1] NYU, Dept Chem, 4 Washington Pl, New York, NY 10003 USA
[2] NYU, Courant Inst Math Sci, 251 Mercer St, New York, NY 10003 USA
[3] NYU Shanghai, NYU ECNU Ctr Computat Chem, Shanghai 200062, Peoples R China
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
[5] Cooper Union Adv Sci & Art, Dept Chem, New York, NY 10003 USA
来源
JOURNAL OF PHYSICAL CHEMISTRY B | 2020年 / 124卷 / 18期
基金
美国国家科学基金会;
关键词
MOLECULAR-DYNAMICS; FORCE-FIELDS; ENERGETICS; POTENTIALS; EFFICIENT; TUTORIAL;
D O I
10.1021/acs.jpcb.0c01218
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Free energy surfaces of chemical and physical systems are often generated using a popular class of enhanced sampling methods that target a set of collective variables (CVs) chosen to distinguish the characteristic features of these surfaces. While some of these approaches are typically limited to low (similar to 1-3)-dimensional CV subspaces, methods such as driven adiabatic free-energy dynamics/temperature-accelerated molecular dynamics have been shown to be capable of generating free energy surfaces of quite high dimension by sampling the associated marginal probability distribution via full sweeps over the CV landscape. These approaches repeatedly visit conformational basins, producing a scattering of points within the basins on each visit. Consequently, they are particularly amenable to synergistic combination with regression machine learning methods for filling in the surfaces between the sampled points and for providing a compact and continuous (or semicontinuous) representation of the surfaces that can be easily stored and used for further computation of observable properties. Given the central role of machine learning techniques in this combined approach, it is timely to provide a detailed comparison of the performance of different machine learning strategies and models, including neural networks, kernel ridge regression, support vector machines, and weighted neighbor schemes, for their ability to learn these high-dimensional surfaces as a function of the amount of sampled training data and, once trained, to subsequently generate accurate ensemble averages corresponding to observable properties of the systems. In this article, we perform such a comparison on a set of oligopeptides, in both gas and aqueous phases, corresponding to CV spaces of 2-10 dimensions and assess their ability to provide a global representation of the free energy surfaces and to generate accurate ensemble averages.
引用
收藏
页码:3647 / 3660
页数:14
相关论文
共 50 条
  • [1] PERFORMANCE OF MACHINE LEARNING METHODS IN CLASSIFICATION MODELS WITH HIGH-DIMENSIONAL DATA
    Zekic-Susac, Marijana
    Pfeifer, Sanja
    Sarlija, Natasa
    SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 219 - 224
  • [2] Representing potential energy surfaces by high-dimensional neural network potentials
    Behler, J.
    JOURNAL OF PHYSICS-CONDENSED MATTER, 2014, 26 (18)
  • [3] Stochastic Neural Network Approach for Learning High-Dimensional Free Energy Surfaces
    Schneider, Elia
    Dai, Luke
    Topper, Robert Q.
    Drechsel-Grau, Christof
    Tuckerman, Mark E.
    PHYSICAL REVIEW LETTERS, 2017, 119 (15)
  • [4] Learning from models: high-dimensional analyses on the performance of machine learning interatomic potentials
    Liu, Yunsheng
    Mo, Yifei
    NPJ COMPUTATIONAL MATERIALS, 2024, 10 (01)
  • [5] Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks
    Lorenz, S
    Gross, A
    Scheffler, M
    CHEMICAL PHYSICS LETTERS, 2004, 395 (4-6) : 210 - 215
  • [6] Locating landmarks on high-dimensional free energy surfaces
    Chen, Ming
    Yu, Tang-Qing
    Tuckerman, Mark E.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (11) : 3235 - 3240
  • [7] High-dimensional potential energy surfaces for molecular simulations: from empiricism to machine learning
    Unke, Oliver T.
    Koner, Debasish
    Patra, Sarbani
    Kaser, Silvan
    Meuwly, Markus
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2020, 1 (01):
  • [8] A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem
    Zekic-Susac, Marijana
    Pfeifer, Sanja
    Sarlija, Natasa
    BUSINESS SYSTEMS RESEARCH JOURNAL, 2014, 5 (03): : 82 - 96
  • [9] Comparison of multifidelity machine learning models for potential energy surfaces
    Goodlett, Stephen M.
    Turney, Justin M.
    Schaefer, Henry F.
    JOURNAL OF CHEMICAL PHYSICS, 2023, 159 (04):
  • [10] Robust High-Dimensional Factor Models with Applications to Statistical Machine Learning
    Fan, Jianqing
    Wang, Kaizheng
    Zhong, Yiqiao
    Zhu, Ziwei
    STATISTICAL SCIENCE, 2021, 36 (02) : 303 - 327