Comparison of the Performance of Machine Learning Models in Representing High-Dimensional Free Energy Surfaces and Generating Observables

被引:19
|
作者
Cendagorta, Joseph R. [1 ]
Tolpin, Jocelyn [4 ]
Schneider, Elia [1 ]
Topper, Robert Q. [5 ]
Tuckerman, Mark E. [1 ,2 ,3 ]
机构
[1] NYU, Dept Chem, 4 Washington Pl, New York, NY 10003 USA
[2] NYU, Courant Inst Math Sci, 251 Mercer St, New York, NY 10003 USA
[3] NYU Shanghai, NYU ECNU Ctr Computat Chem, Shanghai 200062, Peoples R China
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
[5] Cooper Union Adv Sci & Art, Dept Chem, New York, NY 10003 USA
来源
JOURNAL OF PHYSICAL CHEMISTRY B | 2020年 / 124卷 / 18期
基金
美国国家科学基金会;
关键词
MOLECULAR-DYNAMICS; FORCE-FIELDS; ENERGETICS; POTENTIALS; EFFICIENT; TUTORIAL;
D O I
10.1021/acs.jpcb.0c01218
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Free energy surfaces of chemical and physical systems are often generated using a popular class of enhanced sampling methods that target a set of collective variables (CVs) chosen to distinguish the characteristic features of these surfaces. While some of these approaches are typically limited to low (similar to 1-3)-dimensional CV subspaces, methods such as driven adiabatic free-energy dynamics/temperature-accelerated molecular dynamics have been shown to be capable of generating free energy surfaces of quite high dimension by sampling the associated marginal probability distribution via full sweeps over the CV landscape. These approaches repeatedly visit conformational basins, producing a scattering of points within the basins on each visit. Consequently, they are particularly amenable to synergistic combination with regression machine learning methods for filling in the surfaces between the sampled points and for providing a compact and continuous (or semicontinuous) representation of the surfaces that can be easily stored and used for further computation of observable properties. Given the central role of machine learning techniques in this combined approach, it is timely to provide a detailed comparison of the performance of different machine learning strategies and models, including neural networks, kernel ridge regression, support vector machines, and weighted neighbor schemes, for their ability to learn these high-dimensional surfaces as a function of the amount of sampled training data and, once trained, to subsequently generate accurate ensemble averages corresponding to observable properties of the systems. In this article, we perform such a comparison on a set of oligopeptides, in both gas and aqueous phases, corresponding to CV spaces of 2-10 dimensions and assess their ability to provide a global representation of the free energy surfaces and to generate accurate ensemble averages.
引用
收藏
页码:3647 / 3660
页数:14
相关论文
共 50 条
  • [21] Constraining the parameters of high-dimensional models with active learning
    Caron, Sascha
    Heskes, Tom
    Otten, Sydney
    Stienen, Bob
    EUROPEAN PHYSICAL JOURNAL C, 2019, 79 (11):
  • [22] Accurate classification of depression through optimized machine learning models on high-dimensional noisy data
    Fang, Xingang
    Klawohn, Julia
    De Sabatino, Alexander
    Kundnani, Harsh
    Ryan, Jonathan
    Yu, Weikuan
    Hajcak, Greg
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [23] Machine learning for pricing American options in high-dimensional Markovian and non-Markovian models
    Goudenege, Ludovic
    Molent, Andrea
    Zanette, Antonino
    QUANTITATIVE FINANCE, 2020, 20 (04) : 573 - 591
  • [24] Low-Dimensional Flow Models from High-Dimensional Flow Data with Machine Learning and First Principles
    Deng, Nan
    Pastur, Luc R.
    Noack, Bernd R.
    ERCIM NEWS, 2020, (122): : 30 - 31
  • [25] Machine Learning for Generic Energy Models of High Performance Computing Resources
    Murana, Jonathan
    Navarrete, Carmen
    Nesmachnow, Sergio
    HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS, 2021, 12761 : 314 - 330
  • [26] A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction
    Spooner, Annette
    Chen, Emily
    Sowmya, Arcot
    Sachdev, Perminder
    Kochan, Nicole A.
    Trollor, Julian
    Brodaty, Henry
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [27] A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction
    Annette Spooner
    Emily Chen
    Arcot Sowmya
    Perminder Sachdev
    Nicole A. Kochan
    Julian Trollor
    Henry Brodaty
    Scientific Reports, 10
  • [28] INTERPRETABLE MACHINE LEARNING OF HIGH-DIMENSIONAL AGING HEALTH TRAJECTORIES
    Farrell, Spencer
    Mitnitski, Arnold
    Rockwood, Kenneth
    Rutenberg, Andrew
    INNOVATION IN AGING, 2021, 5 : 672 - 672
  • [29] High-dimensional role of Al and machine learning in cancer research
    Capobianco, Enrico
    BRITISH JOURNAL OF CANCER, 2022, 126 (04) : 523 - 532
  • [30] Exploring the robust extrapolation of high-dimensional machine learning potentials
    Zeni, Claudio
    Anelli, Andrea
    Glielmo, Aldo
    Rossi, Kevin
    PHYSICAL REVIEW B, 2022, 105 (16)