Comparison of the Performance of Machine Learning Models in Representing High-Dimensional Free Energy Surfaces and Generating Observables

被引:19
|
作者
Cendagorta, Joseph R. [1 ]
Tolpin, Jocelyn [4 ]
Schneider, Elia [1 ]
Topper, Robert Q. [5 ]
Tuckerman, Mark E. [1 ,2 ,3 ]
机构
[1] NYU, Dept Chem, 4 Washington Pl, New York, NY 10003 USA
[2] NYU, Courant Inst Math Sci, 251 Mercer St, New York, NY 10003 USA
[3] NYU Shanghai, NYU ECNU Ctr Computat Chem, Shanghai 200062, Peoples R China
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
[5] Cooper Union Adv Sci & Art, Dept Chem, New York, NY 10003 USA
来源
JOURNAL OF PHYSICAL CHEMISTRY B | 2020年 / 124卷 / 18期
基金
美国国家科学基金会;
关键词
MOLECULAR-DYNAMICS; FORCE-FIELDS; ENERGETICS; POTENTIALS; EFFICIENT; TUTORIAL;
D O I
10.1021/acs.jpcb.0c01218
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Free energy surfaces of chemical and physical systems are often generated using a popular class of enhanced sampling methods that target a set of collective variables (CVs) chosen to distinguish the characteristic features of these surfaces. While some of these approaches are typically limited to low (similar to 1-3)-dimensional CV subspaces, methods such as driven adiabatic free-energy dynamics/temperature-accelerated molecular dynamics have been shown to be capable of generating free energy surfaces of quite high dimension by sampling the associated marginal probability distribution via full sweeps over the CV landscape. These approaches repeatedly visit conformational basins, producing a scattering of points within the basins on each visit. Consequently, they are particularly amenable to synergistic combination with regression machine learning methods for filling in the surfaces between the sampled points and for providing a compact and continuous (or semicontinuous) representation of the surfaces that can be easily stored and used for further computation of observable properties. Given the central role of machine learning techniques in this combined approach, it is timely to provide a detailed comparison of the performance of different machine learning strategies and models, including neural networks, kernel ridge regression, support vector machines, and weighted neighbor schemes, for their ability to learn these high-dimensional surfaces as a function of the amount of sampled training data and, once trained, to subsequently generate accurate ensemble averages corresponding to observable properties of the systems. In this article, we perform such a comparison on a set of oligopeptides, in both gas and aqueous phases, corresponding to CV spaces of 2-10 dimensions and assess their ability to provide a global representation of the free energy surfaces and to generate accurate ensemble averages.
引用
收藏
页码:3647 / 3660
页数:14
相关论文
共 50 条
  • [41] Transfer Learning Under High-Dimensional Generalized Linear Models
    Tian, Ye
    Feng, Yang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2684 - 2697
  • [42] A comparison study of Bayesian high-dimensional linear regression models
    Shin, Ju-Won
    Lee, Kyoungjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 491 - 505
  • [43] Quantifying the Privacy Risks of Learning High-Dimensional Graphical Models
    Murakonda, Sasi Kumar
    Shokri, Reza
    Theodorakopoulos, George
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [44] Scalable Algorithms for Learning High-Dimensional Linear Mixed Models
    Tan, Zilong
    Roche, Kimberly
    Zhou, Xiang
    Mukherjee, Sayan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 259 - 268
  • [45] Representing Potential Energy Surfaces with Neural Networks and High Dimensional Model Representations
    Manzhos, Sergei
    Carrington, Tucker
    INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009 (ICCMSE 2009), 2012, 1504 : 785 - 787
  • [46] Revisiting Computational Thermodynamics through Machine Learning of High-Dimensional Data
    Srinivasan, Srikant
    Rajan, Krishna
    COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (05) : 22 - 31
  • [47] Two-stage extreme learning machine for high-dimensional data
    Liu, Peng
    Huang, Yihua
    Meng, Lei
    Gong, Siyuan
    Zhang, Guopeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2016, 7 (05) : 765 - 772
  • [48] Asynchronous Parallel, Sparse Approximated SVRG for High-Dimensional Machine Learning
    Shang, Fanhua
    Huang, Hua
    Fan, Jun
    Liu, Yuanyuan
    Liu, Hongying
    Liu, Jianhui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (12) : 5636 - 5648
  • [49] A machine learning based approach towards high-dimensional mediation analysis
    Natha, Tanmay
    Caffoa, Brian
    Wagerb, Tor
    Lindquista, Martin A.
    NEUROIMAGE, 2023, 268
  • [50] Extreme learning machine Cox model for high-dimensional survival analysis
    Wang, Hong
    Li, Gang
    STATISTICS IN MEDICINE, 2019, 38 (12) : 2139 - 2156