Improving molecular force fields across configurational space by combining supervised and unsupervised machine learning

被引:18
|
作者
Fonseca, Gregory [1 ]
Poltavsky, Igor [1 ]
Vassilev-Galindo, Valentin [1 ]
Tkatchenko, Alexandre [1 ]
机构
[1] Univ Luxembourg, Dept Phys & Mat Sci, L-1511 Luxembourg, Luxembourg
来源
JOURNAL OF CHEMICAL PHYSICS | 2021年 / 154卷 / 12期
基金
欧洲研究理事会;
关键词
ATOMISTIC SIMULATIONS; DYNAMICS; APPROXIMATION; POTENTIALS; CORROSION;
D O I
10.1063/5.0035530
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The training set of atomic configurations is key to the performance of any Machine Learning Force Field (MLFF) and, as such, the training set selection determines the applicability of the MLFF model for predictive molecular simulations. However, most atomistic reference datasets are inhomogeneously distributed across configurational space (CS), and thus, choosing the training set randomly or according to the probability distribution of the data leads to models whose accuracy is mainly defined by the most common close-to-equilibrium configurations in the reference data. In this work, we combine unsupervised and supervised ML methods to bypass the inherent bias of the data for common configurations, effectively widening the applicability range of the MLFF to the fullest capabilities of the dataset. To achieve this goal, we first cluster the CS into subregions similar in terms of geometry and energetics. We iteratively test a given MLFF performance on each subregion and fill the training set of the model with the representatives of the most inaccurate parts of the CS. The proposed approach has been applied to a set of small organic molecules and alanine tetrapeptide, demonstrating an up to twofold decrease in the root mean squared errors for force predictions on non-equilibrium geometries of these molecules. Furthermore, our ML models demonstrate superior stability over the default training approaches, allowing reliable study of processes involving highly out-of-equilibrium molecular configurations. These results hold for both kernel-based methods (sGDML and GAP/SOAP models) and deep neural networks (SchNet model).
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Improving facies prediction by combining supervised and unsupervised learning methods
    Ippolito, Marco
    Ferguson, John
    Jenson, Fred
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2021, 200
  • [2] Combining supervised and unsupervised machine learning algorithms to predict the learners' learning styles
    El Aissaoui, Ouafae
    El Alami El Madani, Yasser
    Oughdir, Lahcen
    El Allioui, Youssouf
    SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2018), 2019, 148 : 87 - 96
  • [3] On the design space between molecular mechanics and machine learning force fields
    Wang, Yuanqing
    Takaba, Kenichiro
    Chen, Michael S.
    Wieder, Marcus
    Xu, Yuzhi
    Zhu, Tong
    Zhang, John Z. H.
    Nagle, Arnav
    Yu, Kuang
    Wang, Xinyan
    Cole, Daniel J.
    Rackers, Joshua A.
    Cho, Kyunghyun
    Greener, Joe G.
    Eastman, Peter
    Martiniani, Stefano
    Tuckerman, Mark E.
    APPLIED PHYSICS REVIEWS, 2025, 12 (02):
  • [4] Combining unsupervised and supervised machine learning in analysis of the CHD patient database
    Smuc, T
    Gamberger, D
    Krstacic, G
    ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS, 2001, 2101 : 109 - 112
  • [5] Linking protolith rocks to altered equivalents by combining unsupervised and supervised machine learning
    Hood, Shawn B.
    Cracknell, Matthew J.
    Gazley, Michael F.
    JOURNAL OF GEOCHEMICAL EXPLORATION, 2018, 186 : 270 - 280
  • [6] Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening
    Omta, Wienand A.
    van Heesbeen, Roy G.
    Shen, Ian
    de Nobel, Jacob
    Robers, Desmond
    van Der Velden, Lieke M.
    Medema, Rene H.
    Siebes, Arno P. J. M.
    Feelders, Ad J.
    Brinkkemper, Sjaak
    Klumpermanl, Judith S.
    Spruit, Marco Rene
    Brinkhuis, Matthieu J. S.
    Egan, David A.
    SLAS DISCOVERY, 2020, 25 (06) : 655 - 664
  • [7] Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics
    Wang, Zun
    Wu, Hongfei
    Sun, Lixin
    He, Xinheng
    Liu, Zhirong
    Shao, Bin
    Wang, Tong
    Liu, Tie-Yan
    JOURNAL OF CHEMICAL PHYSICS, 2023, 159 (03):
  • [8] Combining supervised and unsupervised learning for data clustering
    Corsini, Paolo
    Lazzerini, Beatrice
    Marcelloni, Francesco
    NEURAL COMPUTING & APPLICATIONS, 2006, 15 (3-4): : 289 - 297
  • [9] Combining supervised and unsupervised learning for data clustering
    Paolo Corsini
    Beatrice Lazzerini
    Francesco Marcelloni
    Neural Computing & Applications, 2006, 15 : 289 - 297
  • [10] Kernel Approaches to Unsupervised and Supervised Machine Learning
    Kung, Sun-Yuan
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 1 - 32