Identifying and embedding transferability in data-driven representations of chemical space

被引:4
|
作者
Gould, Tim [1 ]
Chan, Bun [2 ]
Dale, Stephen G. [1 ,3 ]
Vuckovic, Stefan [4 ]
机构
[1] Griffith Univ, Queensland Micro & Nanotechnol Ctr, Nathan, Qld 4111, Australia
[2] Nagasaki Univ, Grad Sch Engn, Bunkyo 1-14, Nagasaki 8528521, Japan
[3] Natl Univ Singapore, Inst Funct Intelligent Mat, 4 Sci Dr 2, Singapore 117544, Singapore
[4] Univ Fribourg, Dept Chem, Fribourg, Switzerland
基金
日本学术振兴会; 瑞士国家科学基金会; 澳大利亚研究理事会;
关键词
DENSITY-FUNCTIONAL THEORY; EXCHANGE; THERMOCHEMISTRY; APPROXIMATIONS; DFT; AI;
D O I
10.1039/d4sc02358g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles. We show that human intuition in the curation of training data introduces biases that hamper model transferability. We introduce a transferability assessment tool which rigorously measures and subsequently improves transferability.
引用
收藏
页码:11122 / 11133
页数:12
相关论文
共 50 条
  • [21] A data-driven framework for identifying tropical wetland model
    Anupam, Angesh
    Wilton, David J.
    Anderson, Sean R.
    Kadirkamanathan, Visakan
    2018 UKACC 12TH INTERNATIONAL CONFERENCE ON CONTROL (CONTROL), 2018, : 242 - 247
  • [22] Data-driven design of embedding observers using automatic differentiation
    Fiedler, Julius
    Gerbet, Daniel
    Roebenack, Klaus
    AT-AUTOMATISIERUNGSTECHNIK, 2024, 72 (08) : 745 - 756
  • [23] Resolution limit of data-driven coarse-grained models spanning chemical space
    Kanekal, Kiran H.
    Bereau, Tristan
    JOURNAL OF CHEMICAL PHYSICS, 2019, 151 (16):
  • [24] Data-driven designing of conjugated organic chromophores: Chemical space generation and property prediction
    Khan, Numan
    Ibrahim, Mahmoud A. A.
    Sayed, Shaban R. M.
    Iqbal, Rashid
    JOURNAL OF SOLID STATE CHEMISTRY, 2025, 344
  • [25] Data-driven parametrization of molecular mechanics force fields for expansive chemical space coverage
    Zheng, Tianze
    Wang, Ailun
    Han, Xu
    Xia, Yu
    Xu, Xingyuan
    Zhan, Jiawei
    Liu, Yu
    Chen, Yang
    Wang, Zhi
    Wu, Xiaojie
    Gong, Sheng
    Yan, Wen
    CHEMICAL SCIENCE, 2025, 16 (06) : 2730 - 2740
  • [26] Data-driven in silico tools for the rational design of catalytic systems and the exploration of chemical space
    Hachmann, Johannes
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [27] A Data-driven Model of Nucleosynthesis with Chemical Tagging in a Lower-dimensional Latent Space
    Casey, Andrew R.
    Lattanzio, John C.
    Aleti, Aldeida
    Dowe, David L.
    Bland-Hawthorn, Joss
    Buder, Sven
    Lewis, Geraint F.
    Martell, Sarah L.
    Nordlander, Thomas
    Simpson, Jeffrey D.
    Sharma, Sanjib
    Zucker, Daniel B.
    ASTROPHYSICAL JOURNAL, 2019, 887 (01):
  • [28] Data-Driven Space-Filling Curves
    Zhou, Liang
    Johnson, Chris R.
    Weiskopf, Daniel
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1591 - 1600
  • [29] Variational Embedding of Measured Data in Physics-Constrained Data-Driven Modeling
    Masud, Arif
    Goraya, Shoaib
    JOURNAL OF APPLIED MECHANICS-TRANSACTIONS OF THE ASME, 2022, 89 (11):
  • [30] Data-driven Chemical Reaction Prediction and Retrosynthesis
    Nair, Vishnu H.
    Schwaller, Philippe
    Laino, Teodoro
    CHIMIA, 2019, 73 (12) : 997 - 1000