Identifying and embedding transferability in data-driven representations of chemical space

被引:4
|
作者
Gould, Tim [1 ]
Chan, Bun [2 ]
Dale, Stephen G. [1 ,3 ]
Vuckovic, Stefan [4 ]
机构
[1] Griffith Univ, Queensland Micro & Nanotechnol Ctr, Nathan, Qld 4111, Australia
[2] Nagasaki Univ, Grad Sch Engn, Bunkyo 1-14, Nagasaki 8528521, Japan
[3] Natl Univ Singapore, Inst Funct Intelligent Mat, 4 Sci Dr 2, Singapore 117544, Singapore
[4] Univ Fribourg, Dept Chem, Fribourg, Switzerland
基金
日本学术振兴会; 瑞士国家科学基金会; 澳大利亚研究理事会;
关键词
DENSITY-FUNCTIONAL THEORY; EXCHANGE; THERMOCHEMISTRY; APPROXIMATIONS; DFT; AI;
D O I
10.1039/d4sc02358g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles. We show that human intuition in the curation of training data introduces biases that hamper model transferability. We introduce a transferability assessment tool which rigorously measures and subsequently improves transferability.
引用
收藏
页码:11122 / 11133
页数:12
相关论文
共 50 条
  • [41] Moroccan Data-Driven Spelling Normalization Using Character Neural Embedding
    Tachicart, Ridouane
    Bouzoubaa, Karim
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2021, 8 (01) : 113 - 131
  • [42] A Data-Driven Approach for Identifying Medicinal Combinations of Natural Products
    Yoo, Sunyong
    Ha, Suhyun
    Shin, Moonshik
    Noh, Kyungrin
    Nam, Hojung
    Lee, Doheon
    IEEE ACCESS, 2018, 6 : 58106 - 58118
  • [43] Data-driven tailoring of molecular dipole polarizability and frontier orbital energies in chemical compound space
    Goger, Szabolcs
    Sandonas, Leonardo Medrano
    Mueller, Carolin
    Tkatchenko, Alexandre
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2023, 25 (33) : 22211 - 22222
  • [44] DATA-DRIVEN APPROACH FOR IDENTIFYING MISTUNING IN AS-MANUFACTURED BLISKS
    Kelly, Sean T.
    Lupini, Andrea
    Epureanu, Bogdan I.
    PROCEEDINGS OF ASME TURBO EXPO 2021: TURBOMACHINERY TECHNICAL CONFERENCE AND EXPOSITION, VOL 9B, 2021,
  • [45] Data-Driven Approach for Identifying Mistuning in As-Manufactured Blisks
    Kelly, Sean T.
    Lupini, Andrea
    Epureanu, Bogdan, I
    JOURNAL OF ENGINEERING FOR GAS TURBINES AND POWER-TRANSACTIONS OF THE ASME, 2022, 144 (05):
  • [46] A data-driven approach for identifying project manager competency weights
    Hanna, Awad S.
    Iskandar, Karim A.
    Lotfallah, Wafik
    Ibrahim, Michael W.
    Russell, Jeffrey S.
    CANADIAN JOURNAL OF CIVIL ENGINEERING, 2018, 45 (01) : 1 - 8
  • [47] Improved Transferability of Data-Driven Damage Models Through Sample Selection Bias Correction
    Wagenaar, Dennis
    Hermawan, Tiaravanni
    van den Homberg, Marc
    Aerts, Jeroen C. J. H.
    Kreibich, Heidi
    de Moel, Hans
    Bouwer, Laurens M.
    RISK ANALYSIS, 2021, 41 (01) : 37 - 55
  • [48] Heterogeneous data-driven aerodynamic modeling based on physical feature embedding
    Weiwei ZHANG
    Xuhao PENG
    Jiaqing KOU
    Xu WANG
    Chinese Journal of Aeronautics, 2024, 37 (03) : 1 - 6
  • [49] Testing the Applicability and Transferability of Data-Driven Geospatial Models for Predicting Soil Erosion in Vineyards
    Takats, Tuende
    Pasztor, Laszlo
    Arvai, Matyas
    Albert, Gaspar
    Meszaros, Janos
    LAND, 2025, 14 (01)
  • [50] A data-driven framework to manage uncertainty due to limited transferability in urban growth models
    Yu, Jingyan
    Hagen-Zanker, Alex
    Santitissadeekorn, Naratip
    Hughes, Susan
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2022, 98