Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles. We show that human intuition in the curation of training data introduces biases that hamper model transferability. We introduce a transferability assessment tool which rigorously measures and subsequently improves transferability.
机构:
Department of Civil Engineering and Engineering Mechanics, Columbia University, 614 SW Mudd, Mail Code: 4709, New York,NY,10027, United StatesDepartment of Civil Engineering and Engineering Mechanics, Columbia University, 614 SW Mudd, Mail Code: 4709, New York,NY,10027, United States
机构:
Columbia Univ, Dept Civil Engn & Engn Mech, 614 SW Mudd,Mail Code 4709, New York, NY 10027 USAColumbia Univ, Dept Civil Engn & Engn Mech, 614 SW Mudd,Mail Code 4709, New York, NY 10027 USA
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
New York State Ctr Excellence Mat Informat, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Hachmann, Johannes
Haghighatlari, Mojtaba
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Haghighatlari, Mojtaba
Evangelista, William
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Evangelista, William
Afzal, Mohammad Atif Faiz
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Afzal, Mohammad Atif Faiz
Shih, Ching-Yen
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Shih, Ching-Yen
Moore, Bryan
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Moore, Bryan
Pechagin, Mikhail
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Pechagin, Mikhail
Tian, Yujie
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Chem & Biol Engn, Buffalo, NY USASUNY Buffalo, Chem & Biol Engn, Buffalo, NY USA
Tian, Yujie
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY,
2016,
252
机构:
SUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USA
SUNY Buffalo, Univ Buffalo, Computat & Data Enabled Sci & Engn Grad Program, Buffalo, NY 14260 USA
New York State Ctr Excellence Mat Informat, Buffalo, NY 14202 USASUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USA
Hachmann, Johannes
Afzal, Mohammad Atif Faiz
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USASUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USA
Afzal, Mohammad Atif Faiz
Haghighatlari, Mojtaba
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USASUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USA
Haghighatlari, Mojtaba
Pal, Yudhajit
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USASUNY Buffalo, Univ Buffalo, Dept Chem & Biol Engn, Buffalo, NY 14260 USA
机构:
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico CityDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City
Gaytán-Hernández D.
Chávez-Hernández A.L.
论文数: 0引用数: 0
h-index: 0
机构:
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico CityDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City
Chávez-Hernández A.L.
López-López E.
论文数: 0引用数: 0
h-index: 0
机构:
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City
Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico CityDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City
López-López E.
Miranda-Salas J.
论文数: 0引用数: 0
h-index: 0
机构:
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico CityDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City
Miranda-Salas J.
Saldívar-González F.I.
论文数: 0引用数: 0
h-index: 0
机构:
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico CityDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City
Saldívar-González F.I.
Medina-Franco J.L.
论文数: 0引用数: 0
h-index: 0
机构:
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico CityDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City