A critical examination of robustness and generalizability of machine learning prediction of materials properties

被引:47
作者
Li, Kangming [1 ]
DeCost, Brian [2 ]
Choudhary, Kamal [2 ,3 ]
Greenwood, Michael [4 ]
Hattrick-Simpers, Jason [1 ]
机构
[1] Univ Toronto, Dept Mat Sci & Engn, 27 Kings Coll Cir, Toronto, ON, Canada
[2] Natl Inst Stand & Technol, Mat Measurement Lab, 100 Bur Dr, Gaithersburg, MD USA
[3] Theiss Res, La Jolla, CA 92037 USA
[4] Nat Resources Canada, Canmet MATERIALS, 183 Longwood Rd south, Hamilton, ON, Canada
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
10.1038/s41524-023-01012-9
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Recent advances in machine learning (ML) have led to substantial performance improvement in material database benchmarks, but an excellent benchmark score may not imply good generalization performance. Here we show that ML models trained on Materials Project 2018 can have severely degraded performance on new compounds in Materials Project 2021 due to the distribution shift. We discuss how to foresee the issue with a few simple tools. Firstly, the uniform manifold approximation and projection (UMAP) can be used to investigate the relation between the training and test data within the feature space. Secondly, the disagreement between multiple ML models on the test data can illuminate out-of-distribution samples. We demonstrate that the UMAP-guided and query by committee acquisition strategies can greatly improve prediction accuracy by adding only 1% of the test data. We believe this work provides valuable insights for building databases and models that enable better robustness and generalizability.
引用
收藏
页数:9
相关论文
共 64 条
  • [1] A review of uncertainty quantification in deep learning: Techniques, applications and challenges
    Abdar, Moloud
    Pourpanah, Farhad
    Hussain, Sadiq
    Rezazadegan, Dana
    Liu, Li
    Ghavamzadeh, Mohammad
    Fieguth, Paul
    Cao, Xiaochun
    Khosravi, Abbas
    Acharya, U. Rajendra
    Makarenkov, Vladimir
    Nahavandi, Saeid
    [J]. INFORMATION FUSION, 2021, 76 : 243 - 297
  • [2] Screening for high-performance piezoelectrics using high-throughput density functional theory
    Armiento, Rickard
    Kozinsky, Boris
    Fornari, Marco
    Ceder, Gerbrand
    [J]. PHYSICAL REVIEW B, 2011, 84 (01)
  • [3] A critical examination of compound stability predictions from machine-learned formation energies
    Bartel, Christopher J.
    Trewartha, Amalie
    Wang, Qi
    Dunn, Alexander
    Jain, Anubhav
    Ceder, Gerbrand
    [J]. NPJ COMPUTATIONAL MATERIALS, 2020, 6 (01)
  • [4] On representing chemical environments
    Bartok, Albert P.
    Kondor, Risi
    Csanyi, Gabor
    [J]. PHYSICAL REVIEW B, 2013, 87 (18)
  • [5] Machine learning for molecular and materials science
    Butler, Keith T.
    Davies, Daniel W.
    Cartwright, Hugh
    Isayev, Olexandr
    Walsh, Aron
    [J]. NATURE, 2018, 559 (7715) : 547 - 555
  • [6] A universal graph deep learning interatomic potential for the periodic table
    Chen, Chi
    Ong, Shyue Ping
    [J]. NATURE COMPUTATIONAL SCIENCE, 2022, 2 (11): : 718 - +
  • [7] AtomSets as a hierarchical transfer learning framework for small and large materials datasets
    Chen, Chi
    Ong, Shyue Ping
    [J]. NPJ COMPUTATIONAL MATERIALS, 2021, 7 (01)
  • [8] Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals
    Chen, Chi
    Ye, Weike
    Zuo, Yunxing
    Zheng, Chen
    Ong, Shyue Ping
    [J]. CHEMISTRY OF MATERIALS, 2019, 31 (09) : 3564 - 3572
  • [9] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [10] Unified graph neural network force-field for the periodic table: solid state applications
    Choudhary, Kamal
    Decost, Brian
    Major, Lily
    Butler, Keith
    Thiyagalingam, Jeyan
    Tavazza, Francesca
    [J]. DIGITAL DISCOVERY, 2023, 2 (02): : 346 - 355