A critical examination of robustness and generalizability of machine learning prediction of materials properties

被引:47
作者
Li, Kangming [1 ]
DeCost, Brian [2 ]
Choudhary, Kamal [2 ,3 ]
Greenwood, Michael [4 ]
Hattrick-Simpers, Jason [1 ]
机构
[1] Univ Toronto, Dept Mat Sci & Engn, 27 Kings Coll Cir, Toronto, ON, Canada
[2] Natl Inst Stand & Technol, Mat Measurement Lab, 100 Bur Dr, Gaithersburg, MD USA
[3] Theiss Res, La Jolla, CA 92037 USA
[4] Nat Resources Canada, Canmet MATERIALS, 183 Longwood Rd south, Hamilton, ON, Canada
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
10.1038/s41524-023-01012-9
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Recent advances in machine learning (ML) have led to substantial performance improvement in material database benchmarks, but an excellent benchmark score may not imply good generalization performance. Here we show that ML models trained on Materials Project 2018 can have severely degraded performance on new compounds in Materials Project 2021 due to the distribution shift. We discuss how to foresee the issue with a few simple tools. Firstly, the uniform manifold approximation and projection (UMAP) can be used to investigate the relation between the training and test data within the feature space. Secondly, the disagreement between multiple ML models on the test data can illuminate out-of-distribution samples. We demonstrate that the UMAP-guided and query by committee acquisition strategies can greatly improve prediction accuracy by adding only 1% of the test data. We believe this work provides valuable insights for building databases and models that enable better robustness and generalizability.
引用
收藏
页数:9
相关论文
共 64 条
  • [11] Recent advances and applications of deep learning methods in materials science
    Choudhary, Kamal
    DeCost, Brian
    Chen, Chi
    Jain, Anubhav
    Tavazza, Francesca
    Cohn, Ryan
    Park, Cheol Woo
    Choudhary, Alok
    Agrawal, Ankit
    Billinge, Simon J. L.
    Holm, Elizabeth
    Ong, Shyue Ping
    Wolverton, Chris
    [J]. NPJ COMPUTATIONAL MATERIALS, 2022, 8 (01)
  • [12] Atomistic Line Graph Neural Network for improved materials property predictions
    Choudhary, Kamal
    DeCost, Brian
    [J]. NPJ COMPUTATIONAL MATERIALS, 2021, 7 (01)
  • [13] The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design
    Choudhary, Kamal
    Garrity, Kevin F.
    Reid, Andrew C. E.
    DeCost, Brian
    Biacchi, Adam J.
    Hight Walker, Angela R.
    Trautt, Zachary
    Hattrick-Simpers, Jason
    Kusne, A. Gilad
    Centrone, Andrea
    Davydov, Albert
    Jiang, Jie
    Pachter, Ruth
    Cheon, Gowoon
    Reed, Evan
    Agrawal, Ankit
    Qian, Xiaofeng
    Sharma, Vinit
    Zhuang, Houlong
    Kalinin, Sergei V.
    Sumpter, Bobby G.
    Pilania, Ghanshyam
    Acar, Pinar
    Mandal, Subhasish
    Haule, Kristjan
    Vanderbilt, David
    Rabe, Karin
    Tavazza, Francesca
    [J]. NPJ COMPUTATIONAL MATERIALS, 2020, 6 (01)
  • [14] Choudhary K, 2018, PHYS REV MATER, V2, DOI [10.1103/PhysRevMaterials.2.083801, 10.1103/physrevmaterials.2.083801]
  • [15] AFLOW: An automatic framework for high-throughput materials discovery
    Curtarolo, Stefano
    Setyawan, Wahyu
    Hart, Gus L. W.
    Jahnatek, Michal
    Chepulskii, Roman V.
    Taylor, Richard H.
    Wanga, Shidong
    Xue, Junkai
    Yang, Kesong
    Levy, Ohad
    Mehl, Michael J.
    Stokes, Harold T.
    Demchenko, Denis O.
    Morgan, Dane
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2012, 58 : 218 - 226
  • [16] Computational Screening of All Stoichiometric Inorganic Materials
    Davies, Daniel W.
    Butler, Keith T.
    Jackson, Adam J.
    Morris, Andrew
    Frost, Jarvist M.
    Skelton, Jonathan M.
    Walsh, Aron
    [J]. CHEM, 2016, 1 (04): : 617 - 627
  • [17] Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet
    De Breuck, Pierre-Paul
    Evans, Matthew L.
    Rignanese, Gian-Marco
    [J]. JOURNAL OF PHYSICS-CONDENSED MATTER, 2021, 33 (40)
  • [18] Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet
    De Breuck, Pierre-Paul
    Hautier, Geoffroy
    Rignanese, Gian-Marco
    [J]. NPJ COMPUTATIONAL MATERIALS, 2021, 7 (01)
  • [19] A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds
    de Jong, Maarten
    Chen, Wei
    Notestine, Randy
    Persson, Kristin
    Ceder, Gerbrand
    Jain, Anubhav
    Asta, Mark
    Gamst, Anthony
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [20] Scientific AI in materials science: a path to a sustainable and scalable paradigm
    DeCost, B. L.
    Hattrick-Simpers, J. R.
    Trautt, Z.
    Kusne, A. G.
    Campo, E.
    Green, M. L.
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2020, 1 (03):