A strategy to apply machine learning to small datasets in materials science

被引:0
|
作者
Zhang, Ying [1 ]
Ling, Chen [1 ]
机构
[1] Toyota Res Inst North Amer, 1555 Woodridge Ave, Ann Arbor, MI 48105 USA
关键词
THERMAL-CONDUCTIVITY; MATERIALS INFORMATICS; MODELS; PREDICTIONS;
D O I
10.1038/s41524-018-0081-z
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
There is growing interest in applying machine learning techniques in the research of materials science. However, although it is recognized that materials datasets are typically smaller and sometimes more diverse compared to other fields, the influence of availability of materials data on training machine learning models has not yet been studied, which prevents the possibility to establish accurate predictive rules using small materials datasets. Here we analyzed the fundamental interplay between the availability of materials data and the predictive capability of machine learning models. Instead of affecting the model precision directly, the effect of data size is mediated by the degree of freedom (DoF) of model, resulting in the phenomenon of association between precision and DoF. The appearance of precision-DoF association signals the issue of underfitting and is characterized by large bias of prediction, which consequently restricts the accurate prediction in unknown domains. We proposed to incorporate the crude estimation of property in the feature space to establish ML models using small sized materials data, which increases the accuracy of prediction without the cost of higher DoF. In three case studies of predicting the band gap of binary semiconductors, lattice thermal conductivity, and elastic properties of zeolites, the integration of crude estimation effectively boosted the predictive capability of machine learning models to state-of-art levels, demonstrating the generality of the proposed strategy to construct accurate machine learning models using small materials dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A strategy to apply machine learning to small datasets in materials science
    Ying Zhang
    Chen Ling
    npj Computational Materials, 4
  • [2] Small data machine learning in materials science
    Xu, Pengcheng
    Ji, Xiaobo
    Li, Minjie
    Lu, Wencong
    NPJ COMPUTATIONAL MATERIALS, 2023, 9 (01)
  • [3] Small data machine learning in materials science
    Pengcheng Xu
    Xiaobo Ji
    Minjie Li
    Wencong Lu
    npj Computational Materials, 9
  • [4] Practical feature filter strategy to machine learning for small datasets in chemistry
    Hu, Yang
    Sandt, Roland
    Spatschek, Robert
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [5] Improving machine-learning models in materials science through large datasets
    Schmidt, Jonathan
    Cerqueira, Tiago F. T.
    Romero, Aldo H.
    Loew, Antoine
    Jager, Fabian
    Wang, Hai-Chen
    Botti, Silvana
    Marques, Miguel A. L.
    MATERIALS TODAY PHYSICS, 2024, 48
  • [6] Machine learning strategies for small sample size in materials science
    Tao, Qiuling
    Yu, JinXin
    Mu, Xiangyu
    Jia, Xue
    Shi, Rongpei
    Yao, Zhifu
    Wang, Cuiping
    Zhang, Haijun
    Liu, Xingjun
    SCIENCE CHINA-MATERIALS, 2025, 68 (02) : 387 - 405
  • [7] A machine learning approach for corrosion small datasets
    Totok Sutojo
    Supriadi Rustad
    Muhamad Akrom
    Abdul Syukur
    Guruh Fajar Shidik
    Hermawan Kresno Dipojono
    npj Materials Degradation, 7
  • [8] A machine learning approach for corrosion small datasets
    Sutojo, Totok
    Rustad, Supriadi
    Akrom, Muhamad
    Syukur, Abdul
    Shidik, Guruh Fajar
    Dipojono, Hermawan Kresno
    NPJ MATERIALS DEGRADATION, 2023, 7 (01)
  • [9] Machine learning in materials science
    Wei, Jing
    Chu, Xuan
    Sun, Xiang-Yu
    Xu, Kun
    Deng, Hui-Xiong
    Chen, Jigen
    Wei, Zhongming
    Lei, Ming
    INFOMAT, 2019, 1 (03) : 338 - 358
  • [10] Machine Learning Methods with Noisy, Incomplete or Small Datasets
    Caiafa, Cesar F.
    Sun, Zhe
    Tanaka, Toshihisa
    Marti-Puig, Pere
    Sole-Casals, Jordi
    APPLIED SCIENCES-BASEL, 2021, 11 (09):