Reliable and explainable machine-learning methods for accelerated material discovery

被引:0
作者
Bhavya Kailkhura
Brian Gallagher
Sookyung Kim
Anna Hiszpanski
T. Yong-Jin Han
机构
[1] Lawrence Livermore National Laboratory,Center for Applied Scientific Computing, Computing Directorate
[2] Lawrence Livermore National Laboratory,Materials Science Division, Physical and Life Sciences Directorate
来源
npj Computational Materials | / 5卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.
引用
收藏
相关论文
共 103 条
[1]  
Krawczyk B(2016)Learning from imbalanced data: open challenges and future directions Prog. Artif. Intell. 5 221-232
[2]  
Wagner N(2016)Theory-guided machine learning in materials science Front. Mater. 3 28-722
[3]  
Rondinelli JM(2016)Accelerated search for materials with targeted properties by adaptive design Nat. Commun. 7 689-2393
[4]  
Xue D(2016)A general-purpose machine learning framework for predicting properties of inorganic materials npj Comput. Mater. 2 2357-290
[5]  
Ward L(2015)The open quantum materials database (OQMD): assessing the accuracy of dft formation energies npj Comput. Mater. 1 023017-1820
[6]  
Agrawal A(2019)Survey on deep learning with class imbalance J. Big Data 6 279-1101
[7]  
Choudhary A(2017)Interpretable classification models for recidivism prediction J. R. Stat. Soc.: Ser. A 180 105503-190
[8]  
Wolverton C(2017)A bayesian framework for learning rule sets for interpretable classification J. Mach. Learn. Res. 18 094104-277
[9]  
Kirklin S(2017)Learning physical descriptors for materials science by compressed sensing New J. Phys. 19 1812-510
[10]  
Johnson JM(2013)Property phase diagrams for compound semiconductors through data mining Materials 6 1094-646