Predicting Materials Properties with Little Data Using Shotgun Transfer Learning

被引:297
作者
Yamada, Hironao [1 ]
Liu, Chang [1 ,2 ]
Wu, Stephen [1 ,3 ]
Koyama, Yukinori [2 ]
Ju, Shenghong [4 ]
Shiomi, Junichiro [2 ,4 ]
Morikawa, Junko [2 ,5 ]
Yoshida, Ryo [1 ,2 ,3 ]
机构
[1] Inst Stat Math, Res Org Informat & Syst, Tachikawa, Tokyo 1908562, Japan
[2] Natl Inst Mat Sci, Tsukuba, Ibaraki 3050047, Japan
[3] Grad Univ Adv Studies, Tachikawa, Tokyo 1908562, Japan
[4] Univ Tokyo, Bunkyo Ku, Tokyo 1138656, Japan
[5] Tokyo Inst Technol, Meguro Ku, Tokyo 1528550, Japan
基金
日本科学技术振兴机构; 日本学术振兴会;
关键词
THERMAL-DIFFUSIVITY; DESIGN;
D O I
10.1021/acscentsci.9b00804
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
There is a growing demand for the use of machine learning (ML) to derive fast-to-evaluate surrogate models of materials properties. In recent years, a broad array of materials property databases have emerged as part of a digital transformation of materials science. However, recent technological advances in ML are not fully exploited because of the insufficient volume and diversity of materials data. An ML framework called "transfer learning" has considerable potential to overcome the problem of limited amounts of materials data. Transfer learning relies on the concept that various property types, such as physical, chemical, electronic, thermodynamic, and mechanical properties, are physically interrelated. For a given target property to be predicted from a limited supply of training data, models of related proxy properties are pretrained using sufficient data; these models capture common features relevant to the target task. Repurposing of such machine-acquired features on the target task yields outstanding prediction performance even with exceedingly small data sets, as if highly experienced human experts can make rational inferences even for considerably less experienced tasks. In this study, to facilitate widespread use of transfer learning, we develop a pretrained model library called XenonPy.MDL. In this first release, the library comprises more than 140 000 pretrained models for various properties of small molecules, polymers, and inorganic crystalline materials. Along with these pretrained models, we describe some outstanding successes of transfer learning in different scenarios such as building models with only dozens of materials data, increasing the ability of extrapolative prediction through a strategic model transfer, and so on. Remarkably, transfer learning has autonomously identified rather nontrivial transferability across different properties transcending the different disciplines of materials science; for example, our analysis has revealed underlying bridges between small molecules and polymers and between organic and inorganic chemistry.
引用
收藏
页码:1717 / 1730
页数:14
相关论文
共 53 条
[1]   Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers [J].
Abraham, Mark James ;
Murtola, Teemu ;
Schulz, Roland ;
Páll, Szilárd ;
Smith, Jeremy C. ;
Hess, Berk ;
Lindah, Erik .
SoftwareX, 2015, 1-2 :19-25
[2]   Perspective: Materials informatics and big data: Realization of the "fourth paradigm" of science in materials science [J].
Agrawal, Ankit ;
Choudhary, Alok .
APL MATERIALS, 2016, 4 (05)
[3]  
[Anonymous], NANOINFORMATICS
[4]  
[Anonymous], 2018, MATER TODAY, DOI DOI 10.1016/j.mattod.2017.11.021
[5]  
[Anonymous], EXPLORING DIAMOND LI
[6]  
[Anonymous], 2015, GATED GRAPH SEQUENCE
[7]  
[Anonymous], 2017, OVERCOMING DATA SCAR
[8]  
[Anonymous], RELIABLE EXPLAINABLE
[9]   REVIEW OF PHYSPROP DATABASE (VERSION-1.0) [J].
BLOCH, DE .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1995, 35 (02) :328-329
[10]   970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13 [J].
Blum, Lorenz C. ;
Reymond, Jean-Louis .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, 131 (25) :8732-+