New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships

被引:184
作者
Jain, Anubhav [1 ]
Hautier, Geoffroy [2 ]
Ong, Shyue Ping [3 ]
Persson, Kristin [1 ,4 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Energy & Environm Technol Div, Berkeley, CA 94720 USA
[2] Catholic Univ Louvain, Inst Condensed Matter & Nanosci IMCN, B-1348 Louvain La Neuve, Belgium
[3] Univ Calif San Diego, Dept NanoEngn, La Jolla, CA 92093 USA
[4] Univ Calif Berkeley, Mat Sci & Engn, Berkeley, CA 94720 USA
关键词
DENSITY-FUNCTIONAL THEORY; CRYSTAL-STRUCTURE; NEURAL-NETWORKS; OXIDE COMPOUNDS; DESIGN; CATHODES; INFRASTRUCTURE; SEMICONDUCTORS; PRINCIPLES; PREDICTION;
D O I
10.1557/jmr.2016.80
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Data mining has revolutionized sectors as diverse as pharmaceutical drug discovery, finance, medicine, and marketing, and has the potential to similarly advance materials science. In this paper, we describe advances in simulation-based materials databases, open-source software tools, and machine learning algorithms that are converging to create new opportunities for materials informatics. We discuss the data mining techniques of exploratory data analysis, clustering, linear models, kernel ridge regression, tree-based regression, and recommendation engines. We present these techniques in the context of several materials application areas, including compound prediction, Li-ion battery design, piezoelectric materials, photocatalysts, and thermoelectric materials. Finally, we demonstrate how new data and tools are making it easier and more accessible than ever to perform data mining through a new analysis that learns trends in the valence and conduction band character of compounds in the Materials Project database using data on over 2500 compounds.
引用
收藏
页码:977 / 994
页数:18
相关论文
共 127 条
[1]   Inorganic structures in space group P3m1;: coordinate analysis and systematic prediction of new ferroelectrics [J].
Abrahams, S. C. .
ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE CRYSTAL ENGINEERING AND MATERIALS, 2008, 64 (426-437) :426-437
[2]   The Cambridge Structural Database: a quarter of a million crystal structures and rising [J].
Allen, FH .
ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 2002, 58 (3 PART 1) :380-388
[3]   The introduction of structure types into the inorganic crystal structure database ICSD [J].
Allmann, Rudolf ;
Hinek, Roland .
ACTA CRYSTALLOGRAPHICA SECTION A, 2007, 63 :412-417
[4]  
Andersen R., 2008, Modern methods for robust regression, DOI [DOI 10.4135/9781412985109, 10.4135/9781412985109]
[5]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[6]  
Avdeev M., 2012, SOLID STATE IONICS, V2-5
[7]   An object-oriented scripting interface to a legacy electronic structure code [J].
Bahn, SR ;
Jacobsen, KW .
COMPUTING IN SCIENCE & ENGINEERING, 2002, 4 (03) :56-66
[8]   Identifying the 'inorganic gene' for high-temperature piezoelectric perovskites through statistical learning [J].
Balachandran, Prasanna V. ;
Broderick, Scott R. ;
Rajan, Krishna .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2011, 467 (2132) :2271-2290
[9]   FactSage thermochemical software and databases - recent developments [J].
Bale, C. W. ;
Belisle, E. ;
Chartrand, P. ;
Decterov, S. A. ;
Eriksson, G. ;
Hack, K. ;
Jung, I. -H. ;
Kang, Y. -B. ;
Melancon, J. ;
Pelton, A. D. ;
Robelin, C. ;
Petersen, S. .
CALPHAD-COMPUTER COUPLING OF PHASE DIAGRAMS AND THERMOCHEMISTRY, 2009, 33 (02) :295-311
[10]   Generalized neural-network representation of high-dimensional potential-energy surfaces [J].
Behler, Joerg ;
Parrinello, Michele .
PHYSICAL REVIEW LETTERS, 2007, 98 (14)