Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships

被引:199
作者
Janet, Jon Paul [1 ]
Kulik, Heather J. [1 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
POTENTIAL-ENERGY SURFACES; DENSITY-FUNCTIONAL THEORY; REDOX POTENTIALS; ELECTRONIC-STRUCTURE; QUANTUM-CHEMISTRY; NEURAL-NETWORKS; DESIGN; SPIN; COMPLEXES; MOLECULES;
D O I
10.1021/acs.jpca.7b08750
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20X higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5x smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal ligand bond length prediction (0.004-5 angstrom MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin splitting and distal, steric effects in redox potential and bond lengths.
引用
收藏
页码:8939 / 8954
页数:16
相关论文
共 112 条
[1]  
[Anonymous], 1990, AUTOCORRELATION PROP
[2]   High-dimensional neural-network potentials for multicomponent systems: Applications to zinc oxide [J].
Artrith, Nongnuch ;
Morawietz, Tobias ;
Behler, Joerg .
PHYSICAL REVIEW B, 2011, 83 (15)
[3]   Ironing out the photochemical and spin-crossover behavior of Fe(II) coordination compounds with computational chemistry [J].
Ashley, Daniel C. ;
Jakubikova, Elena .
COORDINATION CHEMISTRY REVIEWS, 2017, 337 :97-111
[4]   Computing redox potentials in solution: Density functional theory as a tool for rational design of redox agents [J].
Baik, MH ;
Friesner, RA .
JOURNAL OF PHYSICAL CHEMISTRY A, 2002, 106 (32) :7407-7412
[5]  
Bartok A. P., 2017, ARXIV E PRINTS
[6]   On representing chemical environments [J].
Bartok, Albert P. ;
Kondor, Risi ;
Csanyi, Gabor .
PHYSICAL REVIEW B, 2013, 87 (18)
[7]   DENSITY-FUNCTIONAL THERMOCHEMISTRY .3. THE ROLE OF EXACT EXCHANGE [J].
BECKE, AD .
JOURNAL OF CHEMICAL PHYSICS, 1993, 98 (07) :5648-5652
[8]   Representing potential energy surfaces by high-dimensional neural network potentials [J].
Behler, J. .
JOURNAL OF PHYSICS-CONDENSED MATTER, 2014, 26 (18)
[9]   Generalized neural-network representation of high-dimensional potential-energy surfaces [J].
Behler, Joerg ;
Parrinello, Michele .
PHYSICAL REVIEW LETTERS, 2007, 98 (14)
[10]   Perspective: Machine learning potentials for atomistic simulations [J].
Behler, Joerg .
JOURNAL OF CHEMICAL PHYSICS, 2016, 145 (17)