Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and Best Practices for Machine Learning in Chemistry

被引:38
作者
Vishwakarma, Gaurav [1 ]
Sonpal, Aditya [1 ]
Hachmann, Johannes [1 ,2 ,3 ]
机构
[1] Univ Buffalo, State Univ New York, Dept Chem & Biol Engn, Buffalo, NY 14260 USA
[2] Univ Buffalo, State Univ New York, Computat & Data Enabled Sci & Engn Grad Program, Buffalo, NY 14260 USA
[3] New York State Ctr Excellence Mat Informat, Buffalo, NY 14203 USA
来源
TRENDS IN CHEMISTRY | 2021年 / 3卷 / 02期
关键词
ABSOLUTE ERROR MAE; MOLECULAR DESCRIPTOR; ACCURACY; MODEL; RELIABILITY; DOMAIN; INDEX; RMSE;
D O I
10.1016/j.trechm.2020.12.004
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This review aims to draw attention to two issues of concern when we set out to make machine learning work in the chemical and materials domain, that is, statistical loss function metrics for the validation and benchmarking of data-derived models, and the uncertainty quantification of predictions made by them. They are often overlooked or underappreciated topics as chemists typically only have limited training in statistics. Aside from helping to assess the quality, reliability, and applicability of a given model, these metrics are also key to comparing the performance of different models and thus for developing guidelines and best practices for the successful application of machine learning in chemistry.
引用
收藏
页码:146 / 156
页数:11
相关论文
共 71 条
[1]   A deep neural network model for packing density predictions and its application in the study of 1.5 million organic molecules [J].
Afzal, Mohammad Atif Faiz ;
Sonpal, Aditya ;
Haghighatlari, Mojtaba ;
Schultz, Andrew J. ;
Hachmann, Johannes .
CHEMICAL SCIENCE, 2019, 10 (36) :8374-8383
[2]   Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining [J].
Afzal, Mohammad Atif Faiz ;
Haghighatlari, Mojtaba ;
Ganesh, Sai Prasad ;
Cheng, Chong ;
Hachmann, Johannes .
JOURNAL OF PHYSICAL CHEMISTRY C, 2019, 123 (23) :14610-14618
[3]   Benchmarking DFT approaches for the calculation of polarizability inputs for refractive index predictions in organic polymers [J].
Afzal, Mohammad Atif Faiz ;
Hachmann, Johannes .
PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2019, 21 (08) :4452-4460
[4]   Combining first-principles and data modeling for the accurate prediction of the refractive index of organic polymers [J].
Afzal, Mohammad Atif Faiz ;
Cheng, Chong ;
Hachmann, Johannes .
JOURNAL OF CHEMICAL PHYSICS, 2018, 148 (24)
[5]  
[Anonymous], 2004, ROCAI
[6]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[7]  
[Anonymous], 1999, MODERN INFORM RETRIE
[8]  
[Anonymous], 2006, P 23 INT C MACH LEAR, DOI 10.1145/1143844.1143874
[9]   ERROR MEASURES FOR GENERALIZING ABOUT FORECASTING METHODS - EMPIRICAL COMPARISONS [J].
ARMSTRONG, JS ;
COLLOPY, F .
INTERNATIONAL JOURNAL OF FORECASTING, 1992, 8 (01) :69-80
[10]   Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? [J].
Bajusz, David ;
Racz, Anita ;
Heberger, Kroly .
JOURNAL OF CHEMINFORMATICS, 2015, 7