Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search

被引:24
|
作者
Tynes, Michael [3 ,4 ]
Gao, Wenhao [1 ,2 ]
Burrill, Daniel J. [3 ,4 ]
Batista, Enrique R. [3 ,4 ]
Perez, Danny [3 ]
Yang, Ping [3 ]
Lubbers, Nicholas [1 ]
机构
[1] Los Alamos Natl Lab, Comp Computat & Stat Sci Div, Comp, Los Alamos, NM 87545 USA
[2] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[3] Los Alamos Natl Lab, Theoret Div, Los Alamos, NM 87545 USA
[4] Los Alamos Natl Lab, Ctr Nonlinear Studies, Los Alamos, NM 87545 USA
关键词
OPTIMIZATION; DESIGN; CLASSIFICATION; DISCOVERY;
D O I
10.1021/acs.jcim.1c00670
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Machine learning (ML) plays a growing role in the design and discovery of chemicals, aiming to reduce the need to perform expensive experiments and simulations. ML for such applications is promising but difficult, as models must generalize to vast chemical spaces from small training sets and must have reliable uncertainty quantification metrics to identify and prioritize unexplored regions. Ab initio computational chemistry and chemical intuition alike often take advantage of differences between chemical conditions, rather than their absolute structure or state, to generate more reliable results. We have developed an analogous comparison-based approach for ML regression, called pairwise difference regression (PADRE), which is applicable to arbitrary underlying learning models and operates on pairs of input data points. During training, the model learns to predict differences between all possible pairs of input points. During prediction, the test points are paired with all training set points, giving rise to a set of predictions that can be treated as a distribution of which the mean is treated as a final prediction and the dispersion is treated as an uncertainty measure. Pairwise difference regression was shown to reliably improve the performance of the random forest algorithm across five chemical ML tasks. Additionally, the pair-derived dispersion is both well correlated with model error and performs well in active learning. We also show that this method is competitive with state-of-the-art neural network techniques. Thus, pairwise difference regression is a promising tool for candidate selection algorithms used in chemical discovery.
引用
收藏
页码:3846 / 3857
页数:12
相关论文
共 50 条
  • [1] SafePredict: A Meta-Algorithm for Machine Learning That Uses Refusals to Guarantee Correctness
    Kocak, Mustafa A.
    Ramirez, David
    Erkip, Elza
    Shasha, Dennis E.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) : 663 - 678
  • [2] Uncertainty quantification of machine learning models: on conformal prediction
    Akpabio, Inimfon I.
    Savari, Serap A.
    JOURNAL OF MICRO-NANOPATTERNING MATERIALS AND METROLOGY-JM3, 2021, 20 (04):
  • [3] Improved Sparrow Search Algorithm with the Extreme Learning Machine and Its Application for Prediction
    Li, Jingjing
    Wu, Yonghong
    NEURAL PROCESSING LETTERS, 2022, 54 (05) : 4189 - 4209
  • [4] Improved Sparrow Search Algorithm with the Extreme Learning Machine and Its Application for Prediction
    Jingjing Li
    Yonghong Wu
    Neural Processing Letters, 2022, 54 : 4189 - 4209
  • [5] Regression prediction of tobacco chemical components during curing based on color quantification and machine learning
    Yang Meng
    Qiang Xu
    Guangqing Chen
    Jianjun Liu
    Shuoye Zhou
    Yanling Zhang
    Aiguo Wang
    Jianwei Wang
    Ding Yan
    Xianjie Cai
    Junying Li
    Xuchu Chen
    Qiuying Li
    Qiang Zeng
    Weimin Guo
    Yuanhui Wang
    Scientific Reports, 14 (1)
  • [6] A Survey of Uncertainty Quantification in Machine Learning for Space Weather Prediction
    Siddique, Talha
    Mahmud, Md Shaad
    Keesee, Amy M.
    Ngwira, Chigomezyo M.
    Connor, Hyunju
    GEOSCIENCES, 2022, 12 (01)
  • [7] Uncertainty quantification in machine learning and nonlinear least squares regression models
    Zhan, Ni
    Kitchin, John R.
    AICHE JOURNAL, 2022, 68 (06)
  • [8] Improved Ensemble Extreme Learning Machine Regression Algorithm
    Li, Meiyi
    Cai, Weibiao
    Liu, Xingwang
    INTELLIGENT INFORMATION PROCESSING IX, 2018, 538 : 12 - 19
  • [9] Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning
    Park, Sangwoo
    Simeone, Osvaldo
    IEEE TRANSACTIONS ON QUANTUM ENGINEERING, 2024, 5 : 1 - 24
  • [10] Prediction of the Electronic Work Function by Regression Algorithm in Machine Learning
    Li, Na
    Zong, Tianxin
    Zhang, Zhigang
    2021 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2021), 2021, : 87 - 91