Assessing Graph-based Deep Learning Models for Predicting Flash Point

被引:16
作者
Sun, Xiaoyu [1 ]
Krakauer, Nathaniel J. [1 ]
Politowicz, Alexander [1 ]
Chen, Wei-Ting [1 ]
Li, Qiying [1 ]
Li, Zuoyi [1 ]
Shao, Xianjia [1 ]
Sunaryo, Alfred [1 ]
Shen, Mingren [1 ]
Wang, James [1 ]
Morgan, Dane [1 ]
机构
[1] Univ Wisconsin, Dept Mat Sci & Engn, 244 MSE, Madison, WI 53562 USA
关键词
Flash point; Domain of applicability; Quantitative structure-property relationship; Neural network; Robust model prediction; Machine learning; ORGANOSILICON COMPOUNDS;
D O I
10.1002/minf.201900101
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Flash points of organic molecules play an important role in preventing flammability hazards and large databases of measured values exist, although millions of compounds remain unmeasured. To rapidly extend existing data to new compounds many researchers have used quantitative structure-property relationship (QSPR) analysis to effectively predict flash points. In recent years graph-based deep learning (GBDL) has emerged as a powerful alternative method to traditional QSPR. In this paper, GBDL models were implemented in predicting flash point for the first time. We assessed the performance of two GBDL models, message-passing neural network (MPNN) and graph convolutional neural network (GCNN), by comparing against 12 previous QSPR studies using more traditional methods. Our result shows that MPNN both outperforms GCNN and yields slightly worse but comparable performance with previous QSPR studies. The average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3 % lower and 2.0 K higher than previous comparable studies. To further explore GBDL models, we collected the largest flash point dataset to date, which contains 10575 unique molecules. The optimized MPNN gives a test data R2 of 0.803 and MAE of 17.8 K on the complete dataset. We also extracted 5 datasets from our integrated dataset based on molecular types (acids, organometallics, organogermaniums, organosilicons, and organotins) and explore the quality of the model in these classes.
引用
收藏
页数:13
相关论文
共 34 条
  • [1] *AICHE, 2015, 801 AICHE
  • [2] Low Data Drug Discovery with One-Shot Learning
    Altae-Tran, Han
    Ramsundar, Bharath
    Pappu, Aneesh S.
    Pande, Vijay
    [J]. ACS CENTRAL SCIENCE, 2017, 3 (04) : 283 - 293
  • [3] ARISTARAN M, 2018, TABULA VERSION 1 2 1
  • [4] Predicting Boiling Points and Flash Points of Monochloroalkanes from Structure
    Carroll, Felix A.
    Brown, David M.
    Quina, Frank H.
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2015, 54 (01) : 560 - 564
  • [5] Simple Method to Evaluate and to Predict Flash Points of Organic Compounds
    Carroll, Felix A.
    Lin, Chung-Yon
    Quina, Frank H.
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2011, 50 (08) : 4796 - 4800
  • [6] Improved Prediction of Hydrocarbon Flash Points from Boiling Point Data
    Carroll, Felix A.
    Lin, Chung-Yon
    Quina, Frank H.
    [J]. ENERGY & FUELS, 2010, 24 (09) : 4854 - 4856
  • [7] Calculating Flash Point Numbers from Molecular Structure: An Improved Method for Predicting the Flash Points of Acyclic Alkanes
    Carroll, Felix A.
    Lin, Chung-You
    Quina, Frank H.
    [J]. ENERGY & FUELS, 2010, 24 (01) : 392 - 395
  • [8] Carson P., 2002, Hazardous Chemicals Handbook, V2
  • [9] Chen C.P., 2014, JOURNAL
  • [10] A graph-convolutional neural network model for the prediction of chemical reactivity
    Coley, Connor W.
    Jin, Wengong
    Rogers, Luke
    Jamison, Timothy F.
    Jaakkola, Tommi S.
    Green, William H.
    Barzilay, Regina
    Jensen, Klavs F.
    [J]. CHEMICAL SCIENCE, 2019, 10 (02) : 370 - 377