Next generation pure component property estimation models: With and without machine learning techniques

被引:57
作者
Alshehri, Abdulelah S. [1 ,2 ]
Tula, Anjan K. [3 ]
You, Fengqi [1 ]
Gani, Rafiqul [4 ]
机构
[1] Cornell Univ, Robert Frederick Smith Sch Chem & Biomol Engn, Ithaca, NY USA
[2] King Saud Univ, Coll Engn, Dept Chem Engn, Riyadh, Saudi Arabia
[3] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China
[4] Korea Adv Inst Sci & Technol KAIST, Dept Chem & Biomol Engn, Daejeon, South Korea
关键词
data analysis; group-contribution; machine learning; pure component property prediction; GAUSSIAN-PROCESSES; ORGANIC-COMPOUNDS; DESIGN; PREDICTION; SELECTION; PRODUCT;
D O I
10.1002/aic.17469
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Physiochemical properties of pure components serve as the basis for the design and simulation of chemical products and processes. Models based on the molecular structural information of chemicals for the following 25 pure component properties are presented in this work: (critical-) temperature, pressure, volume, acentric factor; (normal-) boiling point, melting point, auto-ignition temperature; flash point; (standard-) enthalpy of formation, Gibbs energy of formation, enthalpy of fusion, enthalpy of vaporization, liquid molar volume; (environmental-) (lethal dose-) LC50 and LD50, photo-chemical oxidation potential, bioconcentration factor, permissible exposure limit; (physicochemical-) acid dissociation constant, water-solubility, octanol-water partition coefficient, Hildebrandt solubility parameter, Hansen solubility parameters. Utilizing functional groups for molecular representation, two parallel property estimation models where the group contributions for each property are regressed through traditional regression techniques and machine learning techniques are presented. Both techniques use an a priori data analysis before regression of model parameters. A dataset with more than 24,000 chemicals for the 25 pure component properties has been utilized for the development of the two sets of property models. The efficacy of the developed models and their use are highlighted together with a discussion on the overall performance, application range, and predictive capabilities with implications to product and/or process engineering problem solutions.
引用
收藏
页数:16
相关论文
共 61 条
[1]  
Alshehri A.S., 2021, Computer Aided Chemical Engineering, V50, P227
[2]   Paradigm Shift: The Promise of Deep Learning in Molecular Systems Engineering and Design [J].
Alshehri, Abdulelah S. ;
You, Fengqi .
FRONTIERS IN CHEMICAL ENGINEERING, 2021, 3
[3]   Deep learning and knowledge-based methods for computer-aided molecular design-toward a unified approach: State-of-the-art and future directions [J].
Alshehri, Abdulelah S. ;
Gani, Rafiqul ;
You, Fengqi .
COMPUTERS & CHEMICAL ENGINEERING, 2020, 141
[4]  
Bengio Y, 2004, J MACH LEARN RES, V5, P1089
[5]  
Benhenda M., 2017, ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity?
[6]   Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks [J].
Caywood, Matthew S. ;
Roberts, Daniel M. ;
Colombe, Jeffrey B. ;
Greenwald, Hal S. ;
Weiland, Monica Z. .
FRONTIERS IN HUMAN NEUROSCIENCE, 2017, 10
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   Deep Learning-Based Classification of Hyperspectral Data [J].
Chen, Yushi ;
Lin, Zhouhan ;
Zhao, Xing ;
Wang, Gang ;
Gu, Yanfeng .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (06) :2094-2107
[9]   NEW GROUP-CONTRIBUTION METHOD FOR ESTIMATING PROPERTIES OF PURE COMPOUNDS [J].
CONSTANTINOU, L ;
GANI, R .
AICHE JOURNAL, 1994, 40 (10) :1697-1710
[10]   Hidden representations in deep neural networks: Part 2. Regression problems [J].
Das, Laya ;
Sivaram, Abhishek ;
Venkatasubramanian, Venkat .
COMPUTERS & CHEMICAL ENGINEERING, 2020, 139