Support vector machine regression (LS-SVM)-an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

被引:171
作者
Balabin, Roman M. [1 ]
Lomakina, Ekaterina I. [2 ]
机构
[1] ETH, Dept Chem & Appl Biosci, CH-8093 Zurich, Switzerland
[2] ETH, Dept Comp Sci, CH-8093 Zurich, Switzerland
关键词
NEAR-INFRARED SPECTROSCOPY; COMBINED 1ST-PRINCIPLES CALCULATION; ALKANES RAMAN-SPECTROSCOPY; POTENTIAL-ENERGY SURFACES; DENSITY-FUNCTIONAL THEORY; NIR SPECTROSCOPY; N-PENTANE; GASOLINE CLASSIFICATION; ENTHALPY DIFFERENCE; BASE STOCK;
D O I
10.1039/c1cp00051a
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e. g., thermochemistry) to improve their accuracy (e. g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Moller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e. g., 6-311G(3df, 3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 perpendicular to 0.51 and 0.85 +/- 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach.
引用
收藏
页码:11710 / 11718
页数:9
相关论文
共 91 条
[51]  
Haykin, 2008, NEURAL NETWORKS LEAR
[52]   Application of ab initio theory to QSAR study of 1,4-dihydropyridine-based calcium channel blockers using GA-MLR and PC-GA-ANN procedures [J].
Hemmateenejad, B ;
Safarpour, MA ;
Miri, R ;
Taghavi, F .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2004, 25 (12) :1495-1503
[53]   APPROXIMATION CAPABILITIES OF MULTILAYER FEEDFORWARD NETWORKS [J].
HORNIK, K .
NEURAL NETWORKS, 1991, 4 (02) :251-257
[54]   Combined first-principles calculation and neural-network correction approach for heat of formation [J].
Hu, LH ;
Wang, XJ ;
Wong, LH ;
Chen, GH .
JOURNAL OF CHEMICAL PHYSICS, 2003, 119 (22) :11501-11507
[55]   The magnitude of intramolecular basis set superposition error [J].
Jensen, F .
CHEMICAL PHYSICS LETTERS, 1996, 261 (06) :633-636
[56]  
Jensen F., 1999, INTRO COMPUTATIONAL
[57]   Counter-propagation artificial neural network as a tool for the independent variable selection: Structure-mutagenicity study on aromatic amines [J].
Jezierska, Aneta ;
Vracko, Marjan ;
Basak, Subhash C. .
MOLECULAR DIVERSITY, 2004, 8 (04) :371-377
[58]  
KOLMOGOROV AN, 1957, DOKL AKAD NAUK SSSR+, V114, P953
[59]   KOLMOGOROV THEOREM AND MULTILAYER NEURAL NETWORKS [J].
KURKOVA, V .
NEURAL NETWORKS, 1992, 5 (03) :501-506
[60]   Molecular dissociation of hydrogen peroxide (HOOH) on a neural network ab initio potential surface with a new configuration sampling method involving gradient fitting [J].
Le, Hung M. ;
Sau Huynh ;
Raff, Lionel M. .
JOURNAL OF CHEMICAL PHYSICS, 2009, 131 (01)