Prediction of HPLC Retention Index Using Artificial Neural Networks and IGroup E-State Indices

被引:25
作者
Albaugh, Daniel R. [1 ]
Hall, L. Mark [2 ]
Hill, Dennis W. [1 ]
Kertesz, Tzipporah M. [1 ]
Parham, Marc [3 ]
Hall, Lowell H. [4 ]
Grant, David F. [1 ]
机构
[1] Univ Connecticut, Dept Pharmaceut Sci, Sch Pharm, Storrs, CT 06269 USA
[2] Hall Associates Consulting, Quincy, MA 02170 USA
[3] Interact Anal, Bedford, MA 01730 USA
[4] Eastern Nazarene Coll, Quincy, MA 02170 USA
关键词
STRUCTURE-INFORMATION; MOLECULAR CONNECTIVITY; ORGANIC-COMPOUNDS; BINDING; MODELS; MLR; VALIDATION; PESTICIDES; PHASE; TIMES;
D O I
10.1021/ci9000162
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r(2) = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v(2) = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r(2) = 0.93, MAE = 30 RI units and validation v(2) = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.
引用
收藏
页码:788 / 799
页数:12
相关论文
共 37 条
[1]   The compressed feature matrix - a fast method for feature based substructure search [J].
Abolmaali, SFB ;
Wegner, JK ;
Zell, A .
JOURNAL OF MOLECULAR MODELING, 2003, 9 (04) :235-241
[2]   Quantitative study of the structure-retention index relationship in the imine family [J].
Acevedo-Martínez, J ;
Escalona-Arranz, JC ;
Villar-Rojas, A ;
Téllez-Palmero, F ;
Pérez-Rosés, R ;
González, L ;
Carrasco-Velar, R .
JOURNAL OF CHROMATOGRAPHY A, 2006, 1102 (1-2) :238-244
[3]   The Emergent neural modeling system [J].
Aisa, Brad ;
Mingus, Brian ;
O'Reilly, Randy .
NEURAL NETWORKS, 2008, 21 (08) :1146-1152
[4]  
DANIEL D, 1980, FITTING EQUATIONS DA
[5]   NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58
[6]   The predicting study for chromatographic retention index of saturated alcohols by MLR and ANN [J].
Guo, WQ ;
Lu, Y ;
Zheng, XM .
TALANTA, 2000, 51 (03) :479-488
[7]   Quantitative structure-property relationship study of retention time of some pesticides in gas chromatography [J].
Hadjmohammadi, M. R. ;
Fatemi, M. H. ;
Kamel, K. .
JOURNAL OF CHROMATOGRAPHIC SCIENCE, 2007, 45 (07) :400-404
[8]  
*HALL ASS CONS, 2008, WINMOLCONN
[9]  
Hall L. H., 1991, Reviews in Computational Chemistry, P367, DOI [10.1002/9780470125793.ch9, DOI 10.1002/9780470125793.CH9]
[10]   QSAR modeling based on structure-information for properties of interest in human health [J].
Hall, LH ;
Hall, LM .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2005, 16 (1-2) :13-41