Statistical external validation and consensus modeling:: A QSPR case study for Koc prediction

被引:211
作者
Gramatica, Paola [1 ]
Giani, Elisa [1 ]
Papa, Ester [1 ]
机构
[1] Univ Insubria, Dept Struct & Funct Biol, QSAR Res Unit Environm Chem & Ecotoxicol, I-21100 Varese, Italy
关键词
theoretical molecular descriptors; genetic algorithms; splitting; soil sorption coefficient; K-oc; QSAR; OECD principles;
D O I
10.1016/j.jmgm.2006.06.005
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The soil sorption partition coefficient (log K-oc) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K-ow, and log S-w. The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:755 / 766
页数:12
相关论文
共 52 条
  • [1] General and class specific models for prediction of soil sorption using various physicochemical descriptors
    Andersson, PL
    Maran, U
    Fara, D
    Karelson, M
    Hermens, JLM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (06): : 1450 - 1459
  • [2] Consensus kNN QSAR: A versatile method for predicting the estrogenic activity of organic compounds in silico. A comparative study with five estrogen receptors and a large, diverse set of ligands
    Asikainen, AH
    Ruuskanen, J
    Tuppurainen, KA
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2004, 38 (24) : 6724 - 6729
  • [3] Atkinson AC., 1985, Plots, transformations and regression
  • [4] an introduction to graphical methods of diagnostic regression analysis
  • [5] Reliable QSAR for estimating Koc for persistent organic pollutants:: correlation with molecular connectivity indices
    Baker, JR
    Mihelcic, JR
    Sabljic, A
    [J]. CHEMOSPHERE, 2001, 45 (02) : 213 - 221
  • [6] Estimating Koc for persistent organic pollutants:: limitations of correlations with Kow
    Baker, JR
    Mihelcic, JR
    Shea, E
    [J]. CHEMOSPHERE, 2000, 41 (06) : 813 - 817
  • [7] TOPOLOGICAL INDEXES AND REAL NUMBER VERTEX INVARIANTS BASED ON GRAPH EIGENVALUES OR EIGENVECTORS
    BALABAN, AT
    CIUBOTARIU, D
    MEDELEANU, M
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1991, 31 (04): : 517 - 523
  • [8] 2D QSAR consensus prediction for high-throughput virtual screening. An application to COX-2 inhibition modeling and screening of the NCI database
    Baurin, N
    Mozziconacci, JC
    Arnoult, E
    Chavatte, P
    Marot, C
    Morin-Allory, L
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01): : 276 - 285
  • [9] BONCHEV D, 1983, INFOR THEORETIC INDI
  • [10] Robust QSAR models using Bayesian regularized neural networks
    Burden, FR
    Winkler, DA
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 1999, 42 (16) : 3183 - 3187