The better predictive model:: High q2 for the training set or low root mean square error of prediction for the test set?

被引:95
作者
Aptula, AO
Jeliazkova, NG
Schultz, TW
Cronin, MTD
机构
[1] Liverpool John Moores Univ, Sch Pharm & Chem, Liverpool L3 3AF, Merseyside, England
[2] Bulgarian Acad Sci, Inst Parallel Proc, BU-1113 Sofia, Bulgaria
[3] Univ Tennessee, Coll Vet Med, Dept Comparat Med, Knoxville, TN 37996 USA
来源
QSAR & COMBINATORIAL SCIENCE | 2005年 / 24卷 / 03期
关键词
phenol toxicity; model complexity; validation; QSAR; RMSE; q(2);
D O I
10.1002/qsar.200430909
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The process of validation of computational models (e.g., QSARs) may become the most important step in their development. Different requirements for the reliability and predictability of QSAR models have been described in the literature. Despite these formal recommendations there are few practical rules as to when to cease adding variables to a QSAR (i.e., what is an appropriate level of complexity of the model). In this work the influence of model complexity to statistical fit and error have been investigated using toxicity data for 200 phenols to the ciliated protozoan Tetrahymena pyriformis when applying a test set of a further 50 compounds. The results from this investigation showed that some important factors play a role in the definition of a good and reliable QSAR. These include the fact that q(2) is not a good criterion for a model predictivity; that outliers should not necessarily be deleted as this may reduce the chemical space of the model; the number of descriptors in a multivariate model should be chosen carefully to avoid model under- and over-estimation; and that an appropriate number of dimensions is required for PLS modelling.
引用
收藏
页码:385 / 396
页数:12
相关论文
共 25 条
  • [1] Aptula AO, 2002, QUANT STRUCT-ACT REL, V21, P12, DOI 10.1002/1521-3838(200205)21:1<12::AID-QSAR12>3.0.CO
  • [2] 2-M
  • [3] BALLS M, 1995, ATLA-ALTERN LAB ANIM, V23, P129
  • [4] An assessment of progress in the use of alternatives in toxicity testing since the publication of the report of the second FRAME Toxicity Committee (1991)
    Combes, R
    Balls, M
    Bansil, L
    Barratt, M
    Bell, D
    Botham, P
    Broadhead, C
    Clothier, R
    George, E
    Fentem, J
    Jackson, M
    Indans, I
    Loizou, G
    Navaratnam, V
    Pentreath, V
    Phillips, B
    Stemplewski, H
    Stewart, J
    [J]. ATLA-ALTERNATIVES TO LABORATORY ANIMALS, 2002, 30 (04): : 365 - 406
  • [5] Cronin MTD, 2004, PREDICTING CHEMICAL TOXICITY AND FATE, P3
  • [6] Use of QSARs in international decision-making frameworks to predict ecologic effects and environmental fate of chemical substances
    Cronin, MTD
    Walker, JD
    Jaworska, JS
    Comber, MHI
    Watts, CD
    Worth, AP
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2003, 111 (10) : 1376 - 1390
  • [7] Pitfalls in QSAR
    Cronin, MTD
    Schultz, TW
    [J]. JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2003, 622 (1-2): : 39 - 51
  • [8] Use of QSARs in international decision-making frameworks to predict health effects of chemical substances
    Cronin, MTD
    Jaworska, JS
    Walker, JD
    Comber, MHI
    Watts, CD
    Worth, AP
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2003, 111 (10) : 1391 - 1401
  • [9] Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis
    Cronin, MTD
    Aptula, AO
    Duffy, JC
    Netzeva, TI
    Rowe, PH
    Valkova, IV
    Schultz, TW
    [J]. CHEMOSPHERE, 2002, 49 (10) : 1201 - 1221
  • [10] Beware of q2!
    Golbraikh, A
    Tropsha, A
    [J]. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2002, 20 (04) : 269 - 276