Assessing QSAR Limitations - A Regulatory Perspective

被引:50
作者
Tong, Weida [1 ]
Hong, Huixiao [2 ]
Xie, Qian [2 ]
Shi, Leming [1 ]
Fang, Hong [2 ]
Perkins, Roger [2 ]
机构
[1] NCTR, Ctr Toxicoinformat, Jefferson, AR 72079 USA
[2] Z Tech Inc, Div Bioinformat, Jefferson, AR 72079 USA
关键词
SAR/QSAR; model limitation; model uncertainty; applicability domain; model validation; chance correlation; decision forest; consensus modeling;
D O I
10.2174/1573409053585663
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Wider acceptance of QSARs would result in a constellation of benefits and savings to both private and public sectors. For this to occur, particularly in regulatory applications, a model's limitations need to be identified. We define a model's limitations as encompassing assessment of overall prediction accuracy, applicability domain and chance correlation. A general guideline is presented in this review for assessing a model's limitations with emphasis on and examples of application with consensus modeling methods. More specifically, we discuss the commonalities and differences between external validation and cross-validation for assessing a model's limitations. We illustrate two common ways of assessing overall prediction accuracy, depending on whether or not the intended application domain is predefined. Since even a high quality model will have different confidence in accuracy for predicting different chemicals, we further demonstrate using the novel Decision Forest consensus modeling method a means to determine prediction confidence (i.e., certainty for an individual chemical's prediction) and domain extrapolation (i.e., the prediction accuracy for a chemical that is outside the chemistry space defined by the training chemicals). We show that prediction confidence and domain extrapolation are related measures that together determine the applicability domain of a model, and that prediction confidence is the more important measure. Lastly, the importance of assessing chance correlation is emphasized, and illustrated with several examples of models having a high degree of chance correlations despite cross-validation indicating high prediction accuracy. Generally, a dataset with a skewed distribution, small data size and/or low signal/noise ratio tends to produce a model with high chance correlation. We conclude that it is imperative to assess all three aspects (i.e., overall accuracy, applicability domain and chance correlation) of a model for the regulatory acceptance of QSARs.
引用
收藏
页码:195 / 205
页数:11
相关论文
共 35 条
[1]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[2]   Shape quantization and recognition with randomized trees [J].
Amit, Y ;
Geman, D .
NEURAL COMPUTATION, 1997, 9 (07) :1545-1588
[3]  
Ayton P, 1987, JUDGEMENTAL FORECAST, P229
[4]   COMBINATION OF FORECASTS [J].
BATES, JM ;
GRANGER, CWJ .
OPERATIONAL RESEARCH QUARTERLY, 1969, 20 (04) :451-&
[5]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[6]   COMBINING FORECASTS [J].
BUNN, DW .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1988, 33 (03) :223-229
[7]   COMBINING FORECASTS - A REVIEW AND ANNOTATED-BIBLIOGRAPHY [J].
CLEMEN, RT .
INTERNATIONAL JOURNAL OF FORECASTING, 1989, 5 (04) :559-583
[8]   Estimating the safe starting dose in phase I clinical trials and no observed effect level based on QSAR modeling of the human maximum recommended daily dose [J].
Contrera, JF ;
Matthews, EJ ;
Kruhlak, NL ;
Benz, RD .
REGULATORY TOXICOLOGY AND PHARMACOLOGY, 2004, 40 (03) :185-206
[9]   Use of QSARs in international decision-making frameworks to predict health effects of chemical substances [J].
Cronin, MTD ;
Jaworska, JS ;
Walker, JD ;
Comber, MHI ;
Watts, CD ;
Worth, AP .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2003, 111 (10) :1391-1401
[10]  
Drucker H, 1996, ADV NEUR IN, V8, P479