Assessing QSAR Limitations - A Regulatory Perspective

被引：50

作者：

Tong, Weida ^{[1
]}

Hong, Huixiao ^{[2
]}

Xie, Qian ^{[2
]}

Shi, Leming ^{[1
]}

Fang, Hong ^{[2
]}

Perkins, Roger ^{[2
]}

机构：

[1] NCTR, Ctr Toxicoinformat, Jefferson, AR 72079 USA

[2] Z Tech Inc, Div Bioinformat, Jefferson, AR 72079 USA

来源：

CURRENT COMPUTER-AIDED DRUG DESIGN | 2005年 / 1卷 / 02期

关键词：

SAR/QSAR; model limitation; model uncertainty; applicability domain; model validation; chance correlation; decision forest; consensus modeling;

D O I：

10.2174/1573409053585663

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

Wider acceptance of QSARs would result in a constellation of benefits and savings to both private and public sectors. For this to occur, particularly in regulatory applications, a model's limitations need to be identified. We define a model's limitations as encompassing assessment of overall prediction accuracy, applicability domain and chance correlation. A general guideline is presented in this review for assessing a model's limitations with emphasis on and examples of application with consensus modeling methods. More specifically, we discuss the commonalities and differences between external validation and cross-validation for assessing a model's limitations. We illustrate two common ways of assessing overall prediction accuracy, depending on whether or not the intended application domain is predefined. Since even a high quality model will have different confidence in accuracy for predicting different chemicals, we further demonstrate using the novel Decision Forest consensus modeling method a means to determine prediction confidence (i.e., certainty for an individual chemical's prediction) and domain extrapolation (i.e., the prediction accuracy for a chemical that is outside the chemistry space defined by the training chemicals). We show that prediction confidence and domain extrapolation are related measures that together determine the applicability domain of a model, and that prediction confidence is the more important measure. Lastly, the importance of assessing chance correlation is emphasized, and illustrated with several examples of models having a high degree of chance correlations despite cross-validation indicating high prediction accuracy. Generally, a dataset with a skewed distribution, small data size and/or low signal/noise ratio tends to produce a model with high chance correlation. We conclude that it is imperative to assess all three aspects (i.e., overall accuracy, applicability domain and chance correlation) of a model for the regulatory acceptance of QSARs.

引用

页码：195 / 205

页数：11

共 35 条

[1] Selection bias in gene extraction on the basis of microarray gene-expression data [J].